DeepDanbooru: a prototype NN tagger

gwern-bot

Kichang Kim has created a proof-of-concept CNN for image tagging using Danbooru data.

- Discussion: https://www.reddit.com/r/MachineLearning/comments/akbc11/p_tag_estimation_for_animestyle_girl_image/
- Web interface for uploading images: http://kanotype.iptime.org:8003/deepdanbooru/ (will report tags with confidence >0.5).

The NN is a ResNet-151 trained on 1.9m Danbooru images (resized to 299px) to tag the top 6000 tags, implemented in CNTK.

evazion

about 1 year ago

I've been playing with this for the past week and so far, the results are promising. Here are some performance numbers.

Overall Stats

Min_post_id	Max_post_id	Total_posts	Total_tags	Predictable_tags
3368754	3402356	33228	1280268	1088879

To start with, I ran all posts in the range id:3368754..3402356 through the model. This corresponds to all uploads in the month of January 2019 (date:2019-01-01..2019-02-01), which is outside the training set. This amounts to 33k posts with 1.28m total tags. 1.08m of these tags are predictable - the model was trained on them and can potentially recognize them (I spoke with the author and he shared some of the code, including the list of trained tags).

Overall Performance

Min_confidence	Predicted_tags	Correct_predictions	Precision	Recall	F_score
0.5	557121	399297	0.72	0.37	0.49
0.6	448869	344494	0.77	0.32	0.45
0.7	355846	290220	0.82	0.27	0.4
0.8	269486	232216	0.86	0.21	0.34
0.9	179089	162722	0.91	0.15	0.26
0.95	120686	112876	0.94	0.1	0.19
0.99	47594	46005	0.97	0.04	0.08

This is a breakdown of the overall performance at different confidence levels.

So for example, at >=50% confidence, the model predicted 557k tags. 399k of these tags were correct, giving a precision of 72%. However, there were 1.08m tags in total it could have found, so the recall is only 37%.

(See [1] and [2] for background on precision and recall. In short, precision tells you how accurate it was (how many guesses it got right) and recall tells you how many tags it actually found.)

As you would expect, as you increase the confidence level, the precision goes up but the recall goes down (it gets more accurate but finds fewer tags). The F score gives the sweet spot where the precision and recall are both the highest. So overall, performance is best at >50% confidence, and we could possibly do even better if the confidence were lowered even more.

Performance By Tag (Most Common Tags)

Tag	Actual_posts	Predicted_posts	Correct_predictions	Predicted_frequency	Actual_frequency	Precision	Recall	F_score	Min_confidence
1girl	24551	25143	23305	0.75	0.73	0.93	0.95	0.94	0.5
solo	20281	21571	19202	0.64	0.6	0.89	0.95	0.92	0.5
long_hair	20084	20259	16898	0.6	0.6	0.83	0.84	0.84	0.5
looking_at_viewer	16656	15974	11383	0.48	0.5	0.71	0.68	0.7	0.5
breasts	15349	13374	11606	0.4	0.46	0.87	0.76	0.81	0.5
blush	15128	16484	11274	0.49	0.45	0.68	0.75	0.71	0.5
smile	13283	12572	8463	0.37	0.4	0.67	0.64	0.65	0.5
bangs	13275	5718	3853	0.17	0.4	0.67	0.29	0.41	0.5
eyebrows_visible_through_hair	11961	3827	2531	0.11	0.36	0.66	0.21	0.32	0.5
open_mouth	10856	10477	7641	0.31	0.32	0.73	0.7	0.72	0.5
long_sleeves	8674	3610	2498	0.11	0.26	0.69	0.29	0.41	0.5
simple_background	8386	947	672	0.03	0.25	0.71	0.08	0.14	0.5
short_hair	8383	7045	4696	0.21	0.25	0.67	0.56	0.61	0.5
hair_between_eyes	7671	3455	2270	0.1	0.23	0.66	0.3	0.41	0.5
shirt	7515	2974	2090	0.09	0.22	0.7	0.28	0.4	0.5
blue_eyes	7442	7560	6074	0.23	0.22	0.8	0.82	0.81	0.5
skirt	7412	5523	4376	0.16	0.22	0.79	0.59	0.68	0.5
hair_ornament	7247	4580	3525	0.14	0.22	0.77	0.49	0.6	0.5
white_background	7191	565	446	0.02	0.21	0.79	0.06	0.12	0.5
brown_hair	6628	5763	4438	0.17	0.2	0.77	0.67	0.72	0.5
large_breasts	6356	5487	3897	0.16	0.19	0.71	0.61	0.66	0.5
multiple_girls	6337	5653	4995	0.17	0.19	0.88	0.79	0.83	0.5
gloves	6322	5786	3979	0.17	0.19	0.69	0.63	0.66	0.5
blonde_hair	6238	5975	5131	0.18	0.19	0.86	0.82	0.84	0.5
black_hair	6136	5722	4322	0.17	0.18	0.76	0.7	0.73	0.5
holding	5973	1473	958	0.04	0.18	0.65	0.16	0.26	0.5
ribbon	5811	2777	1793	0.08	0.17	0.65	0.31	0.42	0.5
very_long_hair	5596	2711	1866	0.08	0.17	0.69	0.33	0.45	0.5
bow	5592	2266	1743	0.07	0.17	0.77	0.31	0.44	0.5
thighhighs	5555	4785	4097	0.14	0.17	0.86	0.74	0.79	0.5
closed_mouth	5521	1775	937	0.05	0.16	0.53	0.17	0.26	0.5
red_eyes	5322	4422	3585	0.13	0.16	0.81	0.67	0.74	0.5
dress	5170	3355	2310	0.1	0.15	0.69	0.45	0.54	0.5
navel	5075	4240	3466	0.13	0.15	0.82	0.68	0.74	0.5
collarbone	5068	2117	1421	0.06	0.15	0.67	0.28	0.4	0.5
cleavage	4934	3717	2811	0.11	0.15	0.76	0.57	0.65	0.5
standing	4928	1255	644	0.04	0.15	0.51	0.13	0.21	0.5
bare_shoulders	4916	2040	1265	0.06	0.15	0.62	0.26	0.36	0.5
hat	4748	3792	3118	0.11	0.14	0.82	0.66	0.73	0.5
brown_eyes	4426	2839	2061	0.08	0.13	0.73	0.47	0.57	0.5
medium_breasts	4330	1994	1079	0.06	0.13	0.54	0.25	0.34	0.5
sitting	4280	2631	1897	0.08	0.13	0.72	0.44	0.55	0.5
twintails	4280	2824	2084	0.08	0.13	0.74	0.49	0.59	0.5
2girls	4003	3250	2596	0.1	0.12	0.8	0.65	0.72	0.5
sidelocks	3984	559	262	0.02	0.12	0.47	0.07	0.12	0.5
jewelry	3980	2394	1441	0.07	0.12	0.6	0.36	0.45	0.5
jacket	3904	1879	1252	0.06	0.12	0.67	0.32	0.43	0.5
white_shirt	3871	1009	706	0.03	0.12	0.7	0.18	0.29	0.5
underwear	3859	2587	2092	0.08	0.11	0.81	0.54	0.65	0.5
1boy	3748	3252	2247	0.1	0.11	0.69	0.6	0.64	0.5
black_legwear	3731	3037	2097	0.09	0.11	0.69	0.56	0.62	0.5
school_uniform	3637	3653	2390	0.11	0.11	0.65	0.66	0.66	0.5
animal_ears	3599	2828	2529	0.08	0.11	0.89	0.7	0.79	0.5
full_body	3539	2220	1493	0.07	0.11	0.67	0.42	0.52	0.5
hair_ribbon	3474	1593	967	0.05	0.1	0.61	0.28	0.38	0.5
:d	3447	1617	952	0.05	0.1	0.59	0.28	0.38	0.5
green_eyes	3339	2537	2159	0.08	0.1	0.85	0.65	0.73	0.5
upper_body	3329	2196	1287	0.07	0.1	0.59	0.39	0.47	0.5
pleated_skirt	3309	2010	1396	0.06	0.1	0.69	0.42	0.52	0.5
purple_eyes	3195	2662	1941	0.08	0.1	0.73	0.61	0.66	0.5
japanese_clothes	3184	2458	2149	0.07	0.09	0.87	0.67	0.76	0.5
comic	3152	3122	2892	0.09	0.09	0.93	0.92	0.92	0.5
panties	3151	2221	1815	0.07	0.09	0.82	0.58	0.68	0.5
short_sleeves	3124	1207	819	0.04	0.09	0.68	0.26	0.38	0.5
flower	3077	1903	1510	0.06	0.09	0.79	0.49	0.61	0.5
ahoge	3008	1924	1377	0.06	0.09	0.72	0.46	0.56	0.5
monochrome	2976	3145	2866	0.09	0.09	0.91	0.96	0.94	0.5
ponytail	2947	1707	1086	0.05	0.09	0.64	0.37	0.47	0.5
parted_lips	2933	321	158	0.01	0.09	0.49	0.05	0.1	0.5
hair_bow	2826	959	727	0.03	0.08	0.76	0.26	0.38	0.5
nipples	2826	2337	2140	0.07	0.08	0.92	0.76	0.83	0.5
yellow_eyes	2797	2164	1505	0.06	0.08	0.7	0.54	0.61	0.5
cowboy_shot	2757	1277	621	0.04	0.08	0.49	0.23	0.31	0.5
closed_eyes	2711	2362	1492	0.07	0.08	0.63	0.55	0.59	0.5
greyscale	2692	2634	2509	0.08	0.08	0.95	0.93	0.94	0.5
braid	2684	1552	1070	0.05	0.08	0.69	0.4	0.51	0.5
pink_hair	2649	2154	1798	0.06	0.08	0.83	0.68	0.75	0.5
ass	2579	1857	1478	0.06	0.08	0.8	0.57	0.67	0.5
silver_hair	2568	1809	1175	0.05	0.08	0.65	0.46	0.54	0.5
small_breasts	2501	624	406	0.02	0.07	0.65	0.16	0.26	0.5
weapon	2490	1718	1211	0.05	0.07	0.7	0.49	0.58	0.5
tail	2472	2119	1472	0.06	0.07	0.69	0.6	0.64	0.5
blue_hair	2453	2012	1599	0.06	0.07	0.79	0.65	0.72	0.5
kimono	2426	1591	1439	0.05	0.07	0.9	0.59	0.72	0.5
sweat	2415	710	400	0.02	0.07	0.56	0.17	0.26	0.5
purple_hair	2407	1950	1489	0.06	0.07	0.76	0.62	0.68	0.5
thighs	2350	160	67	0.0	0.07	0.42	0.03	0.05	0.5
boots	2265	2095	1190	0.06	0.07	0.57	0.53	0.55	0.5
heart	2243	879	572	0.03	0.07	0.65	0.26	0.37	0.5
open_clothes	2220	740	464	0.02	0.07	0.63	0.21	0.31	0.5
pantyhose	2191	1984	1442	0.06	0.07	0.73	0.66	0.69	0.5
swimsuit	2185	1927	1554	0.06	0.07	0.81	0.71	0.76	0.5
outdoors	2121	1058	675	0.03	0.06	0.64	0.32	0.42	0.5
wide_sleeves	2115	885	586	0.03	0.06	0.66	0.28	0.39	0.5
shiny	2101	61	24	0.0	0.06	0.39	0.01	0.02	0.5
frills	2096	484	319	0.01	0.06	0.66	0.15	0.25	0.5
white_legwear	2091	1178	904	0.04	0.06	0.77	0.43	0.55	0.5
earrings	2068	1106	664	0.03	0.06	0.6	0.32	0.42	0.5
lying	1992	1030	752	0.03	0.06	0.73	0.38	0.5	0.5
sleeveless	1953	412	262	0.01	0.06	0.64	0.13	0.22	0.5

Here's a breakdown of how well it's able to recognize individual tags. This is at >=50% confidence.

"Actual posts" is how posts actually had the given tag.
"Predicted posts" is how many posts it predicted should have the tag.
"Correct predictions" is how many of those predictions it actually got right (true positives).
"Actual frequency" is how often the tag actually appeared on uploads, while "Predicted frequency" is how often it predicted the tag should appear. If things are working correctly, a tag's predicted frequency should be close to its actual frequency.

So for example, 1girl was actually tagged on 24.5k posts (73% of uploads), while we predicted it should be tagged on 25.1k posts (75% of uploads). 23.3k of those predictions were right, for a precision of 93% and a recall of 95%.

Performance Per Tag (Most Accurate Tags)

Tag	Actual_posts	Predicted_posts	Correct_predictions	Predicted_frequency	Actual_frequency	Precision	Recall	F_score	Min_confidence
1girl	24551	25143	23305	0.75	0.73	0.93	0.95	0.94	0.5
monochrome	2976	3145	2866	0.09	0.09	0.91	0.96	0.94	0.5
greyscale	2692	2634	2509	0.08	0.08	0.95	0.93	0.94	0.5
abigail_williams_(fate/grand_order)	219	201	197	0.01	0.01	0.98	0.9	0.94	0.5
solo	20281	21571	19202	0.64	0.6	0.89	0.95	0.92	0.5
comic	3152	3122	2892	0.09	0.09	0.93	0.92	0.92	0.5
bunnysuit	229	222	195	0.01	0.01	0.88	0.85	0.86	0.5
jeanne_d'arc_(fate)_(all)	290	232	221	0.01	0.01	0.95	0.76	0.85	0.5
long_hair	20084	20259	16898	0.6	0.6	0.83	0.84	0.84	0.5
blonde_hair	6238	5975	5131	0.18	0.19	0.86	0.82	0.84	0.5
multiple_girls	6337	5653	4995	0.17	0.19	0.88	0.79	0.83	0.5
nipples	2826	2337	2140	0.07	0.08	0.92	0.76	0.83	0.5
hatsune_miku	229	198	178	0.01	0.01	0.9	0.78	0.83	0.5
breasts	15349	13374	11606	0.4	0.46	0.87	0.76	0.81	0.5
blue_eyes	7442	7560	6074	0.23	0.22	0.8	0.82	0.81	0.5
artoria_pendragon_(all)	289	275	229	0.01	0.01	0.83	0.79	0.81	0.5
4koma	613	554	465	0.02	0.02	0.84	0.76	0.8	0.5
orange_bow	246	189	173	0.01	0.01	0.92	0.7	0.8	0.5
thighhighs	5555	4785	4097	0.14	0.17	0.86	0.74	0.79	0.5
animal_ears	3599	2828	2529	0.08	0.11	0.89	0.7	0.79	0.5
hakurei_reimu	233	163	156	0.0	0.01	0.96	0.67	0.79	0.5
sex	775	821	619	0.02	0.02	0.75	0.8	0.78	0.5
kaga_(kantai_collection)	198	189	151	0.01	0.01	0.8	0.76	0.78	0.5
penis	1067	1046	818	0.03	0.03	0.78	0.77	0.77	0.5
japanese_clothes	3184	2458	2149	0.07	0.09	0.87	0.67	0.76	0.5
swimsuit	2185	1927	1554	0.06	0.07	0.81	0.71	0.76	0.5
one_eye_closed	1797	1575	1276	0.05	0.05	0.81	0.71	0.76	0.5
pink_hair	2649	2154	1798	0.06	0.08	0.83	0.68	0.75	0.5
santa_hat	258	194	169	0.01	0.01	0.87	0.66	0.75	0.5
red_eyes	5322	4422	3585	0.13	0.16	0.81	0.67	0.74	0.5
navel	5075	4240	3466	0.13	0.15	0.82	0.68	0.74	0.5
one-piece_swimsuit	367	327	258	0.01	0.01	0.79	0.7	0.74	0.5
black_hair	6136	5722	4322	0.17	0.18	0.76	0.7	0.73	0.5
hat	4748	3792	3118	0.11	0.14	0.82	0.66	0.73	0.5
green_eyes	3339	2537	2159	0.08	0.1	0.85	0.65	0.73	0.5
bunny_ears	570	421	361	0.01	0.02	0.86	0.63	0.73	0.5
open_mouth	10856	10477	7641	0.31	0.32	0.73	0.7	0.72	0.5
brown_hair	6628	5763	4438	0.17	0.2	0.77	0.67	0.72	0.5
2girls	4003	3250	2596	0.1	0.12	0.8	0.65	0.72	0.5
blue_hair	2453	2012	1599	0.06	0.07	0.79	0.65	0.72	0.5
kimono	2426	1591	1439	0.05	0.07	0.9	0.59	0.72	0.5
bikini	1660	1438	1112	0.04	0.05	0.77	0.67	0.72	0.5
nude	1705	1584	1179	0.05	0.05	0.74	0.69	0.72	0.5
green_hair	1336	1084	874	0.03	0.04	0.81	0.65	0.72	0.5
witch_hat	324	236	202	0.01	0.01	0.86	0.62	0.72	0.5
blush	15128	16484	11274	0.49	0.45	0.68	0.75	0.71	0.5
fox_ears	700	515	429	0.02	0.02	0.83	0.61	0.71	0.5
maid_headdress	531	422	338	0.01	0.02	0.8	0.64	0.71	0.5
looking_at_viewer	16656	15974	11383	0.48	0.5	0.71	0.68	0.7	0.5
pantyhose	2191	1984	1442	0.06	0.07	0.73	0.66	0.69	0.5
hetero	1192	1351	880	0.04	0.04	0.65	0.74	0.69	0.5
mob_cap	298	201	172	0.01	0.01	0.86	0.58	0.69	0.5
skirt	7412	5523	4376	0.16	0.22	0.79	0.59	0.68	0.5
panties	3151	2221	1815	0.07	0.09	0.82	0.58	0.68	0.5
purple_hair	2407	1950	1489	0.06	0.07	0.76	0.62	0.68	0.5
kirisame_marisa	189	118	105	0.0	0.01	0.89	0.56	0.68	0.5
ass	2579	1857	1478	0.06	0.08	0.8	0.57	0.67	0.5
vaginal	567	602	394	0.02	0.02	0.65	0.69	0.67	0.5
large_breasts	6356	5487	3897	0.16	0.19	0.71	0.61	0.66	0.5
gloves	6322	5786	3979	0.17	0.19	0.69	0.63	0.66	0.5
school_uniform	3637	3653	2390	0.11	0.11	0.65	0.66	0.66	0.5
purple_eyes	3195	2662	1941	0.08	0.1	0.73	0.61	0.66	0.5
fox_tail	522	421	309	0.01	0.02	0.73	0.59	0.66	0.5
smile	13283	12572	8463	0.37	0.4	0.67	0.64	0.65	0.5
cleavage	4934	3717	2811	0.11	0.15	0.76	0.57	0.65	0.5
underwear	3859	2587	2092	0.08	0.11	0.81	0.54	0.65	0.5
serafuku	1864	1827	1202	0.05	0.06	0.66	0.64	0.65	0.5
red_hair	1841	1391	1055	0.04	0.05	0.76	0.57	0.65	0.5
cum	797	881	549	0.03	0.02	0.62	0.69	0.65	0.5
1boy	3748	3252	2247	0.1	0.11	0.69	0.6	0.64	0.5
tail	2472	2119	1472	0.06	0.07	0.69	0.6	0.64	0.5
censored	1184	976	695	0.03	0.04	0.71	0.59	0.64	0.5
scenery	206	219	137	0.01	0.01	0.63	0.67	0.64	0.5
oral	217	197	133	0.01	0.01	0.68	0.61	0.64	0.5
cat_ears	1002	700	539	0.02	0.03	0.77	0.54	0.63	0.5
maid	386	334	225	0.01	0.01	0.67	0.58	0.63	0.5
black_legwear	3731	3037	2097	0.09	0.11	0.69	0.56	0.62	0.5
beach	237	224	142	0.01	0.01	0.63	0.6	0.62	0.5
hair_tubes	311	205	159	0.01	0.01	0.78	0.51	0.62	0.5
short_hair	8383	7045	4696	0.21	0.25	0.67	0.56	0.61	0.5
flower	3077	1903	1510	0.06	0.09	0.79	0.49	0.61	0.5
yellow_eyes	2797	2164	1505	0.06	0.08	0.7	0.54	0.61	0.5
male_focus	1181	1215	735	0.04	0.04	0.6	0.62	0.61	0.5
pokemon_(creature)	304	205	155	0.01	0.01	0.76	0.51	0.61	0.5
umbrella	428	271	213	0.01	0.01	0.79	0.5	0.61	0.5
hair_flaps	396	238	194	0.01	0.01	0.82	0.49	0.61	0.5
hair_ornament	7247	4580	3525	0.14	0.22	0.77	0.49	0.6	0.5
glasses	1452	788	673	0.02	0.04	0.85	0.46	0.6	0.5
cat_tail	534	353	266	0.01	0.02	0.75	0.5	0.6	0.5
striped_legwear	342	256	178	0.01	0.01	0.7	0.52	0.6	0.5
headpiece	169	99	80	0.0	0.01	0.81	0.47	0.6	0.5
paws	196	102	90	0.0	0.01	0.88	0.46	0.6	0.5
twintails	4280	2824	2084	0.08	0.13	0.74	0.49	0.59	0.5
closed_eyes	2711	2362	1492	0.07	0.08	0.63	0.55	0.59	0.5
wings	1197	740	572	0.02	0.04	0.77	0.48	0.59	0.5
pussy	1123	694	540	0.02	0.03	0.78	0.48	0.59	0.5
cum_in_pussy	365	330	205	0.01	0.01	0.62	0.56	0.59	0.5
china_dress	192	129	95	0.0	0.01	0.74	0.49	0.59	0.5
teddy_bear	211	192	119	0.01	0.01	0.62	0.56	0.59	0.5
weapon	2490	1718	1211	0.05	0.07	0.7	0.49	0.58	0.5

Performance Per Tag (Least Accurate Tags)

Tag	Actual_posts	Predicted_posts	Correct_predictions	Predicted_frequency	Actual_frequency	Precision	Recall	F_score	Min_confidence
hand_up	1822	8	3	0.0	0.05	0.38	0.0	0.0	0.5
shiny	2101	61	24	0.0	0.06	0.39	0.01	0.02	0.5
head_tilt	1949	62	18	0.0	0.06	0.29	0.01	0.02	0.5
thighs	2350	160	67	0.0	0.07	0.42	0.03	0.05	0.5
parted_lips	2933	321	158	0.01	0.09	0.49	0.05	0.1	0.5
artist_name	1532	352	93	0.01	0.05	0.26	0.06	0.1	0.5
white_background	7191	565	446	0.02	0.21	0.79	0.06	0.12	0.5
sidelocks	3984	559	262	0.02	0.12	0.47	0.07	0.12	0.5
teeth	1663	201	114	0.01	0.05	0.57	0.07	0.12	0.5
miniskirt	1773	221	128	0.01	0.05	0.58	0.07	0.13	0.5
simple_background	8386	947	672	0.03	0.25	0.71	0.08	0.14	0.5
grey_background	1637	184	143	0.01	0.05	0.78	0.09	0.16	0.5
signature	1643	418	183	0.01	0.05	0.44	0.11	0.18	0.5
collared_shirt	1642	332	186	0.01	0.05	0.56	0.11	0.19	0.5
alternate_costume	1587	452	204	0.01	0.05	0.45	0.13	0.2	0.5
standing	4928	1255	644	0.04	0.15	0.51	0.13	0.21	0.5
sleeveless	1953	412	262	0.01	0.06	0.64	0.13	0.22	0.5
frills	2096	484	319	0.01	0.06	0.66	0.15	0.25	0.5
indoors	1567	471	256	0.01	0.05	0.54	0.16	0.25	0.5
holding	5973	1473	958	0.04	0.18	0.65	0.16	0.26	0.5
closed_mouth	5521	1775	937	0.05	0.16	0.53	0.17	0.26	0.5
small_breasts	2501	624	406	0.02	0.07	0.65	0.16	0.26	0.5
sweat	2415	710	400	0.02	0.07	0.56	0.17	0.26	0.5
white_shirt	3871	1009	706	0.03	0.12	0.7	0.18	0.29	0.5
sailor_collar	1527	410	284	0.01	0.05	0.69	0.19	0.29	0.5
cowboy_shot	2757	1277	621	0.04	0.08	0.49	0.23	0.31	0.5
open_clothes	2220	740	464	0.02	0.07	0.63	0.21	0.31	0.5
black_skirt	1781	474	345	0.01	0.05	0.73	0.19	0.31	0.5
eyebrows_visible_through_hair	11961	3827	2531	0.11	0.36	0.66	0.21	0.32	0.5
black_gloves	1866	690	422	0.02	0.06	0.61	0.23	0.33	0.5
medium_breasts	4330	1994	1079	0.06	0.13	0.54	0.25	0.34	0.5
white_hair	1788	787	453	0.02	0.05	0.58	0.25	0.35	0.5
bare_shoulders	4916	2040	1265	0.06	0.15	0.62	0.26	0.36	0.5
heart	2243	879	572	0.03	0.07	0.65	0.26	0.37	0.5
hair_ribbon	3474	1593	967	0.05	0.1	0.61	0.28	0.38	0.5
:d	3447	1617	952	0.05	0.1	0.59	0.28	0.38	0.5
short_sleeves	3124	1207	819	0.04	0.09	0.68	0.26	0.38	0.5
hair_bow	2826	959	727	0.03	0.08	0.76	0.26	0.38	0.5
multicolored_hair	1876	664	478	0.02	0.06	0.72	0.25	0.38	0.5
striped	1687	1419	589	0.04	0.05	0.42	0.35	0.38	0.5
wide_sleeves	2115	885	586	0.03	0.06	0.66	0.28	0.39	0.5
shirt	7515	2974	2090	0.09	0.22	0.7	0.28	0.4	0.5
collarbone	5068	2117	1421	0.06	0.15	0.67	0.28	0.4	0.5
choker	1784	877	538	0.03	0.05	0.61	0.3	0.4	0.5
bangs	13275	5718	3853	0.17	0.4	0.67	0.29	0.41	0.5
long_sleeves	8674	3610	2498	0.11	0.26	0.69	0.29	0.41	0.5
hair_between_eyes	7671	3455	2270	0.1	0.23	0.66	0.3	0.41	0.5
food	1600	825	492	0.02	0.05	0.6	0.31	0.41	0.5
shoes	1800	781	533	0.02	0.05	0.68	0.3	0.41	0.5
ribbon	5811	2777	1793	0.08	0.17	0.65	0.31	0.42	0.5
earrings	2068	1106	664	0.03	0.06	0.6	0.32	0.42	0.5
outdoors	2121	1058	675	0.03	0.06	0.64	0.32	0.42	0.5
jacket	3904	1879	1252	0.06	0.12	0.67	0.32	0.43	0.5
bow	5592	2266	1743	0.07	0.17	0.77	0.31	0.44	0.5
very_long_hair	5596	2711	1866	0.08	0.17	0.69	0.33	0.45	0.5
jewelry	3980	2394	1441	0.07	0.12	0.6	0.36	0.45	0.5
shorts	1643	794	566	0.02	0.05	0.71	0.34	0.46	0.5
upper_body	3329	2196	1287	0.07	0.1	0.59	0.39	0.47	0.5
ponytail	2947	1707	1086	0.05	0.09	0.64	0.37	0.47	0.5
elbow_gloves	1532	1033	598	0.03	0.05	0.58	0.39	0.47	0.5
hairclip	1664	892	604	0.03	0.05	0.68	0.36	0.47	0.5
detached_sleeves	1624	940	634	0.03	0.05	0.67	0.39	0.49	0.5
lying	1992	1030	752	0.03	0.06	0.73	0.38	0.5	0.5
braid	2684	1552	1070	0.05	0.08	0.69	0.4	0.51	0.5
necktie	1671	1045	688	0.03	0.05	0.66	0.41	0.51	0.5
full_body	3539	2220	1493	0.07	0.11	0.67	0.42	0.52	0.5
pleated_skirt	3309	2010	1396	0.06	0.1	0.69	0.42	0.52	0.5
dress	5170	3355	2310	0.1	0.15	0.69	0.45	0.54	0.5
silver_hair	2568	1809	1175	0.05	0.08	0.65	0.46	0.54	0.5
sitting	4280	2631	1897	0.08	0.13	0.72	0.44	0.55	0.5
boots	2265	2095	1190	0.06	0.07	0.57	0.53	0.55	0.5
white_legwear	2091	1178	904	0.04	0.06	0.77	0.43	0.55	0.5
ahoge	3008	1924	1377	0.06	0.09	0.72	0.46	0.56	0.5
hairband	1873	1371	908	0.04	0.06	0.66	0.48	0.56	0.5
hair_flower	1793	1035	795	0.03	0.05	0.77	0.44	0.56	0.5
brown_eyes	4426	2839	2061	0.08	0.13	0.73	0.47	0.57	0.5
sky	1853	1274	885	0.04	0.06	0.69	0.48	0.57	0.5
weapon	2490	1718	1211	0.05	0.07	0.7	0.49	0.58	0.5
twintails	4280	2824	2084	0.08	0.13	0.74	0.49	0.59	0.5
closed_eyes	2711	2362	1492	0.07	0.08	0.63	0.55	0.59	0.5
hair_ornament	7247	4580	3525	0.14	0.22	0.77	0.49	0.6	0.5
short_hair	8383	7045	4696	0.21	0.25	0.67	0.56	0.61	0.5
flower	3077	1903	1510	0.06	0.09	0.79	0.49	0.61	0.5
yellow_eyes	2797	2164	1505	0.06	0.08	0.7	0.54	0.61	0.5
black_legwear	3731	3037	2097	0.09	0.11	0.69	0.56	0.62	0.5
1boy	3748	3252	2247	0.1	0.11	0.69	0.6	0.64	0.5
tail	2472	2119	1472	0.06	0.07	0.69	0.6	0.64	0.5
smile	13283	12572	8463	0.37	0.4	0.67	0.64	0.65	0.5
cleavage	4934	3717	2811	0.11	0.15	0.76	0.57	0.65	0.5
underwear	3859	2587	2092	0.08	0.11	0.81	0.54	0.65	0.5
serafuku	1864	1827	1202	0.05	0.06	0.66	0.64	0.65	0.5
red_hair	1841	1391	1055	0.04	0.05	0.76	0.57	0.65	0.5
large_breasts	6356	5487	3897	0.16	0.19	0.71	0.61	0.66	0.5
gloves	6322	5786	3979	0.17	0.19	0.69	0.63	0.66	0.5
school_uniform	3637	3653	2390	0.11	0.11	0.65	0.66	0.66	0.5
purple_eyes	3195	2662	1941	0.08	0.1	0.73	0.61	0.66	0.5
ass	2579	1857	1478	0.06	0.08	0.8	0.57	0.67	0.5
skirt	7412	5523	4376	0.16	0.22	0.79	0.59	0.68	0.5
panties	3151	2221	1815	0.07	0.09	0.82	0.58	0.68	0.5
purple_hair	2407	1950	1489	0.06	0.07	0.76	0.62	0.68	0.5

Here are the top 100 best and worst tags. Some observations:

Some of the best tags are things like monochrome, greyscale, and comic, which are unsurprisingly easy to predict.

But it also does very well on 1girl and solo, which is surprising considering the number of difficult corner cases these tags can have.

The worst tags tend to be very common features that are frequently tagged on new uploads, but not on old uploads. This can happen if a tag is only used by certain power uploaders (but not by other users in general), or if it only recently came into widespread use. For example, eyebrows visible through hair is used on 36% of all uploads from this year, but on nearly 0% of uploads before 2016. Training tags like this is probably very difficult given the huge number of false negatives they will have.

It should be emphasized that all precision and recall values here are underestimates. Many predictions are currently counted as wrong due to missing tags, but a tag not being present on a post doesn't necessarily mean the post shouldn't have the tag. So as missing tags are added, these numbers will improve.

Overall, while there's still ample room for improvement, this is already good enough for many purposes, including suggesting tags during editing and finding missing tags.

gwern-bot

about 1 year ago

That's interesting to hear. How difficult would it be to implement something like 'suggesting tags during editing/uploads' on live Danbooru?

albert

about 1 year ago

I implemented something similar previously but in practice I didn't find the tags it returned to be useful. It was incredibly obvious common tags like 1girl or monochrome. It could, for example, identify popular FGO characters, but for frequent uploaders those are some of the least important things that need to be identified. It's kind of a catch 22: an autotagger would be most useful for lesser known tags, but an autotagger would never learn them because there aren't enough examples of them.

The way users apply tags on Danbooru makes it not suitable for ML training applications like this because tags will be used even if they only describe a small percentage of the image, meaning the data set is very noisy. There's also a heavy bias towards new shows, games and characters which will have small training sets and will probably not get identified unless they get so popular that everyone knows about them.

It's a neat idea and I'm glad projects like this exist, but it just doesn't match the needs of most uploaders.

I am hoping better neural nets come out that can deal with things like rotation and flipping better.

evazion

about 1 year ago

albert said:

It's a neat idea and I'm glad projects like this exist, but it just doesn't match the needs of most uploaders.

I disagree. This is usually able to correctly identify 10-15 tags per upload. That's not bad at all. Even when you know everything it gives you, having half your tags handed to you is very convenient. It saves you the trouble of manually typing every tag out, then running everything through related tags to make sure you didn't forget anything.

It's not the case that it only finds very common tags either. It's often able to find surprisingly specific tags:

Tag suggestions aren't the only use case for this. It's extremely useful for tag gardening. It makes finding missing tags a lot easier:

favgroup:2648 -glasses - uploads missing the glasses tag (>50% confidence)
favgroup:4168 -thighhighs - uploads missing the thighhighs tags (>50% confidence)

albert said:

The way users apply tags on Danbooru makes it not suitable for ML training applications like this because tags will be used even if they only describe a small percentage of the image, meaning the data set is very noisy.

I'm not sure how true this is. Some tags are noisier than others, but the difficult tags aren't necessarily the ones you would expect. This is pretty good at tagging eye colors, for example, even though eyes are a very small feature and color tagging is noisy. On the other hand, it's bad at white background, even though this is a big, easy to recognize feature with fairly consistent tagging.

albert said:

I am hoping better neural nets come out that can deal with things like rotation and flipping better.

I'd think rotation and flipping shouldn't be an issue if you rotate and flip images during training, which is normal practice anyway to augment the training set.

I'd be very interested to know more about the approach you used when you tried this, to get a better idea of how it compares with the OP.

CodeKyuubi

about 1 year ago

evazion said:

I personally feel like it's slower that way, in the same way when you tag garden you not only have to look for what tags are not present, you have to scan for incorrect tags as well, I can tag out an image faster from scratch than it takes to correct a list of tags given to me.

ElectricSheep

about 1 year ago

As an occasional uploader even basic tagging assistance can be useful. E.g. I might miss trivial things or recognize the franchise but not the character name.