Danbooru

DeepDanbooru: a prototype NN tagger

Posted under General

I've been playing with this for the past week and so far, the results are promising. Here are some performance numbers.

To start with, I ran all posts in the range id:3368754..3402356 through the model. This corresponds to all uploads in the month of January 2019 (date:2019-01-01..2019-02-01), which is outside the training set. This amounts to 33k posts with 1.28m total tags. 1.08m of these tags are predictable - the model was trained on them and can potentially recognize them (I spoke with the author and he shared some of the code, including the list of trained tags).

This is a breakdown of the overall performance at different confidence levels.

So for example, at >=50% confidence, the model predicted 557k tags. 399k of these tags were correct, giving a precision of 72%. However, there were 1.08m tags in total it could have found, so the recall is only 37%.

(See [1] and [2] for background on precision and recall. In short, precision tells you how accurate it was (how many guesses it got right) and recall tells you how many tags it actually found.)

As you would expect, as you increase the confidence level, the precision goes up but the recall goes down (it gets more accurate but finds fewer tags). The F score gives the sweet spot where the precision and recall are both the highest. So overall, performance is best at >50% confidence, and we could possibly do even better if the confidence were lowered even more.

Here's a breakdown of how well it's able to recognize individual tags. This is at >=50% confidence.

  • "Actual posts" is how posts actually had the given tag.
  • "Predicted posts" is how many posts it predicted should have the tag.
  • "Correct predictions" is how many of those predictions it actually got right (true positives).
  • "Actual frequency" is how often the tag actually appeared on uploads, while "Predicted frequency" is how often it predicted the tag should appear. If things are working correctly, a tag's predicted frequency should be close to its actual frequency.

So for example, 1girl was actually tagged on 24.5k posts (73% of uploads), while we predicted it should be tagged on 25.1k posts (75% of uploads). 23.3k of those predictions were right, for a precision of 93% and a recall of 95%.

Here are the top 100 best and worst tags. Some observations:

  • But it also does very well on 1girl and solo, which is surprising considering the number of difficult corner cases these tags can have.
  • The worst tags tend to be very common features that are frequently tagged on new uploads, but not on old uploads. This can happen if a tag is only used by certain power uploaders (but not by other users in general), or if it only recently came into widespread use. For example, eyebrows visible through hair is used on 36% of all uploads from this year, but on nearly 0% of uploads before 2016. Training tags like this is probably very difficult given the huge number of false negatives they will have.

It should be emphasized that all precision and recall values here are underestimates. Many predictions are currently counted as wrong due to missing tags, but a tag not being present on a post doesn't necessarily mean the post shouldn't have the tag. So as missing tags are added, these numbers will improve.

Overall, while there's still ample room for improvement, this is already good enough for many purposes, including suggesting tags during editing and finding missing tags.

That's interesting to hear. How difficult would it be to implement something like 'suggesting tags during editing/uploads' on live Danbooru?

I implemented something similar previously but in practice I didn't find the tags it returned to be useful. It was incredibly obvious common tags like 1girl or monochrome. It could, for example, identify popular FGO characters, but for frequent uploaders those are some of the least important things that need to be identified. It's kind of a catch 22: an autotagger would be most useful for lesser known tags, but an autotagger would never learn them because there aren't enough examples of them.

The way users apply tags on Danbooru makes it not suitable for ML training applications like this because tags will be used even if they only describe a small percentage of the image, meaning the data set is very noisy. There's also a heavy bias towards new shows, games and characters which will have small training sets and will probably not get identified unless they get so popular that everyone knows about them.

It's a neat idea and I'm glad projects like this exist, but it just doesn't match the needs of most uploaders.

I am hoping better neural nets come out that can deal with things like rotation and flipping better.

albert said:

It's a neat idea and I'm glad projects like this exist, but it just doesn't match the needs of most uploaders.

I disagree. This is usually able to correctly identify 10-15 tags per upload. That's not bad at all. Even when you know everything it gives you, having half your tags handed to you is very convenient. It saves you the trouble of manually typing every tag out, then running everything through related tags to make sure you didn't forget anything.

It's not the case that it only finds very common tags either. It's often able to find surprisingly specific tags:

Tag suggestions aren't the only use case for this. It's extremely useful for tag gardening. It makes finding missing tags a lot easier:

albert said:

The way users apply tags on Danbooru makes it not suitable for ML training applications like this because tags will be used even if they only describe a small percentage of the image, meaning the data set is very noisy.

I'm not sure how true this is. Some tags are noisier than others, but the difficult tags aren't necessarily the ones you would expect. This is pretty good at tagging eye colors, for example, even though eyes are a very small feature and color tagging is noisy. On the other hand, it's bad at white background, even though this is a big, easy to recognize feature with fairly consistent tagging.

albert said:

I am hoping better neural nets come out that can deal with things like rotation and flipping better.

I'd think rotation and flipping shouldn't be an issue if you rotate and flip images during training, which is normal practice anyway to augment the training set.

I'd be very interested to know more about the approach you used when you tried this, to get a better idea of how it compares with the OP.

evazion said:

I personally feel like it's slower that way, in the same way when you tag garden you not only have to look for what tags are not present, you have to scan for incorrect tags as well, I can tag out an image faster from scratch than it takes to correct a list of tags given to me.

As an occasional uploader even basic tagging assistance can be useful. E.g. I might miss trivial things or recognize the franchise but not the character name.

  • 1