Danbooru

Uploading visually identical Twitter/Pixiv images

Posted under General

A number of uploaders have gotten into the problematic habit of uploading images that are visually identical to existing posts, the majority coming from Twitter and Pixiv. These duplicates are usually uploaded with the mistaken mentality that "the Pixiv version is always better" - this may have been true in the past, but it is no longer the case today. There are probably users who know this and upload these duplicates anyway just to have the upload tied to their account, but I like to imagine that for the most part they simply don't know any better.

The problem has become somewhat worse as of late with Twitter updating its image recompression algorithm, leaving less and less Twitter images that are actually inferior to their Pixiv counterparts. In early 2019, the algorithm was updated to avoid recompressing PNGs that measured ≤900 in either direction, PNGs with a small enough filesize before being compressed and PNG8 images. More recently, it changed to stop recompressing JPGs (with some exceptions.

There have been a number of discussions over the years discussing various aspects of the problem, but a decision as to whether we should actively discourage these uploads was never reached. I believe that the uploading of these visually identical posts serves no purpose other than adding meaningless bloat to the site, and as such we should add rules against it.

There are probably users who know this and upload these duplicates anyway just to have the upload tied to their account

I believe topic #14857 should have been able to partially solve this issue, even if you can still show the name of the uploader with userscripts.

About the whole subject, I believe that it is indeed useless to have twice the same (or very similar) picture. However, having two different sources could be useful (for example to solve a bad_id).

We have already discussed about that and while I understand that it is easier to request a feature when you are not the one building it, I think a solution would be to allow multiple sources for each post. That would imply a modification of the database architecture though.

Indeed, another one would be to discourage the additional uploads and putting the additional sources in comments. But then, we cannot do anything for the already existing duplicates.

Rignak said:

I believe topic #14857 should have been able to partially solve this issue, even if you can still show the name of the uploader with userscripts.

About the whole subject, I believe that it is indeed useless to have twice the same (or very similar) picture. However, having two different sources could be useful (for example to solve a bad_id).

We have already discussed about that and while I understand that it is easier to request a feature when you are not the one building it, I think a solution would be to allow multiple sources for each post. That would imply a modification of the database architecture though.

Indeed, another one would be to discourage the additional uploads and putting the additional sources in comments. But then, we cannot do anything for the already existing duplicates.

That thread didn't solve anything, people still snipe and 1up each other for 10kbs of difference.
The issue is that as Zapdos said it used to be that twitter versions were much inferior to the rest, but now is not the case anymore. This however only applies to new posts: all old twitter pictures are still inferior. So basically, there has to be discretion applied by the uploader, because a difference of 200 kb means a lot of artifacting, while 10kb are often not even visible with image difference software.
The problem is that some prolific uploaders don't care to even check for this, so they just upload anything from pixiv if what's on danbooru is from twitter. There's no way to discourage this automatically, the only way is to tell these people to stop doing it, but we need an official stance from the admins for that.
It should be noted though that most of these useless duplicates are uploaded by few specific uploaders, so the best way to handle this for now would be to just tell those people to check for their uploads better.

Also, sources are for direct md5 matches, so they don't apply to this case. If a source doesn't point to the specific file we have on danbooru, it shouldn't be added to the source field. Almost never do twitter and pixiv posts have the same md5, because twitter always recompresses pictures, even if there's no meaningful difference to the human eye.

Frankly, I much prefer Pixiv as a source than Twitter. With a Pixiv source, you can easily find more artwork from the same artist that hasn't been uploaded to Danbooru in their gallery. Twitter is much more cumbersome to browse as everything is listed linearly with much bigger thumbnails and thus you have to scroll for ages to see everything, not to mention that the artworks are buried amongst non-art tweets.

If it was up to me -and this is an extreme case- posts with Twitter sources would be removed when a Pixiv alternative is posted. That has nothing to do with image compression. Posts from Pixiv are superior to posts from twitter simply because they link to the source on Pixiv rather than Twitter. Twitter is simply a terrible place to post art in my opinion.

So, in Claverhouse's style:

Disagreed -1

I disagree with the idea that Pixiv uploads are automatically superior in that regard; many artists will just dump a big batch of images on Pixiv that they uploaded to Twitter over the last month and you end up losing any commentary that came with the original twitter upload. As it is I frequently end up having to wade through an artist's twitter to try to understand a picture that lost its context on Pixiv. Even when they upload individually and retain the commentary sometimes I like to check the twitter upload for its comments as it usually has more actual discussion of the image rather than stamp spam on Pixiv.

You can always just go to the artist entry to get their pixiv anyway, and it'll almost always be linked in their twitter profile as well. The fact that it's harder to find a specific image on twitter means a direct source link there is actually more valuable.

Updated

Kapten-N said:

Frankly, I much prefer Pixiv as a source than Twitter. With a Pixiv source, you can easily find more artwork from the same artist that hasn't been uploaded to Danbooru in their gallery. Twitter is much more cumbersome to browse as everything is listed linearly with much bigger thumbnails and thus you have to scroll for ages to see everything, not to mention that the artworks are buried amongst non-art tweets.

If it was up to me -and this is an extreme case- posts with Twitter sources would be removed when a Pixiv alternative is posted. That has nothing to do with image compression. Posts from Pixiv are superior to posts from twitter simply because they link to the source on Pixiv rather than Twitter. Twitter is simply a terrible place to post art in my opinion.

So, in Claverhouse's style:

Disagreed -1

This scenario works for you if you have a choice of uploading from Twitter or Pixiv. I'm talking about scenarios where the Twitter version was already uploaded and is a pixel-perfect match of the Pixiv version (or, in rare cases, vice versa), which some users feel the need to upload anyway. It doesn't really matter whether users prefer to browse Pixiv over Twitter; this is about making it better to browse Danbooru.

Kapten-N said:

Frankly, I much prefer Pixiv as a source than Twitter. With a Pixiv source, you can easily find more artwork from the same artist that hasn't been uploaded to Danbooru in their gallery. Twitter is much more cumbersome to browse as everything is listed linearly with much bigger thumbnails and thus you have to scroll for ages to see everything, not to mention that the artworks are buried amongst non-art tweets.

I don't disagree with you here, but this is exactly what artist wiki pages are for.

Kapten-N said:

If it was up to me -and this is an extreme case- posts with Twitter sources would be removed when a Pixiv alternative is posted.

This is similar to how such functionality might be implemented, but with some important distinctions. When a better source is found, rather than deleting the inferior-quality post, either:
1. the inferior would be kept as an alternate version as part of the same post as the superior.
2. the source url and image data of the superior would replace those of the inferior.

Both allow the comments, favorites, score, etc of the original to be preserved. Additionally, any tags the would-be uploader tried to tag the duplicate with could be merged with those on the existing post. I could be wrong, but I've gotten the impression that the only reason these changes weren't implemented long ago is that it would prevent bad-faith uploaders from getting credit for uploading things they neither found on their own nor went to the trouble of tagging nor took the risk of having one of their uploads deleted (thus blemishing their record). Of course, from the viewpoint of people who merely browse the site this seems like a god-awful reason not to do it and if anything would be a positive. However, uploaders (even very childish ones) are in heavy demand. There are at least several hundred thousand images available on the internet which would be approved if they were uploaded, but there are only so many people who take the effort to upload in the first place.

Question is, when is the filesize difference significant?
100kB, 80kB, 60kB...?

Lacrimosa said:

Question is, when is the filesize difference significant?
100kB, 80kB, 60kB...?

It basically has to be decided on a case by case basis. If the twitter version is 50kb and the pixiv one is 100kb it's only 50kb of difference but it's going to make a world of difference due to artifacts. But those 50kb become irrelevant if both pictures are in the 3MB range.

If it was up to me -and this is an extreme case- posts with Twitter sources would be removed when a Pixiv alternative is posted. That has nothing to do with image compression. Posts from Pixiv are superior to posts from twitter simply because they link to the source on Pixiv rather than Twitter. Twitter is simply a terrible place to post art in my opinion.

I agree a lot with pixiv being the superior source, but couldn't one just check the artist page to find out their pixiv account? Their remaining artworks could be uploaded that way, without reuploading the ones already on danbooru.

(Oh wait, this was already said.)

  • 1