Janitor Quiz

Posted under General

There seem to be quite a LOT of duplicates, even with the hashchecks. I've uploaded a few and seen a whole lot more.

A quick search of some of the more popular artists or even the duplicate tag reveals exact duplicates, mirrored duplicates, lower res duplicates, etc, etc, etc.

My point is that there are a whole TON of duplicates. Janitors should have the power/task to remove them.

They do have the power to remove them, but it's destructive and shouldn't be done without reason. And as Log said, an exact duplicate can't happen, even if the two images are visually similar. While we don't condone uploading duplicates, it is useful to keep them to link to the other versions, and it's not entirely fair to penalize the uploader with a deletion. Especially if the inferior version was uploaded prior to the better version when no duplicate existed.

DschingisKhan said: Is a "janitor" all about approving things, then? I've been thinking lately...One thing I figure a janitor is at least able to do is use that mass-tagger thing to make desirable changes to large numbers of posts quickly (Not sure how it works exactly, but it's certainly an alluring title).

Mass edit isn't used quite as often as you might think. Most of the time I use it, it's for artists. Generally, most of the mass editing happens via implications and aliases. Either way, if we wanted someone to be able to do more than a janitor is allowed, we'd just have them be a mod. ::shrugs::

Oh, it also just came to mind that Albert's preparing for tests of his Pixiv spider

Yeah... about that. I hope we get a LOT of details on what he has panned with that, because what little I know about it, I don't see as a good thing.

Shinjidude said: While we don't condone uploading duplicates, it is useful to keep them to link to the other versions, and it's not entirely fair to penalize the uploader with a deletion. Especially if the inferior version was uploaded prior to the better version when no duplicate existed.

And the only way not to penalize the uploader is to delete it twice (this removes it from the user's list of deleted posts), thus removing it from the system completely... ...which leaves it open to being uploaded again. Not really ideal.

Shinjidude said:
Especially if the inferior version was uploaded prior to the better version when no duplicate existed.

Wouldn't the inferior version be deleted, even if it was earlier then?

jxh2154 said:
And the only way not to penalize the uploader is to delete it twice (this removes it from the user's list of deleted posts), thus removing it from the system completely... ...which leaves it open to being uploaded again. Not really ideal.

I get thumbnails of deleted posts all the time whenever I click "find similar." Can't those thumbnails be saved in a database whenever danbooru compares the image uploaded to something it already has?

Shinjidude said:
And as Log said, an exact duplicate can't happen, even if the two images are visually similar.

post #310096
post #309872

Seem to be exact duplicates, with maybe a pixel of difference.

I'm not trying to be difficult, just trying to understand exactly why things work the way they do.

EDIT: Are deleted posts really that big of a penalty? Cleanliness is better than having a perfect score.

Say someone uploaded an image they scanned themselves, then someone uploaded a higher quality scan. Should the first person be penalized for it?

Its my understanding one of the main determining factors on whether or not someone gains contributor status is number of deleted images, which makes sense, since these are the people that bypass the queue.

On the note of "exact" duplicates, think of the image as a txt file, the computer reads the file, and won't allow that same file to be uploaded. Change one letter, and the computer doesn't recognize it anymore, allowing a near copy to bu uploaded.

Suiseiseki said:
Say someone uploaded an image they scanned themselves, then someone uploaded a higher quality scan. Should the first person be penalized for it?

Its my understanding one of the main determining factors on whether or not someone gains contributor status is number of deleted images, which makes sense, since these are the people that bypass the queue.

On the note of "exact" duplicates, think of the image as a txt file, the computer reads the file, and won't allow that same file to be uploaded. Change one letter, and the computer doesn't recognize it anymore, allowing a near copy to bu uploaded.

I understand about the way duplicates are found and how hashes are calculated. But my point is shouldn't those "exact" duplicates be eliminated?

I understand the "penalty" of deleted posts, but duplicate posts can easily be chceked using the find similar button. Everyone should be using that button anyway to ensure that there are no duplicates. Isn't it someone's fault if they dont take the extra two seconds and check out if they have a duplicate image?

The thing I have an issue with is deleted posts = penalty. Just because someone has a few of deleted posts due to duplicates doesn't make them an idiot (except if they have a ton or if they have other reasons for deletion).

I'm going to stop arguing now, thanks for the discussion.

Granola said: I get thumbnails of deleted posts all the time whenever I click "find similar."

There's "deleted" and then there's "REALLY DELETED". Delete once and the image doesn't actually go anywhere. You can still find it. Go back and delete it again and it's actually gone, for real. At least that's how it used to work, and I never heard anything to indicate that changed.

post #310096
post #309872
Seem to be exact duplicates, with maybe a pixel of difference.

tab back and forth between them - the difference is actually quite apparent. Look at the colors and sharpness around lines and such.

But, er, yeah this is getting really off topic.

Since I've already been demodded, presumably for not being active enough (although I don't recall ever receiving an actual explanation), there doesn't seem much point in applying.

Suiseiseki said:
Say someone uploaded an image they scanned themselves, then someone uploaded a higher quality scan. Should the first person be penalized for it?

That is the minitokyo way.

Granola said:
I understand about the way duplicates are found and how hashes are calculated. But my point is shouldn't those "exact" duplicates be eliminated?

They can't be exact duplicates down to the hash. If they are exact enought that the human eye cannot distinguish, which do you delete?

I understand the "penalty" of deleted posts, but duplicate posts can easily be chceked using the find similar button. Everyone should be using that button anyway to ensure that there are no duplicates. Isn't it someone's fault if they dont take the extra two seconds and check out if they have a duplicate image?

Going back to minitokyo again.. the number of times I had a message indicating that an upload has been deleted as a duplicate with a snotty message about checking (through perhaps hundeds of images) first - when mine was a personal scan that had been there for years and the "better" version was a recent upload someone found on another site...

Updated by wanchan

Log said:
These are virtually all things anyone can do.

Right, but what of it?
To clarify, ability is not impetus: the motivation of the average free-willed user to do "extra work" (e.g. beyond the assumed random tagging and uploading) is subject to inertia in much the same way as a large boulder or a donkey, else we'd have millions of open source devs rather than tens of thousands. The "average user" makes a decent ass, but a poor tool to be wielded for the benefit of the masses.

Flowery language and rather nonsensical comparisons aside, there's no reason to equip people with approval rights just so that they can fix tags. Active janitorial work on tags is already one of the criteria for promotion, so there is incentive in place already. We don't need to nominate people to do it.

1 2