Sometimes they'll probably still be a bad link and I don't know if there's an easy backend way to do it, but the manual way is just opening up the link.
If nothing else this can at least help resolve some post sources for someone bored enough to tackle it manually/code a way to do it (source:*i.redd.it*). Thought I'd throw it out there because I have too many other plates to juggle on Danbooru r/n (self-imposed).
I wasn't bored enough but you nerd-sniped me. I slapped a script together to update our existing posts. Along with i.redd.it, preview.redd.it can be done the same way. I also noticed that recent tumblr media links can be resolved a similar way, so I did those too, but unfortunately, the vast majority of our tumblr media links are the older style which don't provide post links.
Edit: At first I thought the tumblr media links with more stuff in the path were the ones that could be resolved, but there were 8 tumblr posts with shorter links that were still resolved on the media page. I don't know what Tumblr's backend uses to do the mapping, but it seems limited to more recent posts. A funny thing happened: post #2063620 and post #107342 resolved even though they're old. Turns out their Tumblr source posts are newer than their Danbooru posts! The sources were set long after upload, and they must have come from somewhere else originally. However, the tumblr posts are still 6+ years old, which is a lot more than the others that resolved. Could it be that tumblr saves the reverse mapping when posts are viewed on their end?
All together 393 post sources fixed. (There were 5 where the reddit source post was found but the danbooru post was banned so I couldn't an approver had to update it. Thanks ANON TOKYO.) 105 from reddit did not have posts shown on the media page, maybe bad_id at the source. For tumblr, most of the recent posts resolved but almost nothing past a year or so. There are still ~2.4k older tumblr images that don't resolve this way.
Of the posts updated, 45 had source_request, 29 had artist_request, and 31 lacked an artist tag but didn't have artist_request either. I didn't auto-remove the source_request tags since some could be third-party sources that can be simplified.
I didn't see an API for this on the reddit side so I scraped the media pages, but as scraping goes, this was not all that bad. Scripts below but I put zero effort into making them portable or understandable.
reddit script
#!/usr/bin/env python3
import requests
from urllib.parse import quote
import bs4, html5lib
import danbooru
import json
import itertools
with open('auth.json', 'r') as f:
auth = json.load(f)
danbooru.set_auth(auth['user'], auth['api_key'])
with requests.Session() as reddit_session, open('danbooru_reddit_source_resolve.log', 'a') as log_file:
def log(danbooru_post, status, details=None):
json.dump({'danbooru_post_id': danbooru_post['id'], 'status': status, 'details': details}, log_file)
log_file.write('\n')
reddit_bad_link_posts = itertools.chain.from_iterable(
danbooru.get_posts('source:'+source_pattern, 'status:any')
for source_pattern in ['*i.redd.it/*', '*preview.redd.it/*']
)
for danbooru_post in reddit_bad_link_posts:
image_url = danbooru_post['source']
media_url = 'https://www.reddit.com/media?url=' + quote(image_url, safe='')
r = reddit_session.get(media_url)
r.raise_for_status()
page = bs4.BeautifulSoup(r.content, 'html5lib')
post_bottom_bar = page.select_one('post-bottom-bar')
if post_bottom_bar is None:
log(danbooru_post, 'no reddit post on media page')
continue
post_url = post_bottom_bar['permalink']
if not post_url.startswith('/r/'):
raise RuntimeError('unexpected post url: ' + post_url)
post_url = 'https://www.reddit.com' + post_url
if danbooru_post['is_banned']:
log(danbooru_post, 'banned, cannot update', {'old_source': image_url, 'new_source': post_url})
continue
danbooru_tags = danbooru_post['tag_string'].split(' ')
danbooru_patch_body = {
'post[old_source]': image_url,
'post[source]': post_url,
}
try:
danbooru_tags.remove('bad_link')
except ValueError:
removed_bad_link_tag = False
else:
removed_bad_link_tag = True
danbooru_patch_body.update({
'post[old_tag_string]': danbooru_post['tag_string'],
'post[tag_string]': ' '.join(danbooru_tags),
})
# Update the post on Danbooru
danbooru.api_patch('posts/{}'.format(danbooru_post['id']), danbooru_patch_body)
log(danbooru_post, 'updated', {
'old_source': image_url,
'new_source': post_url,
'removed_bad_link_tag': removed_bad_link_tag,
'tag_count_artist': danbooru_post['tag_count_artist'],
'source_request': 'source_request' in danbooru_tags,
'artist_request': 'artist_request' in danbooru_tags,
})
tumblr script
#!/usr/bin/env python3
import requests
from urllib.parse import quote
import bs4, html5lib
import danbooru
import json
import itertools
with open('auth.json', 'r') as f:
auth = json.load(f)
danbooru.set_auth(auth['user'], auth['api_key'])
with requests.Session() as tumblr_session, open('danbooru_reddit_source_resolve.log', 'a') as log_file:
def log(danbooru_post, status, details=None):
json.dump({'danbooru_post_id': danbooru_post['id'], 'status': status, 'details': details}, log_file)
log_file.write('\n')
tumblr_bad_link_posts = itertools.chain.from_iterable(
danbooru.get_posts('source:'+source_pattern, 'status:any')
for source_pattern in ['*.media.tumblr.com/*/*/*']
)
for danbooru_post in tumblr_bad_link_posts:
image_url = danbooru_post['source']
r = tumblr_session.get(image_url, headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:134.0) Gecko/20100101 Firefox/134.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
})
r.raise_for_status()
page = bs4.BeautifulSoup(r.content, 'html5lib')
data_script = page.select_one('script[id=___INITIAL_STATE___]')
data = json.loads(data_script.string)
post_data = data['ImageUrlPage'].get('post')
if post_data is None:
log(danbooru_post, 'no tumblr post on media page')
continue
post_url = post_data['postUrl']
if danbooru_post['is_banned']:
log(danbooru_post, 'banned, cannot update', {'old_source': image_url, 'new_source': post_url})
continue
danbooru_tags = danbooru_post['tag_string'].split(' ')
danbooru_patch_body = {
'post[old_source]': image_url,
'post[source]': post_url,
}
try:
danbooru_tags.remove('bad_link')
except ValueError:
removed_bad_link_tag = False
else:
removed_bad_link_tag = True
danbooru_patch_body.update({
'post[old_tag_string]': danbooru_post['tag_string'],
'post[tag_string]': ' '.join(danbooru_tags),
})
# Update the post on Danbooru
danbooru.api_patch('posts/{}'.format(danbooru_post['id']), danbooru_patch_body)
log(danbooru_post, 'updated', {
'old_source': image_url,
'new_source': post_url,
'removed_bad_link_tag': removed_bad_link_tag,
'tag_count_artist': danbooru_post['tag_count_artist'],
'source_request': 'source_request' in danbooru_tags,
'artist_request': 'artist_request' in danbooru_tags,
})
danbooru.py (needed by both scripts, code I just had sitting around)
import requests, requests.auth
session = requests.Session()
def set_auth(username, api_key):
session.auth = requests.auth.HTTPBasicAuth(username, api_key)
def api_whatever(method, thing, *args, **kw):
response = method('https://danbooru.donmai.us/{}.json'.format(thing), *args, **kw)
response.raise_for_status()
return response.json()
def api_get(thing, params=None):
return api_whatever(session.get, thing, params=params)
def api_patch(thing, body):
return api_whatever(session.patch, thing, data=body)
def api_post(thing, body):
return api_whatever(session.post, thing, data=body)
def get_posts(*tags, after=None):
params = {
'tags': ' '.join(tags),
'limit': 100
}
if after is not None:
params['page'] = 'a{}'.format(after)
for page in iter(lambda: api_get('posts', params), []):
yield from page
if after is not None:
params['page'] = 'a{}'.format(max(post['id'] for post in page))
else:
params['page'] = 'b{}'.format(min(post['id'] for post in page))