The Watcher

	
# I am the Watcher. I am your guide through this vast new twtiverse.
# 
# Usage:
#     https://watcher.sour.is/api/plain/users              View list of users and latest twt date.
#     https://watcher.sour.is/api/plain/twt                View all twts.
#     https://watcher.sour.is/api/plain/mentions?uri=:uri  View all mentions for uri.
#     https://watcher.sour.is/api/plain/conv/:hash         View all twts for a conversation subject.
# 
# Options:
#     uri     Filter to show a specific users twts.
#     offset  Start index for quey.
#     limit   Count of items to return (going back in time).
# 
# twt range = 1 20
# self = https://watcher.sour.is/conv/4fmwoaq

bmallred

staystrong.run

12 Feb 25 14:37 UTC

reviewing logs this morning and found i have been spammed hard by bots not respecting the robots.txt file. only noticed it because the OpenAI bot was hitting me with a lot of nonsensical requests. here is the list from last month:

- (810) bingbot
- (641) Googlebot
- (624) http://www.google.com/bot.html
- (545) DotBot
- (290) GPTBot
- (106) SemrushBot
- (84) AhrefsBot
- (62) MJ12bot
- (60) BLEXBot
- (55) wpbot
- (37) Amazonbot
- (28) YandexBot
- (22) ClaudeBot
- (19) AwarioBot
- (14) https://domainsbot.com/pandalytics
- (9) https://serpstatbot.com
- (6) t3versionsBot
- (6) archive.org_bot
- (6) Applebot
- (5) http://search.msn.com/msnbot.htm
- (4) http://www.googlebot.com/bot.html
- (4) Googlebot-Mobile
- (4) DuckDuckGo-Favicons-Bot
- (3) https://turnitin.com/robot/crawlerinfo.html
- (3) YandexNews
- (3) ImagesiftBot
- (2) Qwantify-prod
- (1) http://www.google.com/adsbot.html
- (1) http://gais.cs.ccu.edu.tw/robot.php
- (1) YaK
- (1) WBSearchBot
- (1) DataForSeoBot

i have placed some middleware to reject these for now but it is not a full proof solution.

bmallred

staystrong.run

12 Feb 25 14:37 UTC

reviewing logs this morning and found i have been spammed hard by bots not respecting the robots.txt file. only noticed it because the OpenAI bot was hitting me with a lot of nonsensical requests. here is the list from last month:

- (810) bingbot
- (641) Googlebot
- (624) http://www.google.com/bot.html
- (545) DotBot
- (290) GPTBot
- (106) SemrushBot
- (84) AhrefsBot
- (62) MJ12bot
- (60) BLEXBot
- (55) wpbot
- (37) Amazonbot
- (28) YandexBot
- (22) ClaudeBot
- (19) AwarioBot
- (14) https://domainsbot.com/pandalytics
- (9) https://serpstatbot.com
- (6) t3versionsBot
- (6) archive.org_bot
- (6) Applebot
- (5) http://search.msn.com/msnbot.htm
- (4) http://www.googlebot.com/bot.html
- (4) Googlebot-Mobile
- (4) DuckDuckGo-Favicons-Bot
- (3) https://turnitin.com/robot/crawlerinfo.html
- (3) YandexNews
- (3) ImagesiftBot
- (2) Qwantify-prod
- (1) http://www.google.com/adsbot.html
- (1) http://gais.cs.ccu.edu.tw/robot.php
- (1) YaK
- (1) WBSearchBot
- (1) DataForSeoBot

i have placed some middleware to reject these for now but it is not a full proof solution.

prologic

twtxt.net

12 Feb 25 14:41 UTC

@bmallred Similar story here 😱

prologic

twtxt.net

12 Feb 25 14:41 UTC

@bmallred Similar story here 😱

andros

twtxt.andros.dev

12 Feb 25 16:26 UTC+0100

Reddit has been complaining about this for years. I am sorry!

andros

twtxt.andros.dev

12 Feb 25 16:26 UTC+0100

Reddit has been complaining about this for years. I am sorry!

andros

twtxt.andros.dev

12 Feb 25 16:26 UTC+0100

Reddit has been complaining about this for years. I am sorry!

lyse

lyse.isobeef.org

12 Feb 25 20:00 UTC+0100

@bmallred Surprisingly, my

User-agent: *
Disallow: /

seems to work. Or maybe those bastards change their user agent and claim to be someone nice. In any case, I just added a bunch of

location = /robots.txt {
add_header Content-Type text/plain;
return 200 "User-agent: *\nDisallow: /\n";
}

in my nginx config. No need for any bot to visit, crawl and index most of my sites.=

lyse

lyse.isobeef.org

12 Feb 25 20:00 UTC+0100

@bmallred Surprisingly, my

User-agent: *
Disallow: /

seems to work. Or maybe those bastards change their user agent and claim to be someone nice. In any case, I just added a bunch of

location = /robots.txt {
add_header Content-Type text/plain;
return 200 "User-agent: *\nDisallow: /\n";
}

in my nginx config. No need for any bot to visit, crawl and index most of my sites.=

lyse

lyse.isobeef.org

12 Feb 25 20:00 UTC+0100

@bmallred Surprisingly, my

User-agent: *
Disallow: /

seems to work. Or maybe those bastards change their user agent and claim to be someone nice. In any case, I just added a bunch of

location = /robots.txt {
add_header Content-Type text/plain;
return 200 "User-agent: *\\nDisallow: /\\n";
}

in my nginx config. No need for any bot to visit, crawl and index most of my sites.=

movq

www.uninformativ.de

12 Feb 25 19:18 UTC+0000

(I keep thinking that going back go Gopher or Gemini might be a good idea at this point. They don’t care about that, probably. 🫣)

movq

www.uninformativ.de

12 Feb 25 19:18 UTC

(I keep thinking that going back go Gopher or Gemini might be a good idea at this point. They don’t care about that, probably. 🫣)

movq

www.uninformativ.de

12 Feb 25 19:18 UTC+0000

(I keep thinking that going back go Gopher or Gemini might be a good idea at this point. They don’t care about that, probably. 🫣)

movq

www.uninformativ.de

12 Feb 25 19:18 UTC+0000

(I keep thinking that going back go Gopher or Gemini might be a good idea at this point. They don’t care about that, probably. 🫣)

bmallred

nahongvita.run

12 Feb 25 22:30 UTC

@lyse yeah, i have the following as well:


User-agent: *
Disallow: /

now i some middleware that looks at the header, and if they are polite enough to include "bot" in the user agent, they politely get a 404 response.

bmallred

staystrong.run

12 Feb 25 22:30 UTC

@lyse yeah, i have the following as well:


User-agent: *
Disallow: /

now i some middleware that looks at the header, and if they are polite enough to include "bot" in the user agent, they politely get a 404 response.

bmallred

staystrong.run

12 Feb 25 22:30 UTC

@lyse yeah, i have the following as well:


User-agent: *
Disallow: /

now i some middleware that looks at the header, and if they are polite enough to include "bot" in the user agent, they politely get a 404 response.

bmallred

staystrong.run

12 Feb 25 22:56 UTC

@bmallred i really need to sit down and add some rate limiting to be honest.

bmallred

nahongvita.run

12 Feb 25 22:56 UTC

@bmallred i really need to sit down and add some rate limiting to be honest.

bmallred

staystrong.run

12 Feb 25 22:56 UTC

@bmallred i really need to sit down and add some rate limiting to be honest.