# I am the Watcher. I am your guide through this vast new twtiverse.
# 
# Usage:
#     https://watcher.sour.is/api/plain/users              View list of users and latest twt date.
#     https://watcher.sour.is/api/plain/twt                View all twts.
#     https://watcher.sour.is/api/plain/mentions?uri=:uri  View all mentions for uri.
#     https://watcher.sour.is/api/plain/conv/:hash         View all twts for a conversation subject.
# 
# Options:
#     uri     Filter to show a specific users twts.
#     offset  Start index for quey.
#     limit   Count of items to return (going back in time).
# 
# twt range = 1 196238
# self = https://watcher.sour.is?offset=179583
# next = https://watcher.sour.is?offset=179683
# prev = https://watcher.sour.is?offset=179483
@movq It's crazy! I thought about it the other day on my hike. There are so many shady areas in winter that are fully blasted by the sun in summer.
@movq Heck yeah, they're both very lovely! I like how you can still see the full disk through the clouds in the first one.
@kat Oh cool, I wish I had a similar subject in school. :-)
I cobbled that together yesterday, @aelaraji. Since I was too lazy to write some tests, I simply hit your feed as I knew it contains two invalid lines right now. Sorry mate! :-( Next thing is to actually write some proper tests, improve the messages, etc.

Here's the code: https://git.mills.io/yarnsocial/validator

Looking forward to that, @prologic. :-)
“We are...so far removed from the realities of production and work that we inhabit a dream world of artificial stimuli and televised experience.” 📀💩 Добре дошли в пост-дигиталното бъдеще
[47°09′10″S, 126°43′48″W] Automatic systems disengaged due to thunderstorm
For the time being... I've just blocked all of OpenAI(s) Bots. They (_thankfully_) publish a JSON endpoint that you can use to block all OpenAI crawlers from reaching your server (_in my case, blocking it at the edge_). Example:


proxy-1:~# curl -qs https://openai.com/gptbot.json | jq -r '.prefixes[].ipv4Prefix' | xargs -I{} ./block-ip.sh {}


Where block-ip.sh is simply:


#!/bin/sh

ufw insert 1 deny from "$1" to any
For the time being... I've just blocked all of OpenAI(s) Bots. They (_thankfully_) publish a JSON endpoint that you can use to block all OpenAI crawlers from reaching your server (_in my case, blocking it at the edge_). Example:


proxy-1:~# curl -qs https://openai.com/gptbot.json | jq -r '.prefixes[].ipv4Prefix' | xargs -I{} ./block-ip.sh {}


Where block-ip.sh is simply:


#!/bin/sh

ufw insert 1 deny from "$1" to any
@aelaraji Yes! 👏 This is exactly what it is! 🤣 I will of course soon™ be hosting this service, likely at validator.twtxt.net 😅😅
@aelaraji Yes! 👏 This is exactly what it is! 🤣 I will of course soon™ be hosting this service, likely at validator.twtxt.net 😅😅
Any idea What's this "twtxtfeevalidator/0.0.1" UA about? I thought I could ask before throwing a 1000GB file at it 🪤 could it be the same 'xt' thing @lyse was talking about the other day?
Any idea What's this "twtxtfeevalidator/0.0.1" UA about? I thought I could ask before throwing a 1000GB file at it 🪤 could it be the same 'xt' thing @lyse was talking about the other day?
[47°09′07″S, 126°43′13″W] Weather forecast alert -- storm from N
@kat Haha 🤣 If someone figures this out, please let me know 🙏🙏 -- In the meantime, I'm going to very soon™ write a daemon that will watch the audit log for repeated violations and add to the network firewall.
@kat Haha 🤣 If someone figures this out, please let me know 🙏🙏 -- In the meantime, I'm going to very soon™ write a daemon that will watch the audit log for repeated violations and add to the network firewall.
This is better:


proxy-1:~# ./audit-log-by-ip.sh 4.227.36.76 | coraza-log-formatter -m -
2025/01/04 23:17:04 4.227.36.76 58982 GET /external?aff-HY0BLO=&f=mediaonly&f=noreplies&nick=g1n&uri=https%3A%2F%2Fthe-president-codes.linegames.org null 0  On OWASP_CRS/4.7.0
Actionset: OWASP_CRS/4.7.0
Message: Bad User Agent
Severity: 0
Raw: SecRule REQUEST_HEADERS:User-Agent "@pmFromFile /etc/caddy/waf/bad_user_agents.txt" "id:2000,log,phase:1,deny,msg:'Bad User Agent'"
This is better:


proxy-1:~# ./audit-log-by-ip.sh 4.227.36.76 | coraza-log-formatter -m -
2025/01/04 23:17:04 4.227.36.76 58982 GET /external?aff-HY0BLO=&f=mediaonly&f=noreplies&nick=g1n&uri=https%3A%2F%2Fthe-president-codes.linegames.org null 0  On OWASP_CRS/4.7.0
Actionset: OWASP_CRS/4.7.0
Message: Bad User Agent
Severity: 0
Raw: SecRule REQUEST_HEADERS:User-Agent "@pmFromFile /etc/caddy/waf/bad_user_agents.txt" "id:2000,log,phase:1,deny,msg:'Bad User Agent'"
Nice! I wrote another useful tool 👌


proxy-1:~# ./audit-log-by-ip.sh 4.227.36.76 | coraza-log-formatter -m -
Actionset: OWASP_CRS/4.7.0
Message: Bad User Agent
Severity: 0
Raw: SecRule REQUEST_HEADERS:User-Agent "@pmFromFile /etc/caddy/waf/bad_user_agents.txt" "id:2000,log,phase:1,deny,msg:'Bad User Agent'"
Nice! I wrote another useful tool 👌


proxy-1:~# ./audit-log-by-ip.sh 4.227.36.76 | coraza-log-formatter -m -
Actionset: OWASP_CRS/4.7.0
Message: Bad User Agent
Severity: 0
Raw: SecRule REQUEST_HEADERS:User-Agent "@pmFromFile /etc/caddy/waf/bad_user_agents.txt" "id:2000,log,phase:1,deny,msg:'Bad User Agent'"
@prologic we live in hell
@prologic we live in hell
went out with my family today, brought my camcorder, resulted in a little vlog :) https://memoria.sayitditto.net/view?m=SjbDq15bL
went out with my family today, brought my camcorder, resulted in a little vlog :) https://memoria.sayitditto.net/view?m=SjbDq15bL
How in da fuq do you _actually_ make these fucking useless AI bots go way?


proxy-1:~# jq '. | select(.request.remote_ip=="4.227.36.76")' /var/log/caddy/access/mills.io.log | jq -s '. | last' | caddy-log-formatter -
4.227.36.76 - [2025-01-05 04:05:43.971 +0000] "GET /external?aff-QNAXWV=&f=mediaonly&f=noreplies&nick=g1n&uri=https%3A%2F%2Fmy-hero-ultra-impact-codes.linegames.org HTTP/2.0" 0 0
proxy-1:~# date
Sun Jan  5 04:05:49 UTC 2025


😱
How in da fuq do you _actually_ make these fucking useless AI bots go way?


proxy-1:~# jq '. | select(.request.remote_ip=="4.227.36.76")' /var/log/caddy/access/mills.io.log | jq -s '. | last' | caddy-log-formatter -
4.227.36.76 - [2025-01-05 04:05:43.971 +0000] "GET /external?aff-QNAXWV=&f=mediaonly&f=noreplies&nick=g1n&uri=https%3A%2F%2Fmy-hero-ultra-impact-codes.linegames.org HTTP/2.0" 0 0
proxy-1:~# date
Sun Jan  5 04:05:49 UTC 2025


😱
Done.
Done.
@lyse Oh good! It works haha 🤣 I'll bump it up a bit 👌
@lyse Oh good! It works haha 🤣 I'll bump it up a bit 👌
🧮 USERS:1 FEEDS:2 TWTS:1205 ARCHIVED:83338 CACHE:2807 FOLLOWERS:17 FOLLOWING:14
@prologic Looks like I'm hitting this now when reloading my subscriptions:

$ grep twtxt.net .config/twtxt/config | wc -l
26
And now I've applied rate limits on every site to reasonable values 👌
And now I've applied rate limits on every site to reasonable values 👌
@bender Isn't that why um yarning my progress 🤣
@bender Isn't that why um yarning my progress 🤣
… aaaaaaand I had the first bug in my toy OS that was caused by caching. 😂 Bloody caching. (It only triggered in error conditions, but still.)
… aaaaaaand I had the first bug in my toy OS that was caused by caching. 😂 Bloody caching. (It only triggered in error conditions, but still.)
… aaaaaaand I had the first bug in my toy OS that was caused by caching. 😂 Bloody caching. (It only triggered in error conditions, but still.)
… aaaaaaand I had the first bug in my toy OS that was caused by caching. 😂 Bloody caching. (It only triggered in error conditions, but still.)
@prologic you are documenting everything, right? I am very interested in a HOWTO! ☺️
@kat Yeah, Java itself is somewhat “controversial”, I guess. 😅 But I’ve always found their documentation to be very pleasent to work with, at least that of the standard library.
@kat Yeah, Java itself is somewhat “controversial”, I guess. 😅 But I’ve always found their documentation to be very pleasent to work with, at least that of the standard library.
@kat Yeah, Java itself is somewhat “controversial”, I guess. 😅 But I’ve always found their documentation to be very pleasent to work with, at least that of the standard library.
@kat Yeah, Java itself is somewhat “controversial”, I guess. 😅 But I’ve always found their documentation to be very pleasent to work with, at least that of the standard library.
[47°09′38″S, 126°43′33″W] Non-significative results -- sampling finished
Ontem pusemos a tocar uma música do Pavarotti, e agora a miúda (quase 2anos) anda a pedir a "canção do paparoti" e está a ser difícil lidar :i_cant:
Ontem pusemos a tocar uma música do Pavarotti, e agora a miúda (quase 2anos) anda a pedir a "canção do paparoti" e está a ser difícil lidar :i_cant:
Ontem pusemos a tocar uma música do Pavarotti, e agora a miúda (quase 2anos) anda a pedir a "canção do paparoti" e está a ser difícil lidar :i_cant:
@movq woah it's like a cheatsheet with explanations! java is kind of arcane magic sorcery to me so i'm having trouble understanding it but i have that with most programming languages. this is like so much easier to actually look at and read instead of my eyes glazing over lol
@movq woah it's like a cheatsheet with explanations! java is kind of arcane magic sorcery to me so i'm having trouble understanding it but i have that with most programming languages. this is like so much easier to actually look at and read instead of my eyes glazing over lol
@andros Sorry I missed your messages to #twtxt on IRC. There are people there, but it can take several hours to get a response. E.g. I check it every day or two. I recommend using an IRC bouncer. To answer your question about registries, I used a couple of registries when I first started out, to try to find feeds to follow, but haven't since then. I don't remember which ones, but they were easy to find with web searches.
#petpeeve - when in the middle of a #book series, the publisher decides the books should be 1cm taller
#petpeeve - when in the middle of a #book series, the publisher decides the books should be 1cm taller
@kat Okay, horrible cookie popup aside, would you say this is easier to read? https://docs.oracle.com/javase/8/docs/api/java/util/List.html#method.summary 🤔
@kat Okay, horrible cookie popup aside, would you say this is easier to read? https://docs.oracle.com/javase/8/docs/api/java/util/List.html#method.summary 🤔
@kat Okay, horrible cookie popup aside, would you say this is easier to read? https://docs.oracle.com/javase/8/docs/api/java/util/List.html#method.summary 🤔
@kat Okay, horrible cookie popup aside, would you say this is easier to read? https://docs.oracle.com/javase/8/docs/api/java/util/List.html#method.summary 🤔
@prologic YEAH it's so cool!!! i was thinking about trying it as sorta practice for golang lol
@prologic YEAH it's so cool!!! i was thinking about trying it as sorta practice for golang lol
@kat I've actually moved most of my stuff of of Cloudflare now 🤣 I'm actually very happy with my edge proxy setup that reverse proxies, caches and acts as a web application firewall 🥳
@kat I've actually moved most of my stuff of of Cloudflare now 🤣 I'm actually very happy with my edge proxy setup that reverse proxies, caches and acts as a web application firewall 🥳
@kat Have you seen the SSG that I built and use on all my static sites? zs 🤔
@kat Have you seen the SSG that I built and use on all my static sites? zs 🤔
Oh gawd. I can't enable caching on my edge proxy everywhere 😱 Some shit™ doesn't deal with a caching reverse proxy in front of it very well for some reason I don't have time to dig into right now 🤔
Oh gawd. I can't enable caching on my edge proxy everywhere 😱 Some shit™ doesn't deal with a caching reverse proxy in front of it very well for some reason I don't have time to dig into right now 🤔
the windows CSS frameworks are sooo epic like you mean i can click a win aero button in my browser?!?! WITCHCRAFT!
the windows CSS frameworks are sooo epic like you mean i can click a win aero button in my browser?!?! WITCHCRAFT!
morning yarn friends i've been playing with astro the SSG and it's a blast i see why my friends love it and rec it to everyone. i may think javascript was a mistake but this is super cool
morning yarn friends i've been playing with astro the SSG and it's a blast i see why my friends love it and rec it to everyone. i may think javascript was a mistake but this is super cool
@prologic that's iconic af though like i should do the same bc i hate cloudflare that much i just refuse to use them
@prologic that's iconic af though like i should do the same bc i hate cloudflare that much i just refuse to use them
@lyse oh nah it came out like that lol! i actually love how squished it looks it feels accurate lol

oh yeah i think i might have a tripod around but i do need a sandbag or something i could use as one. maybe yeah a giant bag of rice could work LOL. thanks for the tips!!! i took a video class last year in college and we worked with cameras and tripods with sandbags so it was on my mind
@lyse oh nah it came out like that lol! i actually love how squished it looks it feels accurate lol

oh yeah i think i might have a tripod around but i do need a sandbag or something i could use as one. maybe yeah a giant bag of rice could work LOL. thanks for the tips!!! i took a video class last year in college and we worked with cameras and tripods with sandbags so it was on my mind
@lyse yeah! as long as it's fun :D experimenting with it like picking up the camera every once in a while to point somewhere else, or in editing inserting more video in between the static angles, that could be fun!
@lyse yeah! as long as it's fun :D experimenting with it like picking up the camera every once in a while to point somewhere else, or in editing inserting more video in between the static angles, that could be fun!
@movq this is why people like me can't code this is boring eyes glazing over kinda stuff lol
@movq this is why people like me can't code this is boring eyes glazing over kinda stuff lol
[47°09′39″S, 126°43′31″W] Taking samples
What's a reasonable per second or per minute rate limit that I could apply in general at my edge proxy for all clients? (_no matter what_) ... LIke a good reasonable upper bound? 🤔
What's a reasonable per second or per minute rate limit that I could apply in general at my edge proxy for all clients? (_no matter what_) ... LIke a good reasonable upper bound? 🤔
Spent 2 days traveling. Now it's time to stay at home and relax
@movq Yeah I swear to god the engineers that write this shit™ don't know how to write distributed cralwers that don't happy the shit™ out of their targets 🤦‍♂️
@movq Yeah I swear to god the engineers that write this shit™ don't know how to write distributed cralwers that don't happy the shit™ out of their targets 🤦‍♂️
@prologic Yeah, robots.txt or ai.txt are not worth the effort. I have them, but they get ignored. Just now, I saw a stupid AI bot hitting one of my blog posts like crazy. Not just once, but hundreds of times, over and over. 🤦🙄
@prologic Yeah, robots.txt or ai.txt are not worth the effort. I have them, but they get ignored. Just now, I saw a stupid AI bot hitting one of my blog posts like crazy. Not just once, but hundreds of times, over and over. 🤦🙄
@prologic Yeah, robots.txt or ai.txt are not worth the effort. I have them, but they get ignored. Just now, I saw a stupid AI bot hitting one of my blog posts like crazy. Not just once, but hundreds of times, over and over. 🤦🙄
@prologic Yeah, robots.txt or ai.txt are not worth the effort. I have them, but they get ignored. Just now, I saw a stupid AI bot hitting one of my blog posts like crazy. Not just once, but hundreds of times, over and over. 🤦🙄
For some reason, I was using calc all this time. I mean, it’s good, but I need to do base conversions (dec, hex, bin) *very* often and you have to type base(2) or base(16) in calc to do that. That’s exhausting after a while.

So I now replaced calc with a little Python script which always prints the results in dec/hex/bin, grouped in bytes (if the result is an integer). That’s what I need. It’s basically just a loop around Python’s exec().

$ mcalc
> 123
123 0x[7b] 0b[01111011]

> 1234
1234 0x[04 d2] 0b[00000100 11010010]

> 0x7C00 + 0x3F + 512
32319 0x[7e 3f] 0b[01111110 00111111]

> a = 10; b = 0x2b; c = 0b1100101
10 0x[0a] 0b[00001010]

> a + b + 3 * c
356 0x[01 64] 0b[00000001 01100100]

> 232 - 1
4294967295 0x[ff ff ff ff] 0b[11111111 11111111 11111111 11111111]

> 4 * atan(1)
3.141592653589793

> cos(pi)
-1.0=
For some reason, I was using calc all this time. I mean, it’s good, but I need to do base conversions (dec, hex, bin) *very* often and you have to type base(2) or base(16) in calc to do that. That’s exhausting after a while.

So I now replaced calc with a little Python script which always prints the results in dec/hex/bin, grouped in bytes (if the result is an integer). That’s what I need. It’s basically just a loop around Python’s exec().

$ mcalc
> 123
123 0x[7b] 0b[01111011]

> 1234
1234 0x[04 d2] 0b[00000100 11010010]

> 0x7C00 + 0x3F + 512
32319 0x[7e 3f] 0b[01111110 00111111]

> a = 10; b = 0x2b; c = 0b1100101
10 0x[0a] 0b[00001010]

> a + b + 3 * c
356 0x[01 64] 0b[00000001 01100100]

> 232 - 1
4294967295 0x[ff ff ff ff] 0b[11111111 11111111 11111111 11111111]

> 4 * atan(1)
3.141592653589793

> cos(pi)
-1.0=
For some reason, I was using calc all this time. I mean, it’s good, but I need to do base conversions (dec, hex, bin) *very* often and you have to type base(2) or base(16) in calc to do that. That’s exhausting after a while.

So I now replaced calc with a little Python script which always prints the results in dec/hex/bin, grouped in bytes (if the result is an integer). That’s what I need. It’s basically just a loop around Python’s exec().

$ mcalc
> 123
123 0x[7b] 0b[01111011]

> 1234
1234 0x[04 d2] 0b[00000100 11010010]

> 0x7C00 + 0x3F + 512
32319 0x[7e 3f] 0b[01111110 00111111]

> a = 10; b = 0x2b; c = 0b1100101
10 0x[0a] 0b[00001010]

> a + b + 3 * c
356 0x[01 64] 0b[00000001 01100100]

> 232 - 1
4294967295 0x[ff ff ff ff] 0b[11111111 11111111 11111111 11111111]

> 4 * atan(1)
3.141592653589793

> cos(pi)
-1.0=
For some reason, I was using calc all this time. I mean, it’s good, but I need to do base conversions (dec, hex, bin) *very* often and you have to type base(2) or base(16) in calc to do that. That’s exhausting after a while.

So I now replaced calc with a little Python script which always prints the results in dec/hex/bin, grouped in bytes (if the result is an integer). That’s what I need. It’s basically just a loop around Python’s exec().

$ mcalc
> 123
123 0x\n 0b\n

> 1234
1234 0x\n 0b\n

> 0x7C00 + 0x3F + 512
32319 0x\n 0b\n

> a = 10; b = 0x2b; c = 0b1100101
10 0x\n 0b\n

> a + b + 3 * c
356 0x\n 0b\n

> 232 - 1
4294967295 0x\n 0b\n

> 4 * atan(1)
3.141592653589793

> cos(pi)
-1.0=
@doesnm No. I generally don't put up any robots.txt files at all really, because they mostly get ignored. I don't generally mind if "normal" web crawlers crawl things. But LLM(s) can go fuck themselves 🤣
@doesnm No. I generally don't put up any robots.txt files at all really, because they mostly get ignored. I don't generally mind if "normal" web crawlers crawl things. But LLM(s) can go fuck themselves 🤣
Did you have disallow rule in robots.txt? (I think not because can google several twtxt.net posts)
@movq Yeah it's starting to piss me off too 🤣 Not nearly as much as that guy, but stil. Anyway I'm having fun! Now I just need to find a good IP/Subnet list that I can blacklist entirely, ideally one that's updated frequently so I can refresh firewall rules.
@movq Yeah it's starting to piss me off too 🤣 Not nearly as much as that guy, but stil. Anyway I'm having fun! Now I just need to find a good IP/Subnet list that I can blacklist entirely, ideally one that's updated frequently so I can refresh firewall rules.
@prologic You might (not) enjoy this blog post: https://pod.geraspora.de/posts/17342163
@prologic You might (not) enjoy this blog post: https://pod.geraspora.de/posts/17342163
@prologic You might (not) enjoy this blog post: https://pod.geraspora.de/posts/17342163
@prologic You might (not) enjoy this blog post: https://pod.geraspora.de/posts/17342163