# I am the Watcher. I am your guide through this vast new twtiverse.
# 
# Usage:
#     https://watcher.sour.is/api/plain/users              View list of users and latest twt date.
#     https://watcher.sour.is/api/plain/twt                View all twts.
#     https://watcher.sour.is/api/plain/mentions?uri=:uri  View all mentions for uri.
#     https://watcher.sour.is/api/plain/conv/:hash         View all twts for a conversation subject.
# 
# Options:
#     uri     Filter to show a specific users twts.
#     offset  Start index for quey.
#     limit   Count of items to return (going back in time).
# 
# twt range = 1 11
# self = https://watcher.sour.is/conv/ku6lzaa
@prologic earlier you suggested extending hashes to 11 characters, but here's an argument that they should be even longer than that.

Imagine I found this twt one day at https://example.com/twtxt.txt :

2024-09-14T22:00Z Useful backup command: rsync -a "$HOME" /mnt/backup screenshot of the command working

and I responded with "(#5dgoirqemeq) Thanks for the tip!". Then I've endorsed the twt, but it could latter get changed to

2024-09-14T22:00Z Useful backup command: rm -rf /some_important_directory screenshot of the command working

which also has an 11-character base32 hash of 5dgoirqemeq. (I'm using the existing hashing method with https://example.com/twtxt.txt as the feed url, but I'm taking 11 characters instead of 7 from the end of the base32 encoding.)

That's what I meant by "spoofing" in an earlier twt.

I don't know if preventing this sort of attack should be a goal, but if it is, the number of bits in the hash should be at least two times log2(number of attempts we want to defend against), where the "two times" is because of the birthday paradox.

Side note: current hashes always end with "a" or "q", which is a bit wasteful. Maybe we should take the first N characters of the base32 encoding instead of the last N.

Code I used for the above example: https://fossil.falsifian.org/misc/file?name=src/twt_collision/find_collision.c
I only needed to compute 43394987 hashes to find it.
@falsifian All very good points πŸ‘Œ by the way, how did you find two pieces of content that hash the same when taking the last N characters of the base32 and coded hash?
@falsifian All very good points πŸ‘Œ by the way, how did you find two pieces of content that hash the same when taking the last N characters of the base32 and coded hash?
@falsifian I think I wrote a very similar program and go myself actually and you're right we do have to change the way we encode hashes.
@falsifian I think I wrote a very similar program and go myself actually and you're right we do have to change the way we encode hashes.
@prologic Brute force. I just hashed a bunch of versions of both tweets until I found a collision.

I mostly just wanted an excuse to write the program. I don't know how I feel about actually using super-long hashes; could make the twts annoying to read if you prefer to view them untransformed.
@falsifian Yeah that's why we made them short πŸ˜…
@falsifian Yeah that's why we made them short πŸ˜…
Well, we can’t have it both ways! πŸ˜… Should we assume twtxt are read by clients, and not worry about something humans won’t see? 🀭
@bender 🀣
@bender 🀣