# I am the Watcher. I am your guide through this vast new twtiverse.
# 
# Usage:
#     https://watcher.sour.is/api/plain/users              View list of users and latest twt date.
#     https://watcher.sour.is/api/plain/twt                View all twts.
#     https://watcher.sour.is/api/plain/mentions?uri=:uri  View all mentions for uri.
#     https://watcher.sour.is/api/plain/conv/:hash         View all twts for a conversation subject.
# 
# Options:
#     uri     Filter to show a specific users twts.
#     offset  Start index for quey.
#     limit   Count of items to return (going back in time).
# 
# twt range = 1 6196
# self = https://watcher.sour.is?uri=https://lyse.isobeef.org/twtxt.txt&offset=5496
# next = https://watcher.sour.is?uri=https://lyse.isobeef.org/twtxt.txt&offset=5596
# prev = https://watcher.sour.is?uri=https://lyse.isobeef.org/twtxt.txt&offset=5396
I heard a funny saying today: Democracy is when three foxes and a bunny decide what to have for dinner.
I can't make it as I'm on a hike with a mate.
@prologic How is nick@domain any better than a feed URL? Changing the nick now also breaks threading. That's even worse than the current approach. Also, there might be multiple feeds with same nicks on one domain, e.g. on free hosters.
Phew! I now finally called it a day as well. Our customer wanted me to emergency-start implementing some changes. Got an initial version with unit tests, but the final testing must wait until Monday.
@mckinley I could have sworn that it resumed even a partial file the other week. But maybe that was because the first attempt used scp when the connection broke. And then rsync detected that only the last part of that file was incomplete and transferred the missing bits. So, lucky by accident. In any case, I will always include -P from now on. :-)
Ah, I see! Thanks, @bender.
@david Sounds lovely. :-)

We had rain all day long and my mate and I still went for a walk with our umbrellas. It was a bit wet. But now I can send my drying rack over the tub on its maiden voyage. Should have built a second rod for more capacity.
@david Enjoy the day off and fingers crossed that you survive without damages. Stay safe!
Good writeup, @anth! I agree to most of your points.

3.2 Timestamps: I feel no need to mandate UTC. Timezones are fine with me. But I could also live with this new restriction. I fail to see, though, how this change would make things any easier compared to the original format.

3.4 Multi-Line Twts: What exactly do you think are bad things with multi-lines?

4.1 Hash Generation: I do like the idea with with a new uuid metadata field! Any thoughts on two feeds selecting the same UUID for whatever reason? Well, the same could happen today with url.

5.1 Reply to last & 5.2 More work to backtrack: I do not understand anything you're saying. Can you rephrase that?

8.1 Metadata should be collected up front: I generally agree, but if the uuid metadata field were a feed URL and no real UUID, there should be probably an exception to change the feed URL mid-file after relocation.
I passed a mountainbiker with a helmet camera in the forst, saw a four centimeter long black beetle that rolled over its side to change directions and finally spotted three deer on the paddock. An hour well spent I reckon.
Finally! After hours I figured out my problems.

1. The clever Go code to filter out completely read conversations got in the way with the filtering now moved into SQL. Yeah, I also did not think that this could ever conflict. But it did. Initializing the completeConversationRead flag to true got now in my way, this caused a conversation to be removed. Simply deleting all the code around that flag solved it.

2. Generation of missing conversation roots in SQL simply used the oldest (smallest) timestamp from any direct reply in the tree. To find the missing roots I grouped by subject and then aggregated using min(created_at). Now that I optimized this to only take unread messages into consideration in the first place, I do not necessarily see the smallest child anymore (when it's already read), so the timestamp is then moved forward to the next oldest unread reply. As I do not care too much about an accurate timestamp for something made up, I just adjusted my test case accordingly. Good enough for me. :-)

It's an interesting experiment with SQLite so far. I certainly did learn a few things along the way. Mission accomplished.
@prologic Ta! Somehow, my unit tests break, though. Running the same query manually looks like it's producing a plausible looking result, though. I do not understand it.
@david As far as I understand it, auto-completion *is* working, that's the issue. :-D Instead of spamming the terminal with bucketloads of possibilities, zsh's auto-complete is nice enough to ask whether to proceed or not.
@david Weird, I always thought that rsync automatically resumes the up- or download when aborted. But the manual indicates otherwise with --partial (-P is --partial --progress).
@prologic I reckon, I could just hash the subject internally to get a shorter version.
Three feeds (prologic, movq and mine) and my database is already 1.3 MiB in size. Hmm. I actually got the read filter working. More on that later after polishing it.
@aelaraji @mckinley rsync -avzr with an optional --progress is what I always use. Ah, I could use the shorter -P, thanks @movq.
@movq Interesting, it's always good to know how things work under the hood. But I'm very glad, that I do not have to deal with this low-level stuff. :-)
@prologic @movq Luckily, we were only touched by the thunderstorm cell. Even though the sky lit up a bunch and the thunder roared, there were no close thunderbolts. But it rained cats and dogs. The air smelled lovely.
@eapl.me All the best, see you next life around. :-) On Twtxt I only meet my online friends. I'm staying in touch with some of my real life mates on IRC or e-mail. But that's fine. That's just how it goes.

Thanks, @bender. :-)
@aelaraji Hahaha, brilliant! :-D
We're now having a thunderstorm with rain, lightning and thunder and the severe weather map shows all green. I'd expect it to be violet.
Okay, I figured out the cause of the broken output. I also replaced the first subject = '' for the existing conversation roots with subject > ''. Somehow, my brain must have read subject <> ''. That equality check should not have been touched at all. I just updated the updated archive for anyone who is interested to follow along: https://lyse.isobeef.org/tmp/tt2cache.tar.bz2 (151.1 KiB)
@prologic Yeah, relational databases are definitely not the perfect fit for trees, but I want to give it a shot anyway. :-)

Using EXPLAIN QUERY PLAN I was able to create two indices, to avoid some table scans:

CREATE INDEX parent ON messages (hash, subject);
CREATE INDEX subject_created_at ON messages (subject, created_at);

Also, since strings are sortable, instead of str_col <> '' I now use str_col > '' to allow the use of an index.

But somehow, my output seems to be broken at the end for some reason, I just noticed. :-? Hmm.

The read status still gives me headache. I think I either have to filter in the application or create more meta data structures in the database.

I'm wondering if anyone here already used certain storages for tree data.
@prologic I see. I reckon, it makes to combine 1 and 2, because if we change the hashing anyway, we don't break it twice.
This organigram example got me started: https://www.sqlite.org/lang_with.html#controlling_depth_first_versus_breadth_first_search_of_a_tree_using_order_by

But I feel execution times get worse rather quickly with more data I add. Also, caching helps tremendously, executing it for the first time took over 600ms. From then on I'm down to 40ms.

I think, it's particularly bad that parents might be missing. Thus, I cannot use an index, because there is no parent to reference. But my database knowledge is fairly limited, so I have to read up on that.
There you go, @prologic, the SQLite database (with a bit more data now) and the sqlitebrowser project file containing the query: https://lyse.isobeef.org/tmp/tt2cache.tar.bz2 (133.9 KiB)
@falsifian I agreee. It's an optional header.
@movq Oha! @bender Happy cooling off!
@prologic Well, mentions are also quite lengthy as they always include the feed URL. I know, that's not a good argument.

I just got a very, very wild idea that I have not put any brain power into, so it might be totally stupid: Since many replies also mention the original feed, maybe a mention and thread identifier could be compbined, something like: @<nick url timestamp>. But then we would also need another style if one does not want to mention the original author.

So, scratch that. But I put it out there anyway. Maybe this inspires someone else to come up with something neat.
@prologic Not sure how many actually care about a 140 character limit. I don't. Not at all.
@prologic I'm wondering what exactly you mean by incremental changes, what are the individual ones? What do you have in mind?
@prologic I find it quite hard to rank the facets. Some go hand in hand or depend on the protocol that a feed is offered. I feel some are only relevant to specific clients. I'm sure, people interpret some of them differently.

I'm curious, is it possible to see each individual poll submission?
I'm experimenting with SQLite and trees. It's going good so far with only my own 439 messages long main feed from a few days ago in the cache. Fetching these 632 rows took 20ms:

SQL query to build up the conversation trees in the cache

Now comes the real tricky part, how do I exclude completely read threads?
@movq Heaps of mozzies and other stuff that wants to eats you. Yeah, I noticed that as well. But I don't know if it's really more than usual. I might just have forgotten how bad it was in the past by now. :-?

With the wet beginning this year, water-loving insects certainly got a head start.
Voilà: https://git.mills.io/yarnsocial/yarn/pulls/1181
@prologic Correct. The plan is that operators have to manually trust a peer before it is used for fetching missing conversation roots from. Preview of the horrible UI:

New trust level management in the Peer Management page
@bender Yeah, it was nice. 23°C and a bit of wind. Quite acceptable in my opinion. :-)
@prologic @movq In all reality, even seconds precision would be enough for this new feed announcement bot. It just has to delay or predate its messages. It hopefully does not find new feeds all the time. :-)
@prologic What should happen if the archive chain is detected to be broken? I don't think that including the hash in the prev field does really help us in reality. What if messages in the archive feed themselves got lost? You can't detect this unless you've already known about them. I reckon we can simply use the relative path and call it good. I know, I know, we have this format already today. But in my opinion, the hash does not add value.
@prologic The Content-Type should probably even include the charset=utf-8 as we learned recently. :-) Iff you want to keep the UTF-8 encoding mandatory. It doesn't say anything about it in that document.
@prologic The reply-to can come anywhere in the message text? Most examples even put it at the very end. Why relax that? It currently has to be at the beginning, which I think makes parsing easier. I have to admit, at the end makes reading the raw feed nicer. But multi-line messages with U+2028 ruin the raw feed reading experience very quickly.
@prologic For hash calculation we could maybe rethink the newlines and use tabs instead. This is more in line with the twtxt file format itself. With tabs it also is much closer to the registry format (minus the nick).

What about the timestamp format? Just verbatim as it appears in the feed (what I would recommend) or any other shenanigans with normalization, like +00:00 → Z?

An append style is not required, btw. If one uses prepend style feeds, the new URL simply comes at the beginning of the file, where the old URL is further down.

Clients must use the full-length hash in their storages, but only use the first eleven digits when referencing? This differentiation is a bit odd.
@prologic The multline example is broken. I don't see any "pipes".
@prologic I notice that in your document it says reply-to, where in the ReplyTo Extension it's without the hyphen. (But they also use different values after the colon. :-))
Thanks again for typing it up, @movq! I left a few comments there. Currently, I'm in favor of the location-based adressing, that's heaps simpler.
@sorenpeter Excellent point! I agree.
@bender @prologic @aelaraji Everything entering over Pod Gossiping is only cached temporarily, but never archived. So, it eventually fell off the cache. If my fake feeds were still up, yarnd would have pulled it from me again. I ran into the situation locally as well and then got it back, though.
@movq Awesome, thank you very much! I'll have a look at it tomorrow.
It was beautiful in nature: https://lyse.isobeef.org/waldspaziergang-2024-09-21/

Fresh hay bales on a field
@prologic Let me try:

Invent anything you want, say feed A writes message text B at timestamp C. You simply create the hash D for it and reply to precisely that D as subject in your own feed E with your message text F at timestamp G. This gets hashed to H.

Now then, some a client J fetches your feed E. It sees your response from time G with text F where in the subject you reference hash D. Since client J does not know about hash D, it simply asks some peers about it. If it happens to query your yarnd for it, you could happily serve it your invention: "You wanna know about hash D? Oh, that's easy, feed A wrote B at time C."

The client J then verifies it and since everthing lines up, it looks legitimate and puts this record in its cache or displays it to the user or whatever. It does not even matter, if the client J follows feed A or not. The message text B at C with hash D could have just deleted or edited in the meantime.

Congrats, you successfully spread rumors. :-D
@prologic This does not hold if the edit happened before I even got the original.
@falsifian Something similar exists over at https://search.twtxt.net/. But a usable search engine would be actually nice (to be fair, yarns improved a bit). :-) I don't care about feed changes over time. In fact, it would even feel creepy to me. Of course, anyone could still surveil, but I'm not looking forward to these stats.
@movq We could still let the client display a warning if it cannot verify it. But yeah.
@movq Reminds me of this beautiful face recognition failure: https://qz.com/823820/carnegie-mellon-made-a-special-pair-of-glasses-that-lets-you-steal-a-digital-identity :-D
@prologic What exactly?
@prologic Just what @bender did. :-D If he'd additionally serve the fake message from his yarnd twt endpoint, everybody querying that hash from him (or any other yarnd that synced it in the meantime) would believe, that I didn't like Australians.

In fact, I really don't. I love'em! 8-)

We would need to sign each message in a feed, so others could verify that this was actually part of that feed and not made up. But then we end up in the crypto debate for identities again, which I'm not a big fan of. :-)

I just want to highlight, one might get a false sense of message authenticity, if one just briefly looks at the hashes.
@movq Ah, cool. :-)
It just occurs to me we're now building some kind of control structures or commands with (edit:…) and (delete:…) into feeds. It's not just a simple "add this to your cache" or "replace the cache with this set of messages" anymore. Hmm. We might need to think about the consequences of that, can this be exploited somehow, etc.
@movq Not sure if I like the idea of keeping the original message around. It goes against the spirit of an edit in my mind.

If that's what we want to enforce, forget about my other message above in the thread.
@prologic @movq I still don't understand it. If the original message has been replaced with the edited one, I cannot verify that the original was in the same feed. I don't know the original text.
Hahahahahaahaaaahaaaaaa, brilliant! I love it, @bender! :'-D
@movq Thanks for the summary!

So, what would happen if there is no original message anymore in the feed and you encounter an "edit" subject? Since you cannot verify that the feed contained it in the first place, would you obey it?

Some feed could just make a client update something from a different feed. In the cache, the client would need to store in a flag that this message was updated, so that when it later encounters the message from the real feed, it has a chance of reverting that bogus edit. Hmm. The devil is in the detail.

It's much easier with a delete subject. When it finds the message in its cache and the feeds match, remove it. Otherwise, just ignore it.
@movq Right. That's why, I'd bite the bullet and go for huge URLs. :-)

I havent't looked at the code and I'm too lazy right now, does jenny also verify the fetched result against the hash?
@movq Yeah, but hashing also uses the main feed URL or whatever is written in the feed's first url metadata field. So, it's not a new problem, it's exactly the same.
@movq @david Yeah, he got a bit older but I could still easily recognize him.
Another thing: At the moment, anyone could claim that some feed contained a certain message which was then removed again by just creating the hash over the fake message in said feed and invented timestamp themselves. Nobody can ever verify that this was never the case in the first place and completely made up. So, our twt hashes have to be taken with a grain of salt.
@david Cool idea actually! The hash would also be shorter than the raw URL and timestamp.
@prologic I get where you're coming from. But is it really that bad in practice? If you follow any link somewhere in the web, you also don't know if its contents has been changed in the meantime. Is that a problem? Almost never in my experience.

Granted, it's a nice property when one can tell that it was not messed with since the author referenced it.
@movq The more I think about it, the more do I like the location-based addressing. That feels fairly in line with the spirit of twtxt, just like you stated somewhere else.

The big downside for me is that the subjects then become super long.

And if the feed relocates, we end up with broken conversation trees again. Just like nowadays. At least it's not getting worse. :-)

Using the feed URL in there might become a little challenging for new folks, when the twt rotates away into archive feeds. But I reckon, we already have a similar situation with the hashes. So, probably not too bad.
@quark Yeah, let's see what they reveal!
Nice, @david! The winter palms look nice. And the sky is full of snow.
Yesterday, both temperature and wind picked up. There was even wind in the night, which is rare over here. Today, we also got a lot of sunshine, around 22°C and heaps of wind. The leaves and twigs were blown at the house door, it reminded me of a snow drift, basically a leave bank. I should have taken a photo before I swept it, it looked quite bizarre.

But I photographed something else instead:

Possibly a large roof panel on a crane

My mate and I went out in the woods earlier and we came across 08 which broke off in roughly 6, 7 meters from 09. When it hit the ground, it made a 30 cm deep hole. Quite impressive. https://lyse.isobeef.org/waldspaziergang-2024-09-19/
@falsifian Yeah, delete requests feel very odd.
@prologic I wish that was true! But I reckon there is still heaps of old stuff out there, that was created on a Windows machine. :-D And I wouldn't be surprised if even today in that environment a new file does not make use of UTF-8.
@quark I'm not convinced. :-D
@quark @movq Yep, they're all RFC3339. Obviously, +02:00 and +01:00 are best, because I use them! :-P In all seriousness, Z might be the best timezone, as it is shortest. And regarding privacy, it leaks the least information about the user's rough location. But of course, one can just look at the activity and narrow down plausible regions, so that's a weak argument.
@falsifian I can confirm, it's fixed. Thank you! Indeed, this is some wild quoting.

I still do not understand why the encoding suddenly broke, though. :-? Anyway. I concentrate on my rewrite and do things the right™ way. ;-) Still long ways to go.
@bender I know, I know… A relative time in a static HTML document is questionable at best. ;-)
Now WTF!? Suddenly, @falsifian's feed renders broken in my tt Python implementation. Exactly what I had with my Go rewrite. I haven't touched the Python stuff in ages, though. Also, tt and tt2 do not share any data at all.

By any chance, did you remove the ; charset=utf-8 from your Content-Type: text/plain header, falsifian?

interpreted in some crappy windows charset
@movq Non-ASCII characters were broken. Like U+2028, degrees (°), etc.

Turns out I used a silly library to detect the encoding and transform to UTF-8 if needed. When there is no Content-Type header, like for local files, it looks at the first 1024 bytes. Since it only saw ASCII in that region, the damn thing assumed the data to be in Windows-1252 (which for web pages kinda makes sense):

// TODO: change default depending on user's locale?
return charmap.Windows1252, "windows-1252", false

https://cs.opensource.google/go/x/net/+/master:html/charset/charset.go;l=102

This default is hardcoded and cannot be changed.

Trying to be smart and adding automatic support for other encodings turned out to be a bad move on my end. At least I can reduce my dependency list again. :-)

I now just reject everything that explicitly specifies something different than text/plain and an optional charset other than utf-8 (ignoring casing). Otherwise I assume it's in UTF-8 (just like the twtxt file format specification mandates) and hope for the best.
Hmmmm, I somehow run into an encoding problem where my inserted data end up mangled in the database. But, both SQLite and Go use UTF-8. What's happening here? :-?
@prologic Correct. :-D
@prologic I'm basically with @movq, but in contrast to him, I'm not looking forward to implement something like that. :-)

A feed URL is plenty good enough for me. Since I only fetch feeds that I explicity follow, there is some basic trust in those feeds already. Spoofing, impersonation and what not are no issues for me. If I were to find out otherwise, I just unsubscribe from the evil feed. Done.

To retrieve public feeds, I just rely on TLS. Most are served via HTTPS. If a feed is down, I'm not trying to fetch it from some other source, I just wait and try again later. So signed messages/feeds are not a use case I'm particularly benefitting from.

To me, it's just not worth at all adding this crypto complexity on top.
Found it: https://github.com/buckket/twtxt/issues/157
@prologic Yeah, but I reckon we can kill both birds with one stone. If we change it to support edits, it should be fairly easy to also tweak it to support feed URL changes. Like outlined in my first reply: https://twtxt.net/twt/n4omfvq The URL part sounds way easier to me. :-)
@sorenpeter There was or maybe still is a competing proposal for multiline twts that combines all twts with the same timestamp to one logical multiline twt. Not sure what happened to that, if it is used in the wild and whether anyone "here" follows a feed with that convention. "Our" solution for multiline twts is to use U+2028 Unicode LINE SEPARATOR as a newline: https://dev.twtxt.net/doc/multilineextension.html.
@movq What's you definition of "complete thread"? ;-) There might be feeds participating in the conversation that you have no idea of.

But yes, this has a nice discoverability bonus. And even simpler than a hash, that's right.
@movq Yeah, I think so.
Keys for identity are too much for me. This steps up the complexity by a lot. Simplicity is what made me join twtxt with its extensions. A feed URL is all I need.

Eventually, twt hashes have to change (lengthen at least), no doubt about that. But I'd like to keep it equally simple.
@prologic When the next hype train departs. :-)
@stigatle Yeah, the sudden drop makes it feel worse than it is. It made me wear a beanie and gloves on my bike ride on Friday evening. In a few weeks I consider the same temperatures not an issue anymore, maybe even nicely warm. ;-) The body is fairly quick to adopt, but not that fast.

I just saw that we're supposed to hit 19°C mid next week again. Let's see.
@off_grid_living Oh dear, what an epic adventure! Terrible at the time, but hilarious to tell later on. :-D

I do like this photo a lot. It brings up memories of cool scouting trips.
@off_grid_living Hahaha, this is really great, I love it! :-D
@off_grid_living Still a bit different, but this reminds me of the rusk boy on the Brandt boxes which is kinda iconic over here: https://cdn.idealo.com/folder/Product/2151/8/2151814/s1_produktbild_max/brandt-der-markenzwieback-225-g.jpg They should switch to this photo. :-)
@off_grid_living It's kinda cool to see how small cars were back in the days. Especially the left one looks really tiny.
Happy birthday @prologic! :-)
Ta, @bender! Correct, apart from resizing, no further processing on my end. That's just the Japanese sunset photo engineer's magic. :-) In all it's original glory (3.2 MiB): https://lyse.isobeef.org/abendhimmel-2024-09-13/02.JPG
@off_grid_living Looks like you're describing a captcha. They do not really work. Bots seem to solve them, too.
@movq Thanks! Yeah, one week for autumn and spring must be enough. Or so the weather thinks. Looks like there is only on or off.