scp
when the connection broke. And then rsync
detected that only the last part of that file was incomplete and transferred the missing bits. So, lucky by accident. In any case, I will always include -P
from now on. :-)
We had rain all day long and my mate and I still went for a walk with our umbrellas. It was a bit wet. But now I can send my drying rack over the tub on its maiden voyage. Should have built a second rod for more capacity.
3.2 Timestamps: I feel no need to mandate UTC. Timezones are fine with me. But I could also live with this new restriction. I fail to see, though, how this change would make things any easier compared to the original format.
3.4 Multi-Line Twts: What exactly do you think are bad things with multi-lines?
4.1 Hash Generation: I do like the idea with with a new
uuid
metadata field! Any thoughts on two feeds selecting the same UUID for whatever reason? Well, the same could happen today with url
.5.1 Reply to last & 5.2 More work to backtrack: I do not understand anything you're saying. Can you rephrase that?
8.1 Metadata should be collected up front: I generally agree, but if the
uuid
metadata field were a feed URL and no real UUID, there should be probably an exception to change the feed URL mid-file after relocation.
1. The clever Go code to filter out completely read conversations got in the way with the filtering now moved into SQL. Yeah, I also did not think that this could ever conflict. But it did. Initializing the
completeConversationRead
flag to true
got now in my way, this caused a conversation to be removed. Simply deleting all the code around that flag solved it.2. Generation of missing conversation roots in SQL simply used the oldest (smallest) timestamp from any direct reply in the tree. To find the missing roots I grouped by subject and then aggregated using
min(created_at)
. Now that I optimized this to only take unread messages into consideration in the first place, I do not necessarily see the smallest child anymore (when it's already read), so the timestamp is then moved forward to the next oldest unread reply. As I do not care too much about an accurate timestamp for something made up, I just adjusted my test case accordingly. Good enough for me. :-)It's an interesting experiment with SQLite so far. I certainly did learn a few things along the way. Mission accomplished.
--partial
(-P
is --partial --progress
).
rsync -avzr
with an optional --progress
is what I always use. Ah, I could use the shorter -P
, thanks @movq.
Thanks, @bender. :-)
subject = ''
for the existing conversation roots with subject > ''
. Somehow, my brain must have read subject <> ''
. That equality check should not have been touched at all. I just updated the updated archive for anyone who is interested to follow along: https://lyse.isobeef.org/tmp/tt2cache.tar.bz2 (151.1 KiB)
Using
EXPLAIN QUERY PLAN
I was able to create two indices, to avoid some table scans:CREATE INDEX parent ON messages (hash, subject);
CREATE INDEX subject_created_at ON messages (subject, created_at);
Also, since strings are sortable, instead of
str_col <> ''
I now use str_col > ''
to allow the use of an index.But somehow, my output seems to be broken at the end for some reason, I just noticed. :-? Hmm.
The read status still gives me headache. I think I either have to filter in the application or create more meta data structures in the database.
I'm wondering if anyone here already used certain storages for tree data.
But I feel execution times get worse rather quickly with more data I add. Also, caching helps tremendously, executing it for the first time took over 600ms. From then on I'm down to 40ms.
I think, it's particularly bad that parents might be missing. Thus, I cannot use an index, because there is no parent to reference. But my database knowledge is fairly limited, so I have to read up on that.
I just got a very, very wild idea that I have not put any brain power into, so it might be totally stupid: Since many replies also mention the original feed, maybe a mention and thread identifier could be compbined, something like:
@<nick url timestamp>
. But then we would also need another style if one does not want to mention the original author.So, scratch that. But I put it out there anyway. Maybe this inspires someone else to come up with something neat.
I'm curious, is it possible to see each individual poll submission?

Now comes the real tricky part, how do I exclude completely read threads?
With the wet beginning this year, water-loving insects certainly got a head start.

prev
field does really help us in reality. What if messages in the archive feed themselves got lost? You can't detect this unless you've already known about them. I reckon we can simply use the relative path and call it good. I know, I know, we have this format already today. But in my opinion, the hash does not add value.
Content-Type
should probably even include the charset=utf-8
as we learned recently. :-) Iff you want to keep the UTF-8 encoding mandatory. It doesn't say anything about it in that document.
reply-to
can come anywhere in the message text? Most examples even put it at the very end. Why relax that? It currently has to be at the beginning, which I think makes parsing easier. I have to admit, at the end makes reading the raw feed nicer. But multi-line messages with U+2028 ruin the raw feed reading experience very quickly.
What about the timestamp format? Just verbatim as it appears in the feed (what I would recommend) or any other shenanigans with normalization, like
+00:00 → Z
?An append style is not required, btw. If one uses prepend style feeds, the new URL simply comes at the beginning of the file, where the old URL is further down.
Clients must use the full-length hash in their storages, but only use the first eleven digits when referencing? This differentiation is a bit odd.
reply-to
, where in the ReplyTo Extension it's without the hyphen. (But they also use different values after the colon. :-))

Invent anything you want, say feed A writes message text B at timestamp C. You simply create the hash D for it and reply to precisely that D as subject in your own feed E with your message text F at timestamp G. This gets hashed to H.
Now then, some a client J fetches your feed E. It sees your response from time G with text F where in the subject you reference hash D. Since client J does not know about hash D, it simply asks some peers about it. If it happens to query your yarnd for it, you could happily serve it your invention: "You wanna know about hash D? Oh, that's easy, feed A wrote B at time C."
The client J then verifies it and since everthing lines up, it looks legitimate and puts this record in its cache or displays it to the user or whatever. It does not even matter, if the client J follows feed A or not. The message text B at C with hash D could have just deleted or edited in the meantime.
Congrats, you successfully spread rumors. :-D
In fact, I really don't. I love'em! 8-)
We would need to sign each message in a feed, so others could verify that this was actually part of that feed and not made up. But then we end up in the crypto debate for identities again, which I'm not a big fan of. :-)
I just want to highlight, one might get a false sense of message authenticity, if one just briefly looks at the hashes.
(edit:…)
and (delete:…)
into feeds. It's not just a simple "add this to your cache" or "replace the cache with this set of messages" anymore. Hmm. We might need to think about the consequences of that, can this be exploited somehow, etc.
If that's what we want to enforce, forget about my other message above in the thread.
So, what would happen if there is no original message anymore in the feed and you encounter an "edit" subject? Since you cannot verify that the feed contained it in the first place, would you obey it?
Some feed could just make a client update something from a different feed. In the cache, the client would need to store in a flag that this message was updated, so that when it later encounters the message from the real feed, it has a chance of reverting that bogus edit. Hmm. The devil is in the detail.
It's much easier with a delete subject. When it finds the message in its cache and the feeds match, remove it. Otherwise, just ignore it.
I havent't looked at the code and I'm too lazy right now, does jenny also verify the fetched result against the hash?
url
metadata field. So, it's not a new problem, it's exactly the same.
Granted, it's a nice property when one can tell that it was not messed with since the author referenced it.
The big downside for me is that the subjects then become super long.
And if the feed relocates, we end up with broken conversation trees again. Just like nowadays. At least it's not getting worse. :-)
Using the feed URL in there might become a little challenging for new folks, when the twt rotates away into archive feeds. But I reckon, we already have a similar situation with the hashes. So, probably not too bad.
But I photographed something else instead:

My mate and I went out in the woods earlier and we came across 08 which broke off in roughly 6, 7 meters from 09. When it hit the ground, it made a 30 cm deep hole. Quite impressive. https://lyse.isobeef.org/waldspaziergang-2024-09-19/
+02:00
and +01:00
are best, because I use them! :-P In all seriousness, Z
might be the best timezone, as it is shortest. And regarding privacy, it leaks the least information about the user's rough location. But of course, one can just look at the activity and narrow down plausible regions, so that's a weak argument.
I still do not understand why the encoding suddenly broke, though. :-? Anyway. I concentrate on my rewrite and do things the right™ way. ;-) Still long ways to go.
By any chance, did you remove the
; charset=utf-8
from your Content-Type: text/plain
header, falsifian?
Turns out I used a silly library to detect the encoding and transform to UTF-8 if needed. When there is no Content-Type header, like for local files, it looks at the first 1024 bytes. Since it only saw ASCII in that region, the damn thing assumed the data to be in Windows-1252 (which for web pages kinda makes sense):
// TODO: change default depending on user's locale?
return charmap.Windows1252, "windows-1252", false
https://cs.opensource.google/go/x/net/+/master:html/charset/charset.go;l=102
This default is hardcoded and cannot be changed.
Trying to be smart and adding automatic support for other encodings turned out to be a bad move on my end. At least I can reduce my dependency list again. :-)
I now just reject everything that explicitly specifies something different than
text/plain
and an optional charset other than utf-8
(ignoring casing). Otherwise I assume it's in UTF-8 (just like the twtxt file format specification mandates) and hope for the best.
A feed URL is plenty good enough for me. Since I only fetch feeds that I explicity follow, there is some basic trust in those feeds already. Spoofing, impersonation and what not are no issues for me. If I were to find out otherwise, I just unsubscribe from the evil feed. Done.
To retrieve public feeds, I just rely on TLS. Most are served via HTTPS. If a feed is down, I'm not trying to fetch it from some other source, I just wait and try again later. So signed messages/feeds are not a use case I'm particularly benefitting from.
To me, it's just not worth at all adding this crypto complexity on top.
But yes, this has a nice discoverability bonus. And even simpler than a hash, that's right.
Eventually, twt hashes have to change (lengthen at least), no doubt about that. But I'd like to keep it equally simple.
I just saw that we're supposed to hit 19°C mid next week again. Let's see.
I do like this photo a lot. It brings up memories of cool scouting trips.