I think I'll leave reachability.c alone, since my intention there was to use an indent level of one tab, and the spaces are just there to line up a few extra things. I fixed reachability_with_stack.cc though.






I couldn't resist taking home a prize:


It's been snowy here in #Toronto.

(I tried formatting the images in markdown for the benefit of yarn and any other clients that understand it.)
#photo #photos #winter
Anyway, I don't think my eccentric decision to number my twts in the style of other social media platforms is the only context where someone might write 1/4 not meaning a quarter. E.g. January 4, to Americans.
I'm happy to keep overthinking this for as long as you are :-P
1/4
to mean "first out of four".What has
text/markdown
got to do with this? I don't think Markdown says anything about replacing 1/4
with ¼, or other similar transformations. It's not needed, because ¼ is already a unicode character that can simply be directly inserted into the text file.What's wrong with my original suggestion of doing the transformation before the text hits the twtxt.txt file? @prologic, I think it would achieve what you are trying to achieve with this content-type thing: if someone writes
1/4
on a yarnd instance or any other client that wants to do this, it would get transformed, and other clients simply wouldn't do the transformation. Every client that supports displaying unicode characters, including Jenny, would then display ¼ as ¼.Alternatively, if you prefer yarnd to pretty-print all twts nicely, even ones from simpler clients, that's fine too and you don't need to change anything. My
1/4
-> ¼ thing is nothing more than a minor irritation which probably isn't worth overthinking.
I wonder if this kind of postprocessing would fit better between composing (via yarnd's UI) and publishing. So, if a yarnd user types 1/4, it could get changed to ¼ in the twtxt.txt file for everyone to see, not just people reading through yarnd. But when I type 1/4, meaning first out of four, as a non-yarnd user, the meaning wouldn't get corrupted. I can always type ¼ directly if that's what I really intend.
(This twt might be easier to understand if you read it without any transformations :-P)
Anyway, again, I'm not a yarnd user, so do what you will, just know you might not be seeing exactly what I meant.
Just a bunch of shell commands I can pipe together to search, list, view and reply to email (after syncing it to a local Maildir).
EXAMPLES at https://git.vuxu.org/mblaze/tree/README
So far I'm using most of the tools directly from the command line, but I might take inspiration from https://sr.ht/~rakoo/omail/ to make my workflow a bit more efficient.
*To get any closer, I think I'd have to hand-craft my own SMTP client or something.
> See https://dev.twtxt.net
Yes, that is exactly what I meant. I like that collection and "twtxt v2" feels like a departure.
Maybe there's an advantage to grouping it into one spec, but IMO that shouldn't be done at the same time as introducing new untested ideas.
> See https://yarn.social (especially this section: https://yarn.social/#self-host) -- It really doesn't get much simpler than this 🤣
Again, I like this existing simplicity. (I would even argue you don't need the metadata.)
That page says "For the best experience your client should also support some of the Twtxt Extensions..." but it is clear you don't need to. I would like it to stay that way, and publishing a big long spec and calling it "twtxt v2" feels like a departure from that. (I think the content of the document is valuable; I'm just carping about how it's being presented.)
- The Memory Police by Yōko Ogawa. Lovely writing. Very understated; reminded me of Kazuo Ishiguro. Sort of like Nineteen Eighty-Four but not. (I first heard it recommended in comparison to that work.)
- Subcutanean by Aaron Reed; https://subcutanean.textories.com/ . Every copy of the book is different, which is a cool idea. I read two of them (one from the library, actually not different from the other printed copies, and one personalized e-book). I don't read much horror so managed to be a little creeped out by it, which was fun.
- The Wind from Nowhere, a 1962 novel by J. G. Ballard. A random pick from the sci-fi section; I think I picked it up because it made me imagine some weird 4-dimensional effect ("from nowhere" meaning not in a normal direction) but actually (spoiler) it was just about a lot of wind for no reason. The book was moderately entertaining but there was nothing special about it.
Currently reading Scale by Greg Egan and Inversion by Aric McBay.
1. There are lots of great ideas here! Is there a benefit to putting them all into one document? Seems to me this could more easily be a bunch of separate efforts that can progress at their own pace:
1a. Better and longer hashes.
1b. New possibly-controversial ideas like edit: and delete: and location-based references as an alternative to hashes.
1c. Best practices, e.g. Content-Type: text/plain; charset=utf-8
1d. Stuff already described at dev.twtxt.net that doesn't need any changes.
2. We won't know what will and won't work until we try them. So I'm inclined to think of this as a bunch of draft ideas. Maybe later when we've seen it play out it could make sense to define a group of recommended twtxt extensions and give them a name.
3. Another reason for 1 (above) is: I like the current situation where all you need to get started is these two short and simple documents:
\thttps://twtxt.readthedocs.io/en/latest/user/twtxtfile.html
\thttps://twtxt.readthedocs.io/en/latest/user/discoverability.html
and everything else is an extension for anyone interested. (Deprecating non-UTC times seems reasonable to me, though.) Having a big long "twtxt v2" document seems less inviting to people looking for something simple. (@prologic you mentioned an anonymous comment "you've ruined twtxt" and while I don't completely agree with that commenter's sentiment, I would feel like twtxt had lost something if it moved away from having a super-simple core.)
4. All that being said, these are just my opinions, and I'm not doing the work of writing software or drafting proposals. Maybe I will at some point, but until then, if you're actually implementing things, you're in charge of what you decide to make, and I'm grateful for the work.=
1. There are lots of great ideas here! Is there a benefit to putting them all into one document? Seems to me this could more easily be a bunch of separate efforts that can progress at their own pace:
1a. Better and longer hashes.
1b. New possibly-controversial ideas like edit: and delete: and location-based references as an alternative to hashes.
1c. Best practices, e.g. Content-Type: text/plain; charset=utf-8
1d. Stuff already described at dev.twtxt.net that doesn't need any changes.
2. We won't know what will and won't work until we try them. So I'm inclined to think of this as a bunch of draft ideas. Maybe later when we've seen it play out it could make sense to define a group of recommended twtxt extensions and give them a name.
3. Another reason for 1 (above) is: I like the current situation where all you need to get started is these two short and simple documents:
https://twtxt.readthedocs.io/en/latest/user/twtxtfile.html
https://twtxt.readthedocs.io/en/latest/user/discoverability.html
and everything else is an extension for anyone interested. (Deprecating non-UTC times seems reasonable to me, though.) Having a big long "twtxt v2" document seems less inviting to people looking for something simple. (@prologic you mentioned an anonymous comment "you've ruined twtxt" and while I don't completely agree with that commenter's sentiment, I would feel like twtxt had lost something if it moved away from having a super-simple core.)
4. All that being said, these are just my opinions, and I'm not doing the work of writing software or drafting proposals. Maybe I will at some point, but until then, if you're actually implementing things, you're in charge of what you decide to make, and I'm grateful for the work.=
I'm being a little silly, of course. fzf doesn't actually check your email, but it appears to be basically the whole user interface for that mail program, with #mblaze wrangling the emails.
I've been thinking about how I handle my email, and am tempted to make something similar. (When I originally saw this linked the author was presenting it as an example tweaked to their own needs, encouraging people to make their own.)
This approach could surely also be combined with #jenny, taking the place of (neo)mutt. For example mblaze's mthread tool presents a threaded discussion with indentation.
Sorry, you're right, I should have used numbers!
I'm don't understand what "preserve the original hash" could mean other than "make sure there's still a twt in the feed with that hash". Maybe the text could be clarified somehow.
I'm also not sure what you mean by markdown already being part of it. Of course people can already use Markdown, just like presumably nothing stopped people from using (twt subjects) before they were formally described. But it's not universal; e.g. as a jenny user I just see the plain text.
For me there's a distinction. I feel very strongly that I should be able to retain whatever private information I like. On the other hand, I do have some sympathy for requests not to publish or propagate (though I personally feel it's still morally acceptable to ignore such requests).
I hope it can remain a living document (or sequence of draft revisions) for a good long time while we figure out how this stuff works in practice.
I am not sure how I feel about all this being done at once, vs. letting conventions arise.
For example, even today I could reply to twt abc1234 with "(#abc1234) Edit: ..." and I think all you humans would understand it as an edit to (#abc1234). Maybe eventually it would become a common enough convention that clients would start to support it explicitly.
Similarly we could just start using 11-digit hashes. We should iron out whether it's sha256 or whatever but there's no need get all the other stuff right at the same time.
I have similar thoughts about how some users could try out location-based replies in a backward-compatible way (append the replyto: stuff after the legacy (#hash) style).
However I recognize that I'm not the one implementing this stuff, and it's less work to just have everything determined up front.
Misc comments (I haven't read the whole thing):
- Did you mean to make hashes hexadecimal? You lose 11 bits that way compared to base32. I'd suggest gaining 11 bits with base64 instead.
- "Clients MUST preserve the original hash" --- do you mean they MUST preserve the original twt?
- Thanks for phrasing the bit about deletions so neutrally.
- I don't like the MUST in "Clients MUST follow the chain of reply-to references...". If someone writes a client as a 40-line shell script that requires the user to piece together the threading themselves, IMO we shouldn't declare the client non-conforming just because they didn't get to all the bells and whistles.
- Similarly I don't like the MUST for user agents. For one thing, you might want to fetch a feed without revealing your identty. Also, it raises the bar for a minimal implementation (I'm again thinking again of the 40-line shell script).
- For "who follows" lists: why must the long, random tokens be only valid for a limited time? Do you have a scenario in mind where they could leak?
- Why can't feeds be served over HTTP/1.0? Again, thinking about simple software. I recently tried implementing HTTP/1.1 and it wasn't too bad, but 1.0 would have been slightly simpler.
- Why get into the nitty-gritty about caching headers? This seems like generic advice for HTTP servers and clients.
- I'm a little sad about other protocols being not recommended.
- I don't know how I feel about including markdown. I don't mind too much that yarn users emit twts full of markdown, but I'm more of a plain text kind of person. Also it adds to the length. I wonder if putting a separate document would make more sense; that would also help with the length.
I don't know if it's worth giving much thought to the issue unless either you expect to get big enough for the GDPR to matter a lot (I imagine making money is a prerequisite) or someone specifically brings it up. Unless you enjoy thinking through this sort of thing, of course.
How interested would you be in changes in metadata and other comments in the feeds? I'm thinking of just permanently saving every version of each twtxt file that gets pulled, not just the twts. It wouldn't be hard to do (though presenting the information in a sensible way is another matter). Compression should make storage a non-issue unless someone does something weird with their feed like shuffle the comments around every time I fetch it.
Would the GDPR would apply to a one-person client like jenny? I seriously hope not. If someone asks me to delete an email they sent me, I don't think I have to honour that request, no matter how European they are.
I am really bothered by the idea that someone could force me to delete my private, personal record of my interactions with them. Would I have to delete my journal entries about them too if they asked?
Maybe a public-facing client like yarnd needs to consider this, but that also bothers me. I was actually thinking about making an Internet Archive style twtxt archiver, letting you explore past twts, including long-dead feeds, see edit histories, deleted twts, etc.
I don't think we need to decide all at once. If clients add support for a new method then people can use it if they like. The downside of course is that this costs developer time, so I decided to invest a few hours of my own time into a proof of concept.
With apologies to @movq for corrupting jenny's beautiful code. I don't write this expecting you to incorporate the patch, because it does complicate things and might not be a direction you want to go in. But if you like any part of this approach feel free to use bits of it; I release the patch under jenny's current LICENCE.
Supporting both kinds of reply in jenny was complicated because each email can only have one Message-Id, and because it's possible the target twt will not be seen until after the twt referencing it. The following patch uses an sqlite database to keep track of known (url, timestamp) pairs, as well as a separate table of (url, timestamp) pairs that haven't been seen yet but are wanted. When one of those "wanted" twts is finally seen, the mail file gets rewritten to include the appropriate In-Reply-To header.
Patch based on jenny commit 73a5ea81.
https://www.falsifian.org/a/oDtr/patch0.txt
Not implemented:
- Composing twts using the (replyto ...) format.
- Probably other important things I'm forgetting.

git only uses sha1 because they're stuck with it: migrating is very hard. There was an effort to move git to sha256 but I don't know its status. I think there is progress being made with Game Of Trees, a git clone that uses the same on-disk format.
I can't imagine any benefit to using sha1, except that maybe some very old software might support sha1 but not sha256.
But I do like @movq's suggestion on this thread that feeds could contain both the original and the edited twt. I guess it would be up to the author.
There's a simple reason all the current hashes end in a or q: the hash is 256 bits, the base32 encoding chops that into groups of 5 bits, and 256 isn't divisible by 5. The last character of the base32 encoding just has that left-over single bit (256 mod 5 = 1).
So I agree with #3 below, but do you have a source for #1, #2 or #4? I would expect any lack of variability in any part of a hash function's output would make it more vulnerable to attacks, so designers of hash functions would want to make the whole output vary as much as possible.
Other than the divisible-by-5 thing, my current intuition is it doesn't matter what part you take.
> 1. Hash Structure: Hashes are typically designed so that their outputs have specific statistical properties. The first few characters often have more entropy or variability, meaning they are less likely to have patterns. The last characters may not maintain this randomness, especially if the encoding method has a tendency to produce less varied endings.
>
> 2. Collision Resistance: When using hashes, the goal is to minimize the risk of collisions (different inputs producing the same output). By using the first few characters, you leverage the full distribution of the hash. The last characters may not distribute in the same way, potentially increasing the likelihood of collisions.
>
> 3. Encoding Characteristics: Base32 encoding has a specific structure and padding that might influence the last characters more than the first. If the data being hashed is similar, the last characters may be more similar across different hashes.
>
> 4. Use Cases: In many applications (like generating unique identifiers), the beginning of the hash is often the most informative and varied. Relying on the end might reduce the uniqueness of generated identifiers, especially if a prefix has a specific context or meaning.=
URLs can contain commas so I suggest a different character to separate the url from the date. Is this twt I've used space (also after "replyto", for symmetry).
I think this solves:
- Changing feed identities: although @mckinley points out URLs can change, I think this syntax should be okay as long as the feed at that URL can be fetched, and as long as the current canonical URL for the feed lists this one as an alternate.
- editing, if you don't care about message integrity
- finding the root of a thread, if you're not following the author
An optional hash could be added if message integrity is desired. (E.g. if you don't trust the feed author not to make a misleading edit.) Other recent suggestions about how to deal with edits and hashes might be applicable then.
People publishing multiple twts per second should include sub-second precision in their timestamps. As you suggested, the timestamp could just be copied verbatim.
> Maybe I’m being a bit too purist/minimalistic here. As I said before (in one of the 1372739 posts on this topic – or maybe I didn’t even send that twt, I don’t remember 😅), I never really liked hashes to begin with. They aren’t super hard to implement but they are kind of against the beauty of the original twtxt – because you *need* special client support for them. It’s not something that you could write manually in your
twtxt.txt
file. With @sorenpeter’s proposal, though, that would be possible.Tangentially related, I was a bit disappointed to learn that the twt subject extension is now never used except with hashes. Manually-written subjects sounded so beautifully ad-hoc and organic as a way to disambiguate replies. Maybe I'll try it some time just for fun.
> (#w4chkna) @falsifian You mean the idea of being able to inline
# url =
changes in your feed?Yes, that one. But @lyse pointed out suffers a compatibility issue, since currently the first listed url is used for hashing, not the last. Unless your feed is in reverse chronological order. Heh, I guess another metadata field could indicate which version to use.
Or maybe url changes could somehow be combined with the archive feeds extension? Could the url metadata field be local to each archive file, so that to switch to a new url all you need to do is archive everything you've got and start a new file at the new url?
I don't think it's that likely my feed url will change.
I mostly just wanted an excuse to write the program. I don't know how I feel about actually using super-long hashes; could make the twts annoying to read if you prefer to view them untransformed.
Imagine I found this twt one day at https://example.com/twtxt.txt :
2024-09-14T22:00Z\tUseful backup command: rsync -a "$HOME" /mnt/backup

and I responded with "(#5dgoirqemeq) Thanks for the tip!". Then I've endorsed the twt, but it could latter get changed to
2024-09-14T22:00Z\tUseful backup command: rm -rf /some_important_directory

which also has an 11-character base32 hash of 5dgoirqemeq. (I'm using the existing hashing method with https://example.com/twtxt.txt as the feed url, but I'm taking 11 characters instead of 7 from the end of the base32 encoding.)
That's what I meant by "spoofing" in an earlier twt.
I don't know if preventing this sort of attack should be a goal, but if it is, the number of bits in the hash should be at least two times log2(number of attempts we want to defend against), where the "two times" is because of the birthday paradox.
Side note: current hashes always end with "a" or "q", which is a bit wasteful. Maybe we should take the first N characters of the base32 encoding instead of the last N.
Code I used for the above example: https://fossil.falsifian.org/misc/file?name=src/twt_collision/find_collision.c
I only needed to compute 43394987 hashes to find it.
Imagine I found this twt one day at https://example.com/twtxt.txt :
2024-09-14T22:00Z Useful backup command: rsync -a "$HOME" /mnt/backup

and I responded with "(#5dgoirqemeq) Thanks for the tip!". Then I've endorsed the twt, but it could latter get changed to
2024-09-14T22:00Z Useful backup command: rm -rf /some_important_directory

which also has an 11-character base32 hash of 5dgoirqemeq. (I'm using the existing hashing method with https://example.com/twtxt.txt as the feed url, but I'm taking 11 characters instead of 7 from the end of the base32 encoding.)
That's what I meant by "spoofing" in an earlier twt.
I don't know if preventing this sort of attack should be a goal, but if it is, the number of bits in the hash should be at least two times log2(number of attempts we want to defend against), where the "two times" is because of the birthday paradox.
Side note: current hashes always end with "a" or "q", which is a bit wasteful. Maybe we should take the first N characters of the base32 encoding instead of the last N.
Code I used for the above example: https://fossil.falsifian.org/misc/file?name=src/twt_collision/find_collision.c
I only needed to compute 43394987 hashes to find it.
They're in Section 6:
- Receiver should adopt UDP GRO. (Something about saving CPU processing UDP packets; I'm a but fuzzy about it.) And they have suggestions for making GRO more useful for QUIC.
- Some other receiver-side suggestions: "sending delayed QUICK ACKs"; "using recvmsg to read multiple UDF packets in a single system call".
- Use multiple threads when receiving large files.
> > HTTPS is supposed to do [verification] anyway.
>
> TLS provides verification that nobody is tampering with or snooping on your connection to a server. It doesn't, for example, verify that a file downloaded from server A is from the same entity as the one from server B.
I was confused by this response for a while, but now I think I understand what you're getting at. You are pointing out that with signed feeds, I can verify the authenticity of a feed without accessing the original server, whereas with HTTPS I can't verify a feed unless I download it myself from the origin server. Is that right?
I.e. if the HTTPS origin server is online and I don't mind taking the time and bandwidth to contact it, then perhaps signed feeds offer no advantage, but if the origin server might not be online, or I want to download a big archive of lots of feeds at once without contacting each server individually, then I need signed feeds.
> > feed locations [being] URLs gives some flexibility
>
> It does give flexibility, but perhaps we should have made them URIs instead for even more flexibility. Then, you could use a tag URI,
urn:uuid:*
, or a regular old URL if you wanted to. The spec seems to indicate that the url
tag should be a working URL that clients can use to find a copy of the feed, optionally at multiple locations. I'm not very familiar with IP{F,N}S but if it ensures you own an identifier forever and that identifier points to a current copy of your feed, it could be a great way to fix it on an individual basis without breaking any specs :)I'm also not very familiar with IPFS or IPNS.
I haven't been following the other twts about signatures carefully. I just hope whatever you smart people come up with will be backwards-compatible so it still works if I'm too lazy to change how I publish my feed :-)
> > HTTPS is supposed to do \n anyway.
>
> TLS provides verification that nobody is tampering with or snooping on your connection to a server. It doesn't, for example, verify that a file downloaded from server A is from the same entity as the one from server B.
I was confused by this response for a while, but now I think I understand what you're getting at. You are pointing out that with signed feeds, I can verify the authenticity of a feed without accessing the original server, whereas with HTTPS I can't verify a feed unless I download it myself from the origin server. Is that right?
I.e. if the HTTPS origin server is online and I don't mind taking the time and bandwidth to contact it, then perhaps signed feeds offer no advantage, but if the origin server might not be online, or I want to download a big archive of lots of feeds at once without contacting each server individually, then I need signed feeds.
> > feed locations \n URLs gives some flexibility
>
> It does give flexibility, but perhaps we should have made them URIs instead for even more flexibility. Then, you could use a tag URI,
urn:uuid:*
, or a regular old URL if you wanted to. The spec seems to indicate that the url
tag should be a working URL that clients can use to find a copy of the feed, optionally at multiple locations. I'm not very familiar with IP{F,N}S but if it ensures you own an identifier forever and that identifier points to a current copy of your feed, it could be a great way to fix it on an individual basis without breaking any specs :)I'm also not very familiar with IPFS or IPNS.
I haven't been following the other twts about signatures carefully. I just hope whatever you smart people come up with will be backwards-compatible so it still works if I'm too lazy to change how I publish my feed :-)
To be fair, I don't think QUIC was ever expected to be faster for transferring a single stream of data. I think QUIC is supposed to reduce the impact of a dropped packet by making sure it only affects the stream it's part of. I imagine QUIC still has that advantage, and this paper is showing the other side of a tradeoff.
Another thought: if clients can't agree on the url (for example, if we switch to this new way, but some old clients still do it the old way), that could be mitigated by computing many hashes for each twt: one for every url in the feed. So, if a feed has three URLs, every twt is associated with three hashes when it comes time to put threads together.
A client stills need to choose one url to use for the hash when composing a reply, but this might add some breathing room if there's a period when clients are doing different things.
(From what I understand of jenny, this would be difficult to implement there since each pseudo-email can only have one msgid to match to the in-reply-to headers. I don't know about other clients.)
Maybe you could even just use the time, and rely on -mentions to disambiguate. Not sure how that would work out.
Though I kind of like the idea of twts being immutable. At least, it's clear which version of a twt you're replying to (assuming nobody is engineering hash collisions).
1. Key rotation. I'm not a security person, but my understanding is that it's good to be able to give keys an expiry date and replace them with new ones periodically.
2. It makes maintaining a feed more complicated. Now instead of just needing to put a file on a web server (and scan the logs for user agents) I also need to do this. What brought me to twtxt was its radical simplicity.
Instead, maybe we should think about a way to allow old urls to be rotated out? Like, my metadata could somehow say that X used to be my primary URL, but going forward from date D onward my primary url is Y. (Or, if you really want to use public key cryptography, maybe something similar could be used for key rotation there.)
It's nice that your scheme would add a way to verify the twts you download, but https is supposed to do that anyway. If you don't trust https to do that (maybe you don't like relying on root CAs?) then maybe your preferred solution should be reflected by your primary feed url. E.g. if you prefer the security offered by IPFS, then maybe an IPNS url would do the trick. The fact that feed locations are URLs gives some flexibility. (But then rotation is still an issue, if I understand ipns right.)
What I like about this is that clients that don't know this convention will still stick it in the same thread. And I feel it's in the spirit of the old pre-hash (subject) convention, though that's before my time.
I guess it may not work when the edited twt itself is a reply, and there are replies to it. Maybe that could be solved by letting twts have more than one (subject) prefix.
> But the great thing about the current system is that nobody can spoof message IDs.
I don't think twtxt hashes are long enough to prevent spoofing.
What I like about this is that clients that don't know this convention will still stick it in the same thread. And I feel it's in the spirit of the old pre-hash (subject) convention, though that's before my time.
I guess it may not work when the edited twt itself is a reply, and there are replies to it. Maybe that could be solved by letting twts have more than one (subject) prefix.
> But the great thing about the current system is that nobody can spoof message IDs.
I don't think twtxt hashes are long enough to prevent spoofing.