The Watcher

xuu

dev.txt.sour.is

There is nothing wrong with how we currently run a diff to see what has been removed. if i build a merkle tree off all the twt hashes in a feed i can use that to verify a twt should be in a feed or not. and gossip that to my peers.

movq

www.uninformativ.de

18 Sep 24 19:34 UTC+0000

View Thread

(Or maybe I’m talking nonsense. That’s known to happen. I’ll go to bed. 😂)

movq

www.uninformativ.de

18 Sep 24 19:34 UTC

View Thread

(Or maybe I’m talking nonsense. That’s known to happen. I’ll go to bed. 😂)

movq

www.uninformativ.de

18 Sep 24 19:34 UTC+0000

View Thread

(Or maybe I’m talking nonsense. That’s known to happen. I’ll go to bed. 😂)

movq

www.uninformativ.de

18 Sep 24 19:34 UTC+0000

View Thread

(Or maybe I’m talking nonsense. That’s known to happen. I’ll go to bed. 😂)

xuu

txt.sour.is

18 Sep 24 13:34 UTC-0600

View Thread

So.. basically a rehash of the email "unsend" requests? What if i was to make a (delete: 5vbi2ea) .. would it delete someone elses twt?

xuu

dev.txt.sour.is

18 Sep 24 13:34 UTC-0600

View Thread

So.. basically a rehash of the email "unsend" requests? What if i was to make a (delete: 5vbi2ea) .. would it delete someone elses twt?

bender

twtxt.net

18 Sep 24 19:31 UTC

View Thread

> Brisbane is coming onboard. Roosters are "singing" all around @prologic, and the dog is begging for the morning poo/pee walk. @prologic throws a slipper at the dog, as he turns around, and hides under his comforter.

😂😂😂

movq

www.uninformativ.de

18 Sep 24 19:29 UTC+0000

View Thread

@quark Printing a version? I’ll think about it. 🤔

It would be easy to do for releases, but it’s a little hard to do for all the commits in between – jenny has no build process, so there’s no easy way to incorporate the output of git describe, for example.

movq

www.uninformativ.de

18 Sep 24 19:29 UTC+0000

View Thread

movq

www.uninformativ.de

18 Sep 24 19:29 UTC

View Thread

movq

www.uninformativ.de

18 Sep 24 19:29 UTC+0000

View Thread

xuu

dev.txt.sour.is

18 Sep 24 13:29 UTC-0600

View Thread

isn't the benefit of blake2b that it is a more efficient algo than sha1 and has the same or similar entropy to sha3? i thought we had partially solved this with some type of expanding hash size? additionally we could increase bit density by using base36 or base64/url-safe...

xuu

txt.sour.is

18 Sep 24 13:29 UTC-0600

View Thread

movq

www.uninformativ.de

18 Sep 24 19:25 UTC+0000

View Thread

I’m not advocating in either direction, btw. I haven’t made up my mind yet. 😅 Just braindumping here.

The (replyto:…) proposal is definitely more in the spirit of twtxt, I’d say. It’s much simpler, anyone can use it even with the simplest tools, no need for any client code. That is certainly a great property, if you ask me, and it’s things like that that brought me to twtxt in the first place.

I’d also say that in our tiny little community, message integrity simply doesn’t matter. Signed feeds don’t matter. I signed my feed for a while using GPG, someone else did the same, but in the end, nobody cares. The community is so tiny, there’s enough “implicit trust” or whatever you want to call it.

If twtxt/Yarn was to grow bigger, then this would become a concern again. *But even Mastodon allows editing*, so how much of a problem can it really be? 😅

I do have to “admit”, though, that hashes *feel* better. It feels good to know that we can clearly identify a certain twt. It feels more correct and stable.

Hm.

I *suspect* that the (replyto:…) proposal would work just as well in practice.

movq

www.uninformativ.de

18 Sep 24 19:25 UTC+0000

View Thread

movq

www.uninformativ.de

18 Sep 24 19:25 UTC+0000

View Thread

movq

www.uninformativ.de

18 Sep 24 19:25 UTC

View Thread

quark

ferengi.one

18 Sep 24 19:24 UTC+0000

View Thread

Hey, @movq, a tiny thing to add to jenny, a -v switch. That way when you twtxt "*That’s an older format that was used before jenny version v23.04*", I can go and run jenny -v, and "duh!" myself on the way to a git pull. :-D

quark

ferengi.one

18 Sep 24 19:24 UTC

View Thread

quark

ferengi.one

18 Sep 24 19:13 UTC

View Thread

@movq ooooh, nice! commit 62a2b7735749f2ff3c9306dd984ad28f853595c5:

> Crawl archived feeds in --fetch-context

Like, very much! :-)

quark

ferengi.one

18 Sep 24 19:13 UTC+0000

View Thread

@movq ooooh, nice! commit 62a2b7735749f2ff3c9306dd984ad28f853595c5:

> Crawl archived feeds in --fetch-context

Like, very much! :-)

movq

www.uninformativ.de

18 Sep 24 19:12 UTC+0000

View Thread

@falsifian @prologic @lyse

> - editing, if you don't care about message integrity

So that’s the big question, because that’s the only real difference between hashes and the (replyto:…) proposal.

Do we care about message integrity?

With (replyto:…), someone could write a twt, then I reply to it, like “you’re absolutely right!”, and then that person could change their twt to something malicious like “the earth is flat!” And then it would look like I’m a nutcase agreeing with that person. 😅

Hashes (in their current form) prevent that. The thread is broken and my reply clearly refers to something else. That’s good, right?

But now take into account that we want to allow editing anyway. Is there even a point to using hashes anymore? Isn’t message integrity ignored anyway now, at least in practice?

There’s no difference (in practice) between someone writing

2024-09-18T12:34Z Brds are great!

and then editing it to either

2024-09-18T12:34Z (original:#12379) Birds are great! (Whoops, fixed a typo.)

or

2024-09-18T12:34Z (original:#12379) The earth is flat!

The actual original message is (potentially) gone. The only thing that we can be sure of now is that the twt was edited in *some* way. *Essentially*, the actual twt message is no longer part of the hash, is it? What does #12379 refer to? The edited message or the original one? We *want* it to refer to the edited one, because we don’t want to break threads, so … what’s the point of using a hash?

movq

www.uninformativ.de

18 Sep 24 19:12 UTC+0000

View Thread

movq

www.uninformativ.de

18 Sep 24 19:12 UTC

View Thread

movq

www.uninformativ.de

18 Sep 24 19:12 UTC+0000

View Thread

quark

ferengi.one

18 Sep 24 19:00 UTC

View Thread

@movq to paraphrase US Presidents speech on each State of the Union, "the State of the Jenny is strong!" :-D As for the potential upcoming changes, there has to be a knowledgeable head honcho that will agglomerate and coalesce, and guide onto the direction that will be taken. All that with the strong input from the developers that will be implementing the changes, and a lesser (but not less valuable) input from users.

quark

ferengi.one

18 Sep 24 19:00 UTC+0000

View Thread

@jo

comam.es

18 Sep 24 21:00 UTC+0200

View Thread

[47°09′45″S, 126°43′35″W] Transfer completed

ttybitnik

eternodevir.com

18 Sep 24 15:53 UTC-0300

View Thread

Better is better than best. Wisdom droplet from the Google developer documentation style guide.

movq

www.uninformativ.de

18 Sep 24 18:49 UTC+0000

View Thread

Regarding jenny development: There have been enough changes in the last few weeks, imo. I want to let things settle for a while (potential bugfixes aside) and then I’m going to cut a new release.

And I guess the release after that is going to include all the threading/hashing stuff – if we can decide on one of the proposals. 😂

movq

www.uninformativ.de

18 Sep 24 18:49 UTC+0000

View Thread

movq

www.uninformativ.de

18 Sep 24 18:49 UTC+0000

View Thread

movq

www.uninformativ.de

18 Sep 24 18:49 UTC

View Thread

quark

ferengi.one

18 Sep 24 18:39 UTC

View Thread

@lyse I call upon the services of the @yarn_police to further investigate this oddness!

quark

ferengi.one

18 Sep 24 18:39 UTC+0000

View Thread

@lyse I call upon the services of the @yarn_police to further investigate this oddness!

falsifian

www.falsifian.org

18 Sep 24 18:38 UTC

View Thread

@quark Oh, sure, it would be nice if edits didn't break threads. I was just pondering the circumstances under which I get annoyed about data being irrecoverably deleted or otherwise lost.

lyse

lyse.isobeef.org

18 Sep 24 20:30 UTC+0200

View Thread

@falsifian Yeah, delete requests feel very odd.

quark

ferengi.one

18 Sep 24 18:28 UTC+0000

View Thread

@falsifian "*I don't really mind if the twt gets edited before I even fetch it.*", right, that's never the problem. Editing a twtxt before anyone fetches it isn't even editing, right? :-P The problem we are trying to fix is the havoc is causes editing twtxts that have already been replied to, often ad nauseam. That's the real problem.

quark

ferengi.one

18 Sep 24 18:28 UTC

View Thread

falsifian

www.falsifian.org

18 Sep 24 18:18 UTC

View Thread

@quark I don't really mind if the twt gets edited before I even fetch it. I think it's the idea of my computer discarding old versions it's fetched, especially if it's shown them to me, that bugs me.

But I do like @movq's suggestion on this thread that feeds could contain both the original and the edited twt. I guess it would be up to the author.

quark

ferengi.one

18 Sep 24 18:16 UTC

View Thread

@lyse now, how am I not surprised at that reply?! Hahahahaha!

quark

ferengi.one

18 Sep 24 18:16 UTC+0000

View Thread

@lyse now, how am I not surprised at that reply?! Hahahahaha!

lyse

lyse.isobeef.org

18 Sep 24 20:15 UTC+0200

View Thread

@prologic I wish that was true! But I reckon there is still heaps of old stuff out there, that was created on a Windows machine. :-D And I wouldn't be surprised if even today in that environment a new file does not make use of UTF-8.

quark

ferengi.one

18 Sep 24 18:06 UTC+0000

View Thread

@falsifian that would be problematic to do on a fully decentralised system. I am not disagreeing, though. That's the reason I have stopped editing twtxts. I strive to own mistakes, as minor as they might be. Now, if trail editing can be accomplished, I am all for it!

quark

ferengi.one

18 Sep 24 18:06 UTC

View Thread

lyse

lyse.isobeef.org

18 Sep 24 20:00 UTC+0200

View Thread

@quark I'm not convinced. :-D

falsifian

www.falsifian.org

18 Sep 24 17:50 UTC

View Thread

@quark None. I like being able to see edit history for the same reason.

lyse

lyse.isobeef.org

18 Sep 24 19:45 UTC+0200

View Thread

@quark @movq Yep, they're all RFC3339. Obviously, +02:00 and +01:00 are best, because I use them! :-P In all seriousness, Z might be the best timezone, as it is shortest. And regarding privacy, it leaks the least information about the user's rough location. But of course, one can just look at the activity and narrow down plausible regions, so that's a weak argument.

aelaraji

aelaraji.com

18 Sep 24 17:38 UTC+0000

View Thread

@movq You're right! switching from zsh to bash gave me the same result zq4fgq Thanks!

aelaraji

aelaraji.com

18 Sep 24 17:38 UTC+0000

View Thread

@movq You're right! switching from zsh to bash gave me the same result zq4fgq Thanks!

aelaraji

aelaraji.com

18 Sep 24 17:38 UTC

View Thread

@movq You're right! switching from zsh to bash gave me the same result zq4fgq Thanks!

quark

ferengi.one

18 Sep 24 17:38 UTC

View Thread

@falsifian what would the difference be between an edit the changes everything on the original twtxt, and a delete?

quark

ferengi.one

18 Sep 24 17:38 UTC+0000

View Thread

@falsifian what would the difference be between an edit the changes everything on the original twtxt, and a delete?

falsifian

www.falsifian.org

18 Sep 24 17:35 UTC

View Thread

@prologic Why sha1 in particular? There are known attacks on it. sha256 seems pretty widely supported if you're worried about support.

falsifian

www.falsifian.org

18 Sep 24 17:32 UTC

View Thread

@prologic I wouldn't want my client to honour delete requests. I like my computer's memory to be better than mine, not worse, so it would bug me if I remember seeing something and my computer can't find it.

falsifian

www.falsifian.org

18 Sep 24 17:28 UTC

View Thread

@prologic

There's a simple reason all the current hashes end in a or q: the hash is 256 bits, the base32 encoding chops that into groups of 5 bits, and 256 isn't divisible by 5. The last character of the base32 encoding just has that left-over single bit (256 mod 5 = 1).

So I agree with #3 below, but do you have a source for #1, #2 or #4? I would expect any lack of variability in any part of a hash function's output would make it more vulnerable to attacks, so designers of hash functions would want to make the whole output vary as much as possible.

Other than the divisible-by-5 thing, my current intuition is it doesn't matter what part you take.

> 1. Hash Structure: Hashes are typically designed so that their outputs have specific statistical properties. The first few characters often have more entropy or variability, meaning they are less likely to have patterns. The last characters may not maintain this randomness, especially if the encoding method has a tendency to produce less varied endings.
>
> 2. Collision Resistance: When using hashes, the goal is to minimize the risk of collisions (different inputs producing the same output). By using the first few characters, you leverage the full distribution of the hash. The last characters may not distribute in the same way, potentially increasing the likelihood of collisions.
>
> 3. Encoding Characteristics: Base32 encoding has a specific structure and padding that might influence the last characters more than the first. If the data being hashed is similar, the last characters may be more similar across different hashes.
>
> 4. Use Cases: In many applications (like generating unique identifiers), the beginning of the hash is often the most informative and varied. Relying on the end might reduce the uniqueness of generated identifiers, especially if a prefix has a specific context or meaning.=

movq

www.uninformativ.de

18 Sep 24 17:13 UTC

View Thread

@aelaraji Looks like your shell didn’t turn the \\n into actual newlines:


$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\\n2020-07-18T12:39:52Z\\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
zq4fgq
$ printf "https://twtxt.net/user/prologic/twtxt.txt\\\\n2020-07-18T12:39:52Z\\\\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
p44j3q

movq

www.uninformativ.de

18 Sep 24 17:13 UTC+0000

View Thread

@aelaraji Looks like your shell didn’t turn the \n into actual newlines:


$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
zq4fgq
$ printf "https://twtxt.net/user/prologic/twtxt.txt\\n2020-07-18T12:39:52Z\\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
p44j3q

movq

www.uninformativ.de

18 Sep 24 17:13 UTC+0000

View Thread

@aelaraji Looks like your shell didn’t turn the \n into actual newlines:


$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
zq4fgq
$ printf "https://twtxt.net/user/prologic/twtxt.txt\\n2020-07-18T12:39:52Z\\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
p44j3q

movq

www.uninformativ.de

18 Sep 24 17:13 UTC+0000

View Thread

@aelaraji Looks like your shell didn’t turn the \n into actual newlines:


$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
zq4fgq
$ printf "https://twtxt.net/user/prologic/twtxt.txt\\n2020-07-18T12:39:52Z\\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
p44j3q

quark

ferengi.one

18 Sep 24 17:08 UTC

View Thread

@aelaraji odd, I ran it under Ubuntu 24.04, and got the same result as @prologic (which is on macOS), zq4fgq.

quark

ferengi.one

18 Sep 24 17:08 UTC+0000

View Thread

@aelaraji odd, I ran it under Ubuntu 24.04, and got the same result as @prologic (which is on macOS), zq4fgq.

aelaraji

aelaraji.com

18 Sep 24 17:03 UTC+0000

View Thread

@prologic I ran the same command and got an even different result xD


~ » echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
p44j3q

aelaraji

aelaraji.com

18 Sep 24 17:03 UTC+0000

View Thread

@prologic I ran the same command and got an even different result xD


~ » echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
p44j3q

aelaraji

aelaraji.com

18 Sep 24 17:03 UTC

View Thread

@prologic I ran the same command and got an even different result xD


~ » echo -n "https://twtxt.net/user/prologic/twtxt.txt\\n2020-07-18T12:39:52Z\\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
p44j3q

akkartik

akkartik.name

18 Sep 24 09:11 UTC-0700

View Thread

Beginnings of a little notebook app. Doesn't actually run any code yet. https://akkartik.name/images/20240917-notebook.png

akkartik.name

18 Sep 24 09:11 UTC-0700

View Thread

Beginnings of a little notebook app. Doesn't actually run any code yet. https://akkartik.name/images/20240917-notebook.png

quark

ferengi.one

18 Sep 24 16:00 UTC+0000

View Thread

@prologic I just realised the jenny also does what I want, as of latest commit. Simply use jenny --debug-feed <feed url>, and it will do what I wanted too!

quark

ferengi.one

18 Sep 24 16:00 UTC

View Thread

@prologic I just realised the jenny also does what I want, as of latest commit. Simply use jenny --debug-feed <feed url>, and it will do what I wanted too!

@jo

comam.es

18 Sep 24 18:00 UTC+0200

View Thread

[47°09′50″S, 126°43′50″W] Carrier too weak

quark

ferengi.one

18 Sep 24 15:51 UTC+0000

View Thread

@movq alright, fair, and interesting. I was expecting them to be all the same (format wise), but it doesn't matter, for sure, as it works just fine. Thanks!

quark

ferengi.one

18 Sep 24 15:51 UTC

View Thread

@movq alright, fair, and interesting. I was expecting them to be all the same (format wise), but it doesn't matter, for sure, as it works just fine. Thanks!

movq

www.uninformativ.de

18 Sep 24 15:42 UTC

View Thread

@quark They’re all RFC3339, unless I’m mistaken: https://ijmacd.github.io/rfc3339-iso8601/ So they’re all correct.

movq

www.uninformativ.de

18 Sep 24 15:42 UTC+0000

View Thread

@quark They’re all RFC3339, unless I’m mistaken: https://ijmacd.github.io/rfc3339-iso8601/ So they’re all correct.

movq

www.uninformativ.de

18 Sep 24 15:42 UTC+0000

View Thread

@quark They’re all RFC3339, unless I’m mistaken: https://ijmacd.github.io/rfc3339-iso8601/ So they’re all correct.

movq

www.uninformativ.de

18 Sep 24 15:42 UTC+0000

View Thread

@quark They’re all RFC3339, unless I’m mistaken: https://ijmacd.github.io/rfc3339-iso8601/ So they’re all correct.

quark

ferengi.one

18 Sep 24 15:40 UTC

View Thread

I have noticed that twtxt timestamps differ. For example:

* @prologic (and I assume any Yarn user)
2024-09-18T13:16:17Z
* @lyse
2024-09-17T21:15:00+02:00
* @aelaraji (and @movq, and me)
2024-09-18T05:43:13+00:00

So, which is right, or best?*

quark

ferengi.one

18 Sep 24 15:40 UTC+0000

View Thread

quark

ferengi.one

18 Sep 24 15:24 UTC+0000

View Thread

I came across this Gallery Theme for Hugo, and @lyse immediately came to mind. I think it would be a very fitting theme to use for all your photos, Lyse!

quark

ferengi.one

18 Sep 24 15:24 UTC

View Thread

I came across this Gallery Theme for Hugo, and @lyse immediately came to mind. I think it would be a very fitting theme to use for all your photos, Lyse!

movq

www.uninformativ.de

18 Sep 24 15:22 UTC+0000

View Thread

@prologic So the feed would contain *two* twts, right?


2024-09-18T23:08:00+10:00	Hllo World
2024-09-18T23:10:43+10:00	(edit:#229d24612a2) Hello World

movq

www.uninformativ.de

18 Sep 24 15:22 UTC

View Thread

@prologic So the feed would contain *two* twts, right?


2024-09-18T23:08:00+10:00\tHllo World
2024-09-18T23:10:43+10:00\t(edit:#229d24612a2) Hello World

movq

www.uninformativ.de

18 Sep 24 15:22 UTC+0000

View Thread

@prologic So the feed would contain *two* twts, right?


2024-09-18T23:08:00+10:00	Hllo World
2024-09-18T23:10:43+10:00	(edit:#229d24612a2) Hello World

movq

www.uninformativ.de

18 Sep 24 15:22 UTC+0000

View Thread

@prologic So the feed would contain *two* twts, right?


2024-09-18T23:08:00+10:00	Hllo World
2024-09-18T23:10:43+10:00	(edit:#229d24612a2) Hello World

garfield

feeds.twtxt.net

18 Sep 24 14:30 UTC

View Thread

****
Solo para boomers ⌘ Read more****

prologic

twtxt.net

18 Sep 24 13:16 UTC

View Thread

Finally @lyse 's idea of updating metadata changes in a feed "inline" where the change happened (_with respect to other Twts in whatever order the file is written in_) is used to drive things like "Oh this feed now has a new URI, let's use that from now on as the feed's identity for the purposes of computing Twt hashes". This could extend to # nick = as preferential indicators to clients as well as even other updates such as # description = -- Not just # url =

prologic

twtxt.net

18 Sep 24 13:16 UTC

View Thread

prologic

twtxt.net

18 Sep 24 13:14 UTC

View Thread

Likewise we _could_ also support delete:229d24612a2, which would indicate to clients that fetch the feed to delete any cached Twt matching the hash 229d24612a2 if the author wishes to "unpublish" that Twt permanently, rather than just deleting the line from the feed (_which does nothing for clients really_).

prologic

twtxt.net

18 Sep 24 13:14 UTC

View Thread

prologic

twtxt.net

18 Sep 24 13:12 UTC

View Thread

An alternate idea for supporting (_properly_) Twt Edits is to denoate as such and extend the meaning of a Twt Subject (_which would need to be called something better?_); For example, let's say I produced the following Twt:


2024-09-18T23:08:00+10:00	Hllo World

And my feed's URI is https://example.com/twtxt.txt. The hash for this Twt is therefore 229d24612a2:


$ echo -n "https://example.com/twtxt.txt\n2024-09-18T23:08:00+10:00\nHllo World" | sha1sum | head -c 11
229d24612a2

You wish to correct your mistake, so you make an amendment to that Twt like so:


2024-09-18T23:10:43+10:00	(edit:#229d24612a2) Hello World

Which would then have a new Twt hash value of 026d77e03fa:


$ echo -n "https://example.com/twtxt.txt\n2024-09-18T23:10:43+10:00\nHello World" | sha1sum | head -c 11
026d77e03fa

Clients would then take this edit:#229d24612a2 to mean, this Twt is an edit of 229d24612a2 and should be replaced in the client's cache, or indicated as such to the user that this is the intended content._

prologic

twtxt.net

18 Sep 24 13:12 UTC

View Thread


2024-09-18T23:08:00+10:00	Hllo World

And my feed's URI is https://example.com/twtxt.txt. The hash for this Twt is therefore 229d24612a2:


$ echo -n "https://example.com/twtxt.txt\n2024-09-18T23:08:00+10:00\nHllo World" | sha1sum | head -c 11
229d24612a2

You wish to correct your mistake, so you make an amendment to that Twt like so:


2024-09-18T23:10:43+10:00	(edit:#229d24612a2) Hello World

Which would then have a new Twt hash value of 026d77e03fa:


$ echo -n "https://example.com/twtxt.txt\n2024-09-18T23:10:43+10:00\nHello World" | sha1sum | head -c 11
026d77e03fa

prologic

twtxt.net

18 Sep 24 13:12 UTC

View Thread


2024-09-18T23:08:00+10:00\tHllo World

And my feed's URI is https://example.com/twtxt.txt. The hash for this Twt is therefore 229d24612a2:


$ echo -n "https://example.com/twtxt.txt\\n2024-09-18T23:08:00+10:00\\nHllo World" | sha1sum | head -c 11
229d24612a2

You wish to correct your mistake, so you make an amendment to that Twt like so:


2024-09-18T23:10:43+10:00\t(edit:#229d24612a2) Hello World

Which would then have a new Twt hash value of 026d77e03fa:


$ echo -n "https://example.com/twtxt.txt\\n2024-09-18T23:10:43+10:00\\nHello World" | sha1sum | head -c 11
026d77e03fa

prologic

twtxt.net

18 Sep 24 13:05 UTC

View Thread

@bender Just replace the echo with something like pbpaste or similar. You'd just need to shell escape things like " and such. That's all. Alternatives you can shove the 3 lines into a small file and cat file.txt | ...

prologic

twtxt.net

18 Sep 24 13:05 UTC

View Thread

prologic

twtxt.net

18 Sep 24 13:04 UTC

View Thread

With a SHA1 encoding the probability of a hash collision becomes, at various k (_number of twts_):


>>> import math
>>>
>>> def collision_probability(k, bits):
...     n = 2 ** bits  # Total unique hash values based on the number of bits
...     probability = 1 - math.exp(- (k ** 2) / (2 * n))
...     return probability * 100  # Return as percentage
...
>>> # Example usage:
>>> k_values = [100000, 1000000, 10000000]
>>> bits = 44  # Number of bits for the hash
>>>
>>> for k in k_values:
...     print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 44 bits: 0.0284%
Probability of collision for 1000000 hashes with 44 bits: 2.8022%
Probability of collision for 10000000 hashes with 44 bits: 94.1701%
>>> bits = 48
>>> for k in k_values:
...     print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 48 bits: 0.0018%
Probability of collision for 1000000 hashes with 48 bits: 0.1775%
Probability of collision for 10000000 hashes with 48 bits: 16.2753%
>>> bits = 52
>>> for k in k_values:
...     print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 52 bits: 0.0001%
Probability of collision for 1000000 hashes with 52 bits: 0.0111%
Probability of collision for 10000000 hashes with 52 bits: 1.1041%
>>>

If we adopted this scheme, we could have to increase the no. of characters (_first N_) from 11 to 12 and finally 13 as we approach globally larger enough Twts across the space. I _think_ at least full crawl/scrape it was around ~500k (_maybe_)? https://search.twtxt.net/ says only ~99k

prologic

twtxt.net

18 Sep 24 13:04 UTC

View Thread

With a SHA1 encoding the probability of a hash collision becomes, at various k (_number of twts_):


>>> import math
>>>
>>> def collision_probability(k, bits):
...     n = 2 ** bits  # Total unique hash values based on the number of bits
...     probability = 1 - math.exp(- (k ** 2) / (2 * n))
...     return probability * 100  # Return as percentage
...
>>> # Example usage:
>>> k_values = [100000, 1000000, 10000000]
>>> bits = 44  # Number of bits for the hash
>>>
>>> for k in k_values:
...     print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 44 bits: 0.0284%
Probability of collision for 1000000 hashes with 44 bits: 2.8022%
Probability of collision for 10000000 hashes with 44 bits: 94.1701%
>>> bits = 48
>>> for k in k_values:
...     print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 48 bits: 0.0018%
Probability of collision for 1000000 hashes with 48 bits: 0.1775%
Probability of collision for 10000000 hashes with 48 bits: 16.2753%
>>> bits = 52
>>> for k in k_values:
...     print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 52 bits: 0.0001%
Probability of collision for 1000000 hashes with 52 bits: 0.0111%
Probability of collision for 10000000 hashes with 52 bits: 1.1041%
>>>

bender

twtxt.net

18 Sep 24 12:57 UTC

View Thread

@prologic how would that line look like if the twtxt itself had ", and other "spurious" characters in it?

prologic

twtxt.net

18 Sep 24 12:54 UTC

View Thread

@quark My money is on a SHA1SUM hash encoding to keep things much simpler:


$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | sha1sum | head -c 11
87fd9b0ae4e

prologic

twtxt.net

18 Sep 24 12:54 UTC

View Thread

@quark My money is on a SHA1SUM hash encoding to keep things much simpler:


$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\\n2020-07-18T12:39:52Z\\nHello World! 😊" | sha1sum | head -c 11
87fd9b0ae4e