# I am the Watcher. I am your guide through this vast new twtiverse.
# 
# Usage:
#     https://watcher.sour.is/api/plain/users              View list of users and latest twt date.
#     https://watcher.sour.is/api/plain/twt                View all twts.
#     https://watcher.sour.is/api/plain/mentions?uri=:uri  View all mentions for uri.
#     https://watcher.sour.is/api/plain/conv/:hash         View all twts for a conversation subject.
# 
# Options:
#     uri     Filter to show a specific users twts.
#     offset  Start index for quey.
#     limit   Count of items to return (going back in time).
# 
# twt range = 1 196279
# self = https://watcher.sour.is?offset=171125
# next = https://watcher.sour.is?offset=171225
# prev = https://watcher.sour.is?offset=171025
Regarding jenny development: There have been enough changes in the last few weeks, imo. I want to let things settle for a while (potential bugfixes aside) and then I’m going to cut a new release.

And I guess the release after that is going to include all the threading/hashing stuff – if we can decide on one of the proposals. 😂
Regarding jenny development: There have been enough changes in the last few weeks, imo. I want to let things settle for a while (potential bugfixes aside) and then I’m going to cut a new release.

And I guess the release after that is going to include all the threading/hashing stuff – if we can decide on one of the proposals. 😂
Regarding jenny development: There have been enough changes in the last few weeks, imo. I want to let things settle for a while (potential bugfixes aside) and then I’m going to cut a new release.

And I guess the release after that is going to include all the threading/hashing stuff – if we can decide on one of the proposals. 😂
@lyse I call upon the services of the @yarn_police to further investigate this oddness!
@lyse I call upon the services of the @yarn_police to further investigate this oddness!
@quark Oh, sure, it would be nice if edits didn't break threads. I was just pondering the circumstances under which I get annoyed about data being irrecoverably deleted or otherwise lost.
@falsifian Yeah, delete requests feel very odd.
@falsifian "*I don't really mind if the twt gets edited before I even fetch it.*", right, that's never the problem. Editing a twtxt before anyone fetches it isn't even editing, right? :-P The problem we are trying to fix is the havoc is causes editing twtxts that have already been replied to, often ad nauseam. That's the real problem.
@falsifian "*I don't really mind if the twt gets edited before I even fetch it.*", right, that's never the problem. Editing a twtxt before anyone fetches it isn't even editing, right? :-P The problem we are trying to fix is the havoc is causes editing twtxts that have already been replied to, often ad nauseam. That's the real problem.
@quark I don't really mind if the twt gets edited before I even fetch it. I think it's the idea of my computer discarding old versions it's fetched, especially if it's shown them to me, that bugs me.

But I do like @movq's suggestion on this thread that feeds could contain both the original and the edited twt. I guess it would be up to the author.
@lyse now, how am I not surprised at that reply?! Hahahahaha!
@lyse now, how am I not surprised at that reply?! Hahahahaha!
@prologic I wish that was true! But I reckon there is still heaps of old stuff out there, that was created on a Windows machine. :-D And I wouldn't be surprised if even today in that environment a new file does not make use of UTF-8.
@falsifian that would be problematic to do on a fully decentralised system. I am not disagreeing, though. That's the reason I have stopped editing twtxts. I strive to own mistakes, as minor as they might be. Now, if trail editing can be accomplished, I am all for it!
@falsifian that would be problematic to do on a fully decentralised system. I am not disagreeing, though. That's the reason I have stopped editing twtxts. I strive to own mistakes, as minor as they might be. Now, if trail editing can be accomplished, I am all for it!
@quark I'm not convinced. :-D
@quark None. I like being able to see edit history for the same reason.
@quark @movq Yep, they're all RFC3339. Obviously, +02:00 and +01:00 are best, because I use them! :-P In all seriousness, Z might be the best timezone, as it is shortest. And regarding privacy, it leaks the least information about the user's rough location. But of course, one can just look at the activity and narrow down plausible regions, so that's a weak argument.
@movq You're right! switching from zsh to bash gave me the same result zq4fgq Thanks!
@movq You're right! switching from zsh to bash gave me the same result zq4fgq Thanks!
@movq You're right! switching from zsh to bash gave me the same result zq4fgq Thanks!
@falsifian what would the difference be between an edit the changes everything on the original twtxt, and a delete?
@falsifian what would the difference be between an edit the changes everything on the original twtxt, and a delete?
@prologic Why sha1 in particular? There are known attacks on it. sha256 seems pretty widely supported if you're worried about support.
@prologic I wouldn't want my client to honour delete requests. I like my computer's memory to be better than mine, not worse, so it would bug me if I remember seeing something and my computer can't find it.
@prologic

There's a simple reason all the current hashes end in a or q: the hash is 256 bits, the base32 encoding chops that into groups of 5 bits, and 256 isn't divisible by 5. The last character of the base32 encoding just has that left-over single bit (256 mod 5 = 1).

So I agree with #3 below, but do you have a source for #1, #2 or #4? I would expect any lack of variability in any part of a hash function's output would make it more vulnerable to attacks, so designers of hash functions would want to make the whole output vary as much as possible.

Other than the divisible-by-5 thing, my current intuition is it doesn't matter what part you take.

> 1. Hash Structure: Hashes are typically designed so that their outputs have specific statistical properties. The first few characters often have more entropy or variability, meaning they are less likely to have patterns. The last characters may not maintain this randomness, especially if the encoding method has a tendency to produce less varied endings.
>
> 2. Collision Resistance: When using hashes, the goal is to minimize the risk of collisions (different inputs producing the same output). By using the first few characters, you leverage the full distribution of the hash. The last characters may not distribute in the same way, potentially increasing the likelihood of collisions.
>
> 3. Encoding Characteristics: Base32 encoding has a specific structure and padding that might influence the last characters more than the first. If the data being hashed is similar, the last characters may be more similar across different hashes.
>
> 4. Use Cases: In many applications (like generating unique identifiers), the beginning of the hash is often the most informative and varied. Relying on the end might reduce the uniqueness of generated identifiers, especially if a prefix has a specific context or meaning.=
@aelaraji Looks like your shell didn’t turn the \\n into actual newlines:


$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\\n2020-07-18T12:39:52Z\\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
zq4fgq
$ printf "https://twtxt.net/user/prologic/twtxt.txt\\\\n2020-07-18T12:39:52Z\\\\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
p44j3q
@aelaraji Looks like your shell didn’t turn the \n into actual newlines:


$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
zq4fgq
$ printf "https://twtxt.net/user/prologic/twtxt.txt\\n2020-07-18T12:39:52Z\\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
p44j3q
@aelaraji Looks like your shell didn’t turn the \n into actual newlines:


$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
zq4fgq
$ printf "https://twtxt.net/user/prologic/twtxt.txt\\n2020-07-18T12:39:52Z\\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
p44j3q
@aelaraji Looks like your shell didn’t turn the \n into actual newlines:


$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
zq4fgq
$ printf "https://twtxt.net/user/prologic/twtxt.txt\\n2020-07-18T12:39:52Z\\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
p44j3q
@aelaraji odd, I ran it under Ubuntu 24.04, and got the same result as @prologic (which is on macOS), zq4fgq.
@aelaraji odd, I ran it under Ubuntu 24.04, and got the same result as @prologic (which is on macOS), zq4fgq.
@prologic I ran the same command and got an even different result xD


~ » echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
p44j3q
@prologic I ran the same command and got an even different result xD


~ » echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
p44j3q
@prologic I ran the same command and got an even different result xD


~ » echo -n "https://twtxt.net/user/prologic/twtxt.txt\\n2020-07-18T12:39:52Z\\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
p44j3q
Beginnings of a little notebook app. Doesn't actually run any code yet. https://akkartik.name/images/20240917-notebook.png
Beginnings of a little notebook app. Doesn't actually run any code yet. https://akkartik.name/images/20240917-notebook.png
@prologic I just realised the jenny also does what I want, as of latest commit. Simply use jenny --debug-feed <feed url>, and it will do what I wanted too!
@prologic I just realised the jenny also does what I want, as of latest commit. Simply use jenny --debug-feed <feed url>, and it will do what I wanted too!
[47°09′50″S, 126°43′50″W] Carrier too weak
@movq alright, fair, and interesting. I was expecting them to be all the same (format wise), but it doesn't matter, for sure, as it works just fine. Thanks!
@movq alright, fair, and interesting. I was expecting them to be all the same (format wise), but it doesn't matter, for sure, as it works just fine. Thanks!
@quark They’re all RFC3339, unless I’m mistaken: https://ijmacd.github.io/rfc3339-iso8601/ So they’re all correct.
@quark They’re all RFC3339, unless I’m mistaken: https://ijmacd.github.io/rfc3339-iso8601/ So they’re all correct.
@quark They’re all RFC3339, unless I’m mistaken: https://ijmacd.github.io/rfc3339-iso8601/ So they’re all correct.
@quark They’re all RFC3339, unless I’m mistaken: https://ijmacd.github.io/rfc3339-iso8601/ So they’re all correct.
I have noticed that twtxt timestamps differ. For example:

* @prologic (and I assume any Yarn user)
2024-09-18T13:16:17Z
* @lyse
2024-09-17T21:15:00+02:00
* @aelaraji (and @movq, and me)
2024-09-18T05:43:13+00:00

So, which is right, or best?*
I have noticed that twtxt timestamps differ. For example:

* @prologic (and I assume any Yarn user)
2024-09-18T13:16:17Z
* @lyse
2024-09-17T21:15:00+02:00
* @aelaraji (and @movq, and me)
2024-09-18T05:43:13+00:00

So, which is right, or best?*
I came across this Gallery Theme for Hugo, and @lyse immediately came to mind. I think it would be a very fitting theme to use for all your photos, Lyse!
I came across this Gallery Theme for Hugo, and @lyse immediately came to mind. I think it would be a very fitting theme to use for all your photos, Lyse!
@prologic So the feed would contain *two* twts, right?


2024-09-18T23:08:00+10:00	Hllo World
2024-09-18T23:10:43+10:00	(edit:#229d24612a2) Hello World
@prologic So the feed would contain *two* twts, right?


2024-09-18T23:08:00+10:00\tHllo World
2024-09-18T23:10:43+10:00\t(edit:#229d24612a2) Hello World
@prologic So the feed would contain *two* twts, right?


2024-09-18T23:08:00+10:00	Hllo World
2024-09-18T23:10:43+10:00	(edit:#229d24612a2) Hello World
@prologic So the feed would contain *two* twts, right?


2024-09-18T23:08:00+10:00	Hllo World
2024-09-18T23:10:43+10:00	(edit:#229d24612a2) Hello World
****
Solo para boomers ⌘ Read more****
Finally @lyse 's idea of updating metadata changes in a feed "inline" where the change happened (_with respect to other Twts in whatever order the file is written in_) is used to drive things like "Oh this feed now has a new URI, let's use that from now on as the feed's identity for the purposes of computing Twt hashes". This could extend to # nick = as preferential indicators to clients as well as even other updates such as # description = -- Not just # url =
Finally @lyse 's idea of updating metadata changes in a feed "inline" where the change happened (_with respect to other Twts in whatever order the file is written in_) is used to drive things like "Oh this feed now has a new URI, let's use that from now on as the feed's identity for the purposes of computing Twt hashes". This could extend to # nick = as preferential indicators to clients as well as even other updates such as # description = -- Not just # url =
Likewise we _could_ also support delete:229d24612a2, which would indicate to clients that fetch the feed to delete any cached Twt matching the hash 229d24612a2 if the author wishes to "unpublish" that Twt permanently, rather than just deleting the line from the feed (_which does nothing for clients really_).
Likewise we _could_ also support delete:229d24612a2, which would indicate to clients that fetch the feed to delete any cached Twt matching the hash 229d24612a2 if the author wishes to "unpublish" that Twt permanently, rather than just deleting the line from the feed (_which does nothing for clients really_).
An alternate idea for supporting (_properly_) Twt Edits is to denoate as such and extend the meaning of a Twt Subject (_which would need to be called something better?_); For example, let's say I produced the following Twt:


2024-09-18T23:08:00+10:00	Hllo World


And my feed's URI is https://example.com/twtxt.txt. The hash for this Twt is therefore 229d24612a2:


$ echo -n "https://example.com/twtxt.txt\n2024-09-18T23:08:00+10:00\nHllo World" | sha1sum | head -c 11
229d24612a2


You wish to correct your mistake, so you make an amendment to that Twt like so:


2024-09-18T23:10:43+10:00	(edit:#229d24612a2) Hello World


Which would then have a new Twt hash value of 026d77e03fa:


$ echo -n "https://example.com/twtxt.txt\n2024-09-18T23:10:43+10:00\nHello World" | sha1sum | head -c 11
026d77e03fa


Clients would then take this edit:#229d24612a2 to mean, this Twt is an edit of 229d24612a2 and should be replaced in the client's cache, or indicated as such to the user that this is the intended content._
An alternate idea for supporting (_properly_) Twt Edits is to denoate as such and extend the meaning of a Twt Subject (_which would need to be called something better?_); For example, let's say I produced the following Twt:


2024-09-18T23:08:00+10:00	Hllo World


And my feed's URI is https://example.com/twtxt.txt. The hash for this Twt is therefore 229d24612a2:


$ echo -n "https://example.com/twtxt.txt\n2024-09-18T23:08:00+10:00\nHllo World" | sha1sum | head -c 11
229d24612a2


You wish to correct your mistake, so you make an amendment to that Twt like so:


2024-09-18T23:10:43+10:00	(edit:#229d24612a2) Hello World


Which would then have a new Twt hash value of 026d77e03fa:


$ echo -n "https://example.com/twtxt.txt\n2024-09-18T23:10:43+10:00\nHello World" | sha1sum | head -c 11
026d77e03fa


Clients would then take this edit:#229d24612a2 to mean, this Twt is an edit of 229d24612a2 and should be replaced in the client's cache, or indicated as such to the user that this is the intended content._
An alternate idea for supporting (_properly_) Twt Edits is to denoate as such and extend the meaning of a Twt Subject (_which would need to be called something better?_); For example, let's say I produced the following Twt:


2024-09-18T23:08:00+10:00\tHllo World


And my feed's URI is https://example.com/twtxt.txt. The hash for this Twt is therefore 229d24612a2:


$ echo -n "https://example.com/twtxt.txt\\n2024-09-18T23:08:00+10:00\\nHllo World" | sha1sum | head -c 11
229d24612a2


You wish to correct your mistake, so you make an amendment to that Twt like so:


2024-09-18T23:10:43+10:00\t(edit:#229d24612a2) Hello World


Which would then have a new Twt hash value of 026d77e03fa:


$ echo -n "https://example.com/twtxt.txt\\n2024-09-18T23:10:43+10:00\\nHello World" | sha1sum | head -c 11
026d77e03fa


Clients would then take this edit:#229d24612a2 to mean, this Twt is an edit of 229d24612a2 and should be replaced in the client's cache, or indicated as such to the user that this is the intended content._
@bender Just replace the echo with something like pbpaste or similar. You'd just need to shell escape things like " and such. That's all. Alternatives you can shove the 3 lines into a small file and cat file.txt | ...
@bender Just replace the echo with something like pbpaste or similar. You'd just need to shell escape things like " and such. That's all. Alternatives you can shove the 3 lines into a small file and cat file.txt | ...
With a SHA1 encoding the probability of a hash collision becomes, at various k (_number of twts_):


>>> import math
>>>
>>> def collision_probability(k, bits):
...     n = 2 ** bits  # Total unique hash values based on the number of bits
...     probability = 1 - math.exp(- (k ** 2) / (2 * n))
...     return probability * 100  # Return as percentage
...
>>> # Example usage:
>>> k_values = [100000, 1000000, 10000000]
>>> bits = 44  # Number of bits for the hash
>>>
>>> for k in k_values:
...     print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 44 bits: 0.0284%
Probability of collision for 1000000 hashes with 44 bits: 2.8022%
Probability of collision for 10000000 hashes with 44 bits: 94.1701%
>>> bits = 48
>>> for k in k_values:
...     print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 48 bits: 0.0018%
Probability of collision for 1000000 hashes with 48 bits: 0.1775%
Probability of collision for 10000000 hashes with 48 bits: 16.2753%
>>> bits = 52
>>> for k in k_values:
...     print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 52 bits: 0.0001%
Probability of collision for 1000000 hashes with 52 bits: 0.0111%
Probability of collision for 10000000 hashes with 52 bits: 1.1041%
>>>


If we adopted this scheme, we could have to increase the no. of characters (_first N_) from 11 to 12 and finally 13 as we approach globally larger enough Twts across the space. I _think_ at least full crawl/scrape it was around ~500k (_maybe_)? https://search.twtxt.net/ says only ~99k
With a SHA1 encoding the probability of a hash collision becomes, at various k (_number of twts_):


>>> import math
>>>
>>> def collision_probability(k, bits):
...     n = 2 ** bits  # Total unique hash values based on the number of bits
...     probability = 1 - math.exp(- (k ** 2) / (2 * n))
...     return probability * 100  # Return as percentage
...
>>> # Example usage:
>>> k_values = [100000, 1000000, 10000000]
>>> bits = 44  # Number of bits for the hash
>>>
>>> for k in k_values:
...     print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 44 bits: 0.0284%
Probability of collision for 1000000 hashes with 44 bits: 2.8022%
Probability of collision for 10000000 hashes with 44 bits: 94.1701%
>>> bits = 48
>>> for k in k_values:
...     print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 48 bits: 0.0018%
Probability of collision for 1000000 hashes with 48 bits: 0.1775%
Probability of collision for 10000000 hashes with 48 bits: 16.2753%
>>> bits = 52
>>> for k in k_values:
...     print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 52 bits: 0.0001%
Probability of collision for 1000000 hashes with 52 bits: 0.0111%
Probability of collision for 10000000 hashes with 52 bits: 1.1041%
>>>


If we adopted this scheme, we could have to increase the no. of characters (_first N_) from 11 to 12 and finally 13 as we approach globally larger enough Twts across the space. I _think_ at least full crawl/scrape it was around ~500k (_maybe_)? https://search.twtxt.net/ says only ~99k
@prologic how would that line look like if the twtxt itself had ", and other "spurious" characters in it?
@quark My money is on a SHA1SUM hash encoding to keep things much simpler:


$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | sha1sum | head -c 11
87fd9b0ae4e
@quark My money is on a SHA1SUM hash encoding to keep things much simpler:


$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\\n2020-07-18T12:39:52Z\\nHello World! 😊" | sha1sum | head -c 11
87fd9b0ae4e
@quark My money is on a SHA1SUM hash encoding to keep things much simpler:


$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | sha1sum | head -c 11
87fd9b0ae4e
I think it was a mistake to take the last n base32 encoded characters of the blake2b 256bit encoded hash value. It should have been the first n. where n is >= 7=
I think it was a mistake to take the last n base32 encoded characters of the blake2b 256bit encoded hash value. It should have been the first n. where n is >= 7=
Taking the last n characters of a base32 encoded hash instead of the first n can be problematic for several reasons:

1. Hash Structure: Hashes are typically designed so that their outputs have specific statistical properties. The first few characters often have more entropy or variability, meaning they are less likely to have patterns. The last characters may not maintain this randomness, especially if the encoding method has a tendency to produce less varied endings.

2. Collision Resistance: When using hashes, the goal is to minimize the risk of collisions (different inputs producing the same output). By using the first few characters, you leverage the full distribution of the hash. The last characters may not distribute in the same way, potentially increasing the likelihood of collisions.

3. Encoding Characteristics: Base32 encoding has a specific structure and padding that might influence the last characters more than the first. If the data being hashed is similar, the last characters may be more similar across different hashes.

4. Use Cases: In many applications (like generating unique identifiers), the beginning of the hash is often the most informative and varied. Relying on the end might reduce the uniqueness of generated identifiers, especially if a prefix has a specific context or meaning.

In summary, using the first n characters generally preserves the intended randomness and collision resistance of the hash, making it a safer choice in most cases.
Taking the last n characters of a base32 encoded hash instead of the first n can be problematic for several reasons:

1. Hash Structure: Hashes are typically designed so that their outputs have specific statistical properties. The first few characters often have more entropy or variability, meaning they are less likely to have patterns. The last characters may not maintain this randomness, especially if the encoding method has a tendency to produce less varied endings.

2. Collision Resistance: When using hashes, the goal is to minimize the risk of collisions (different inputs producing the same output). By using the first few characters, you leverage the full distribution of the hash. The last characters may not distribute in the same way, potentially increasing the likelihood of collisions.

3. Encoding Characteristics: Base32 encoding has a specific structure and padding that might influence the last characters more than the first. If the data being hashed is similar, the last characters may be more similar across different hashes.

4. Use Cases: In many applications (like generating unique identifiers), the beginning of the hash is often the most informative and varied. Relying on the end might reduce the uniqueness of generated identifiers, especially if a prefix has a specific context or meaning.

In summary, using the first n characters generally preserves the intended randomness and collision resistance of the hash, making it a safer choice in most cases.
@quark Bloody good question 🙋 God only knows 🤣
@quark Bloody good question 🙋 God only knows 🤣
@bender Welcome! 🤗
@bender Welcome! 🤗
@prologic the real conclusion is, is it going to change, to what, and when? :-P
@prologic the real conclusion is, is it going to change, to what, and when? :-P
@prologic this works perfectly. Thanks!
@movq Haha 😝

> What I was referring to in the OP: Sometimes I check the workphone simply out of curiosity. 😂
@movq Haha 😝

> What I was referring to in the OP: Sometimes I check the workphone simply out of curiosity. 😂
@movq Fair 👌
@movq Fair 👌
Current Twt Hash spec and probability of hash collision:

The probability of a Twt Hash collision depends on the size of the hash and the number of possible values it can take. For the Twt Hash, which uses a Blake2b 256-bit hash, Base32 encoding, and takes the last 7 characters, the space of possible hash values is significantly reduced.

### Breakdown:

1. Base32 encoding: Each character in the Base32 encoding represents 5 bits of information (since \( 2^5 = 32 \)).
2. 7 characters: With 7 characters, the total number of possible hashes is:
\[
 32^7 = 3,518,437,208
 \]
This gives about 3.5 billion possible hash values.

### Probability of Collision:

The probability of a hash collision depends on the number of hashes generated and can be estimated using the Birthday Paradox. The paradox tells us that collisions are more likely than expected when hashing a large number of items.

The approximate formula for the probability of at least one collision after generating n hashes is:
\[
P(\text{collision}) \approx 1 - e^{-\frac{n^2}{2M}}
\]
Where:
- \(n\) is the number of generated Twt Hashes.
- \(M = 32^7 = 3,518,437,208\) is the total number of possible hash values.

For practical purposes, here are some example probabilities for different numbers of hashes (n):

- For 1,000 hashes:
\[
 P(\text{collision}) \approx 1 - e^{-\frac{1000^2}{2 \cdot 3,518,437,208}} \approx 0.00014 \, \text{(0.014%)}
\]
- For 10,000 hashes:
\[
 P(\text{collision}) \approx 1 - e^{-\frac{10000^2}{2 \cdot 3,518,437,208}} \approx 0.14 \, \text{(14%)}
\]
- For 100,000 hashes:
\[
 P(\text{collision}) \approx 1 - e^{-\frac{100000^2}{2 \cdot 3,518,437,208}} \approx 0.999 \, \text{(99.9%)}
\]

### Conclusion:

- For small to moderate numbers of hashes (up to around 1,000–10,000), the collision probability is quite low.
- However, as the number of Twts grows (above 100,000), the likelihood of a collision increases significantly due to the relatively small hash space (3.5 billion).=
Current Twt Hash spec and probability of hash collision:

The probability of a Twt Hash collision depends on the size of the hash and the number of possible values it can take. For the Twt Hash, which uses a Blake2b 256-bit hash, Base32 encoding, and takes the last 7 characters, the space of possible hash values is significantly reduced.

### Breakdown:

1. Base32 encoding: Each character in the Base32 encoding represents 5 bits of information (since \\( 2^5 = 32 \\)).
2. 7 characters: With 7 characters, the total number of possible hashes is:
\\[
 32^7 = 3,518,437,208
 \\]
This gives about 3.5 billion possible hash values.

### Probability of Collision:

The probability of a hash collision depends on the number of hashes generated and can be estimated using the Birthday Paradox. The paradox tells us that collisions are more likely than expected when hashing a large number of items.

The approximate formula for the probability of at least one collision after generating n hashes is:
\\[
P(\\text{collision}) \\approx 1 - e^{-\\frac{n^2}{2M}}
\\]
Where:
- \\(n\\) is the number of generated Twt Hashes.
- \\(M = 32^7 = 3,518,437,208\\) is the total number of possible hash values.

For practical purposes, here are some example probabilities for different numbers of hashes (n):

- For 1,000 hashes:
\\[
 P(\\text{collision}) \\approx 1 - e^{-\\frac{1000^2}{2 \\cdot 3,518,437,208}} \\approx 0.00014 \\, \\text{(0.014%)}
\\]
- For 10,000 hashes:
\\[
 P(\\text{collision}) \\approx 1 - e^{-\\frac{10000^2}{2 \\cdot 3,518,437,208}} \\approx 0.14 \\, \\text{(14%)}
\\]
- For 100,000 hashes:
\\[
 P(\\text{collision}) \\approx 1 - e^{-\\frac{100000^2}{2 \\cdot 3,518,437,208}} \\approx 0.999 \\, \\text{(99.9%)}
\\]

### Conclusion:

- For small to moderate numbers of hashes (up to around 1,000–10,000), the collision probability is quite low.
- However, as the number of Twts grows (above 100,000), the likelihood of a collision increases significantly due to the relatively small hash space (3.5 billion).=
Current Twt Hash spec and probability of hash collision:

The probability of a Twt Hash collision depends on the size of the hash and the number of possible values it can take. For the Twt Hash, which uses a Blake2b 256-bit hash, Base32 encoding, and takes the last 7 characters, the space of possible hash values is significantly reduced.

### Breakdown:

1. Base32 encoding: Each character in the Base32 encoding represents 5 bits of information (since \\( 2^5 = 32 \\)).
2. 7 characters: With 7 characters, the total number of possible hashes is:
\\\n
This gives about 3.5 billion possible hash values.

### Probability of Collision:

The probability of a hash collision depends on the number of hashes generated and can be estimated using the Birthday Paradox. The paradox tells us that collisions are more likely than expected when hashing a large number of items.

The approximate formula for the probability of at least one collision after generating n hashes is:
\\[
P(\\text{collision}) \\approx 1 - e^{-\\frac{n^2}{2M}}
\\]
Where:
- \\(n\\) is the number of generated Twt Hashes.
- \\(M = 32^7 = 3,518,437,208\\) is the total number of possible hash values.

For practical purposes, here are some example probabilities for different numbers of hashes (n):

- For 1,000 hashes:
\\[
 P(\\text{collision}) \\approx 1 - e^{-\\frac{1000^2}{2 \\cdot 3,518,437,208}} \\approx 0.00014 \\, \\text{(0.014%)}
\\]
- For 10,000 hashes:
\\[
 P(\\text{collision}) \\approx 1 - e^{-\\frac{10000^2}{2 \\cdot 3,518,437,208}} \\approx 0.14 \\, \\text{(14%)}
\\]
- For 100,000 hashes:
\\[
 P(\\text{collision}) \\approx 1 - e^{-\\frac{100000^2}{2 \\cdot 3,518,437,208}} \\approx 0.999 \\, \\text{(99.9%)}
\\]

### Conclusion:

- For small to moderate numbers of hashes (up to around 1,000–10,000), the collision probability is quite low.
- However, as the number of Twts grows (above 100,000), the likelihood of a collision increases significantly due to the relatively small hash space (3.5 billion).=
Current Twt Hash spec and probability of hash collision:

The probability of a Twt Hash collision depends on the size of the hash and the number of possible values it can take. For the Twt Hash, which uses a Blake2b 256-bit hash, Base32 encoding, and takes the last 7 characters, the space of possible hash values is significantly reduced.

### Breakdown:

1. Base32 encoding: Each character in the Base32 encoding represents 5 bits of information (since \( 2^5 = 32 \)).
2. 7 characters: With 7 characters, the total number of possible hashes is:
\[
 32^7 = 3,518,437,208
 \]
This gives about 3.5 billion possible hash values.

### Probability of Collision:

The probability of a hash collision depends on the number of hashes generated and can be estimated using the Birthday Paradox. The paradox tells us that collisions are more likely than expected when hashing a large number of items.

The approximate formula for the probability of at least one collision after generating n hashes is:
\[
P(\text{collision}) \approx 1 - e^{-\frac{n^2}{2M}}
\]
Where:
- \(n\) is the number of generated Twt Hashes.
- \(M = 32^7 = 3,518,437,208\) is the total number of possible hash values.

For practical purposes, here are some example probabilities for different numbers of hashes (n):

- For 1,000 hashes:
\[
 P(\text{collision}) \approx 1 - e^{-\frac{1000^2}{2 \cdot 3,518,437,208}} \approx 0.00014 \, \text{(0.014%)}
\]
- For 10,000 hashes:
\[
 P(\text{collision}) \approx 1 - e^{-\frac{10000^2}{2 \cdot 3,518,437,208}} \approx 0.14 \, \text{(14%)}
\]
- For 100,000 hashes:
\[
 P(\text{collision}) \approx 1 - e^{-\frac{100000^2}{2 \cdot 3,518,437,208}} \approx 0.999 \, \text{(99.9%)}
\]

### Conclusion:

- For small to moderate numbers of hashes (up to around 1,000–10,000), the collision probability is quite low.
- However, as the number of Twts grows (above 100,000), the likelihood of a collision increases significantly due to the relatively small hash space (3.5 billion).=
@quark Add here:


* a0826a65 - Add debug sub-command to yarnc (7 weeks ago) <James Mills>


I'd recommend a git pull && make build
@quark Add here:


* a0826a65 - Add debug sub-command to yarnc (7 weeks ago) <James Mills>


I'd recommend a git pull && make build

$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | sha1sum | head -c 11
87fd9b0ae4e

$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | sha1sum | head -c 11
87fd9b0ae4e

$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\\n2020-07-18T12:39:52Z\\nHello World! 😊" | sha1sum | head -c 11
87fd9b0ae4e
@prologic I don’t get paid for “standing by” and “waiting for a call”, that’s right. But I’m fine with that, because I don’t *have to* be available, either. 😅 If someone were to call me (or send me a text message), I wouldn’t be *obliged* to help them out. If I have the time and energy, I will do it, though. And that extra time will be paid.

It works for us because there are enough people around and there’s a good chance that someone will be able to help.

Really, I am glad that we have this model. The alternative would be actual on-call duty, like, this week you’re the poor bastard who is legally required to fix shit. That’s just horrible, I don’t want that. 😅

What I was referring to in the OP: Sometimes I check the workphone simply out of curiosity. 😂
@prologic I don’t get paid for “standing by” and “waiting for a call”, that’s right. But I’m fine with that, because I don’t *have to* be available, either. 😅 If someone were to call me (or send me a text message), I wouldn’t be *obliged* to help them out. If I have the time and energy, I will do it, though. And that extra time will be paid.

It works for us because there are enough people around and there’s a good chance that someone will be able to help.

Really, I am glad that we have this model. The alternative would be actual on-call duty, like, this week you’re the poor bastard who is legally required to fix shit. That’s just horrible, I don’t want that. 😅

What I was referring to in the OP: Sometimes I check the workphone simply out of curiosity. 😂
@prologic I don’t get paid for “standing by” and “waiting for a call”, that’s right. But I’m fine with that, because I don’t *have to* be available, either. 😅 If someone were to call me (or send me a text message), I wouldn’t be *obliged* to help them out. If I have the time and energy, I will do it, though. And that extra time will be paid.

It works for us because there are enough people around and there’s a good chance that someone will be able to help.

Really, I am glad that we have this model. The alternative would be actual on-call duty, like, this week you’re the poor bastard who is legally required to fix shit. That’s just horrible, I don’t want that. 😅

What I was referring to in the OP: Sometimes I check the workphone simply out of curiosity. 😂
@prologic I don’t get paid for “standing by” and “waiting for a call”, that’s right. But I’m fine with that, because I don’t *have to* be available, either. 😅 If someone were to call me (or send me a text message), I wouldn’t be *obliged* to help them out. If I have the time and energy, I will do it, though. And that extra time will be paid.

It works for us because there are enough people around and there’s a good chance that someone will be able to help.

Really, I am glad that we have this model. The alternative would be actual on-call duty, like, this week you’re the poor bastard who is legally required to fix shit. That’s just horrible, I don’t want that. 😅

What I was referring to in the OP: Sometimes I check the workphone simply out of curiosity. 😂

$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | sha256sum | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 12
tdqmjaeawqu

$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | sha256sum | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 12
tdqmjaeawqu