(delete: 5vbi2ea) .. would it delete someone elses twt?
(delete: 5vbi2ea) .. would it delete someone elses twt?
😂😂😂
It would be easy to do for releases, but it’s a little hard to do for all the commits in between – jenny has no build process, so there’s no easy way to incorporate the output of
git describe, for example.
It would be easy to do for releases, but it’s a little hard to do for all the commits in between – jenny has no build process, so there’s no easy way to incorporate the output of
git describe, for example.
It would be easy to do for releases, but it’s a little hard to do for all the commits in between – jenny has no build process, so there’s no easy way to incorporate the output of
git describe, for example.
It would be easy to do for releases, but it’s a little hard to do for all the commits in between – jenny has no build process, so there’s no easy way to incorporate the output of
git describe, for example.
The
(replyto:…) proposal is definitely more in the spirit of twtxt, I’d say. It’s much simpler, anyone can use it even with the simplest tools, no need for any client code. That is certainly a great property, if you ask me, and it’s things like that that brought me to twtxt in the first place.I’d also say that in our tiny little community, message integrity simply doesn’t matter. Signed feeds don’t matter. I signed my feed for a while using GPG, someone else did the same, but in the end, nobody cares. The community is so tiny, there’s enough “implicit trust” or whatever you want to call it.
If twtxt/Yarn was to grow bigger, then this would become a concern again. *But even Mastodon allows editing*, so how much of a problem can it really be? 😅
I do have to “admit”, though, that hashes *feel* better. It feels good to know that we can clearly identify a certain twt. It feels more correct and stable.
Hm.
I *suspect* that the
(replyto:…) proposal would work just as well in practice.
The
(replyto:…) proposal is definitely more in the spirit of twtxt, I’d say. It’s much simpler, anyone can use it even with the simplest tools, no need for any client code. That is certainly a great property, if you ask me, and it’s things like that that brought me to twtxt in the first place.I’d also say that in our tiny little community, message integrity simply doesn’t matter. Signed feeds don’t matter. I signed my feed for a while using GPG, someone else did the same, but in the end, nobody cares. The community is so tiny, there’s enough “implicit trust” or whatever you want to call it.
If twtxt/Yarn was to grow bigger, then this would become a concern again. *But even Mastodon allows editing*, so how much of a problem can it really be? 😅
I do have to “admit”, though, that hashes *feel* better. It feels good to know that we can clearly identify a certain twt. It feels more correct and stable.
Hm.
I *suspect* that the
(replyto:…) proposal would work just as well in practice.
The
(replyto:…) proposal is definitely more in the spirit of twtxt, I’d say. It’s much simpler, anyone can use it even with the simplest tools, no need for any client code. That is certainly a great property, if you ask me, and it’s things like that that brought me to twtxt in the first place.I’d also say that in our tiny little community, message integrity simply doesn’t matter. Signed feeds don’t matter. I signed my feed for a while using GPG, someone else did the same, but in the end, nobody cares. The community is so tiny, there’s enough “implicit trust” or whatever you want to call it.
If twtxt/Yarn was to grow bigger, then this would become a concern again. *But even Mastodon allows editing*, so how much of a problem can it really be? 😅
I do have to “admit”, though, that hashes *feel* better. It feels good to know that we can clearly identify a certain twt. It feels more correct and stable.
Hm.
I *suspect* that the
(replyto:…) proposal would work just as well in practice.
The
(replyto:…) proposal is definitely more in the spirit of twtxt, I’d say. It’s much simpler, anyone can use it even with the simplest tools, no need for any client code. That is certainly a great property, if you ask me, and it’s things like that that brought me to twtxt in the first place.I’d also say that in our tiny little community, message integrity simply doesn’t matter. Signed feeds don’t matter. I signed my feed for a while using GPG, someone else did the same, but in the end, nobody cares. The community is so tiny, there’s enough “implicit trust” or whatever you want to call it.
If twtxt/Yarn was to grow bigger, then this would become a concern again. *But even Mastodon allows editing*, so how much of a problem can it really be? 😅
I do have to “admit”, though, that hashes *feel* better. It feels good to know that we can clearly identify a certain twt. It feels more correct and stable.
Hm.
I *suspect* that the
(replyto:…) proposal would work just as well in practice.
jenny, a -v switch. That way when you twtxt "*That’s an older format that was used before jenny version v23.04*", I can go and run jenny -v, and "duh!" myself on the way to a git pull. :-D
jenny, a -v switch. That way when you twtxt "*That’s an older format that was used before jenny version v23.04*", I can go and run jenny -v, and "duh!" myself on the way to a git pull. :-D
commit 62a2b7735749f2ff3c9306dd984ad28f853595c5:> Crawl archived feeds in --fetch-context
Like, very much! :-)
commit 62a2b7735749f2ff3c9306dd984ad28f853595c5:> Crawl archived feeds in --fetch-context
Like, very much! :-)
> - editing, if you don't care about message integrity
So that’s the big question, because that’s the only real difference between hashes and the
(replyto:…) proposal.Do we care about message integrity?
With
(replyto:…), someone could write a twt, then I reply to it, like “you’re absolutely right!”, and then that person could change their twt to something malicious like “the earth is flat!” And then it would look like I’m a nutcase agreeing with that person. 😅Hashes (in their current form) prevent that. The thread is broken and my reply clearly refers to something else. That’s good, right?
But now take into account that we want to allow editing anyway. Is there even a point to using hashes anymore? Isn’t message integrity ignored anyway now, at least in practice?
There’s no difference (in practice) between someone writing
2024-09-18T12:34Z Brds are great!
and then editing it to either
2024-09-18T12:34Z (original:#12379) Birds are great! (Whoops, fixed a typo.)
or
2024-09-18T12:34Z (original:#12379) The earth is flat!
The actual original message is (potentially) gone. The only thing that we can be sure of now is that the twt was edited in *some* way. *Essentially*, the actual twt message is no longer part of the hash, is it? What does
#12379 refer to? The edited message or the original one? We *want* it to refer to the edited one, because we don’t want to break threads, so … what’s the point of using a hash?
> - editing, if you don't care about message integrity
So that’s the big question, because that’s the only real difference between hashes and the
(replyto:…) proposal.Do we care about message integrity?
With
(replyto:…), someone could write a twt, then I reply to it, like “you’re absolutely right!”, and then that person could change their twt to something malicious like “the earth is flat!” And then it would look like I’m a nutcase agreeing with that person. 😅Hashes (in their current form) prevent that. The thread is broken and my reply clearly refers to something else. That’s good, right?
But now take into account that we want to allow editing anyway. Is there even a point to using hashes anymore? Isn’t message integrity ignored anyway now, at least in practice?
There’s no difference (in practice) between someone writing
2024-09-18T12:34Z Brds are great!
and then editing it to either
2024-09-18T12:34Z (original:#12379) Birds are great! (Whoops, fixed a typo.)
or
2024-09-18T12:34Z (original:#12379) The earth is flat!
The actual original message is (potentially) gone. The only thing that we can be sure of now is that the twt was edited in *some* way. *Essentially*, the actual twt message is no longer part of the hash, is it? What does
#12379 refer to? The edited message or the original one? We *want* it to refer to the edited one, because we don’t want to break threads, so … what’s the point of using a hash?
> - editing, if you don't care about message integrity
So that’s the big question, because that’s the only real difference between hashes and the
(replyto:…) proposal.Do we care about message integrity?
With
(replyto:…), someone could write a twt, then I reply to it, like “you’re absolutely right!”, and then that person could change their twt to something malicious like “the earth is flat!” And then it would look like I’m a nutcase agreeing with that person. 😅Hashes (in their current form) prevent that. The thread is broken and my reply clearly refers to something else. That’s good, right?
But now take into account that we want to allow editing anyway. Is there even a point to using hashes anymore? Isn’t message integrity ignored anyway now, at least in practice?
There’s no difference (in practice) between someone writing
2024-09-18T12:34Z Brds are great!
and then editing it to either
2024-09-18T12:34Z (original:#12379) Birds are great! (Whoops, fixed a typo.)
or
2024-09-18T12:34Z (original:#12379) The earth is flat!
The actual original message is (potentially) gone. The only thing that we can be sure of now is that the twt was edited in *some* way. *Essentially*, the actual twt message is no longer part of the hash, is it? What does
#12379 refer to? The edited message or the original one? We *want* it to refer to the edited one, because we don’t want to break threads, so … what’s the point of using a hash?
> - editing, if you don't care about message integrity
So that’s the big question, because that’s the only real difference between hashes and the
(replyto:…) proposal.Do we care about message integrity?
With
(replyto:…), someone could write a twt, then I reply to it, like “you’re absolutely right!”, and then that person could change their twt to something malicious like “the earth is flat!” And then it would look like I’m a nutcase agreeing with that person. 😅Hashes (in their current form) prevent that. The thread is broken and my reply clearly refers to something else. That’s good, right?
But now take into account that we want to allow editing anyway. Is there even a point to using hashes anymore? Isn’t message integrity ignored anyway now, at least in practice?
There’s no difference (in practice) between someone writing
2024-09-18T12:34Z Brds are great!
and then editing it to either
2024-09-18T12:34Z (original:#12379) Birds are great! (Whoops, fixed a typo.)
or
2024-09-18T12:34Z (original:#12379) The earth is flat!
The actual original message is (potentially) gone. The only thing that we can be sure of now is that the twt was edited in *some* way. *Essentially*, the actual twt message is no longer part of the hash, is it? What does
#12379 refer to? The edited message or the original one? We *want* it to refer to the edited one, because we don’t want to break threads, so … what’s the point of using a hash?
And I guess the release after that is going to include all the threading/hashing stuff – if we can decide on one of the proposals. 😂
And I guess the release after that is going to include all the threading/hashing stuff – if we can decide on one of the proposals. 😂
And I guess the release after that is going to include all the threading/hashing stuff – if we can decide on one of the proposals. 😂
And I guess the release after that is going to include all the threading/hashing stuff – if we can decide on one of the proposals. 😂
But I do like @movq's suggestion on this thread that feeds could contain both the original and the edited twt. I guess it would be up to the author.
+02:00 and +01:00 are best, because I use them! :-P In all seriousness, Z might be the best timezone, as it is shortest. And regarding privacy, it leaks the least information about the user's rough location. But of course, one can just look at the activity and narrow down plausible regions, so that's a weak argument.
zq4fgq Thanks!
zq4fgq Thanks!
zq4fgq Thanks!
There's a simple reason all the current hashes end in a or q: the hash is 256 bits, the base32 encoding chops that into groups of 5 bits, and 256 isn't divisible by 5. The last character of the base32 encoding just has that left-over single bit (256 mod 5 = 1).
So I agree with #3 below, but do you have a source for #1, #2 or #4? I would expect any lack of variability in any part of a hash function's output would make it more vulnerable to attacks, so designers of hash functions would want to make the whole output vary as much as possible.
Other than the divisible-by-5 thing, my current intuition is it doesn't matter what part you take.
> 1. Hash Structure: Hashes are typically designed so that their outputs have specific statistical properties. The first few characters often have more entropy or variability, meaning they are less likely to have patterns. The last characters may not maintain this randomness, especially if the encoding method has a tendency to produce less varied endings.
>
> 2. Collision Resistance: When using hashes, the goal is to minimize the risk of collisions (different inputs producing the same output). By using the first few characters, you leverage the full distribution of the hash. The last characters may not distribute in the same way, potentially increasing the likelihood of collisions.
>
> 3. Encoding Characteristics: Base32 encoding has a specific structure and padding that might influence the last characters more than the first. If the data being hashed is similar, the last characters may be more similar across different hashes.
>
> 4. Use Cases: In many applications (like generating unique identifiers), the beginning of the hash is often the most informative and varied. Relying on the end might reduce the uniqueness of generated identifiers, especially if a prefix has a specific context or meaning.=
\\n into actual newlines:
$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\\n2020-07-18T12:39:52Z\\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
zq4fgq
$ printf "https://twtxt.net/user/prologic/twtxt.txt\\\\n2020-07-18T12:39:52Z\\\\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
p44j3q
\n into actual newlines:
$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
zq4fgq
$ printf "https://twtxt.net/user/prologic/twtxt.txt\\n2020-07-18T12:39:52Z\\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
p44j3q
\n into actual newlines:
$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
zq4fgq
$ printf "https://twtxt.net/user/prologic/twtxt.txt\\n2020-07-18T12:39:52Z\\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
p44j3q
\n into actual newlines:
$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
zq4fgq
$ printf "https://twtxt.net/user/prologic/twtxt.txt\\n2020-07-18T12:39:52Z\\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
p44j3q
zq4fgq.
zq4fgq.
~ » echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
p44j3q
~ » echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
p44j3q
~ » echo -n "https://twtxt.net/user/prologic/twtxt.txt\\n2020-07-18T12:39:52Z\\nHello World! 😊" | openssl dgst -blake2s256 -binary | base32 | tr -d '=' | tr 'A-Z' 'a-z' | tail -c 7
p44j3q
jenny also does what I want, as of latest commit. Simply use jenny --debug-feed <feed url>, and it will do what I wanted too!
jenny also does what I want, as of latest commit. Simply use jenny --debug-feed <feed url>, and it will do what I wanted too!
* @prologic (and I assume any Yarn user)
2024-09-18T13:16:17Z* @lyse
2024-09-17T21:15:00+02:00* @aelaraji (and @movq, and me)
2024-09-18T05:43:13+00:00So, which is right, or best?*
* @prologic (and I assume any Yarn user)
2024-09-18T13:16:17Z* @lyse
2024-09-17T21:15:00+02:00* @aelaraji (and @movq, and me)
2024-09-18T05:43:13+00:00So, which is right, or best?*
2024-09-18T23:08:00+10:00 Hllo World
2024-09-18T23:10:43+10:00 (edit:#229d24612a2) Hello World
2024-09-18T23:08:00+10:00\tHllo World
2024-09-18T23:10:43+10:00\t(edit:#229d24612a2) Hello World
2024-09-18T23:08:00+10:00 Hllo World
2024-09-18T23:10:43+10:00 (edit:#229d24612a2) Hello World
2024-09-18T23:08:00+10:00 Hllo World
2024-09-18T23:10:43+10:00 (edit:#229d24612a2) Hello World
Solo para boomers ⌘ Read more****
# nick = as preferential indicators to clients as well as even other updates such as # description = -- Not just # url =
# nick = as preferential indicators to clients as well as even other updates such as # description = -- Not just # url =
delete:229d24612a2, which would indicate to clients that fetch the feed to delete any cached Twt matching the hash 229d24612a2 if the author wishes to "unpublish" that Twt permanently, rather than just deleting the line from the feed (_which does nothing for clients really_).
delete:229d24612a2, which would indicate to clients that fetch the feed to delete any cached Twt matching the hash 229d24612a2 if the author wishes to "unpublish" that Twt permanently, rather than just deleting the line from the feed (_which does nothing for clients really_).
2024-09-18T23:08:00+10:00 Hllo World
And my feed's URI is
https://example.com/twtxt.txt. The hash for this Twt is therefore 229d24612a2:
$ echo -n "https://example.com/twtxt.txt\n2024-09-18T23:08:00+10:00\nHllo World" | sha1sum | head -c 11
229d24612a2
You wish to correct your mistake, so you make an amendment to that Twt like so:
2024-09-18T23:10:43+10:00 (edit:#229d24612a2) Hello World
Which would then have a new Twt hash value of
026d77e03fa:
$ echo -n "https://example.com/twtxt.txt\n2024-09-18T23:10:43+10:00\nHello World" | sha1sum | head -c 11
026d77e03fa
Clients would then take this
edit:#229d24612a2 to mean, this Twt is an edit of 229d24612a2 and should be replaced in the client's cache, or indicated as such to the user that this is the intended content._
2024-09-18T23:08:00+10:00 Hllo World
And my feed's URI is
https://example.com/twtxt.txt. The hash for this Twt is therefore 229d24612a2:
$ echo -n "https://example.com/twtxt.txt\n2024-09-18T23:08:00+10:00\nHllo World" | sha1sum | head -c 11
229d24612a2
You wish to correct your mistake, so you make an amendment to that Twt like so:
2024-09-18T23:10:43+10:00 (edit:#229d24612a2) Hello World
Which would then have a new Twt hash value of
026d77e03fa:
$ echo -n "https://example.com/twtxt.txt\n2024-09-18T23:10:43+10:00\nHello World" | sha1sum | head -c 11
026d77e03fa
Clients would then take this
edit:#229d24612a2 to mean, this Twt is an edit of 229d24612a2 and should be replaced in the client's cache, or indicated as such to the user that this is the intended content._
2024-09-18T23:08:00+10:00\tHllo World
And my feed's URI is
https://example.com/twtxt.txt. The hash for this Twt is therefore 229d24612a2:
$ echo -n "https://example.com/twtxt.txt\\n2024-09-18T23:08:00+10:00\\nHllo World" | sha1sum | head -c 11
229d24612a2
You wish to correct your mistake, so you make an amendment to that Twt like so:
2024-09-18T23:10:43+10:00\t(edit:#229d24612a2) Hello World
Which would then have a new Twt hash value of
026d77e03fa:
$ echo -n "https://example.com/twtxt.txt\\n2024-09-18T23:10:43+10:00\\nHello World" | sha1sum | head -c 11
026d77e03fa
Clients would then take this
edit:#229d24612a2 to mean, this Twt is an edit of 229d24612a2 and should be replaced in the client's cache, or indicated as such to the user that this is the intended content._
echo with something like pbpaste or similar. You'd just need to shell escape things like " and such. That's all. Alternatives you can shove the 3 lines into a small file and cat file.txt | ...
echo with something like pbpaste or similar. You'd just need to shell escape things like " and such. That's all. Alternatives you can shove the 3 lines into a small file and cat file.txt | ...
>>> import math
>>>
>>> def collision_probability(k, bits):
... n = 2 ** bits # Total unique hash values based on the number of bits
... probability = 1 - math.exp(- (k ** 2) / (2 * n))
... return probability * 100 # Return as percentage
...
>>> # Example usage:
>>> k_values = [100000, 1000000, 10000000]
>>> bits = 44 # Number of bits for the hash
>>>
>>> for k in k_values:
... print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 44 bits: 0.0284%
Probability of collision for 1000000 hashes with 44 bits: 2.8022%
Probability of collision for 10000000 hashes with 44 bits: 94.1701%
>>> bits = 48
>>> for k in k_values:
... print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 48 bits: 0.0018%
Probability of collision for 1000000 hashes with 48 bits: 0.1775%
Probability of collision for 10000000 hashes with 48 bits: 16.2753%
>>> bits = 52
>>> for k in k_values:
... print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 52 bits: 0.0001%
Probability of collision for 1000000 hashes with 52 bits: 0.0111%
Probability of collision for 10000000 hashes with 52 bits: 1.1041%
>>>
If we adopted this scheme, we could have to increase the no. of characters (_first N_) from
11 to 12 and finally 13 as we approach globally larger enough Twts across the space. I _think_ at least full crawl/scrape it was around ~500k (_maybe_)? https://search.twtxt.net/ says only ~99k
>>> import math
>>>
>>> def collision_probability(k, bits):
... n = 2 ** bits # Total unique hash values based on the number of bits
... probability = 1 - math.exp(- (k ** 2) / (2 * n))
... return probability * 100 # Return as percentage
...
>>> # Example usage:
>>> k_values = [100000, 1000000, 10000000]
>>> bits = 44 # Number of bits for the hash
>>>
>>> for k in k_values:
... print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 44 bits: 0.0284%
Probability of collision for 1000000 hashes with 44 bits: 2.8022%
Probability of collision for 10000000 hashes with 44 bits: 94.1701%
>>> bits = 48
>>> for k in k_values:
... print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 48 bits: 0.0018%
Probability of collision for 1000000 hashes with 48 bits: 0.1775%
Probability of collision for 10000000 hashes with 48 bits: 16.2753%
>>> bits = 52
>>> for k in k_values:
... print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 52 bits: 0.0001%
Probability of collision for 1000000 hashes with 52 bits: 0.0111%
Probability of collision for 10000000 hashes with 52 bits: 1.1041%
>>>
If we adopted this scheme, we could have to increase the no. of characters (_first N_) from
11 to 12 and finally 13 as we approach globally larger enough Twts across the space. I _think_ at least full crawl/scrape it was around ~500k (_maybe_)? https://search.twtxt.net/ says only ~99k
", and other "spurious" characters in it?
$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | sha1sum | head -c 11
87fd9b0ae4e
$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\\n2020-07-18T12:39:52Z\\nHello World! 😊" | sha1sum | head -c 11
87fd9b0ae4e