yarnd
and cp -a yarn.db yarn.db.bak
before testing this PR/branch.
yarnd
and cp -a yarn.db yarn.db.bak
before testing this PR/branch.
yarnd
PR that upgrades the Bitcask dependency for its internal database to v2? πVERY IMPORTANT If you do; Please Please Please backup your
yarn.db
database first! π
Heaven knows I don't want to be responsible for fucking up a production database here or there π€£
yarnd
PR that upgrades the Bitcask dependency for its internal database to v2? πVERY IMPORTANT If you do; Please Please Please backup your
yarn.db
database first! π
Heaven knows I don't want to be responsible for fucking up a production database here or there π€£
yarnd
that I _think_ have always been there, but only recently uncovered by the Go 1.23 compiler.https://git.mills.io/yarnsocial/yarn/issues/1175
yarnd
that I _think_ have always been there, but only recently uncovered by the Go 1.23 compiler.https://git.mills.io/yarnsocial/yarn/issues/1175
yarnd
currently uses. twtxt2html
uses Goldmark and appears to behave better π€£
yarnd
currently uses. twtxt2html
uses Goldmark and appears to behave better π€£
# nick =
as preferential indicators to clients as well as even other updates such as # description =
-- Not just # url =
# nick =
as preferential indicators to clients as well as even other updates such as # description =
-- Not just # url =
delete:229d24612a2
, which would indicate to clients that fetch the feed to delete any cached Twt matching the hash 229d24612a2
if the author wishes to "unpublish" that Twt permanently, rather than just deleting the line from the feed (_which does nothing for clients really_).
delete:229d24612a2
, which would indicate to clients that fetch the feed to delete any cached Twt matching the hash 229d24612a2
if the author wishes to "unpublish" that Twt permanently, rather than just deleting the line from the feed (_which does nothing for clients really_).
2024-09-18T23:08:00+10:00 Hllo World
And my feed's URI is
https://example.com/twtxt.txt
. The hash for this Twt is therefore 229d24612a2
:
$ echo -n "https://example.com/twtxt.txt\n2024-09-18T23:08:00+10:00\nHllo World" | sha1sum | head -c 11
229d24612a2
You wish to correct your mistake, so you make an amendment to that Twt like so:
2024-09-18T23:10:43+10:00 (edit:#229d24612a2) Hello World
Which would then have a new Twt hash value of
026d77e03fa
:
$ echo -n "https://example.com/twtxt.txt\n2024-09-18T23:10:43+10:00\nHello World" | sha1sum | head -c 11
026d77e03fa
Clients would then take this
edit:#229d24612a2
to mean, this Twt is an edit of 229d24612a2
and should be replaced in the client's cache, or indicated as such to the user that this is the intended content._
2024-09-18T23:08:00+10:00 Hllo World
And my feed's URI is
https://example.com/twtxt.txt
. The hash for this Twt is therefore 229d24612a2
:
$ echo -n "https://example.com/twtxt.txt\n2024-09-18T23:08:00+10:00\nHllo World" | sha1sum | head -c 11
229d24612a2
You wish to correct your mistake, so you make an amendment to that Twt like so:
2024-09-18T23:10:43+10:00 (edit:#229d24612a2) Hello World
Which would then have a new Twt hash value of
026d77e03fa
:
$ echo -n "https://example.com/twtxt.txt\n2024-09-18T23:10:43+10:00\nHello World" | sha1sum | head -c 11
026d77e03fa
Clients would then take this
edit:#229d24612a2
to mean, this Twt is an edit of 229d24612a2
and should be replaced in the client's cache, or indicated as such to the user that this is the intended content._
2024-09-18T23:08:00+10:00\tHllo World
And my feed's URI is
https://example.com/twtxt.txt
. The hash for this Twt is therefore 229d24612a2
:
$ echo -n "https://example.com/twtxt.txt\\n2024-09-18T23:08:00+10:00\\nHllo World" | sha1sum | head -c 11
229d24612a2
You wish to correct your mistake, so you make an amendment to that Twt like so:
2024-09-18T23:10:43+10:00\t(edit:#229d24612a2) Hello World
Which would then have a new Twt hash value of
026d77e03fa
:
$ echo -n "https://example.com/twtxt.txt\\n2024-09-18T23:10:43+10:00\\nHello World" | sha1sum | head -c 11
026d77e03fa
Clients would then take this
edit:#229d24612a2
to mean, this Twt is an edit of 229d24612a2
and should be replaced in the client's cache, or indicated as such to the user that this is the intended content._
echo
with something like pbpaste
or similar. You'd just need to shell escape things like "
and such. That's all. Alternatives you can shove the 3 lines into a small file and cat file.txt | ...
echo
with something like pbpaste
or similar. You'd just need to shell escape things like "
and such. That's all. Alternatives you can shove the 3 lines into a small file and cat file.txt | ...
>>> import math
>>>
>>> def collision_probability(k, bits):
... n = 2 ** bits # Total unique hash values based on the number of bits
... probability = 1 - math.exp(- (k ** 2) / (2 * n))
... return probability * 100 # Return as percentage
...
>>> # Example usage:
>>> k_values = [100000, 1000000, 10000000]
>>> bits = 44 # Number of bits for the hash
>>>
>>> for k in k_values:
... print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 44 bits: 0.0284%
Probability of collision for 1000000 hashes with 44 bits: 2.8022%
Probability of collision for 10000000 hashes with 44 bits: 94.1701%
>>> bits = 48
>>> for k in k_values:
... print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 48 bits: 0.0018%
Probability of collision for 1000000 hashes with 48 bits: 0.1775%
Probability of collision for 10000000 hashes with 48 bits: 16.2753%
>>> bits = 52
>>> for k in k_values:
... print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 52 bits: 0.0001%
Probability of collision for 1000000 hashes with 52 bits: 0.0111%
Probability of collision for 10000000 hashes with 52 bits: 1.1041%
>>>
If we adopted this scheme, we could have to increase the no. of characters (_first N_) from
11
to 12
and finally 13
as we approach globally larger enough Twts across the space. I _think_ at least full crawl/scrape it was around ~500k (_maybe_)? https://search.twtxt.net/ says only ~99k
>>> import math
>>>
>>> def collision_probability(k, bits):
... n = 2 ** bits # Total unique hash values based on the number of bits
... probability = 1 - math.exp(- (k ** 2) / (2 * n))
... return probability * 100 # Return as percentage
...
>>> # Example usage:
>>> k_values = [100000, 1000000, 10000000]
>>> bits = 44 # Number of bits for the hash
>>>
>>> for k in k_values:
... print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 44 bits: 0.0284%
Probability of collision for 1000000 hashes with 44 bits: 2.8022%
Probability of collision for 10000000 hashes with 44 bits: 94.1701%
>>> bits = 48
>>> for k in k_values:
... print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 48 bits: 0.0018%
Probability of collision for 1000000 hashes with 48 bits: 0.1775%
Probability of collision for 10000000 hashes with 48 bits: 16.2753%
>>> bits = 52
>>> for k in k_values:
... print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 52 bits: 0.0001%
Probability of collision for 1000000 hashes with 52 bits: 0.0111%
Probability of collision for 10000000 hashes with 52 bits: 1.1041%
>>>
If we adopted this scheme, we could have to increase the no. of characters (_first N_) from
11
to 12
and finally 13
as we approach globally larger enough Twts across the space. I _think_ at least full crawl/scrape it was around ~500k (_maybe_)? https://search.twtxt.net/ says only ~99k
$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! π" | sha1sum | head -c 11
87fd9b0ae4e
$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\\n2020-07-18T12:39:52Z\\nHello World! π" | sha1sum | head -c 11
87fd9b0ae4e
$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! π" | sha1sum | head -c 11
87fd9b0ae4e
1. Hash Structure: Hashes are typically designed so that their outputs have specific statistical properties. The first few characters often have more entropy or variability, meaning they are less likely to have patterns. The last characters may not maintain this randomness, especially if the encoding method has a tendency to produce less varied endings.
2. Collision Resistance: When using hashes, the goal is to minimize the risk of collisions (different inputs producing the same output). By using the first few characters, you leverage the full distribution of the hash. The last characters may not distribute in the same way, potentially increasing the likelihood of collisions.
3. Encoding Characteristics: Base32 encoding has a specific structure and padding that might influence the last characters more than the first. If the data being hashed is similar, the last characters may be more similar across different hashes.
4. Use Cases: In many applications (like generating unique identifiers), the beginning of the hash is often the most informative and varied. Relying on the end might reduce the uniqueness of generated identifiers, especially if a prefix has a specific context or meaning.
In summary, using the first n characters generally preserves the intended randomness and collision resistance of the hash, making it a safer choice in most cases.
1. Hash Structure: Hashes are typically designed so that their outputs have specific statistical properties. The first few characters often have more entropy or variability, meaning they are less likely to have patterns. The last characters may not maintain this randomness, especially if the encoding method has a tendency to produce less varied endings.
2. Collision Resistance: When using hashes, the goal is to minimize the risk of collisions (different inputs producing the same output). By using the first few characters, you leverage the full distribution of the hash. The last characters may not distribute in the same way, potentially increasing the likelihood of collisions.
3. Encoding Characteristics: Base32 encoding has a specific structure and padding that might influence the last characters more than the first. If the data being hashed is similar, the last characters may be more similar across different hashes.
4. Use Cases: In many applications (like generating unique identifiers), the beginning of the hash is often the most informative and varied. Relying on the end might reduce the uniqueness of generated identifiers, especially if a prefix has a specific context or meaning.
In summary, using the first n characters generally preserves the intended randomness and collision resistance of the hash, making it a safer choice in most cases.