The Watcher

@movq I think the order of the lines in a feed don't matter as long as we can guarantee the order of Twts. Clients should already be ordering by Timestamp anyway.

prologic

twtxt.net

20 Sep 24 02:51 UTC

@movq I think the order of the lines in a feed don't matter as long as we can guarantee the order of Twts. Clients should already be ordering by Timestamp anyway.

prologic

twtxt.net

20 Sep 24 02:48 UTC

@movq Pretry much 👌

prologic

twtxt.net

20 Sep 24 02:48 UTC

@movq Pretry much 👌

prologic

twtxt.net

20 Sep 24 02:42 UTC

@lyse Sorry could you explain this sifferently?

prologic

twtxt.net

20 Sep 24 02:42 UTC

@lyse Sorry could you explain this sifferently?

prologic

twtxt.net

20 Sep 24 02:39 UTC

Do you k ow what you clicked on before going back?

prologic

twtxt.net

20 Sep 24 02:39 UTC

Do you k ow what you clicked on before going back?

prologic

twtxt.net

20 Sep 24 02:30 UTC

@eldersnake Sweet thank you! 🙇‍♂️ I'll merge this PR tonight I think.

prologic

twtxt.net

20 Sep 24 02:30 UTC

@eldersnake Sweet thank you! 🙇‍♂️ I'll merge this PR tonight I think.

prologic

twtxt.net

19 Sep 24 14:40 UTC

@david I think we can!

prologic

twtxt.net

19 Sep 24 14:40 UTC

@david I think we can!

prologic

twtxt.net

e.g: Shutdown yarnd and cp -a yarn.db yarn.db.bak before testing this PR/branch.

prologic

twtxt.net

e.g: Shutdown yarnd and cp -a yarn.db yarn.db.bak before testing this PR/branch.

prologic

twtxt.net

Can I get someone like maybe @xuu or @abucci or even @eldersnake -- If you have some spare time -- to test this yarnd PR that upgrades the Bitcask dependency for its internal database to v2? 🙏

VERY IMPORTANT If you do; Please Please Please backup your yarn.db database first! 😅 Heaven knows I don't want to be responsible for fucking up a production database here or there 🤣

prologic

twtxt.net

prologic

twtxt.net

19 Sep 24 13:59 UTC

nevermind; I _think_ this might be some changes internally in Go 1.23 and a dependency I needed to update 🤞

prologic

twtxt.net

19 Sep 24 13:59 UTC

nevermind; I _think_ this might be some changes internally in Go 1.23 and a dependency I needed to update 🤞

prologic

twtxt.net

19 Sep 24 13:45 UTC

Can someone much smarter than me help me figure out a couple of newly discovered deadlocks in yarnd that I _think_ have always been there, but only recently uncovered by the Go 1.23 compiler.

https://git.mills.io/yarnsocial/yarn/issues/1175

prologic

twtxt.net

19 Sep 24 13:45 UTC

prologic

twtxt.net

19 Sep 24 13:41 UTC

Location Addressing is fine in smaller or single systems. But when you're talking about large decentralised systems with no single point of control (_kind of the point_) things like independable variable integrity become quite important.

prologic

twtxt.net

19 Sep 24 13:41 UTC

prologic

twtxt.net

19 Sep 24 13:40 UTC

What is being proposed as a counter to content-addressing is called location-addressing. Two very different approaches, both with pros/cons of course. But a local cannot be verified, the content cannot be be guaranteed to be authenticate in any way, you just have to implicitly trust that the location points to the right thing.

prologic

twtxt.net

19 Sep 24 13:40 UTC

prologic

twtxt.net

19 Sep 24 13:38 UTC

For example, without content-addressing, you'd never have been able to find let alone pull up that ~3yr old Twt of me (_my very first_), hell I'd even though I lost my first feed file or it became corrupted or something 🤣 -- If that were the case, it would actually be possible to reconstruct the feed and verify every single Twt against the caches of all of you 🤣~

prologic

twtxt.net

19 Sep 24 13:38 UTC

prologic

twtxt.net

19 Sep 24 13:37 UTC

@david I _really_ thinks articles like this explain the benefits far better than I can.

prologic

twtxt.net

19 Sep 24 13:37 UTC

@david I _really_ thinks articles like this explain the benefits far better than I can.

prologic

twtxt.net

@david Oh ! 🤦‍♂️

prologic

twtxt.net

@david Oh ! 🤦‍♂️

prologic

twtxt.net

@david Witout including the content, it's no longer really "content addressing" now is it? You're essentially only addressing say nick+timestamp or url+timestamp.

prologic

twtxt.net

@david Witout including the content, it's no longer really "content addressing" now is it? You're essentially only addressing say nick+timestamp or url+timestamp.

prologic

twtxt.net

19 Sep 24 11:41 UTC

Speaking of AI tech (_sorry!_); Just came across this really cool tool built by some engineers at Google™ (_currently completely free to use without any signup_) called NotebookLM 👌 Looks really good for summarizing and talking to document 📃

prologic

twtxt.net

19 Sep 24 11:41 UTC

prologic

twtxt.net

19 Sep 24 11:30 UTC

@eldersnake Yeah I'm looking forward to that myself 🤣 It'll be great to see where technology grow to a level of maturity and efficiency where you can run the tools on your own PC or Device and use it for what, so far, I've found it to be somewhat decent at; Auto-Complete, Search and Q&A.

prologic

twtxt.net

19 Sep 24 11:30 UTC

prologic

twtxt.net

19 Sep 24 09:15 UTC

@sorenpeter I really don't think we can ignore the last ~3 years and a bit of this threading model working quite well for us as a community across a very diverse set of clients and platforms. We cannot just drop something that "mostly works just fine" for the sake of "simplicity". We have to weight up all the options. There are very real benefits to using content addressing here that really IMO shouldn't be disregarded so lightly that actually provide a lot of implicit value that users of various clients just don't get to see. I'd recommend reading up on the ideas behind content addressing before simply dismissing the Twt Hash spec entirely, it wasn't even written or formalised by me, but I understand how it works quite well 😅 The guy that wrote the spec was (is?) way smarter than I was back then, probably still is now 🤣~

prologic

twtxt.net

19 Sep 24 09:15 UTC

prologic

twtxt.net

19 Sep 24 07:16 UTC

@falsifian Right I see. Yeah maybe we want to avoid that 🤣 I do kind of tend to agree with @xuu in another thread that there isn't actually anything wrong with our use of Blake2 at all really, but we may want to consider all our options.

prologic

twtxt.net

19 Sep 24 07:16 UTC

prologic

twtxt.net

@xuu I don't think this is a lextwt problem tbh. Just the Markdown aprser that yarnd currently uses. twtxt2html uses Goldmark and appears to behave better 🤣

prologic

twtxt.net

@xuu I don't think this is a lextwt problem tbh. Just the Markdown aprser that yarnd currently uses. twtxt2html uses Goldmark and appears to behave better 🤣

prologic

twtxt.net

@xuu Long while back, I experimented with using similarity algorithms to detect if two Twts were similar enough to be considered an "Edit".

prologic

twtxt.net

@xuu Long while back, I experimented with using similarity algorithms to detect if two Twts were similar enough to be considered an "Edit".

prologic

twtxt.net

19 Sep 24 07:13 UTC

Right I see what you mean @xuu -- Can you maybe come up with a fully fleshed out proposal for this? 🤔 This will help solve the problem of hash collision that result from the Twt/hash space growing larger over time without us having to change anything about the way we construct hashes in the first place. We just assume spec compliant clients will just dynamically handle this as the space grows.

prologic

twtxt.net

19 Sep 24 07:13 UTC

prologic

twtxt.net

19 Sep 24 07:11 UTC

@xuu I _think_ we never progressed this idea further because we weren't sure how to tell if a hash collision would occur in the first place right? In other words, how does Client A know to expand a hash vs. Client B in a 100% decentralised way? 🤔

prologic

twtxt.net

19 Sep 24 07:11 UTC

prologic

twtxt.net

19 Sep 24 07:09 UTC

Plus these so-called "LLM"(s) have a pretty good grasp of the "shape" of language, so they _appear_ to be quite intelligent or produce intelligible response (_when they're actually quite stupid really_).

prologic

twtxt.net

19 Sep 24 07:09 UTC

prologic

twtxt.net

19 Sep 24 07:08 UTC

@eldersnake You don't get left behind at all 🤣 It's hyped up so much, it's not even funny anymore. Basically at this point (_so far at least_) I've concluded that all this GenAI / LLM stuff is just a fancy auto-complete and indexing + search reinvented 🤣

prologic

twtxt.net

19 Sep 24 07:08 UTC

prologic

twtxt.net

19 Sep 24 02:59 UTC

@bender This is the different Markdown parsers being used. Goldmark vs. gomarkdown. We need to switch to Goldmark 😅

prologic

twtxt.net

19 Sep 24 02:59 UTC

@bender This is the different Markdown parsers being used. Goldmark vs. gomarkdown. We need to switch to Goldmark 😅

prologic

twtxt.net

19 Sep 24 00:37 UTC

@quark i'm guessing the quotas text should've been emphasized?

prologic

twtxt.net

19 Sep 24 00:37 UTC

@quark i'm guessing the quotas text should've been emphasized?

prologic

twtxt.net

@slashdot NahahahahHa 🤣 So glad I don't use LinkedIn 🤦‍♂️

prologic

twtxt.net

@slashdot NahahahahHa 🤣 So glad I don't use LinkedIn 🤦‍♂️

prologic

twtxt.net

@falsifian No u don't sorry. But I tend to agree with you and I think if we continue to use hashes we should keep the remainder in mind as we choose truncation values of N

prologic

twtxt.net

@falsifian No u don't sorry. But I tend to agree with you and I think if we continue to use hashes we should keep the remainder in mind as we choose truncation values of N

prologic

twtxt.net

19 Sep 24 00:27 UTC

@falsifian Mostly because Git uses it 🤣 Known attacks that would affect our use? 🤔

prologic

twtxt.net

19 Sep 24 00:27 UTC

@falsifian Mostly because Git uses it 🤣 Known attacks that would affect our use? 🤔

prologic

twtxt.net

19 Sep 24 00:23 UTC

@xuu I don't recall where that discussion ended up being though?

prologic

twtxt.net

19 Sep 24 00:23 UTC

@xuu I don't recall where that discussion ended up being though?

prologic

twtxt.net

19 Sep 24 00:21 UTC

@bender wut da fuq?! 🤣

prologic

twtxt.net

19 Sep 24 00:21 UTC

@bender wut da fuq?! 🤣

prologic

twtxt.net

19 Sep 24 00:20 UTC

@xuu you mean my original idea of basically just automatically detecting Twt edits from the client side?

prologic

twtxt.net

19 Sep 24 00:20 UTC

@xuu you mean my original idea of basically just automatically detecting Twt edits from the client side?

prologic

twtxt.net

19 Sep 24 00:19 UTC

@xuu this is where you would need to prove that the editor delete request actually came from that feed author. Hence why integrity is much more important here.

prologic

twtxt.net

19 Sep 24 00:19 UTC

@xuu this is where you would need to prove that the editor delete request actually came from that feed author. Hence why integrity is much more important here.

prologic

twtxt.net

19 Sep 24 00:16 UTC

@falsifian without supporting dudes properly though you're running into GDP issues and the right to forget. 🤣 we've had pretty lengthy discussions about this in the past years ago as well, but we never came to a conclusion. We're all happy with.

prologic

twtxt.net

19 Sep 24 00:16 UTC

prologic

twtxt.net

18 Sep 24 23:29 UTC

@movq it would work, you are right, however, it has drawbacks, and I think in the long term would create a new set of problems that we would also then have to solve.

prologic

twtxt.net

18 Sep 24 23:29 UTC

@movq it would work, you are right, however, it has drawbacks, and I think in the long term would create a new set of problems that we would also then have to solve.

prologic

twtxt.net

18 Sep 24 23:25 UTC

@david Hah 🤣

prologic

twtxt.net

18 Sep 24 23:25 UTC

@david Hah 🤣

prologic

twtxt.net

@david We'll get there soon™ 🔜

prologic

twtxt.net

@david We'll get there soon™ 🔜

prologic

twtxt.net

@david Hah Welcome back! 😅

prologic

twtxt.net

@david Hah Welcome back! 😅

prologic

twtxt.net

18 Sep 24 13:16 UTC

Finally @lyse 's idea of updating metadata changes in a feed "inline" where the change happened (_with respect to other Twts in whatever order the file is written in_) is used to drive things like "Oh this feed now has a new URI, let's use that from now on as the feed's identity for the purposes of computing Twt hashes". This could extend to # nick = as preferential indicators to clients as well as even other updates such as # description = -- Not just # url =

prologic

twtxt.net

18 Sep 24 13:16 UTC

prologic

twtxt.net

18 Sep 24 13:14 UTC

Likewise we _could_ also support delete:229d24612a2, which would indicate to clients that fetch the feed to delete any cached Twt matching the hash 229d24612a2 if the author wishes to "unpublish" that Twt permanently, rather than just deleting the line from the feed (_which does nothing for clients really_).

prologic

twtxt.net

18 Sep 24 13:14 UTC

prologic

twtxt.net

18 Sep 24 13:12 UTC

An alternate idea for supporting (_properly_) Twt Edits is to denoate as such and extend the meaning of a Twt Subject (_which would need to be called something better?_); For example, let's say I produced the following Twt:


2024-09-18T23:08:00+10:00	Hllo World

And my feed's URI is https://example.com/twtxt.txt. The hash for this Twt is therefore 229d24612a2:


$ echo -n "https://example.com/twtxt.txt\n2024-09-18T23:08:00+10:00\nHllo World" | sha1sum | head -c 11
229d24612a2

You wish to correct your mistake, so you make an amendment to that Twt like so:


2024-09-18T23:10:43+10:00	(edit:#229d24612a2) Hello World

Which would then have a new Twt hash value of 026d77e03fa:


$ echo -n "https://example.com/twtxt.txt\n2024-09-18T23:10:43+10:00\nHello World" | sha1sum | head -c 11
026d77e03fa

Clients would then take this edit:#229d24612a2 to mean, this Twt is an edit of 229d24612a2 and should be replaced in the client's cache, or indicated as such to the user that this is the intended content._

prologic

twtxt.net

18 Sep 24 13:12 UTC


2024-09-18T23:08:00+10:00	Hllo World

And my feed's URI is https://example.com/twtxt.txt. The hash for this Twt is therefore 229d24612a2:


$ echo -n "https://example.com/twtxt.txt\n2024-09-18T23:08:00+10:00\nHllo World" | sha1sum | head -c 11
229d24612a2

You wish to correct your mistake, so you make an amendment to that Twt like so:


2024-09-18T23:10:43+10:00	(edit:#229d24612a2) Hello World

Which would then have a new Twt hash value of 026d77e03fa:


$ echo -n "https://example.com/twtxt.txt\n2024-09-18T23:10:43+10:00\nHello World" | sha1sum | head -c 11
026d77e03fa

prologic

twtxt.net

18 Sep 24 13:12 UTC


2024-09-18T23:08:00+10:00\tHllo World

And my feed's URI is https://example.com/twtxt.txt. The hash for this Twt is therefore 229d24612a2:


$ echo -n "https://example.com/twtxt.txt\\n2024-09-18T23:08:00+10:00\\nHllo World" | sha1sum | head -c 11
229d24612a2

You wish to correct your mistake, so you make an amendment to that Twt like so:


2024-09-18T23:10:43+10:00\t(edit:#229d24612a2) Hello World

Which would then have a new Twt hash value of 026d77e03fa:


$ echo -n "https://example.com/twtxt.txt\\n2024-09-18T23:10:43+10:00\\nHello World" | sha1sum | head -c 11
026d77e03fa

prologic

twtxt.net

18 Sep 24 13:05 UTC

@bender Just replace the echo with something like pbpaste or similar. You'd just need to shell escape things like " and such. That's all. Alternatives you can shove the 3 lines into a small file and cat file.txt | ...

prologic

twtxt.net

18 Sep 24 13:05 UTC

prologic

twtxt.net

18 Sep 24 13:04 UTC

With a SHA1 encoding the probability of a hash collision becomes, at various k (_number of twts_):


>>> import math
>>>
>>> def collision_probability(k, bits):
...     n = 2 ** bits  # Total unique hash values based on the number of bits
...     probability = 1 - math.exp(- (k ** 2) / (2 * n))
...     return probability * 100  # Return as percentage
...
>>> # Example usage:
>>> k_values = [100000, 1000000, 10000000]
>>> bits = 44  # Number of bits for the hash
>>>
>>> for k in k_values:
...     print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 44 bits: 0.0284%
Probability of collision for 1000000 hashes with 44 bits: 2.8022%
Probability of collision for 10000000 hashes with 44 bits: 94.1701%
>>> bits = 48
>>> for k in k_values:
...     print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 48 bits: 0.0018%
Probability of collision for 1000000 hashes with 48 bits: 0.1775%
Probability of collision for 10000000 hashes with 48 bits: 16.2753%
>>> bits = 52
>>> for k in k_values:
...     print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 52 bits: 0.0001%
Probability of collision for 1000000 hashes with 52 bits: 0.0111%
Probability of collision for 10000000 hashes with 52 bits: 1.1041%
>>>

If we adopted this scheme, we could have to increase the no. of characters (_first N_) from 11 to 12 and finally 13 as we approach globally larger enough Twts across the space. I _think_ at least full crawl/scrape it was around ~500k (_maybe_)? https://search.twtxt.net/ says only ~99k

prologic

twtxt.net

18 Sep 24 13:04 UTC

With a SHA1 encoding the probability of a hash collision becomes, at various k (_number of twts_):


>>> import math
>>>
>>> def collision_probability(k, bits):
...     n = 2 ** bits  # Total unique hash values based on the number of bits
...     probability = 1 - math.exp(- (k ** 2) / (2 * n))
...     return probability * 100  # Return as percentage
...
>>> # Example usage:
>>> k_values = [100000, 1000000, 10000000]
>>> bits = 44  # Number of bits for the hash
>>>
>>> for k in k_values:
...     print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 44 bits: 0.0284%
Probability of collision for 1000000 hashes with 44 bits: 2.8022%
Probability of collision for 10000000 hashes with 44 bits: 94.1701%
>>> bits = 48
>>> for k in k_values:
...     print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 48 bits: 0.0018%
Probability of collision for 1000000 hashes with 48 bits: 0.1775%
Probability of collision for 10000000 hashes with 48 bits: 16.2753%
>>> bits = 52
>>> for k in k_values:
...     print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 52 bits: 0.0001%
Probability of collision for 1000000 hashes with 52 bits: 0.0111%
Probability of collision for 10000000 hashes with 52 bits: 1.1041%
>>>

prologic

twtxt.net

18 Sep 24 12:54 UTC

@quark My money is on a SHA1SUM hash encoding to keep things much simpler:


$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | sha1sum | head -c 11
87fd9b0ae4e

prologic

twtxt.net

18 Sep 24 12:54 UTC

@quark My money is on a SHA1SUM hash encoding to keep things much simpler:


$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\\n2020-07-18T12:39:52Z\\nHello World! 😊" | sha1sum | head -c 11
87fd9b0ae4e

prologic

twtxt.net

18 Sep 24 12:54 UTC

@quark My money is on a SHA1SUM hash encoding to keep things much simpler:


$ echo -n "https://twtxt.net/user/prologic/twtxt.txt\n2020-07-18T12:39:52Z\nHello World! 😊" | sha1sum | head -c 11
87fd9b0ae4e

prologic

twtxt.net

I think it was a mistake to take the last n base32 encoded characters of the blake2b 256bit encoded hash value. It should have been the first n. where n is >= 7=

prologic

twtxt.net

I think it was a mistake to take the last n base32 encoded characters of the blake2b 256bit encoded hash value. It should have been the first n. where n is >= 7=

prologic

twtxt.net

Taking the last n characters of a base32 encoded hash instead of the first n can be problematic for several reasons:

1. Hash Structure: Hashes are typically designed so that their outputs have specific statistical properties. The first few characters often have more entropy or variability, meaning they are less likely to have patterns. The last characters may not maintain this randomness, especially if the encoding method has a tendency to produce less varied endings.

2. Collision Resistance: When using hashes, the goal is to minimize the risk of collisions (different inputs producing the same output). By using the first few characters, you leverage the full distribution of the hash. The last characters may not distribute in the same way, potentially increasing the likelihood of collisions.

3. Encoding Characteristics: Base32 encoding has a specific structure and padding that might influence the last characters more than the first. If the data being hashed is similar, the last characters may be more similar across different hashes.

4. Use Cases: In many applications (like generating unique identifiers), the beginning of the hash is often the most informative and varied. Relying on the end might reduce the uniqueness of generated identifiers, especially if a prefix has a specific context or meaning.

In summary, using the first n characters generally preserves the intended randomness and collision resistance of the hash, making it a safer choice in most cases.

prologic

twtxt.net

prologic

twtxt.net

18 Sep 24 12:50 UTC