# I am the Watcher. I am your guide through this vast new twtiverse.
# 
# Usage:
#     https://watcher.sour.is/api/plain/users              View list of users and latest twt date.
#     https://watcher.sour.is/api/plain/twt                View all twts.
#     https://watcher.sour.is/api/plain/mentions?uri=:uri  View all mentions for uri.
#     https://watcher.sour.is/api/plain/conv/:hash         View all twts for a conversation subject.
# 
# Options:
#     uri     Filter to show a specific users twts.
#     offset  Start index for quey.
#     limit   Count of items to return (going back in time).
# 
# twt range = 1 196278
# self = https://watcher.sour.is?offset=171931
# next = https://watcher.sour.is?offset=172031
# prev = https://watcher.sour.is?offset=171831
@lyse And your query to construct a tree? Can you share the full query (_screenshot looks scary 🤣_) -- On another note, SQL and relational databases aren't really that conduces to tree-like structures are they? 🤣_
This organigram example got me started: https://www.sqlite.org/lang_with.html#controlling_depth_first_versus_breadth_first_search_of_a_tree_using_order_by

But I feel execution times get worse rather quickly with more data I add. Also, caching helps tremendously, executing it for the first time took over 600ms. From then on I'm down to 40ms.

I think, it's particularly bad that parents might be missing. Thus, I cannot use an index, because there is no parent to reference. But my database knowledge is fairly limited, so I have to read up on that.
In fact it depends on how many Twts there are that form part of a thread, if you take a much larger sample size of my own feed for example, it starts to approximate ~1.5x increase in size:


$ ./compare.sh https://twtxt.net/user/prologic/twtxt.txt 500
Original file size: 126842 bytes
Modified file size: 317029 bytes
Percentage increase in file size: 149.94%
...
~
In fact it depends on how many Twts there are that form part of a thread, if you take a much larger sample size of my own feed for example, it starts to approximate ~1.5x increase in size:


$ ./compare.sh https://twtxt.net/user/prologic/twtxt.txt 500
Original file size: 126842 bytes
Modified file size: 317029 bytes
Percentage increase in file size: 149.94%
...
~
In fact @falsifian you had quite a lot of good feedback, do you mind collecting them in a task list on the doc somewhere so I can get to em? 🤔
In fact @falsifian you had quite a lot of good feedback, do you mind collecting them in a task list on the doc somewhere so I can get to em? 🤔
Can someone make the edit?
Can someone make the edit?
[47°09′54″S, 126°43′08″W] Transfer 25% complete...
There you go, @prologic, the SQLite database (with a bit more data now) and the sqlitebrowser project file containing the query: https://lyse.isobeef.org/tmp/tt2cache.tar.bz2 (133.9 KiB)
@movq Tbis was just a representative sample. The real concrete cost here is a ~5x increase in memory consumption for yarnd and/or ~5x increase in disk storage.
@movq Tbis was just a representative sample. The real concrete cost here is a ~5x increase in memory consumption for yarnd and/or ~5x increase in disk storage.
@lyse Mind sharing your schema?
@lyse Mind sharing your schema?
@lyse Not sure I'll check
@lyse Not sure I'll check
@lyse My proposal is three steps:

- increase the hash length from 7 to 11

Then:

- Add support for changing your feed's location without breaking g threads

Then much later:

- Add formal support for edits
@lyse My proposal is three steps:

- increase the hash length from 7 to 11

Then:

- Add support for changing your feed's location without breaking g threads

Then much later:

- Add formal support for edits
@lyse No I don't either just say'n 😅
@lyse No I don't either just say'n 😅
@falsifian I agreee. It's an optional header.
@movq That's what I want to know 🤣
@movq That's what I want to know 🤣
@prologic What’s that in absolute numbers? My ~/Mail/twt is currently 26 MB in size. Increase that by 20% and we get 31.2 MB.

I don’t buy the argument with 2025 bytes. This worst case scenario is not relevant in practice.
@prologic What’s that in absolute numbers? My ~/Mail/twt is currently 26 MB in size. Increase that by 20% and we get 31.2 MB.

I don’t buy the argument with 2025 bytes. This worst case scenario is not relevant in practice.
@prologic What’s that in absolute numbers? My ~/Mail/twt is currently 26 MB in size. Increase that by 20% and we get 31.2 MB.

I don’t buy the argument with 2025 bytes. This worst case scenario is not relevant in practice.
@prologic What’s that in absolute numbers? My ~/Mail/twt is currently 26 MB in size. Increase that by 20% and we get 31.2 MB.

I don’t buy the argument with 2025 bytes. This worst case scenario is not relevant in practice.
@movq Oha! @bender Happy cooling off!
@prologic Well, mentions are also quite lengthy as they always include the feed URL. I know, that's not a good argument.

I just got a very, very wild idea that I have not put any brain power into, so it might be totally stupid: Since many replies also mention the original feed, maybe a mention and thread identifier could be compbined, something like: @<nick url timestamp>. But then we would also need another style if one does not want to mention the original author.

So, scratch that. But I put it out there anyway. Maybe this inspires someone else to come up with something neat.
It’s a different story when you just publish a twtxt file, I think. The question here is: When you publish a twt and don’t like it anymore and want to delete it, do you have the *right* to *force* others to delete it? (Not in a technical manner, but by sueing them.) What does the GDPR have to say about that? Not a clue. 😂
It’s a different story when you just publish a twtxt file, I think. The question here is: When you publish a twt and don’t like it anymore and want to delete it, do you have the *right* to *force* others to delete it? (Not in a technical manner, but by sueing them.) What does the GDPR have to say about that? Not a clue. 😂
It’s a different story when you just publish a twtxt file, I think. The question here is: When you publish a twt and don’t like it anymore and want to delete it, do you have the *right* to *force* others to delete it? (Not in a technical manner, but by sueing them.) What does the GDPR have to say about that? Not a clue. 😂
It’s a different story when you just publish a twtxt file, I think. The question here is: When you publish a twt and don’t like it anymore and want to delete it, do you have the *right* to *force* others to delete it? (Not in a technical manner, but by sueing them.) What does the GDPR have to say about that? Not a clue. 😂
What gossip, gopherspace?!
@xuu I *think* it is more tricky than that.

https://commission.europa.eu/law/law-topic/data-protection/reform/rules-business-and-organisations/application-regulation/who-does-data-protection-law-apply_en

“A company *or entity* …”

Also, as I understand it, “personal or household activity” (as you called it) is rather strict: An example could be you uploading photos to a webspace behind HTTP basic auth and sending that link to a friend. So, yes, a webserver is involved and you process your friend’s data (e.g., when did he access your files), but it’s just between you and him. But if you were to publish these photos publicly on a webserver that anyone can access, then it’s a different story – even though you could say that “this is just my personal hobby, not related to any job or money”.

If you operate a public Yarn pod and *if you accept registrations from other users*, then I’m pretty sure the GDPR applies. 🤔 You process personal data and you don’t really know these people. It’s not a personal/private thing anymore.
@xuu I *think* it is more tricky than that.

https://commission.europa.eu/law/law-topic/data-protection/reform/rules-business-and-organisations/application-regulation/who-does-data-protection-law-apply_en

“A company *or entity* …”

Also, as I understand it, “personal or household activity” (as you called it) is rather strict: An example could be you uploading photos to a webspace behind HTTP basic auth and sending that link to a friend. So, yes, a webserver is involved and you process your friend’s data (e.g., when did he access your files), but it’s just between you and him. But if you were to publish these photos publicly on a webserver that anyone can access, then it’s a different story – even though you could say that “this is just my personal hobby, not related to any job or money”.

If you operate a public Yarn pod and *if you accept registrations from other users*, then I’m pretty sure the GDPR applies. 🤔 You process personal data and you don’t really know these people. It’s not a personal/private thing anymore.
@xuu I *think* it is more tricky than that.

https://commission.europa.eu/law/law-topic/data-protection/reform/rules-business-and-organisations/application-regulation/who-does-data-protection-law-apply_en

“A company *or entity* …”

Also, as I understand it, “personal or household activity” (as you called it) is rather strict: An example could be you uploading photos to a webspace behind HTTP basic auth and sending that link to a friend. So, yes, a webserver is involved and you process your friend’s data (e.g., when did he access your files), but it’s just between you and him. But if you were to publish these photos publicly on a webserver that anyone can access, then it’s a different story – even though you could say that “this is just my personal hobby, not related to any job or money”.

If you operate a public Yarn pod and *if you accept registrations from other users*, then I’m pretty sure the GDPR applies. 🤔 You process personal data and you don’t really know these people. It’s not a personal/private thing anymore.
@xuu I *think* it is more tricky than that.

https://commission.europa.eu/law/law-topic/data-protection/reform/rules-business-and-organisations/application-regulation/who-does-data-protection-law-apply_en

“A company *or entity* …”

Also, as I understand it, “personal or household activity” (as you called it) is rather strict: An example could be you uploading photos to a webspace behind HTTP basic auth and sending that link to a friend. So, yes, a webserver is involved and you process your friend’s data (e.g., when did he access your files), but it’s just between you and him. But if you were to publish these photos publicly on a webserver that anyone can access, then it’s a different story – even though you could say that “this is just my personal hobby, not related to any job or money”.

If you operate a public Yarn pod and *if you accept registrations from other users*, then I’m pretty sure the GDPR applies. 🤔 You process personal data and you don’t really know these people. It’s not a personal/private thing anymore.
@prologic Not sure how many actually care about a 140 character limit. I don't. Not at all.
@prologic I'm wondering what exactly you mean by incremental changes, what are the individual ones? What do you have in mind?
@prologic I find it quite hard to rank the facets. Some go hand in hand or depend on the protocol that a feed is offered. I feel some are only relevant to specific clients. I'm sure, people interpret some of them differently.

I'm curious, is it possible to see each individual poll submission?
I'm experimenting with SQLite and trees. It's going good so far with only my own 439 messages long main feed from a few days ago in the cache. Fetching these 632 rows took 20ms:

SQL query to build up the conversation trees in the cache

Now comes the real tricky part, how do I exclude completely read threads?
So just to be clear, it's not as bad as the OP in this thread, this is just a worst case scenario. With some additional analysis I did today, its closer to around ~5x the memory requirements of my pod, which would roughly go from ~22MB to ~120MB or so, probably a bit more in practise. But this is still a significant increase in memory. The on-disk requirements would also increase by around ~5x as well on average going from ~12GB to about ~60GB at current archive size.
So just to be clear, it's not as bad as the OP in this thread, this is just a worst case scenario. With some additional analysis I did today, its closer to around ~5x the memory requirements of my pod, which would roughly go from ~22MB to ~120MB or so, probably a bit more in practise. But this is still a significant increase in memory. The on-disk requirements would also increase by around ~5x as well on average going from ~12GB to about ~60GB at current archive size.
[47°09′20″S, 126°43′13″W] Sample analyzing complete -- starting transfer
Just out of curiosity, I inspected the yarns database (_the search engine//cralwer_) to find the average length of a Twtxt URI:


$ inspect-db yarns.db | jq -r '.Value.URL' | awk '{ total += length; count++ } END { if (count > 0) print total / count }'
40.3387


Given an RFC3339 UTC timestamp has a length of 20 characters with seconds precision. We're talking about Twt Subject taking up ~63 characters/bytes on average._~
Just out of curiosity, I inspected the yarns database (_the search engine//cralwer_) to find the average length of a Twtxt URI:


$ inspect-db yarns.db | jq -r '.Value.URL' | awk '{ total += length; count++ } END { if (count > 0) print total / count }'
40.3387


Given an RFC3339 UTC timestamp has a length of 20 characters with seconds precision. We're talking about Twt Subject taking up ~63 characters/bytes on average._~
Comparing a few feeds:

- @xuu would see an increase of ~20%
- @falsifian would see an increase of ~8%
- @bender would see an increase of ~20%
- @lyse would see an increase of ~15%
- @aelaraji would see an increase of ~13%
- @sorenpeter would see an increase of ~8%
- @movq would see an increase of ~9%

Just from a scalability standpoint along I'm not seeing a switch to location-based Twt ids to support threading a good idea here. This is what I meant when I said to @david in a recent call that we open up a new can of worms (_or new set of problems_) by drastically changing the approach, rather than incrementally improving the existing approach we have today (_which has served us well for the past 4 years already_0.~_
Comparing a few feeds:

- @xuu would see an increase of ~20%
- @falsifian would see an increase of ~8%
- @bender would see an increase of ~20%
- @lyse would see an increase of ~15%
- @aelaraji would see an increase of ~13%
- @sorenpeter would see an increase of ~8%
- @movq would see an increase of ~9%

Just from a scalability standpoint along I'm not seeing a switch to location-based Twt ids to support threading a good idea here. This is what I meant when I said to @david in a recent call that we open up a new can of worms (_or new set of problems_) by drastically changing the approach, rather than incrementally improving the existing approach we have today (_which has served us well for the past 4 years already_0.~_
Reminder to take the Twtxt (_anonymous_) Poll: http://polljunkie.com/poll/xdgjib/twtxt-v2

Apologies, I can't edit the poll once it's live, so the suggestion on feedback for supporting Markdown will have to be discussed at another time.
Reminder to take the Twtxt (_anonymous_) Poll: http://polljunkie.com/poll/xdgjib/twtxt-v2

Apologies, I can't edit the poll once it's live, so the suggestion on feedback for supporting Markdown will have to be discussed at another time.
@xuu correct
@xuu correct
@xuu 🤣🤣🤣
@xuu 🤣🤣🤣
I demand full 9 digit nano second timestamps and the full TZ identifier as documented in the tz 2024b database! I need to know if there was a change in daylight savings as per the locality in question as of the provided date.
I demand full 9 digit nano second timestamps and the full TZ identifier as documented in the tz 2024b database! I need to know if there was a change in daylight savings as per the locality in question as of the provided date.
@falsifian I believe the preserve means to include the original subject hash in the start of the twt such as (#somehash)
@falsifian I believe the preserve means to include the original subject hash in the start of the twt such as (#somehash)
[47°09′47″S, 126°43′17″W] Analyzing samples
So I whipped up a quick shell script to demonstrate what I mean by the increase in feed size on average as well as the expected increase in storage and retrieval requirements.


$ ./compare.sh
Original file size: 28145 bytes
Modified file size: 70672 bytes
Percentage increase in file size: 151.10%
...


So I whipped up a quick shell script to demonstrate what I mean by the increase in feed size on average as well as the expected increase in storage and retrieval requirements.


$ ./compare.sh
Original file size: 28145 bytes
Modified file size: 70672 bytes
Percentage increase in file size: 151.10%
...


Thank goodness we relaxed that limit and I've stopped being so Puritan about it but my overall point is we would be significantly increasing the human size as well as the machine size of the identity of threads as well as twts
Thank goodness we relaxed that limit and I've stopped being so Puritan about it but my overall point is we would be significantly increasing the human size as well as the machine size of the identity of threads as well as twts
With the original specification of 140 character Twt length recommendation. There's only leaves you with about 78 characters worth of anything remotely useful to say in response.
With the original specification of 140 character Twt length recommendation. There's only leaves you with about 78 characters worth of anything remotely useful to say in response.
Let's say the overhead is always three bytes two parentheses under space.
Let's say the overhead is always three boats two parentheses under space.
Let's say the overhead is always three bytes two parentheses under space.
So for example, if we would use @movq 's feed as an example thread ID here, his feed with a particular timestamp, were already looking at a subject length of 59 bytes +/- a couple of bytes to denote the subject in the Twt itself/
So for example, if we would use @movq 's feed as an example thread ID here, his feed with a particular timestamp, were already looking at a subject length of 59 bytes +/- a couple of bytes to denote the subject in the Twt itself/
One of the reasons we wanted to originally use Contant based addressing and short hashes as our threading model was to keep individual Twts short so that they were still readable if you viewed the manually by hand.

With the proposal to switch to location based addressing using a pointer to a feed and a timestamp in that feed you're looking at roughly 2025 characters long because both the HTTP and HTML and even URI specifications do not specify maximum length for URI(s) AFAIK only recommendations.
One of the reasons we wanted to originally use Contant based addressing and short hashes as our threading model was to keep individual Twts short so that they were still readable if you viewed the manually by hand.

With the proposal to switch to location based addressing using a pointer to a feed and a timestamp in that feed you're looking at roughly 2025 characters long because both the HTTP and HTML and even URI specifications do not specify maximum length for URI(s) AFAIK only recommendations.
@bender I can't see myself personally, increasing the infrastructure and costs to run this pod to support this as we switch over potentially and as things continue to grow in scale. You would never get your infinite search and infinite timeline features that you've always wanted for example and I would have to drastically reduce what is visible or even searchable at any given point in time to much less than what it is today.
@bender I can't see myself personally, increasing the infrastructure and costs to run this pod to support this as we switch over potentially and as things continue to grow in scale. You would never get your infinite search and infinite timeline features that you've always wanted for example and I would have to drastically reduce what is visible or even searchable at any given point in time to much less than what it is today.
Another interesting side effect of changing from content-based addressing to location-based addressing is that switching from 7-byte keys to 2025-character keys for 3.5 million entries would expand the database size from 24.5 MB to about 7.09 GB—an increase of roughly 7.06 GB!
Another interesting side effect of changing from content-based addressing to location-based addressing is that switching from 7-byte keys to 2025-character keys for 3.5 million entries would expand the database size from 24.5 MB to about 7.09 GB—an increase of roughly 7.06 GB!
@bender Ha! Maybe I should get on the Markdown train. You're taking away my excuses.
@falsifian you can colorise things in Mutt/Neomutt. I have have colours for bold, italics, code, and blockquotes. In a way, I can “see” markdown! 😊
@falsifian No worries! Fell few to contribute to the doc directly I'd you wish 👌
@falsifian No worries! Fell few to contribute to the doc directly I'd you wish 👌
@falsifian Hmmm not sure sorry 🤔
@falsifian Hmmm not sure sorry 🤔
@bender

Sorry, you're right, I should have used numbers!

I'm don't understand what "preserve the original hash" could mean other than "make sure there's still a twt in the feed with that hash". Maybe the text could be clarified somehow.

I'm also not sure what you mean by markdown already being part of it. Of course people can already use Markdown, just like presumably nothing stopped people from using (twt subjects) before they were formally described. But it's not universal; e.g. as a jenny user I just see the plain text.
@xuu Goos to know! 👌 So as long as we remain decentralized and non-commercial (I assume non/profit works too?) we're good?
@xuu Goos to know! 👌 So as long as we remain decentralized and non-commercial (I assume non/profit works too?) we're good?
@falsifian The GDPR does not apply to the processing of data for a purely personal or household activity that is not connected to a professional or commercial activity.
@falsifian The GDPR does not apply to the processing of data for a purely personal or household activity that is not connected to a professional or commercial activity.
@falsifian it would be easier if instead of a bulleted list you would have used a numbered one. That way it would be easier to refer to the specific miscellaneous comment.

I have little to contribute on this reply. On bullet two, he meant the original hash. On the last bullet, markdown is already part of it (after all, it is plain text). Yarn, being a web client/server, simply renders it.
🧮 USERS:1 FEEDS:2 TWTS:1101 ARCHIVED:79243 CACHE:2591 FOLLOWERS:17 FOLLOWING:14
@prologic Do you feel the same about published vs. privately stored data?

For me there's a distinction. I feel very strongly that I should be able to retain whatever private information I like. On the other hand, I do have some sympathy for requests not to publish or propagate (though I personally feel it's still morally acceptable to ignore such requests).
@lyse I'd suggest making the whole content-type thing a SHOULD, to accommodate people just using some hosting service they don't have much control over. (The same situation could make detecting followers hard, but IMO "please email me if you follow me" is still legit twtxt, even if inconvenient.)
@prologic Thanks for writing that up!

I hope it can remain a living document (or sequence of draft revisions) for a good long time while we figure out how this stuff works in practice.

I am not sure how I feel about all this being done at once, vs. letting conventions arise.

For example, even today I could reply to twt abc1234 with "(#abc1234) Edit: ..." and I think all you humans would understand it as an edit to (#abc1234). Maybe eventually it would become a common enough convention that clients would start to support it explicitly.

Similarly we could just start using 11-digit hashes. We should iron out whether it's sha256 or whatever but there's no need get all the other stuff right at the same time.

I have similar thoughts about how some users could try out location-based replies in a backward-compatible way (append the replyto: stuff after the legacy (#hash) style).

However I recognize that I'm not the one implementing this stuff, and it's less work to just have everything determined up front.

Misc comments (I haven't read the whole thing):

- Did you mean to make hashes hexadecimal? You lose 11 bits that way compared to base32. I'd suggest gaining 11 bits with base64 instead.

- "Clients MUST preserve the original hash" --- do you mean they MUST preserve the original twt?

- Thanks for phrasing the bit about deletions so neutrally.

- I don't like the MUST in "Clients MUST follow the chain of reply-to references...". If someone writes a client as a 40-line shell script that requires the user to piece together the threading themselves, IMO we shouldn't declare the client non-conforming just because they didn't get to all the bells and whistles.

- Similarly I don't like the MUST for user agents. For one thing, you might want to fetch a feed without revealing your identty. Also, it raises the bar for a minimal implementation (I'm again thinking again of the 40-line shell script).

- For "who follows" lists: why must the long, random tokens be only valid for a limited time? Do you have a scenario in mind where they could leak?

- Why can't feeds be served over HTTP/1.0? Again, thinking about simple software. I recently tried implementing HTTP/1.1 and it wasn't too bad, but 1.0 would have been slightly simpler.

- Why get into the nitty-gritty about caching headers? This seems like generic advice for HTTP servers and clients.

- I'm a little sad about other protocols being not recommended.

- I don't know how I feel about including markdown. I don't mind too much that yarn users emit twts full of markdown, but I'm more of a plain text kind of person. Also it adds to the length. I wonder if putting a separate document would make more sense; that would also help with the length.
Meanwhile in Florida we are having a very Autumnal Equinox day, with temperatures 10-14° cooler than normal. That, on its own, isn’t normal at all, but I taketh! 😂
[47°09′40″S, 126°43′18″W] Raw reading: 0x66F06931, offset +/-2
@lyse Wet and warm, yeah. 🫤 There were flies everywhere, lots of them, on all windows of the apartment. Never seen anything like that. 😳🪰 Like the building was a dead carcass. 😂
@lyse Wet and warm, yeah. 🫤 There were flies everywhere, lots of them, on all windows of the apartment. Never seen anything like that. 😳🪰 Like the building was a dead carcass. 😂
@lyse Wet and warm, yeah. 🫤 There were flies everywhere, lots of them, on all windows of the apartment. Never seen anything like that. 😳🪰 Like the building was a dead carcass. 😂
@lyse Wet and warm, yeah. 🫤 There were flies everywhere, lots of them, on all windows of the apartment. Never seen anything like that. 😳🪰 Like the building was a dead carcass. 😂
cloning nixpkgs is way too difficult, none of my local machines could even do it