# I am the Watcher. I am your guide through this vast new twtiverse.
# 
# Usage:
#     https://watcher.sour.is/api/plain/users              View list of users and latest twt date.
#     https://watcher.sour.is/api/plain/twt                View all twts.
#     https://watcher.sour.is/api/plain/mentions?uri=:uri  View all mentions for uri.
#     https://watcher.sour.is/api/plain/conv/:hash         View all twts for a conversation subject.
# 
# Options:
#     uri     Filter to show a specific users twts.
#     offset  Start index for quey.
#     limit   Count of items to return (going back in time).
# 
# twt range = 1 60515
# self = https://watcher.sour.is?uri=https://twtxt.net/user/prologic/twtxt.txt&offset=57215
# next = https://watcher.sour.is?uri=https://twtxt.net/user/prologic/twtxt.txt&offset=57315
# prev = https://watcher.sour.is?uri=https://twtxt.net/user/prologic/twtxt.txt&offset=57115
Don't forget about the upcoming Yarn.social meetup coming up this Saturday! See #jjbnvgq for details! Hope to see some/all of y'all there πŸ’ͺ
Don't forget about the upcoming Yarn.social meetup coming up this Saturday! See # for details! Hope to see some/all of y'all there πŸ’ͺ
Don't forget about the upcoming Yarn.social meetup coming up this Saturday! See #jjbnvgq for details! Hope to see some/all of y'all there πŸ’ͺ
@lyse And your query to construct a tree? Can you share the full query (_screenshot looks scary 🀣_) -- On another note, SQL and relational databases aren't really that conduces to tree-like structures are they? 🀣_
@lyse And your query to construct a tree? Can you share the full query (_screenshot looks scary 🀣_) -- On another note, SQL and relational databases aren't really that conduces to tree-like structures are they? 🀣_
In fact it depends on how many Twts there are that form part of a thread, if you take a much larger sample size of my own feed for example, it starts to approximate ~1.5x increase in size:


$ ./compare.sh https://twtxt.net/user/prologic/twtxt.txt 500
Original file size: 126842 bytes
Modified file size: 317029 bytes
Percentage increase in file size: 149.94%
...
~
In fact it depends on how many Twts there are that form part of a thread, if you take a much larger sample size of my own feed for example, it starts to approximate ~1.5x increase in size:


$ ./compare.sh https://twtxt.net/user/prologic/twtxt.txt 500
Original file size: 126842 bytes
Modified file size: 317029 bytes
Percentage increase in file size: 149.94%
...
~
In fact @falsifian you had quite a lot of good feedback, do you mind collecting them in a task list on the doc somewhere so I can get to em? πŸ€”
In fact @falsifian you had quite a lot of good feedback, do you mind collecting them in a task list on the doc somewhere so I can get to em? πŸ€”
Can someone make the edit?
Can someone make the edit?
@movq Tbis was just a representative sample. The real concrete cost here is a ~5x increase in memory consumption for yarnd and/or ~5x increase in disk storage.
@movq Tbis was just a representative sample. The real concrete cost here is a ~5x increase in memory consumption for yarnd and/or ~5x increase in disk storage.
@lyse Mind sharing your schema?
@lyse Mind sharing your schema?
@lyse Not sure I'll check
@lyse Not sure I'll check
@lyse My proposal is three steps:

- increase the hash length from 7 to 11

Then:

- Add support for changing your feed's location without breaking g threads

Then much later:

- Add formal support for edits
@lyse My proposal is three steps:

- increase the hash length from 7 to 11

Then:

- Add support for changing your feed's location without breaking g threads

Then much later:

- Add formal support for edits
@lyse No I don't either just say'n πŸ˜…
@lyse No I don't either just say'n πŸ˜…
@movq That's what I want to know 🀣
@movq That's what I want to know 🀣
So just to be clear, it's not as bad as the OP in this thread, this is just a worst case scenario. With some additional analysis I did today, its closer to around ~5x the memory requirements of my pod, which would roughly go from ~22MB to ~120MB or so, probably a bit more in practise. But this is still a significant increase in memory. The on-disk requirements would also increase by around ~5x as well on average going from ~12GB to about ~60GB at current archive size.
So just to be clear, it's not as bad as the OP in this thread, this is just a worst case scenario. With some additional analysis I did today, its closer to around ~5x the memory requirements of my pod, which would roughly go from ~22MB to ~120MB or so, probably a bit more in practise. But this is still a significant increase in memory. The on-disk requirements would also increase by around ~5x as well on average going from ~12GB to about ~60GB at current archive size.
Just out of curiosity, I inspected the yarns database (_the search engine//cralwer_) to find the average length of a Twtxt URI:


$ inspect-db yarns.db | jq -r '.Value.URL' | awk '{ total += length; count++ } END { if (count > 0) print total / count }'
40.3387


Given an RFC3339 UTC timestamp has a length of 20 characters with seconds precision. We're talking about Twt Subject taking up ~63 characters/bytes on average._~
Just out of curiosity, I inspected the yarns database (_the search engine//cralwer_) to find the average length of a Twtxt URI:


$ inspect-db yarns.db | jq -r '.Value.URL' | awk '{ total += length; count++ } END { if (count > 0) print total / count }'
40.3387


Given an RFC3339 UTC timestamp has a length of 20 characters with seconds precision. We're talking about Twt Subject taking up ~63 characters/bytes on average._~
Comparing a few feeds:

- @xuu would see an increase of ~20%
- @falsifian would see an increase of ~8%
- @bender would see an increase of ~20%
- @lyse would see an increase of ~15%
- @aelaraji would see an increase of ~13%
- @sorenpeter would see an increase of ~8%
- @movq would see an increase of ~9%

Just from a scalability standpoint along I'm not seeing a switch to location-based Twt ids to support threading a good idea here. This is what I meant when I said to @david in a recent call that we open up a new can of worms (_or new set of problems_) by drastically changing the approach, rather than incrementally improving the existing approach we have today (_which has served us well for the past 4 years already_0.~_
Comparing a few feeds:

- @xuu would see an increase of ~20%
- @falsifian would see an increase of ~8%
- @bender would see an increase of ~20%
- @lyse would see an increase of ~15%
- @aelaraji would see an increase of ~13%
- @sorenpeter would see an increase of ~8%
- @movq would see an increase of ~9%

Just from a scalability standpoint along I'm not seeing a switch to location-based Twt ids to support threading a good idea here. This is what I meant when I said to @david in a recent call that we open up a new can of worms (_or new set of problems_) by drastically changing the approach, rather than incrementally improving the existing approach we have today (_which has served us well for the past 4 years already_0.~_
Reminder to take the Twtxt (_anonymous_) Poll: http://polljunkie.com/poll/xdgjib/twtxt-v2

Apologies, I can't edit the poll once it's live, so the suggestion on feedback for supporting Markdown will have to be discussed at another time.
Reminder to take the Twtxt (_anonymous_) Poll: http://polljunkie.com/poll/xdgjib/twtxt-v2

Apologies, I can't edit the poll once it's live, so the suggestion on feedback for supporting Markdown will have to be discussed at another time.
@xuu correct
@xuu correct
@xuu 🀣🀣🀣
@xuu 🀣🀣🀣
So I whipped up a quick shell script to demonstrate what I mean by the increase in feed size on average as well as the expected increase in storage and retrieval requirements.


$ ./compare.sh
Original file size: 28145 bytes
Modified file size: 70672 bytes
Percentage increase in file size: 151.10%
...


So I whipped up a quick shell script to demonstrate what I mean by the increase in feed size on average as well as the expected increase in storage and retrieval requirements.


$ ./compare.sh
Original file size: 28145 bytes
Modified file size: 70672 bytes
Percentage increase in file size: 151.10%
...


Thank goodness we relaxed that limit and I've stopped being so Puritan about it but my overall point is we would be significantly increasing the human size as well as the machine size of the identity of threads as well as twts
Thank goodness we relaxed that limit and I've stopped being so Puritan about it but my overall point is we would be significantly increasing the human size as well as the machine size of the identity of threads as well as twts
With the original specification of 140 character Twt length recommendation. There's only leaves you with about 78 characters worth of anything remotely useful to say in response.
With the original specification of 140 character Twt length recommendation. There's only leaves you with about 78 characters worth of anything remotely useful to say in response.
Let's say the overhead is always three bytes two parentheses under space.
Let's say the overhead is always three boats two parentheses under space.
Let's say the overhead is always three bytes two parentheses under space.
So for example, if we would use @movq 's feed as an example thread ID here, his feed with a particular timestamp, were already looking at a subject length of 59 bytes +/- a couple of bytes to denote the subject in the Twt itself/
So for example, if we would use @movq 's feed as an example thread ID here, his feed with a particular timestamp, were already looking at a subject length of 59 bytes +/- a couple of bytes to denote the subject in the Twt itself/
One of the reasons we wanted to originally use Contant based addressing and short hashes as our threading model was to keep individual Twts short so that they were still readable if you viewed the manually by hand.

With the proposal to switch to location based addressing using a pointer to a feed and a timestamp in that feed you're looking at roughly 2025 characters long because both the HTTP and HTML and even URI specifications do not specify maximum length for URI(s) AFAIK only recommendations.
One of the reasons we wanted to originally use Contant based addressing and short hashes as our threading model was to keep individual Twts short so that they were still readable if you viewed the manually by hand.

With the proposal to switch to location based addressing using a pointer to a feed and a timestamp in that feed you're looking at roughly 2025 characters long because both the HTTP and HTML and even URI specifications do not specify maximum length for URI(s) AFAIK only recommendations.
@bender I can't see myself personally, increasing the infrastructure and costs to run this pod to support this as we switch over potentially and as things continue to grow in scale. You would never get your infinite search and infinite timeline features that you've always wanted for example and I would have to drastically reduce what is visible or even searchable at any given point in time to much less than what it is today.
@bender I can't see myself personally, increasing the infrastructure and costs to run this pod to support this as we switch over potentially and as things continue to grow in scale. You would never get your infinite search and infinite timeline features that you've always wanted for example and I would have to drastically reduce what is visible or even searchable at any given point in time to much less than what it is today.
Another interesting side effect of changing from content-based addressing to location-based addressing is that switching from 7-byte keys to 2025-character keys for 3.5 million entries would expand the database size from 24.5 MB to about 7.09 GBβ€”an increase of roughly 7.06 GB!
Another interesting side effect of changing from content-based addressing to location-based addressing is that switching from 7-byte keys to 2025-character keys for 3.5 million entries would expand the database size from 24.5 MB to about 7.09 GBβ€”an increase of roughly 7.06 GB!
@falsifian No worries! Fell few to contribute to the doc directly I'd you wish πŸ‘Œ
@falsifian No worries! Fell few to contribute to the doc directly I'd you wish πŸ‘Œ
@falsifian Hmmm not sure sorry πŸ€”
@falsifian Hmmm not sure sorry πŸ€”
@xuu Goos to know! πŸ‘Œ So as long as we remain decentralized and non-commercial (I assume non/profit works too?) we're good?
@xuu Goos to know! πŸ‘Œ So as long as we remain decentralized and non-commercial (I assume non/profit works too?) we're good?
@lyse Nice ! πŸ™
@lyse Nice ! πŸ™
@doesnm Hello! πŸ‘‹
@doesnm Hello! πŸ‘‹
@lyse Yes let's make UTF-8 mandatory πŸ‘Œ
@lyse Yes let's make UTF-8 mandatory πŸ‘Œ
@lyse Agreed
@lyse Agreed
Let's try this pill for Twtxt v2 (no account required)

http://polljunkie.com/poll/xdgjib/twtxt-v2
Let's try this pill for Twtxt v2 (no account required)

http://polljunkie.com/poll/xdgjib/twtxt-v2
@lyse I'm a bit indifferent whether it's at the beginning or end tbh.
@lyse I'm a bit indifferent whether it's at the beginning or end tbh.
This is still a draft! Feel free to edit it πŸ‘Œ
This is still a draft! Feel free to edit it πŸ‘Œ
@movq That's what I was afraid of 🀣
@movq That's what I was afraid of 🀣
@movq Makes sense πŸ‘Œ I think it's fair to implement any spec changes incrementaly for sure πŸ‘Œ

And yea since yarnd has a store it's a bit easier to support edit / delete actions πŸ˜…
@movq Makes sense πŸ‘Œ I think it's fair to implement any spec changes incrementaly for sure πŸ‘Œ

And yea since yarnd has a store it's a bit easier to support edit / delete actions πŸ˜…
So I'm a location based system, how exactly do I reply to one of these two Twts from @Yarns ? πŸ€”


2024-09-07T12:55:56Z	πŸ₯³ NEW FEED: @<twtxt http://edsu.github.io/twtxt/twtxt.txt>
2024-09-07T12:55:56Z	πŸ₯³ NEW FEED: @<kdy https://twtxt.kdy.ch/twtxt.txt>
So I'm a location based system, how exactly do I reply to one of these two Twts from @Yarns ? πŸ€”


2024-09-07T12:55:56Z\tπŸ₯³ NEW FEED: @<twtxt http://edsu.github.io/twtxt/twtxt.txt>
2024-09-07T12:55:56Z\tπŸ₯³ NEW FEED: @<kdy https://twtxt.kdy.ch/twtxt.txt>
So I'm a location based system, how exactly do I reply to one of these two Twts from @Yarns ? πŸ€”


2024-09-07T12:55:56Z	πŸ₯³ NEW FEED: @<twtxt http://edsu.github.io/twtxt/twtxt.txt>
2024-09-07T12:55:56Z	πŸ₯³ NEW FEED: @<kdy https://twtxt.kdy.ch/twtxt.txt>
@lyse Yup, this is why you started seeing if you could improve the "trust" of peers right? πŸ˜…
@lyse Yup, this is why you started seeing if you could improve the "trust" of peers right? πŸ˜…
@movq Yeah I think what I'm proposing here is a more pragmatic approach to improvements that will last much longer than our first interaction (~4 years and going strong, but running into minor issues with edit/identify and some collssions_). This scope of changes is much easier to implement for yarnd and I suspect jenny too. and as indicated in here quite easy to have a reference implementation written in Bash with standard UNIX tools.~_
@movq Yeah I think what I'm proposing here is a more pragmatic approach to improvements that will last much longer than our first interaction (~4 years and going strong, but running into minor issues with edit/identify and some collssions_). This scope of changes is much easier to implement for yarnd and I suspect jenny too. and as indicated in here quite easy to have a reference implementation written in Bash with standard UNIX tools.~_
It's even sorta/somewhat compatible with our existing feeds (_kind of_) 🀣 -- Bit too stupid to figure out how to write enough correct Bash to make threads display inline nicely in an indented/tree-like fashion, but oh well πŸ˜…
It's even sorta/somewhat compatible with our existing feeds (_kind of_) 🀣 -- Bit too stupid to figure out how to write enough correct Bash to make threads display inline nicely in an indented/tree-like fashion, but oh well πŸ˜…
Example:



$ ./twtxt-v2.sh reply 242561ce02d "Cool! πŸ‘Œ"
Posted twt with hash: b2c938f9838
...
$ ./twtxt-v2.sh timeline
...
prologic@twtxt.net [2024-09-22T07:26:37Z] <242561ce02d> Okay folks, I've spent all day on this today, and I _think_ its in "good enough"β„’ shape to share:

**Twtxt v2**:

- Specification: https://docs.mills.io/uJXuisaYTRWYDrl8A2jADg?both
- implementation: https://gist.mills.io/prologic/afdec15443da4d7aa898f383f171ec1b

 ![](https://twtxt.net/media/Wb9MtAiQyEkzNQB5dyVvUR.png)
prologic@localhost [2024-09-22T07:51:16Z] <b2c938f9838> Cool! πŸ‘Œ (reply-to:242561ce02d)


Example:



$ ./twtxt-v2.sh reply 242561ce02d "Cool! πŸ‘Œ"
Posted twt with hash: b2c938f9838
...
$ ./twtxt-v2.sh timeline
...
prologic@twtxt.net [2024-09-22T07:26:37Z] <242561ce02d> Okay folks, I've spent all day on this today, and I _think_ its in "good enough"β„’ shape to share:

**Twtxt v2**:

- Specification: https://docs.mills.io/uJXuisaYTRWYDrl8A2jADg?both
- implementation: https://gist.mills.io/prologic/afdec15443da4d7aa898f383f171ec1b

 ![](https://twtxt.net/media/Wb9MtAiQyEkzNQB5dyVvUR.png)
prologic@localhost [2024-09-22T07:51:16Z] <b2c938f9838> Cool! πŸ‘Œ (reply-to:242561ce02d)


Okay folks, I've spent all day on this today, and I _think_ its in "good enough"β„’ shape to share:

Twtxt v2:

- Specification: https://docs.mills.io/uJXuisaYTRWYDrl8A2jADg?both
- implementation: https://gist.mills.io/prologic/afdec15443da4d7aa898f383f171ec1b

Okay folks, I've spent all day on this today, and I _think_ its in "good enough"β„’ shape to share:

Twtxt v2:

- Specification: https://docs.mills.io/uJXuisaYTRWYDrl8A2jADg?both
- implementation: https://gist.mills.io/prologic/afdec15443da4d7aa898f383f171ec1b

@aelaraji No that is absolutely correct. Without cryptographic identities and signatures there is no way to verify authenticity. That is correct. And I don't think we need to necessarily. What I was just showing and proving was that I didn't write that spoofed Twt in the first place, which was only provable at the time of @lyse short-lived attack 🀣 He essentially forked yarnd, hosted it temporarily (_I think locally_) and used it to poison the caches of a few production pods.

Thankfully the gossip protocol used by yarnd as part of its "peering" between pods isn't fully trusted, twts are not archived for example into permanent storage. So the moment my pod re-fetched my own feed, the spoofed Twt was obliterated πŸ˜…

Eventual consistency 🀣
@aelaraji No that is absolutely correct. Without cryptographic identities and signatures there is no way to verify authenticity. That is correct. And I don't think we need to necessarily. What I was just showing and proving was that I didn't write that spoofed Twt in the first place, which was only provable at the time of @lyse short-lived attack 🀣 He essentially forked yarnd, hosted it temporarily (_I think locally_) and used it to poison the caches of a few production pods.

Thankfully the gossip protocol used by yarnd as part of its "peering" between pods isn't fully trusted, twts are not archived for example into permanent storage. So the moment my pod re-fetched my own feed, the spoofed Twt was obliterated πŸ˜…

Eventual consistency 🀣
LOl πŸ˜‚ Not only have a tried to write up a full Twtxt v2 specification, I've also written a Bash shell script that implements the new spec πŸ˜…
LOl πŸ˜‚ Not only have a tried to write up a full Twtxt v2 specification, I've also written a Bash shell script that implements the new spec πŸ˜…
@movq Haha 😝 Nice one! And yes I'm also aware of some collisions too!
@movq Haha 😝 Nice one! And yes I'm also aware of some collisions too!
@aelaraji I like Nttfy πŸ‘Œ I've wanted to replace my use of the Pushover service with this for a while now πŸ€”
@aelaraji I like Nttfy πŸ‘Œ I've wanted to replace my use of the Pushover service with this for a while now πŸ€”
@bender πŸ‘Œ
@bender πŸ‘Œ
πŸ‘‹ Reminder folks of the upcoming Yarn.social monthly online meetup:

I hope to see @david @movq @lyse @xuu @sorenpeter and hopefully others too @aelaraji @falsifian and anyone else that sees this! πŸ™ We're _hopefully_ going to primarily discuss the future of Twtxt and the last few weeks of discussions 🀣

- Event: Yarn.social Online Meetup
- When: 28th September 2024 at 12:00pm UTC (midday)
- Where: Mills Meet : Yarn.social
- Cadence: 4th Saturday of every Month

Agenda:

- Let's talk about the upcoming changes to the Twtxt spec(s)
- See #xgghhnq

#Yarn.social #Meetup