# I am the Watcher. I am your guide through this vast new twtiverse.
# 
# Usage:
#     https://watcher.sour.is/api/plain/users              View list of users and latest twt date.
#     https://watcher.sour.is/api/plain/twt                View all twts.
#     https://watcher.sour.is/api/plain/mentions?uri=:uri  View all mentions for uri.
#     https://watcher.sour.is/api/plain/conv/:hash         View all twts for a conversation subject.
# 
# Options:
#     uri     Filter to show a specific users twts.
#     offset  Start index for quey.
#     limit   Count of items to return (going back in time).
# 
# twt range = 1 60454
# self = https://watcher.sour.is?uri=https://twtxt.net/user/prologic/twtxt.txt&offset=57191
# next = https://watcher.sour.is?uri=https://twtxt.net/user/prologic/twtxt.txt&offset=57291
# prev = https://watcher.sour.is?uri=https://twtxt.net/user/prologic/twtxt.txt&offset=57091
So just to be clear, it's not as bad as the OP in this thread, this is just a worst case scenario. With some additional analysis I did today, its closer to around ~5x the memory requirements of my pod, which would roughly go from ~22MB to ~120MB or so, probably a bit more in practise. But this is still a significant increase in memory. The on-disk requirements would also increase by around ~5x as well on average going from ~12GB to about ~60GB at current archive size.
Just out of curiosity, I inspected the yarns database (_the search engine//cralwer_) to find the average length of a Twtxt URI:


$ inspect-db yarns.db | jq -r '.Value.URL' | awk '{ total += length; count++ } END { if (count > 0) print total / count }'
40.3387


Given an RFC3339 UTC timestamp has a length of 20 characters with seconds precision. We're talking about Twt Subject taking up ~63 characters/bytes on average._~
Just out of curiosity, I inspected the yarns database (_the search engine//cralwer_) to find the average length of a Twtxt URI:


$ inspect-db yarns.db | jq -r '.Value.URL' | awk '{ total += length; count++ } END { if (count > 0) print total / count }'
40.3387


Given an RFC3339 UTC timestamp has a length of 20 characters with seconds precision. We're talking about Twt Subject taking up ~63 characters/bytes on average._~
Comparing a few feeds:

- @xuu would see an increase of ~20%
- @falsifian would see an increase of ~8%
- @bender would see an increase of ~20%
- @lyse would see an increase of ~15%
- @aelaraji would see an increase of ~13%
- @sorenpeter would see an increase of ~8%
- @movq would see an increase of ~9%

Just from a scalability standpoint along I'm not seeing a switch to location-based Twt ids to support threading a good idea here. This is what I meant when I said to @david in a recent call that we open up a new can of worms (_or new set of problems_) by drastically changing the approach, rather than incrementally improving the existing approach we have today (_which has served us well for the past 4 years already_0.~_
Comparing a few feeds:

- @xuu would see an increase of ~20%
- @falsifian would see an increase of ~8%
- @bender would see an increase of ~20%
- @lyse would see an increase of ~15%
- @aelaraji would see an increase of ~13%
- @sorenpeter would see an increase of ~8%
- @movq would see an increase of ~9%

Just from a scalability standpoint along I'm not seeing a switch to location-based Twt ids to support threading a good idea here. This is what I meant when I said to @david in a recent call that we open up a new can of worms (_or new set of problems_) by drastically changing the approach, rather than incrementally improving the existing approach we have today (_which has served us well for the past 4 years already_0._~
Reminder to take the Twtxt (_anonymous_) Poll: http://polljunkie.com/poll/xdgjib/twtxt-v2

Apologies, I can't edit the poll once it's live, so the suggestion on feedback for supporting Markdown will have to be discussed at another time.
Reminder to take the Twtxt (_anonymous_) Poll: http://polljunkie.com/poll/xdgjib/twtxt-v2

Apologies, I can't edit the poll once it's live, so the suggestion on feedback for supporting Markdown will have to be discussed at another time.
@xuu correct
@xuu correct
@xuu 🤣🤣🤣
@xuu 🤣🤣🤣
So I whipped up a quick shell script to demonstrate what I mean by the increase in feed size on average as well as the expected increase in storage and retrieval requirements.


$ ./compare.sh
Original file size: 28145 bytes
Modified file size: 70672 bytes
Percentage increase in file size: 151.10%
...


So I whipped up a quick shell script to demonstrate what I mean by the increase in feed size on average as well as the expected increase in storage and retrieval requirements.


$ ./compare.sh
Original file size: 28145 bytes
Modified file size: 70672 bytes
Percentage increase in file size: 151.10%
...


Thank goodness we relaxed that limit and I've stopped being so Puritan about it but my overall point is we would be significantly increasing the human size as well as the machine size of the identity of threads as well as twts
Thank goodness we relaxed that limit and I've stopped being so Puritan about it but my overall point is we would be significantly increasing the human size as well as the machine size of the identity of threads as well as twts
With the original specification of 140 character Twt length recommendation. There's only leaves you with about 78 characters worth of anything remotely useful to say in response.
With the original specification of 140 character Twt length recommendation. There's only leaves you with about 78 characters worth of anything remotely useful to say in response.
Let's say the overhead is always three bytes two parentheses under space.
Let's say the overhead is always three boats two parentheses under space.
Let's say the overhead is always three bytes two parentheses under space.
So for example, if we would use @movq 's feed as an example thread ID here, his feed with a particular timestamp, were already looking at a subject length of 59 bytes +/- a couple of bytes to denote the subject in the Twt itself/
So for example, if we would use @movq 's feed as an example thread ID here, his feed with a particular timestamp, were already looking at a subject length of 59 bytes +/- a couple of bytes to denote the subject in the Twt itself/
One of the reasons we wanted to originally use Contant based addressing and short hashes as our threading model was to keep individual Twts short so that they were still readable if you viewed the manually by hand.

With the proposal to switch to location based addressing using a pointer to a feed and a timestamp in that feed you're looking at roughly 2025 characters long because both the HTTP and HTML and even URI specifications do not specify maximum length for URI(s) AFAIK only recommendations.
One of the reasons we wanted to originally use Contant based addressing and short hashes as our threading model was to keep individual Twts short so that they were still readable if you viewed the manually by hand.

With the proposal to switch to location based addressing using a pointer to a feed and a timestamp in that feed you're looking at roughly 2025 characters long because both the HTTP and HTML and even URI specifications do not specify maximum length for URI(s) AFAIK only recommendations.
@bender I can't see myself personally, increasing the infrastructure and costs to run this pod to support this as we switch over potentially and as things continue to grow in scale. You would never get your infinite search and infinite timeline features that you've always wanted for example and I would have to drastically reduce what is visible or even searchable at any given point in time to much less than what it is today.
@bender I can't see myself personally, increasing the infrastructure and costs to run this pod to support this as we switch over potentially and as things continue to grow in scale. You would never get your infinite search and infinite timeline features that you've always wanted for example and I would have to drastically reduce what is visible or even searchable at any given point in time to much less than what it is today.
Another interesting side effect of changing from content-based addressing to location-based addressing is that switching from 7-byte keys to 2025-character keys for 3.5 million entries would expand the database size from 24.5 MB to about 7.09 GB—an increase of roughly 7.06 GB!
Another interesting side effect of changing from content-based addressing to location-based addressing is that switching from 7-byte keys to 2025-character keys for 3.5 million entries would expand the database size from 24.5 MB to about 7.09 GB—an increase of roughly 7.06 GB!
@falsifian No worries! Fell few to contribute to the doc directly I'd you wish 👌
@falsifian No worries! Fell few to contribute to the doc directly I'd you wish 👌
@falsifian Hmmm not sure sorry 🤔
@falsifian Hmmm not sure sorry 🤔
@xuu Goos to know! 👌 So as long as we remain decentralized and non-commercial (I assume non/profit works too?) we're good?
@xuu Goos to know! 👌 So as long as we remain decentralized and non-commercial (I assume non/profit works too?) we're good?
@lyse Nice ! 🙏
@lyse Nice ! 🙏
@doesnm Hello! 👋
@doesnm Hello! 👋
@lyse Yes let's make UTF-8 mandatory 👌
@lyse Yes let's make UTF-8 mandatory 👌
@lyse Agreed
@lyse Agreed
Let's try this pill for Twtxt v2 (no account required)

http://polljunkie.com/poll/xdgjib/twtxt-v2
Let's try this pill for Twtxt v2 (no account required)

http://polljunkie.com/poll/xdgjib/twtxt-v2
@lyse I'm a bit indifferent whether it's at the beginning or end tbh.
@lyse I'm a bit indifferent whether it's at the beginning or end tbh.
This is still a draft! Feel free to edit it 👌
This is still a draft! Feel free to edit it 👌
@movq That's what I was afraid of 🤣
@movq That's what I was afraid of 🤣
@movq Makes sense 👌 I think it's fair to implement any spec changes incrementaly for sure 👌

And yea since yarnd has a store it's a bit easier to support edit / delete actions 😅
@movq Makes sense 👌 I think it's fair to implement any spec changes incrementaly for sure 👌

And yea since yarnd has a store it's a bit easier to support edit / delete actions 😅
So I'm a location based system, how exactly do I reply to one of these two Twts from @Yarns ? 🤔


2024-09-07T12:55:56Z	🥳 NEW FEED: @<twtxt http://edsu.github.io/twtxt/twtxt.txt>
2024-09-07T12:55:56Z	🥳 NEW FEED: @<kdy https://twtxt.kdy.ch/twtxt.txt>
So I'm a location based system, how exactly do I reply to one of these two Twts from @Yarns ? 🤔


2024-09-07T12:55:56Z\t🥳 NEW FEED: @<twtxt http://edsu.github.io/twtxt/twtxt.txt>
2024-09-07T12:55:56Z\t🥳 NEW FEED: @<kdy https://twtxt.kdy.ch/twtxt.txt>
So I'm a location based system, how exactly do I reply to one of these two Twts from @Yarns ? 🤔


2024-09-07T12:55:56Z	🥳 NEW FEED: @<twtxt http://edsu.github.io/twtxt/twtxt.txt>
2024-09-07T12:55:56Z	🥳 NEW FEED: @<kdy https://twtxt.kdy.ch/twtxt.txt>
@lyse Yup, this is why you started seeing if you could improve the "trust" of peers right? 😅
@lyse Yup, this is why you started seeing if you could improve the "trust" of peers right? 😅
@movq Yeah I think what I'm proposing here is a more pragmatic approach to improvements that will last much longer than our first interaction (~4 years and going strong, but running into minor issues with edit/identify and some collssions_). This scope of changes is much easier to implement for yarnd and I suspect jenny too. and as indicated in here quite easy to have a reference implementation written in Bash with standard UNIX tools.~_
@movq Yeah I think what I'm proposing here is a more pragmatic approach to improvements that will last much longer than our first interaction (~4 years and going strong, but running into minor issues with edit/identify and some collssions_). This scope of changes is much easier to implement for yarnd and I suspect jenny too. and as indicated in here quite easy to have a reference implementation written in Bash with standard UNIX tools._~
It's even sorta/somewhat compatible with our existing feeds (_kind of_) 🤣 -- Bit too stupid to figure out how to write enough correct Bash to make threads display inline nicely in an indented/tree-like fashion, but oh well 😅
It's even sorta/somewhat compatible with our existing feeds (_kind of_) 🤣 -- Bit too stupid to figure out how to write enough correct Bash to make threads display inline nicely in an indented/tree-like fashion, but oh well 😅
Example:



$ ./twtxt-v2.sh reply 242561ce02d "Cool! 👌"
Posted twt with hash: b2c938f9838
...
$ ./twtxt-v2.sh timeline
...
prologic@twtxt.net [2024-09-22T07:26:37Z] <242561ce02d> Okay folks, I've spent all day on this today, and I _think_ its in "good enough"™ shape to share:

**Twtxt v2**:

- Specification: https://docs.mills.io/uJXuisaYTRWYDrl8A2jADg?both
- implementation: https://gist.mills.io/prologic/afdec15443da4d7aa898f383f171ec1b

 ![](https://twtxt.net/media/Wb9MtAiQyEkzNQB5dyVvUR.png)
prologic@localhost [2024-09-22T07:51:16Z] <b2c938f9838> Cool! 👌 (reply-to:242561ce02d)


Example:



$ ./twtxt-v2.sh reply 242561ce02d "Cool! 👌"
Posted twt with hash: b2c938f9838
...
$ ./twtxt-v2.sh timeline
...
prologic@twtxt.net [2024-09-22T07:26:37Z] <242561ce02d> Okay folks, I've spent all day on this today, and I _think_ its in "good enough"™ shape to share:

**Twtxt v2**:

- Specification: https://docs.mills.io/uJXuisaYTRWYDrl8A2jADg?both
- implementation: https://gist.mills.io/prologic/afdec15443da4d7aa898f383f171ec1b

 ![](https://twtxt.net/media/Wb9MtAiQyEkzNQB5dyVvUR.png)
prologic@localhost [2024-09-22T07:51:16Z] <b2c938f9838> Cool! 👌 (reply-to:242561ce02d)


Okay folks, I've spent all day on this today, and I _think_ its in "good enough"™ shape to share:

Twtxt v2:

- Specification: https://docs.mills.io/uJXuisaYTRWYDrl8A2jADg?both
- implementation: https://gist.mills.io/prologic/afdec15443da4d7aa898f383f171ec1b

Okay folks, I've spent all day on this today, and I _think_ its in "good enough"™ shape to share:

Twtxt v2:

- Specification: https://docs.mills.io/uJXuisaYTRWYDrl8A2jADg?both
- implementation: https://gist.mills.io/prologic/afdec15443da4d7aa898f383f171ec1b

@aelaraji No that is absolutely correct. Without cryptographic identities and signatures there is no way to verify authenticity. That is correct. And I don't think we need to necessarily. What I was just showing and proving was that I didn't write that spoofed Twt in the first place, which was only provable at the time of @lyse short-lived attack 🤣 He essentially forked yarnd, hosted it temporarily (_I think locally_) and used it to poison the caches of a few production pods.

Thankfully the gossip protocol used by yarnd as part of its "peering" between pods isn't fully trusted, twts are not archived for example into permanent storage. So the moment my pod re-fetched my own feed, the spoofed Twt was obliterated 😅

Eventual consistency 🤣
@aelaraji No that is absolutely correct. Without cryptographic identities and signatures there is no way to verify authenticity. That is correct. And I don't think we need to necessarily. What I was just showing and proving was that I didn't write that spoofed Twt in the first place, which was only provable at the time of @lyse short-lived attack 🤣 He essentially forked yarnd, hosted it temporarily (_I think locally_) and used it to poison the caches of a few production pods.

Thankfully the gossip protocol used by yarnd as part of its "peering" between pods isn't fully trusted, twts are not archived for example into permanent storage. So the moment my pod re-fetched my own feed, the spoofed Twt was obliterated 😅

Eventual consistency 🤣
LOl 😂 Not only have a tried to write up a full Twtxt v2 specification, I've also written a Bash shell script that implements the new spec 😅
LOl 😂 Not only have a tried to write up a full Twtxt v2 specification, I've also written a Bash shell script that implements the new spec 😅
@movq Haha 😝 Nice one! And yes I'm also aware of some collisions too!
@movq Haha 😝 Nice one! And yes I'm also aware of some collisions too!
@aelaraji I like Nttfy 👌 I've wanted to replace my use of the Pushover service with this for a while now 🤔
@aelaraji I like Nttfy 👌 I've wanted to replace my use of the Pushover service with this for a while now 🤔
@bender 👌
@bender 👌
👋 Reminder folks of the upcoming Yarn.social monthly online meetup:

I hope to see @david @movq @lyse @xuu @sorenpeter and hopefully others too @aelaraji @falsifian and anyone else that sees this! 🙏 We're _hopefully_ going to primarily discuss the future of Twtxt and the last few weeks of discussions 🤣

- Event: Yarn.social Online Meetup
- When: 28th September 2024 at 12:00pm UTC (midday)
- Where: Mills Meet : Yarn.social
- Cadence: 4th Saturday of every Month

Agenda:

- Let's talk about the upcoming changes to the Twtxt spec(s)
- See #xgghhnq

#Yarn.social #Meetup
👋 Reminder folks of the upcoming Yarn.social monthly online meetup:

I hope to see @david @movq @lyse @xuu @sorenpeter and hopefully others too @aelaraji @falsifian and anyone else that sees this! 🙏 We're _hopefully_ going to primarily discuss the future of Twtxt and the last few weeks of discussions 🤣

- Event: Yarn.social Online Meetup
- When: 28th September 2024 at 12:00pm UTC (midday)
- Where: Mills Meet : Yarn.social
- Cadence: 4th Saturday of every Month

Agenda:

- Let's talk about the upcoming changes to the Twtxt spec(s)
- See #xgghhnq

#Yarn.social #Meetup
My Position on the last few weeks of Twtxt spec discussions:

- We increase the Hash length from 7 to 11.
- We formalise the Update Commands extension.
- We amend the Twt Hash and Metadata extension to state:

> Feed authors that wish to change the location of their feed (_once Twts have been published_) must append a new # url = comment to their feed to indicate the new location and thus change the "Hashing URI" used for Twts from _that_ point onward.

This has implications of the "order" of a feed, and we should either do one of two things, either:

- Mandate that feeds are append-only.
- Or amend the Metadata spec with a new field that denotes the order of the feed so clients can make sense of "inline" comments in the feed. -- This would also imply that the default order is (_of course_) append-only. Suggestion: # direction = [append|prepend]
My Position on the last few weeks of Twtxt spec discussions:

- We increase the Hash length from 7 to 11.
- We formalise the Update Commands extension.
- We amend the Twt Hash and Metadata extension to state:

> Feed authors that wish to change the location of their feed (_once Twts have been published_) must append a new # url = comment to their feed to indicate the new location and thus change the "Hashing URI" used for Twts from _that_ point onward.

This has implications of the "order" of a feed, and we should either do one of two things, either:

- Mandate that feeds are append-only.
- Or amend the Metadata spec with a new field that denotes the order of the feed so clients can make sense of "inline" comments in the feed. -- This would also imply that the default order is (_of course_) append-only. Suggestion: # direction = [append|prepend]
I finally decided to do a few experiments with yarnd to see how many things would break and how many assumptions there are around the idea of "Content Addressing"; here's where I'm at so far:

- What breaks

Basically I'm at a point where spending time on this is going to provide very little value, there are assumptions made in the lextwt parser, assumptions made in yarnd, assumptions in the way storage is done and the way threading works and things are looked up. There are far reaching implications to changing the way Twts are identified here to be "location addressed" that I'm quite worried about the amount of effort would be required to change yarnd here.

I finally decided to do a few experiments with yarnd to see how many things would break and how many assumptions there are around the idea of "Content Addressing"; here's where I'm at so far:

- What breaks

Basically I'm at a point where spending time on this is going to provide very little value, there are assumptions made in the lextwt parser, assumptions made in yarnd, assumptions in the way storage is done and the way threading works and things are looked up. There are far reaching implications to changing the way Twts are identified here to be "location addressed" that I'm quite worried about the amount of effort would be required to change yarnd here.

@mckinley Yes I have, however I'm not counting that because even using "Cloud" is not labor free.
@mckinley Yes I have, however I'm not counting that because even using "Cloud" is not labor free.
@aelaraji We digits it out 🤣 @lyse 's little hack was good but only temporary 🤣
@aelaraji We digits it out 🤣 @lyse 's little hack was good but only temporary 🤣
@sorenpeter Lins of agree with dealing with this kind of social nonsense which we've all done in the past 🤣
@sorenpeter Lins of agree with dealing with this kind of social nonsense which we've all done in the past 🤣
@movq I think your scenario doesn't account for clients and their storage. The scenario described only really affects clients that come along later. Even then they would also be able to re-fetch mossing Twts from peers or even a search engine to fill in the gaps.
@movq I think your scenario doesn't account for clients and their storage. The scenario described only really affects clients that come along later. Even then they would also be able to re-fetch mossing Twts from peers or even a search engine to fill in the gaps.
@movq That's kind a problem though right?
@movq That's kind a problem though right?
@david 🤣🤣🤣
@david 🤣🤣🤣
I just realized the other big property you lose is:

> What if someone completely changes the content of the root of the thread?

Does the Subject reference the feed and timestamp only or the intent too?
I just realized the other big property you lose is:

> What if someone completely changes the content of the root of the thread?

Does the Subject reference the feed and timestamp only or the intent too?
@bender Yeah I'll be honest here; I'm not going to be very happy if we go down this "location addressing" route;

- Twt Subjects lose their meaning.
- Twt Subjects cannot be verified without looking up the feed.
- Which may or may not exist anymore or may change.
- Two persons cannot reply to a Twt independently of each other anymore.

_and probably some other properties we'd stand to lose that I'm forgetting about..._
@bender Yeah I'll be honest here; I'm not going to be very happy if we go down this "location addressing" route;

- Twt Subjects lose their meaning.
- Twt Subjects cannot be verified without looking up the feed.
- Which may or may not exist anymore or may change.
- Two persons cannot reply to a Twt independently of each other anymore.

_and probably some other properties we'd stand to lose that I'm forgetting about..._
@movq One of the biggest reasons I don't like the (replyto:…) proposal (_location addressing vs. content addressing_) is that you just introduce a similar problem down the track, albeit rarer where if a feed changes its location, your thread's "identifiers" are no longer valid, unless those feed authors maintain strict URL redirects, etc. This potentially has the long-term effect of being rather fragile, as opposed to what we have now where an Edit just really causes a natural fork in the thread, which is how "forking" works in the first place.

I realise this is a bit pret here, and it probably doesn't matter a whole lot at our size. But I'm trying to think way ahead, to a point where Twtxt as a "thing" can continue to work and function decades from now, even with the extensions we've built. We've already proven for example that Twts and threads from ~4 years ago still work and are easily looked up haha 😝~
@movq One of the biggest reasons I don't like the (replyto:…) proposal (_location addressing vs. content addressing_) is that you just introduce a similar problem down the track, albeit rarer where if a feed changes its location, your thread's "identifiers" are no longer valid, unless those feed authors maintain strict URL redirects, etc. This potentially has the long-term effect of being rather fragile, as opposed to what we have now where an Edit just really causes a natural fork in the thread, which is how "forking" works in the first place.

I realise this is a bit pret here, and it probably doesn't matter a whole lot at our size. But I'm trying to think way ahead, to a point where Twtxt as a "thing" can continue to work and function decades from now, even with the extensions we've built. We've already proven for example that Twts and threads from ~4 years ago still work and are easily looked up haha 😝~
I just read the primary spec I'm strongly in support of and it's pretty rock solid for me 👌 💯