# I am the Watcher. I am your guide through this vast new twtiverse.
# 
# Usage:
#     https://watcher.sour.is/api/plain/users              View list of users and latest twt date.
#     https://watcher.sour.is/api/plain/twt                View all twts.
#     https://watcher.sour.is/api/plain/mentions?uri=:uri  View all mentions for uri.
#     https://watcher.sour.is/api/plain/conv/:hash         View all twts for a conversation subject.
# 
# Options:
#     uri     Filter to show a specific users twts.
#     offset  Start index for quey.
#     limit   Count of items to return (going back in time).
# 
# twt range = 1 60445
# self = https://watcher.sour.is?uri=https://twtxt.net/user/prologic/twtxt.txt&offset=57291
# next = https://watcher.sour.is?uri=https://twtxt.net/user/prologic/twtxt.txt&offset=57391
# prev = https://watcher.sour.is?uri=https://twtxt.net/user/prologic/twtxt.txt&offset=57191
> "Everything should be made as simple as possible, but not simpler."
– *Albert Einstein*

> The beauty of simplicity lies in not losing the essence.

#simplicity #Einstein #wisdom
Don't forget about the upcoming Yarn.social monthly online meetup. See #jjbnvgq for details.
Don't forget about the upcoming Yarn.social monthly online meetup. See #jjbnvgq for details.
Last day to have your say before our monthly online meetup 👋

http://polljunkie.com/poll/xdgjib/twtxt-v2
Last day to have your say before our monthly online meetup 👋

http://polljunkie.com/poll/xdgjib/twtxt-v2
@anth Thank you I'll have a read 👌
@anth Thank you I'll have a read 👌
@sorenpeter i'm just saying that your argument, better support better clients and worrying less about the actual underlying raw Twtxt feed. so the simplicity argument is a bit weaker here.
@sorenpeter i'm just saying that your argument, better support better clients and worrying less about the actual underlying raw Twtxt feed. so the simplicity argument is a bit weaker here.
@sorenpeter This is an argument for better clients really and less worry about the "transport" -- the raw Twtxt feed file.
@sorenpeter This is an argument for better clients really and less worry about the "transport" -- the raw Twtxt feed file.
@sorenpeter CPU cost of calculating hashes are negligible
@sorenpeter CPU cost of calculating hashes are negligible
@lyse Haha 😝
@lyse Haha 😝
@lyse Now increase the indexes on the Twt Subject form 7 bytes to 64 bytes 😈
@lyse Now increase the indexes on the Twt Subject form 7 bytes to 64 bytes 😈
@lyse Congrats 🙌
@lyse Congrats 🙌
Hmm this question has a leading "Yes" in favor of so far with 13 votes:

> Should we formally support edit and deletion requests?


Thanks y'all for voting (_it's all anonymous so I have no idea who's voted for what!_)

If you haven't already had your say, please do so here: http://polljunkie.com/poll/xdgjib/twtxt-v2 -- This is my feeble attempt at trying to ascertain the voice of the greater community with ideas of a Twtxt v2 specification (_which I'm hoping will just be an improved specification of what we largely have already built to date with some small but important improvements 🤞_)_
Hmm this question has a leading "Yes" in favor of so far with 13 votes:

> Should we formally support edit and deletion requests?


Thanks y'all for voting (_it's all anonymous so I have no idea who's voted for what!_)

If you haven't already had your say, please do so here: http://polljunkie.com/poll/xdgjib/twtxt-v2 -- This is my feeble attempt at trying to ascertain the voice of the greater community with ideas of a Twtxt v2 specification (_which I'm hoping will just be an improved specification of what we largely have already built to date with some small but important improvements 🤞_)_
Starting a couple of new projects (_geez where do I find the time?!_):

HomeTunnel:
> HomeTunnel is a self-hosted solution that combines secure tunneling, proxying, and automation to create your own private cloud. Utilizing Wireguard for VPN, Caddy for reverse proxying, and Traefik for service routing, HomeTunnel allows you to securely expose your home network services (such as Gitea, Poste.io, etc.) to the Internet. With seamless automation and on-demand TLS, HomeTunnel gives you the power to manage your own cloud-like environment with the control and privacy of self-hosting.

CraneOps:
> craneops is an open-source operator framework, written in Go, that allows self-hosters to automate the deployment and management of infrastructure and applications. Inspired by Kubernetes operators, CraneOps uses declarative YAML Custom Resource Definitions (CRDs) to manage Docker Swarm deployments on Proxmox VE clusters.
Starting a couple of new projects (_geez where do I find the time?!_):

HomeTunnel:
> HomeTunnel is a self-hosted solution that combines secure tunneling, proxying, and automation to create your own private cloud. Utilizing Wireguard for VPN, Caddy for reverse proxying, and Traefik for service routing, HomeTunnel allows you to securely expose your home network services (such as Gitea, Poste.io, etc.) to the Internet. With seamless automation and on-demand TLS, HomeTunnel gives you the power to manage your own cloud-like environment with the control and privacy of self-hosting.

CraneOps:
> craneops is an open-source operator framework, written in Go, that allows self-hosters to automate the deployment and management of infrastructure and applications. Inspired by Kubernetes operators, CraneOps uses declarative YAML Custom Resource Definitions (CRDs) to manage Docker Swarm deployments on Proxmox VE clusters.
I think that's one of the worst aspects of the proposed idea of location-based addressing or identity. The fact that Alice reads Twt A and Bob reads Twt A at the same location, but Alice and Bob _could_ have in fact read very different content entirely. It is no longer possible to have consistency in a decentralised way that works properly.

One could argue this is fine, because we're so small and nothing matters, but it's a properly I rely on fairly heavily in yarnd, a properly that if lost would have significant impact on how yarnd works I think. 🤔
I think that's one of the worst aspects of the proposed idea of location-based addressing or identity. The fact that Alice reads Twt A and Bob reads Twt A at the same location, but Alice and Bob _could_ have in fact read very different content entirely. It is no longer possible to have consistency in a decentralised way that works properly.

One could argue this is fine, because we're so small and nothing matters, but it's a properly I rely on fairly heavily in yarnd, a properly that if lost would have significant impact on how yarnd works I think. 🤔
Unless I"m missing something here 🤔 But a <url> <timestamp> does not for me identify an individual Twt, it only identifies its location, which may or may not have changed since I last saw a version of it hmmm 🧐
Unless I"m missing something here 🤔 But a <url> <timestamp> does not for me identify an individual Twt, it only identifies its location, which may or may not have changed since I last saw a version of it hmmm 🧐
Also I'm not even sure I can validly cache, let alone index feeds anymore if we do this, because if the structure of a Twt is cuh that I can no longer trust that an individual Twt's content hasn't been changed at the source, what's the point of caching or indexing individual twts at all? This makes the implementations of yarnd and yarns (_the search engine, crawlers and indexer_) kind of hard to reason about.
Also I'm not even sure I can validly cache, let alone index feeds anymore if we do this, because if the structure of a Twt is cuh that I can no longer trust that an individual Twt's content hasn't been changed at the source, what's the point of caching or indexing individual twts at all? This makes the implementations of yarnd and yarns (_the search engine, crawlers and indexer_) kind of hard to reason about.
Also you're right I guess. But still that also requires the author not to change the timestamp too. Hmmm
Also you're right I guess. But still that also requires the author not to change the timestamp too. Hmmm
@movq I don't think there's any misunderstand at all. I just treat every lines in a feed as an individual entity. These are stored on their own.
@movq I don't think there's any misunderstand at all. I just treat every lines in a feed as an individual entity. These are stored on their own.
@movq So I obviously happen to agree with you as well. However in so saying, one of my goals was also to bring the simplicity of Twtxt to the Web and for the general "lay person" (_of sorts_). So I eventually found myself building yarnd. Has it been successful, well sort of, somewhat (_but that doesn't matter, I like that it's small and niche anyway_).

I agree that the goal of simplicity is a good goal to strive for, which is why I'm actually suggesting we change the Twt identifiers to be a simple SHA256 hash, something that everyone understand and has readily available tools for. I really don't think we should be doing any of this by hand to be honest. But part of the beauty of Twt Subject and Twt Hash(es) in the first place is replying by hand is much much easier because you only have a short 7 or 11 character thing to copy/paste in your reply. Switching to something like <url> <timestamp> with a space in it is going to become a lot harder to copy/paste, because you can't "double click" (_or is it triple click for some?_) to copy/paste to your clipboard/buffer now 🤣

Anyway I digress... On the whole edit thing, I'm actually find if we don't support it at all and don't build a protocol around that. I have zero issues with dropping that as an idea. Why? Because I actually think that clients should be auto-detecting edits anyway. They already can, I've PoC'd this myself, I _think_ it can be done. I haven't (yet), and one of the reasons I've not spent much effort in it is it isn't something that comes up frequently anyway.

Who cares if a thread breaks every now 'n again anyway?_
@movq So I obviously happen to agree with you as well. However in so saying, one of my goals was also to bring the simplicity of Twtxt to the Web and for the general "lay person" (_of sorts_). So I eventually found myself building yarnd. Has it been successful, well sort of, somewhat (_but that doesn't matter, I like that it's small and niche anyway_).

I agree that the goal of simplicity is a good goal to strive for, which is why I'm actually suggesting we change the Twt identifiers to be a simple SHA256 hash, something that everyone understand and has readily available tools for. I really don't think we should be doing any of this by hand to be honest. But part of the beauty of Twt Subject and Twt Hash(es) in the first place is replying by hand is much much easier because you only have a short 7 or 11 character thing to copy/paste in your reply. Switching to something like <url> <timestamp> with a space in it is going to become a lot harder to copy/paste, because you can't "double click" (_or is it triple click for some?_) to copy/paste to your clipboard/buffer now 🤣

Anyway I digress... On the whole edit thing, I'm actually find if we don't support it at all and don't build a protocol around that. I have zero issues with dropping that as an idea. Why? Because I actually think that clients should be auto-detecting edits anyway. They already can, I've PoC'd this myself, I _think_ it can be done. I haven't (yet), and one of the reasons I've not spent much effort in it is it isn't something that comes up frequently anyway.

Who cares if a thread breaks every now 'n again anyway?_
@doesnm Like maybe you need to check something, debug a client, or whatever 😅
@doesnm Like maybe you need to check something, debug a client, or whatever 😅
Don't forget about the upcoming Yarn.social online meetup coming up this Saturday! 😅 See #jjbnvgq for details! -- Hope to see y'all there 💪
Don't forget about the upcoming Yarn.social online meetup coming up this Saturday! 😅 See #jjbnvgq for details! -- Hope to see y'all there 💪
👋 Don't forget to take the Twtxt v2 poll 🙏 if you haven't done so already (_sorry about the confusing question at the end!_)
👋 Don't forget to take the Twtxt v2 poll 🙏 if you haven't done so already (_sorry about the confusing question at the end!_)
@doesnm I don't even advocate for reading Twtxt in its raw form in the first place, which is why I'm in favor of continuing to use content-based addressing (hashes) and incremental improve what we already have. IMO the only reason to read a Twtxt file in it's raw form is a) if you're a developer b) new feed author or c) debugging a client issue.
@doesnm I don't even advocate for reading Twtxt in its raw form in the first place, which is why I'm in favor of continuing to use content-based addressing (hashes) and incremental improve what we already have. IMO the only reason to read a Twtxt file in it's raw form is a) if you're a developer b) new feed author or c) debugging a client issue.
And finally the legibility of feeds when viewing them in their raw form are worsened as you go from a Twt Subject of (#abcdefg12345) to something like (https://twtxt.net/user/prologic/twtxt.txt 2024-09-22T07:51:16Z).
And finally the legibility of feeds when viewing them in their raw form are worsened as you go from a Twt Subject of (#abcdefg12345) to something like (https://twtxt.net/user/prologic/twtxt.txt 2024-09-22T07:51:16Z).
There is also a ~5x increase cost in memory utilization for any implementations or implementors that use or wish to use in-memory storage (yarnd does for example) and equally a 5x increase in on-disk storage as well. This is based on the Twt Hash going from a 13 bytes (content-addressing) to 63 bytes (on average for location-based addressing). There is roughly a ~20-150% increase in the size of individual feeds as well that needs to be taken into consideration (_on the average case_).
There is also a ~5x increase cost in memory utilization for any implementations or implementors that use or wish to use in-memory storage (yarnd does for example) and equally a 5x increase in on-disk storage as well. This is based on the Twt Hash going from a 13 bytes (content-addressing) to 63 bytes (on average for location-based addressing). There is roughly a ~20-150% increase in the size of individual feeds as well that needs to be taken into consideration (_on the average case_).
With Location-based addressing there is no way to verify that a single Twt _actaully_ came from that feed without actually fetching the feed and checking. That has the effect of always having to rely on fetching the feed and storing a copy of feeds you fetch (_which is okay_), but you're force to do this. You cannot really share individual Twts anymore really like yarnd does (_as peering_) because there is no "integrity" to the Twt identified by it's <url> <timestamp>. The identify is meaningless and is only valid as long as you can trust the location and that the location at that point hasn't changed its content.
With Location-based addressing there is no way to verify that a single Twt _actaully_ came from that feed without actually fetching the feed and checking. That has the effect of always having to rely on fetching the feed and storing a copy of feeds you fetch (_which is okay_), but you're force to do this. You cannot really share individual Twts anymore really like yarnd does (_as peering_) because there is no "integrity" to the Twt identified by it's <url> <timestamp>. The identify is meaningless and is only valid as long as you can trust the location and that the location at that point hasn't changed its content.
Location-based addressing is vulnerable to the content changing. If the content changes the "location" is no longer valid. This is a problem if you build systems that rely on this.
Location-based addressing is vulnerable to the content changing. If the content changes the "location" is no longer valid. This is a problem if you build systems that rely on this.
So really your argument is just that switching to a location-based addressing "just makes sense". Why? Without concrete pros/cons of each approach this isn't really a strong argument I'm afraid. In fact I probably need to just sit down and detail the properties of both approaches and the pros/cons of both.

I also don't really buy the argument of simplicity either personally, because I don't technically see it much more difficult to take a echo -e "<url>\t<timestamp>\t<content>" | sha256sum | base64 as the Twt Subject or concatenating the <url> <timestamp> -- The "effort" is the same. If we're going to argue that SHA256 or cryptographic hashes are "too complicated" then I'm not really sure how to support that argument.
So really your argument is just that switching to a location-based addressing "just makes sense". Why? Without concrete pros/cons of each approach this isn't really a strong argument I'm afraid. In fact I probably need to just sit down and detail the properties of both approaches and the pros/cons of both.

I also don't really buy the argument of simplicity either personally, because I don't technically see it much more difficult to take a echo -e "<url>\t<timestamp>\t<content>" | sha256sum | base64 as the Twt Subject or concatenating the <url> <timestamp> -- The "effort" is the same. If we're going to argue that SHA256 or cryptographic hashes are "too complicated" then I'm not really sure how to support that argument.
So really your argument is just that switching to a location-based addressing "just makes sense". Why? Without concrete pros/cons of each approach this isn't really a strong argument I'm afraid. In fact I probably need to just sit down and detail the properties of both approaches and the pros/cons of both.

I also don't really buy the argument of simplicity either personally, because I don't technically see it much more difficult to take a echo -e "<url>\\t<timestamp>\\t<content>" | sha256sum | base64 as the Twt Subject or concatenating the <url> <timestamp> -- The "effort" is the same. If we're going to argue that SHA256 or cryptographic hashes are "too complicated" then I'm not really sure how to support that argument.
@sorenpeter Points 2 & 3 aren't really applicable here in the discussion of the threading model really I'm afraid. WebMentions is completely orthogonal to the discussion. Further, no-one that uses Twtxt really uses WebMentions, whilst yarnd supports the use of WebMentions, it's very rarely used in practise (_if ever_) -- In fact I should just drop the feature entirely.

The use of WebSub OTOH is far more useful and is used by every single yarnd pod everywhere (_no that there's that many around these days_) to subscribe to feed updates in ~near real-time _without_ having the poll constantly.~
@sorenpeter Points 2 & 3 aren't really applicable here in the discussion of the threading model really I'm afraid. WebMentions is completely orthogonal to the discussion. Further, no-one that uses Twtxt really uses WebMentions, whilst yarnd supports the use of WebMentions, it's very rarely used in practise (_if ever_) -- In fact I should just drop the feature entirely.

The use of WebSub OTOH is far more useful and is used by every single yarnd pod everywhere (_no that there's that many around these days_) to subscribe to feed updates in ~near real-time _without_ having the poll constantly.~
@doesnm Welcome back 😅
@doesnm Welcome back 😅
@eapl.me Sad to see you go, disappointed in your choice of X, but respect your decision and choice. I will never cave in myself, even if it means my "circle of friends" remains low. I guess we call 'em internet friends right? 😅
@eapl.me Sad to see you go, disappointed in your choice of X, but respect your decision and choice. I will never cave in myself, even if it means my "circle of friends" remains low. I guess we call 'em internet friends right? 😅
@lyse How violent is the thunderstorm? 🤔
@lyse How violent is the thunderstorm? 🤔
@aelaraji LOl 😂
@aelaraji LOl 😂
A new thing LLM(s) can't do well. Write patches 🤣
A new thing LLM(s) can't do well. Write patches 🤣
@lyse Yeah I _think_ it's one of the reasons why yarnd's cache became so complicated really. I mean it's a bunch of maps and lists that is recalculated every ~5m. I don't know of any better way to do this right now, but maybe one day I'll figure out a better way to represent the same information that is displayed today that works reasonably well.~
@lyse Yeah I _think_ it's one of the reasons why yarnd's cache became so complicated really. I mean it's a bunch of maps and lists that is recalculated every ~5m. I don't know of any better way to do this right now, but maybe one day I'll figure out a better way to represent the same information that is displayed today that works reasonably well.~
My point is, this is not a small trade-off to make for the sake of simplicity 😅
My point is, this is not a small trade-off to make for the sake of simplicity 😅
@movq Maybe I misspoke. It's a factor of 5 in the size of the keyspace required. The impact is significantly less for on-disk storage of raw feeds and such, around ~1-1.5x depending on how many replies there are I suppose.

I wasn't very clear; my apologies. If we update the current hash truncation length from 7 to 11. But then still decide anyway to go down this location-based twt identity and threading model then yes, we're talking about twt subjects having a ~5x increase in size on average. Going from 14 characters (11 for the has, 2 for the parens, 1 for the #) to ~63 bytes (average I've worked out of length of URL + Timestamp) + 3 byte overhead for parents and space.~
@movq Maybe I misspoke. It's a factor of 5 in the size of the keyspace required. The impact is significantly less for on-disk storage of raw feeds and such, around ~1-1.5x depending on how many replies there are I suppose.

I wasn't very clear; my apologies. If we update the current hash truncation length from 7 to 11. But then still decide anyway to go down this location-based twt identity and threading model then yes, we're talking about twt subjects having a ~5x increase in size on average. Going from 14 characters (11 for the has, 2 for the parens, 1 for the #) to ~63 bytes (average I've worked out of length of URL + Timestamp) + 3 byte overhead for parents and space.~
@lyse Yes I think so.
@lyse Yes I think so.
Don't forget about the upcoming Yarn.social meetup coming up this Saturday! See #jjbnvgq for details! Hope to see some/all of y'all there 💪
Don't forget about the upcoming Yarn.social meetup coming up this Saturday! See # for details! Hope to see some/all of y'all there 💪
Don't forget about the upcoming Yarn.social meetup coming up this Saturday! See #jjbnvgq for details! Hope to see some/all of y'all there 💪
@lyse And your query to construct a tree? Can you share the full query (_screenshot looks scary 🤣_) -- On another note, SQL and relational databases aren't really that conduces to tree-like structures are they? 🤣_
@lyse And your query to construct a tree? Can you share the full query (_screenshot looks scary 🤣_) -- On another note, SQL and relational databases aren't really that conduces to tree-like structures are they? 🤣_
In fact it depends on how many Twts there are that form part of a thread, if you take a much larger sample size of my own feed for example, it starts to approximate ~1.5x increase in size:


$ ./compare.sh https://twtxt.net/user/prologic/twtxt.txt 500
Original file size: 126842 bytes
Modified file size: 317029 bytes
Percentage increase in file size: 149.94%
...
~
In fact it depends on how many Twts there are that form part of a thread, if you take a much larger sample size of my own feed for example, it starts to approximate ~1.5x increase in size:


$ ./compare.sh https://twtxt.net/user/prologic/twtxt.txt 500
Original file size: 126842 bytes
Modified file size: 317029 bytes
Percentage increase in file size: 149.94%
...
~
In fact @falsifian you had quite a lot of good feedback, do you mind collecting them in a task list on the doc somewhere so I can get to em? 🤔
In fact @falsifian you had quite a lot of good feedback, do you mind collecting them in a task list on the doc somewhere so I can get to em? 🤔
Can someone make the edit?
Can someone make the edit?
@movq Tbis was just a representative sample. The real concrete cost here is a ~5x increase in memory consumption for yarnd and/or ~5x increase in disk storage.
@movq Tbis was just a representative sample. The real concrete cost here is a ~5x increase in memory consumption for yarnd and/or ~5x increase in disk storage.
@lyse Mind sharing your schema?
@lyse Mind sharing your schema?
@lyse Not sure I'll check
@lyse Not sure I'll check
@lyse My proposal is three steps:

- increase the hash length from 7 to 11

Then:

- Add support for changing your feed's location without breaking g threads

Then much later:

- Add formal support for edits
@lyse My proposal is three steps:

- increase the hash length from 7 to 11

Then:

- Add support for changing your feed's location without breaking g threads

Then much later:

- Add formal support for edits
@lyse No I don't either just say'n 😅
@lyse No I don't either just say'n 😅
@movq That's what I want to know 🤣
@movq That's what I want to know 🤣
So just to be clear, it's not as bad as the OP in this thread, this is just a worst case scenario. With some additional analysis I did today, its closer to around ~5x the memory requirements of my pod, which would roughly go from ~22MB to ~120MB or so, probably a bit more in practise. But this is still a significant increase in memory. The on-disk requirements would also increase by around ~5x as well on average going from ~12GB to about ~60GB at current archive size.