The Watcher

	
# I am the Watcher. I am your guide through this vast new twtiverse.
# 
# Usage:
#     https://watcher.sour.is/api/plain/users              View list of users and latest twt date.
#     https://watcher.sour.is/api/plain/twt                View all twts.
#     https://watcher.sour.is/api/plain/mentions?uri=:uri  View all mentions for uri.
#     https://watcher.sour.is/api/plain/conv/:hash         View all twts for a conversation subject.
# 
# Options:
#     uri     Filter to show a specific users twts.
#     offset  Start index for quey.
#     limit   Count of items to return (going back in time).
# 
# twt range = 1 52
# self = https://watcher.sour.is/conv/tuizh4q

mutefall

twtxt.net

10 Mar 22 23:36 UTC

@nexeq given lightweight nature of yarnd and the twtxt protocol in general relying on a text file and a cache layer, a poderator who would desire to have n(x) timeline (i.e. all tweets from 2017 to today) would have to invest heavily in infrastructure and the protocol twtxt and client yarnd would have to be redesigned from the ground up.

now that being said, let's say there's a post that turns into a yarn and people respond to it frequently it may be more prevalent and show up in your feed if you are indeed engaged in said yarn.

caesar

twtxt.net

11 Mar 22 10:37 UTC

@mutefall @prologic I don't understand this answer at all from a technical perspective (leaving any philosophical arguments aside). A twtxt file is *literally* a flat file containing a list of all of a person's posts. Surely simply displaying all of that person's posts in Yarn should be the *easiest possible* thing to do, way easier than threading etc. Why would it require "investing heavily in infrastructure" or for the protocol to be "redesigned from the ground up"?

I'm guessing I've misunderstood what you're saying; can you help me understand?

prologic

twtxt.net

11 Mar 22 11:00 UTC

@caesar What if that text file is 1MB in size? How do you display this in any reasonable way? What if it was recently rotated (something that occurs once feeds reach a certain size). Moreover, even if the feed file itself was relatively small, you would incur processing resources as you would have to parse it over and over just to serve the purpose? Which is what? To view the entire contents one someone's feed? 😅

Hope this helps 😅

prologic

twtxt.net

11 Mar 22 11:00 UTC

@caesar What if that text file is 1MB in size? How do you display this in any reasonable way? What if it was recently rotated (something that occurs once feeds reach a certain size). Moreover, even if the feed file itself was relatively small, you would incur processing resources as you would have to parse it over and over just to serve the purpose? Which is what? To view the entire contents one someone's feed? 😅

Hope this helps 😅

caesar

twtxt.net

11 Mar 22 11:07 UTC

@prologic
> How do you display this in any reasonable way?

Pagination? Like Yarn uses elsewhere. Or infinite scroll, but from the server side that's still pagination.

> Which is what? To view the entire contents one someone’s feed? 😅

Exactly. Every other social network has that feature; I've missed it here serveral times already and it looks like I'm not the only one.

I still don't get the difficulty from a technical point of view I'm afraid. 🤔

prologic

twtxt.net

11 Mar 22 11:27 UTC

@caesar

> Pagination? Like Yarn uses elsewhere. Or infinite scroll, but from the server side that’s still pagination.

Sure. Possible. Infinite scroll on an SSR isn't really possible without significant use of JS AFIAK.

> Exactly. Every other social network has that feature; I’ve missed it here serveral times already and it looks like I’m not the only one.

We don't 😀 See philosophical reasons.

> I still don’t get the difficulty from a technical point of view I’m afraid. 🤔

It's a design decision...

prologic

twtxt.net

11 Mar 22 11:27 UTC

@caesar

> Pagination? Like Yarn uses elsewhere. Or infinite scroll, but from the server side that’s still pagination.

Sure. Possible. Infinite scroll on an SSR isn't really possible without significant use of JS AFIAK.

> Exactly. Every other social network has that feature; I’ve missed it here serveral times already and it looks like I’m not the only one.

We don't 😀 See philosophical reasons.

> I still don’t get the difficulty from a technical point of view I’m afraid. 🤔

It's a design decision...

prologic

twtxt.net

11 Mar 22 11:28 UTC

Feeds are periodically fetched, cache is updated and views are rendered or API responses are provided from the cache. Cache is limited by Size per Feed and TTL

prologic

twtxt.net

11 Mar 22 11:28 UTC

Feeds are periodically fetched, cache is updated and views are rendered or API responses are provided from the cache. Cache is limited by Size per Feed and TTL

caesar

twtxt.net

11 Mar 22 13:16 UTC

@prologic
> philosophical reasons [...] design decision

That I can understand (though to the extent that I understand it, I think I disagree with it 😄). I was asking more about the technical barriers @mutefall mentioned.

> responses are provided from the cache

I see, so we're taking about an architectural limitation in Yarn, rather than twtxt. Still, I know cache invalidation is famously hard, but surely an intentional page load from a user trying to view a feed that isn't (fully) cached is about the best signal you could get to fetch that data from the origin? 🤔

mutefall

twtxt.net

11 Mar 22 13:32 UTC

> Sure. Possible. Infinite scroll on an SSR isn’t really possible without significant use of JS AFIAK.

accurate. the backend has to be able to catch the event from the browser which there's a disconnect unless you have some sort of client-side hook to trigger the backend to advance the next pagination call.

mutefall

twtxt.net

11 Mar 22 13:35 UTC

> A twtxt file is literally a flat file containing a list of all of a person’s posts. Surely simply displaying all of that person’s posts in Yarn should be the easiest possible thing to do, way easier than threading etc. Why would it require “investing heavily in

displaying a flat text file is in principle not problematic at all. the crux of the situation is the scale-factor.

mutefall

twtxt.net

11 Mar 22 13:38 UTC

put aside the idea of a 1mb text file. what if your pod user's grow their feed to 60mb over months or years? how do you effectively cache and serve this without putting more compute resources behind it? to illustrate, go to github and try loading rawtext one of those big blocklists with tons of entries. takes 10-20s, yes? similar principal applies here.

pagination would be a good way to work around this but without js being involved i'm not sure how it would be done.

prologic

twtxt.net

11 Mar 22 14:57 UTC

@mutefall pagination is also kind of tricky to do in the first place, because the entire feed has to be parsed, loaded into memory, then paginated. it's terribly inefficient. one could argue you could use a giant big SQL database and come up with some kind of schema, but that's not really the point I don't think nor really desirable for many reasons.

prologic

twtxt.net

11 Mar 22 14:57 UTC

@mutefall pagination is also kind of tricky to do in the first place, because the entire feed has to be parsed, loaded into memory, then paginated. it's terribly inefficient. one could argue you could use a giant big SQL database and come up with some kind of schema, but that's not really the point I don't think nor really desirable for many reasons.

mutefall

twtxt.net

11 Mar 22 20:14 UTC

@prologic indeed. even with a db backing it pagination is usually an expensive query operation in addition to loading it into the heap.

i think there's an rfc for adding pagination to the hardest problems in computer science. if there isn't, there should be :-)

caesar

twtxt.net

11 Mar 22 21:01 UTC

I'm clearly going to have to take a proper look at the code and get a feeling for the data architecture to understand this! From the outside I have to say if something as simple as "display all of a user's posts" is impossible – especially when a twtxt file *is* literally a list of all of a user's posts – it feels like some *very* strange architectural choices must have been made… but I am also well aware that a lot of painstaking thought by very clever people has gone into this, and I haven't even looked at the code, so don't mind me 😆

caesar

twtxt.net

11 Mar 22 21:02 UTC

I also totally get whet you're saying about a twtxt file *potentially* growing to be huge. I guess that, and the fact that it's necessary to work around it with a significant caching architecture, is a major downside to the model of twtxt itself which I hadn't considered.

caesar

twtxt.net

11 Mar 22 21:07 UTC

I guess I should go read the code before asking too many questions, but I'm a little puzzled why the same issues with a feed being huge don't present an issue *every time* you want to poll for updates? Particularly with the apparent convention of the newest posts being at the bottom of the file.

As for pagination, sure, it can be hard, but why would it be harder in this case than in the cases where Yarn already does it?

(As for infinite scroll, if you have pagination on the server side already, it's trivial on the client side. Yes you need JS of course, but not a lot)

mutefall

twtxt.net

11 Mar 22 21:26 UTC

> I guess I should go read the code before asking too many questions, but I’m a little puzzled why the same issues with a feed being huge don’t present an issue every time you want to poll for updates?

you're reading from cache, so it's quicker. memory will always have significantly faster iops vs disk-bound read operations. also recommend giving the codebase a look. there's always room for contributors. i'm planning to take a crack at a few issues.

prologic

twtxt.net

12 Mar 22 01:22 UTC

@caesar

> but I’m a little puzzled why the same issues with a feed being huge don’t present an issue every time you want to poll for updates?

They do! As I said in af4el2q Pods will refuse to fetch feeds over the --max-fetch-limit in size. Feeds are also rotated on Pods. There is also a soec for this.

prologic

twtxt.net

12 Mar 22 01:22 UTC

@caesar

> but I’m a little puzzled why the same issues with a feed being huge don’t present an issue every time you want to poll for updates?

They do! As I said in af4el2q Pods will refuse to fetch feeds over the --max-fetch-limit in size. Feeds are also rotated on Pods. There is also a soec for this.

prologic

twtxt.net

12 Mar 22 01:23 UTC

@caesar

> Particularly with the apparent convention of the newest posts being at the bottom of the file.

This is generally the convenatio, yes. And folks like @lyse @xuu @movq and I have considered and talked about formalizing the "direction" of a feed including supporting "Range" requests. These are both things that I will likely do myself at some point, because it further helps with optimizing the traffic/bandwidth used and helps keeps things running smoothly as the network scales over time.

prologic

twtxt.net

12 Mar 22 01:23 UTC

@caesar

> Particularly with the apparent convention of the newest posts being at the bottom of the file.

This is generally the convenatio, yes. And folks like @lyse @xuu @movq and I have considered and talked about formalizing the "direction" of a feed including supporting "Range" requests. These are both things that I will likely do myself at some point, because it further helps with optimizing the traffic/bandwidth used and helps keeps things running smoothly as the network scales over time.

prologic

twtxt.net

12 Mar 22 01:26 UTC

@caesar

> As for pagination, sure, it can be hard, but why would it be harder in this case than in the cases where Yarn already does it?

It's done in the background as a background job. See this Dashbaord for a visuaul:

> (As for infinite scroll, if you have pagination on the server side already, it’s trivial on the client side. Yes you need JS of course, but not a lot)

Remember the builtin Web Interface (an SSR) is designed to be able to used without Javascript (graceful degradation).

prologic

twtxt.net

12 Mar 22 01:26 UTC

@caesar

> As for pagination, sure, it can be hard, but why would it be harder in this case than in the cases where Yarn already does it?

It's done in the background as a background job. See this Dashbaord for a visuaul:

> (As for infinite scroll, if you have pagination on the server side already, it’s trivial on the client side. Yes you need JS of course, but not a lot)

Remember the builtin Web Interface (an SSR) is designed to be able to used without Javascript (graceful degradation).

prologic

twtxt.net

12 Mar 22 01:28 UTC

@mutefall

> you’re reading from cache, so it’s quicker. memory will always have significantly faster iops vs disk-bound read operations. also recommend giving the codebase a look. there’s always room for contributors. i’m planning to take a crack at a few issues.

It's even more than just "memory is faster than disk". The Cache is designed to have O(1) lookups on all Profile (think Feed) and User Timeline as well as Pod Discover views. This is very important for the UX.

prologic

twtxt.net

12 Mar 22 01:28 UTC

@mutefall

> you’re reading from cache, so it’s quicker. memory will always have significantly faster iops vs disk-bound read operations. also recommend giving the codebase a look. there’s always room for contributors. i’m planning to take a crack at a few issues.

It's even more than just "memory is faster than disk". The Cache is designed to have O(1) lookups on all Profile (think Feed) and User Timeline as well as Pod Discover views. This is very important for the UX.

prologic

twtxt.net

12 Mar 22 01:32 UTC

One thing I want to point out is that this "problem" (per se, remember it's a design decision) also exists in other places like:

Cache expired posts vanish from threads with no warning - yarn - Mills

As Twts fall off the active Cache and are archived in an on-disk Archive, Yarns and Twts eventually "disappear" (they don't really, they are still searchable and accessible as everything is content addressable).

prologic

twtxt.net

12 Mar 22 01:32 UTC

One thing I want to point out is that this "problem" (per se, remember it's a design decision) also exists in other places like:

Cache expired posts vanish from threads with no warning - yarn - Mills

As Twts fall off the active Cache and are archived in an on-disk Archive, Yarns and Twts eventually "disappear" (they don't really, they are still searchable and accessible as everything is content addressable).

prologic

twtxt.net

12 Mar 22 01:34 UTC

There are very good technical reasons for this design, but there are also very good human reasons for this too .

As my old man said to me many moons ago when I was first designing this (he helped and contributed ideas here!):

> If I said something X ago, I don't want someone to say "Hey but X ago you said this". What if I've changed my mind since then and now have a different opinion?

I'm paraphrasing here of course, we talk regularly on the phone, but a lot of ideas ans inspiration has come from my Dad 👌 -- The idea here is that Humans forget, so should Yarn.social

prologic

twtxt.net

12 Mar 22 01:34 UTC

There are very good technical reasons for this design, but there are also very good human reasons for this too .

As my old man said to me many moons ago when I was first designing this (he helped and contributed ideas here!):

> If I said something X ago, I don't want someone to say "Hey but X ago you said this". What if I've changed my mind since then and now have a different opinion?

I'm paraphrasing here of course, we talk regularly on the phone, but a lot of ideas ans inspiration has come from my Dad 👌 -- The idea here is that Humans forget, so should Yarn.social

prologic

twtxt.net

12 Mar 22 01:52 UTC

One more thing @caesar I forgot to add here is that the Cache Size and TTL are actually configurable at a Pod level via the -I, --max-cache-items and -C, --max-cache-ttl options which default to 150 and 240h by default. As you are a user on my pod at twtxt.net, these settings directly impact you. If you were to run your own pod (for example) you could choose to tweak these to your 'taste". @david for example runs his pod netbros.com with quite high Cache settings.

prologic

twtxt.net

12 Mar 22 01:52 UTC

One more thing @caesar I forgot to add here is that the Cache Size and TTL are actually configurable at a Pod level via the -I, --max-cache-items and -C, --max-cache-ttl options which default to 150 and 240h by default. As you are a user on my pod at twtxt.net, these settings directly impact you. If you were to run your own pod (for example) you could choose to tweak these to your 'taste". @david for example runs his pod netbros.com with quite high Cache settings.

tkanos

twtxt.net

12 Mar 22 05:09 UTC

Sorry I'm late, on the discussion, but as I see it. A big redis cluster will solve that issue (Twitter uses it) and a bit of js for the pagination client side. BUT the ability to be able to edit a post (impossible in twitter) makes it hard to have a big redis cluster.
The hardest part will mainly be for the client command line app.

mutefall

twtxt.net

12 Mar 22 23:25 UTC

@prologic indeed, but was trying to ease into an unpacking and ran out of characters before i blacked out.

mutefall

twtxt.net

12 Mar 22 23:26 UTC

@tkanos

> BUT the ability to be able to edit a post (impossible in twitter) makes it hard to have a big redis cluster.

you've no idea how close to home this is for me. one day i'll explain why. :-)

prologic

twtxt.net

12 Mar 22 23:41 UTC

@mutefall It's pretty easy to delete or even edit a Twt you posted on Yarn.social 😂 -- But it has unintended side-effects, due to the decentralised nature, you end up with UX problems where for example, someone makes a Twt A, realizes they've made a typo or mistake or something, then edits it (which is equivalent to delete + repost) and posts a new Twt A'

Dealing with this is hard™ But I have some ideas 😅

prologic

twtxt.net

12 Mar 22 23:41 UTC

@mutefall It's pretty easy to delete or even edit a Twt you posted on Yarn.social 😂 -- But it has unintended side-effects, due to the decentralised nature, you end up with UX problems where for example, someone makes a Twt A, realizes they've made a typo or mistake or something, then edits it (which is equivalent to delete + repost) and posts a new Twt A'

Dealing with this is hard™ But I have some ideas 😅

mutefall

twtxt.net

13 Mar 22 01:41 UTC

@prologic i noticed this in my own testing. definitely something that would be good to address. the doorslam method would be a prompt are you sure you're not knackered? then pod member posts. reminds me of google's beer goggle feature.

prologic

twtxt.net

13 Mar 22 02:01 UTC

@mutefall The ideas I have in mind to deal wit this are basically to get good at "detecting edits" in the first place at ingestion time. I've played around with a few "text similarity" algorithms and I _think_ we can reasonably (with high confidence) say that Twt A' was an edit of Twt A -- We _would_ cache and archive them both, but in the User Interface collapse them and show the Twt A' (with a visual indication/link that it was an edit of Twt A)

prologic

twtxt.net

13 Mar 22 02:01 UTC

@mutefall The ideas I have in mind to deal wit this are basically to get good at "detecting edits" in the first place at ingestion time. I've played around with a few "text similarity" algorithms and I _think_ we can reasonably (with high confidence) say that Twt A' was an edit of Twt A -- We _would_ cache and archive them both, but in the User Interface collapse them and show the Twt A' (with a visual indication/link that it was an edit of Twt A)

mutefall

twtxt.net

13 Mar 22 02:22 UTC

@prologic

given rfc 3339 timestamp, that could be a starting point if given n(x) similarities the one with the greater distance from current time could be considered the genesis perhaps?

life is becomes tricky without int || bigint ids :-)

ullarah

txt.quisquiliae.com

13 Mar 22 02:30 UTC

@prologic @mutefall or we could have a "pencil edit" icon near the date-time, or wherever, to indicate that it has been edited if the text edit algorithm has detected a significant change.

mutefall

twtxt.net

13 Mar 22 02:32 UTC

@ullarah that's a great signal method. could have a little hover over the pencil saying this post may have been edited, we're working to confirm.

don't mind me. i'm still bewildered by lorenzo lamas and how his hair can stay in one place.

prologic

twtxt.net

13 Mar 22 03:03 UTC

@mutefall Re RFC 3339 timestamps, if I understand you correctly, I _think_ it's extremely unlikely for someone to repost a Twt (an edit) within the same second (at least not humanly possible). In any case, I've only validated the ideas so far in isolation, the algorithm(s) need to be built, feature gated, measured, understood and finally put in place with some UX (I like @ullarah's gugestion)

prologic

twtxt.net

13 Mar 22 03:03 UTC

@mutefall Re RFC 3339 timestamps, if I understand you correctly, I _think_ it's extremely unlikely for someone to repost a Twt (an edit) within the same second (at least not humanly possible). In any case, I've only validated the ideas so far in isolation, the algorithm(s) need to be built, feature gated, measured, understood and finally put in place with some UX (I like @ullarah's gugestion)

tkanos

twtxt.net

13 Mar 22 03:35 UTC

Time stamp +user id+ hashtwt should be enough. So you can have an append log and do the edit in the application.

prologic

twtxt.net

13 Mar 22 03:42 UTC

@tkanos That's right, I _believe_ we have enough data to identify if a Twt was edited and I _think_ we can figure out a nice way to deal with this. Essentially it causes forks.

prologic

twtxt.net

13 Mar 22 03:42 UTC

@tkanos That's right, I _believe_ we have enough data to identify if a Twt was edited and I _think_ we can figure out a nice way to deal with this. Essentially it causes forks.

mutefall

twtxt.net

13 Mar 22 14:55 UTC

@prologic i meant using the timestamp as a poor-man's id and using the time whichever is oldest is the genesis type of thing. was thinking out loud, doesn't meant it's coherent :-)

mutefall

twtxt.net

13 Mar 22 14:58 UTC

@prologic whereas i'm sitting here thinking how i can devote a full stick of ram to my own :-\\