# I am the Watcher. I am your guide through this vast new twtiverse.
#
# Usage:
# https://watcher.sour.is/api/plain/users View list of users and latest twt date.
# https://watcher.sour.is/api/plain/twt View all twts.
# https://watcher.sour.is/api/plain/mentions?uri=:uri View all mentions for uri.
# https://watcher.sour.is/api/plain/conv/:hash View all twts for a conversation subject.
#
# Options:
# uri Filter to show a specific users twts.
# offset Start index for quey.
# limit Count of items to return (going back in time).
#
# twt range = 1 52
# self = https://watcher.sour.is/conv/tuizh4q
@nexeq given lightweight nature of yarnd and the twtxt protocol in general relying on a text file and a cache layer, a poderator who would desire to have n(x) timeline (i.e. all tweets from 2017 to today) would have to invest heavily in infrastructure and the protocol twtxt
and client yarnd
would have to be redesigned from the ground up.
now that being said, let's say there's a post that turns into a yarn and people respond to it frequently it may be more prevalent and show up in your feed if you are indeed engaged in said yarn.
@mutefall @prologic I don't understand this answer at all from a technical perspective (leaving any philosophical arguments aside). A twtxt file is *literally* a flat file containing a list of all of a person's posts. Surely simply displaying all of that person's posts in Yarn should be the *easiest possible* thing to do, way easier than threading etc. Why would it require "investing heavily in infrastructure" or for the protocol to be "redesigned from the ground up"?
I'm guessing I've misunderstood what you're saying; can you help me understand?
@caesar What if that text file is 1MB in size? How do you display this in any reasonable way? What if it was recently rotated (something that occurs once feeds reach a certain size). Moreover, even if the feed file itself was relatively small, you would incur processing resources as you would have to parse it over and over just to serve the purpose? Which is what? To view the entire contents one someone's feed? 😅
Hope this helps 😅
@caesar What if that text file is 1MB in size? How do you display this in any reasonable way? What if it was recently rotated (something that occurs once feeds reach a certain size). Moreover, even if the feed file itself was relatively small, you would incur processing resources as you would have to parse it over and over just to serve the purpose? Which is what? To view the entire contents one someone's feed? 😅
Hope this helps 😅
@prologic
> How do you display this in any reasonable way?
Pagination? Like Yarn uses elsewhere. Or infinite scroll, but from the server side that's still pagination.
> Which is what? To view the entire contents one someone’s feed? 😅
Exactly. Every other social network has that feature; I've missed it here serveral times already and it looks like I'm not the only one.
I still don't get the difficulty from a technical point of view I'm afraid. 🤔
@caesar
> Pagination? Like Yarn uses elsewhere. Or infinite scroll, but from the server side that’s still pagination.
Sure. Possible. Infinite scroll on an SSR isn't really possible without significant use of JS AFIAK.
> Exactly. Every other social network has that feature; I’ve missed it here serveral times already and it looks like I’m not the only one.
We don't 😀 See philosophical reasons.
> I still don’t get the difficulty from a technical point of view I’m afraid. 🤔
It's a design decision...
@caesar
> Pagination? Like Yarn uses elsewhere. Or infinite scroll, but from the server side that’s still pagination.
Sure. Possible. Infinite scroll on an SSR isn't really possible without significant use of JS AFIAK.
> Exactly. Every other social network has that feature; I’ve missed it here serveral times already and it looks like I’m not the only one.
We don't 😀 See philosophical reasons.
> I still don’t get the difficulty from a technical point of view I’m afraid. 🤔
It's a design decision...
Feeds are periodically fetched, cache is updated and views are rendered or API responses are provided from the cache. Cache is limited by Size per Feed and TTL
Feeds are periodically fetched, cache is updated and views are rendered or API responses are provided from the cache. Cache is limited by Size per Feed and TTL
@prologic
> philosophical reasons [...] design decision
That I can understand (though to the extent that I understand it, I think I disagree with it 😄). I was asking more about the technical barriers @mutefall mentioned.
> responses are provided from the cache
I see, so we're taking about an architectural limitation in Yarn, rather than twtxt. Still, I know cache invalidation is famously hard, but surely an intentional page load from a user trying to view a feed that isn't (fully) cached is about the best signal you could get to fetch that data from the origin? 🤔
> Sure. Possible. Infinite scroll on an SSR isn’t really possible without significant use of JS AFIAK.
accurate. the backend has to be able to catch the event from the browser which there's a disconnect unless you have some sort of client-side hook to trigger the backend to advance the next pagination call.
> A twtxt file is literally a flat file containing a list of all of a person’s posts. Surely simply displaying all of that person’s posts in Yarn should be the easiest possible thing to do, way easier than threading etc. Why would it require “investing heavily in
displaying a flat text file is in principle not problematic at all. the crux of the situation is the scale-factor.
put aside the idea of a 1mb text file. what if your pod user's grow their feed to 60mb over months or years? how do you effectively cache and serve this without putting more compute resources behind it? to illustrate, go to github and try loading rawtext one of those big blocklists with tons of entries. takes 10-20s, yes? similar principal applies here.
pagination would be a good way to work around this but without js being involved i'm not sure how it would be done.
@mutefall pagination is also kind of tricky to do in the first place, because the entire feed has to be parsed, loaded into memory, then paginated. it's terribly inefficient. one could argue you could use a giant big SQL database and come up with some kind of schema, but that's not really the point I don't think nor really desirable for many reasons.
@mutefall pagination is also kind of tricky to do in the first place, because the entire feed has to be parsed, loaded into memory, then paginated. it's terribly inefficient. one could argue you could use a giant big SQL database and come up with some kind of schema, but that's not really the point I don't think nor really desirable for many reasons.
@prologic indeed. even with a db backing it pagination is usually an expensive query operation in addition to loading it into the heap.
i think there's an rfc for adding pagination
to the hardest problems in computer science. if there isn't, there should be :-)
I'm clearly going to have to take a proper look at the code and get a feeling for the data architecture to understand this! From the outside I have to say if something as simple as "display all of a user's posts" is impossible – especially when a twtxt file *is* literally a list of all of a user's posts – it feels like some *very* strange architectural choices must have been made… but I am also well aware that a lot of painstaking thought by very clever people has gone into this, and I haven't even looked at the code, so don't mind me 😆
I also totally get whet you're saying about a twtxt file *potentially* growing to be huge. I guess that, and the fact that it's necessary to work around it with a significant caching architecture, is a major downside to the model of twtxt itself which I hadn't considered.
I guess I should go read the code before asking too many questions, but I'm a little puzzled why the same issues with a feed being huge don't present an issue *every time* you want to poll for updates? Particularly with the apparent convention of the newest posts being at the bottom of the file.
As for pagination, sure, it can be hard, but why would it be harder in this case than in the cases where Yarn already does it?
(As for infinite scroll, if you have pagination on the server side already, it's trivial on the client side. Yes you need JS of course, but not a lot)
> I guess I should go read the code before asking too many questions, but I’m a little puzzled why the same issues with a feed being huge don’t present an issue every time you want to poll for updates?
you're reading from cache, so it's quicker. memory will always have significantly faster iops vs disk-bound read operations. also recommend giving the codebase a look. there's always room for contributors. i'm planning to take a crack at a few issues.
@caesar
> but I’m a little puzzled why the same issues with a feed being huge don’t present an issue every time you want to poll for updates?
They do! As I said in af4el2q Pods will refuse to fetch feeds over the --max-fetch-limit
in size. Feeds are also rotated on Pods. There is also a soec for this.
@caesar
> but I’m a little puzzled why the same issues with a feed being huge don’t present an issue every time you want to poll for updates?
They do! As I said in af4el2q Pods will refuse to fetch feeds over the --max-fetch-limit
in size. Feeds are also rotated on Pods. There is also a soec for this.
@caesar
> Particularly with the apparent convention of the newest posts being at the bottom of the file.
This is generally the convenatio, yes. And folks like @lyse @xuu @movq and I have considered and talked about formalizing the "direction" of a feed including supporting "Range" requests. These are both things that I will likely do myself at some point, because it further helps with optimizing the traffic/bandwidth used and helps keeps things running smoothly as the network scales over time.
@caesar
> Particularly with the apparent convention of the newest posts being at the bottom of the file.
This is generally the convenatio, yes. And folks like @lyse @xuu @movq and I have considered and talked about formalizing the "direction" of a feed including supporting "Range" requests. These are both things that I will likely do myself at some point, because it further helps with optimizing the traffic/bandwidth used and helps keeps things running smoothly as the network scales over time.
@caesar
> As for pagination, sure, it can be hard, but why would it be harder in this case than in the cases where Yarn already does it?
It's done in the background as a background job. See this Dashbaord for a visuaul:
> (As for infinite scroll, if you have pagination on the server side already, it’s trivial on the client side. Yes you need JS of course, but not a lot)
Remember the builtin Web Interface (an SSR) is designed to be able to used without Javascript (graceful degradation).
@caesar
> As for pagination, sure, it can be hard, but why would it be harder in this case than in the cases where Yarn already does it?
It's done in the background as a background job. See this Dashbaord for a visuaul:
> (As for infinite scroll, if you have pagination on the server side already, it’s trivial on the client side. Yes you need JS of course, but not a lot)
Remember the builtin Web Interface (an SSR) is designed to be able to used without Javascript (graceful degradation).
@mutefall
> you’re reading from cache, so it’s quicker. memory will always have significantly faster iops vs disk-bound read operations. also recommend giving the codebase a look. there’s always room for contributors. i’m planning to take a crack at a few issues.
It's even more than just "memory is faster than disk". The Cache is designed to have O(1) lookups on all Profile (think Feed) and User Timeline as well as Pod Discover views. This is very important for the UX.
@mutefall
> you’re reading from cache, so it’s quicker. memory will always have significantly faster iops vs disk-bound read operations. also recommend giving the codebase a look. there’s always room for contributors. i’m planning to take a crack at a few issues.
It's even more than just "memory is faster than disk". The Cache is designed to have O(1) lookups on all Profile (think Feed) and User Timeline as well as Pod Discover views. This is very important for the UX.
One thing I want to point out is that this "problem" (per se, remember it's a design decision) also exists in other places like:
Cache expired posts vanish from threads with no warning - yarn - Mills
As Twts fall off the active Cache and are archived in an on-disk Archive, Yarns and Twts eventually "disappear" (they don't really, they are still searchable and accessible as everything is content addressable).
One thing I want to point out is that this "problem" (per se, remember it's a design decision) also exists in other places like:
Cache expired posts vanish from threads with no warning - yarn - Mills
As Twts fall off the active Cache and are archived in an on-disk Archive, Yarns and Twts eventually "disappear" (they don't really, they are still searchable and accessible as everything is content addressable).
There are very good technical reasons for this design, but there are also very good human reasons for this too .
As my old man said to me many moons ago when I was first designing this (he helped and contributed ideas here!):
> If I said something X ago, I don't want someone to say "Hey but X ago you said this". What if I've changed my mind since then and now have a different opinion?
I'm paraphrasing here of course, we talk regularly on the phone, but a lot of ideas ans inspiration has come from my Dad 👌 -- The idea here is that Humans forget, so should Yarn.social
There are very good technical reasons for this design, but there are also very good human reasons for this too .
As my old man said to me many moons ago when I was first designing this (he helped and contributed ideas here!):
> If I said something X ago, I don't want someone to say "Hey but X ago you said this". What if I've changed my mind since then and now have a different opinion?
I'm paraphrasing here of course, we talk regularly on the phone, but a lot of ideas ans inspiration has come from my Dad 👌 -- The idea here is that Humans forget, so should Yarn.social
One more thing @caesar I forgot to add here is that the Cache Size and TTL are actually configurable at a Pod level via the -I, --max-cache-items
and -C, --max-cache-ttl
options which default to 150
and 240h
by default. As you are a user on my pod at twtxt.net, these settings directly impact you. If you were to run your own pod (for example) you could choose to tweak these to your 'taste". @david for example runs his pod netbros.com with quite high Cache settings.
One more thing @caesar I forgot to add here is that the Cache Size and TTL are actually configurable at a Pod level via the -I, --max-cache-items
and -C, --max-cache-ttl
options which default to 150
and 240h
by default. As you are a user on my pod at twtxt.net, these settings directly impact you. If you were to run your own pod (for example) you could choose to tweak these to your 'taste". @david for example runs his pod netbros.com with quite high Cache settings.
Sorry I'm late, on the discussion, but as I see it. A big redis cluster will solve that issue (Twitter uses it) and a bit of js for the pagination client side. BUT the ability to be able to edit a post (impossible in twitter) makes it hard to have a big redis cluster.
The hardest part will mainly be for the client command line app.
@prologic indeed, but was trying to ease into an unpacking and ran out of characters before i blacked out.
@tkanos
> BUT the ability to be able to edit a post (impossible in twitter) makes it hard to have a big redis cluster.
you've no idea how close to home this is for me. one day i'll explain why. :-)
@mutefall It's pretty easy to delete or even edit a Twt you posted on Yarn.social 😂 -- But it has unintended side-effects, due to the decentralised nature, you end up with UX problems where for example, someone makes a Twt A, realizes they've made a typo or mistake or something, then edits it (which is equivalent to delete + repost) and posts a new Twt A'
Dealing with this is hard™ But I have some ideas 😅
@mutefall It's pretty easy to delete or even edit a Twt you posted on Yarn.social 😂 -- But it has unintended side-effects, due to the decentralised nature, you end up with UX problems where for example, someone makes a Twt A, realizes they've made a typo or mistake or something, then edits it (which is equivalent to delete + repost) and posts a new Twt A'
Dealing with this is hard™ But I have some ideas 😅
@prologic i noticed this in my own testing. definitely something that would be good to address. the doorslam method would be a prompt are you sure you're not knackered?
then pod member posts. reminds me of google's beer goggle feature.
@mutefall The ideas I have in mind to deal wit this are basically to get good at "detecting edits" in the first place at ingestion time. I've played around with a few "text similarity" algorithms and I _think_ we can reasonably (with high confidence) say that Twt A' was an edit of Twt A -- We _would_ cache and archive them both, but in the User Interface collapse them and show the Twt A' (with a visual indication/link that it was an edit of Twt A)
@mutefall The ideas I have in mind to deal wit this are basically to get good at "detecting edits" in the first place at ingestion time. I've played around with a few "text similarity" algorithms and I _think_ we can reasonably (with high confidence) say that Twt A' was an edit of Twt A -- We _would_ cache and archive them both, but in the User Interface collapse them and show the Twt A' (with a visual indication/link that it was an edit of Twt A)
@prologic
given rfc 3339
timestamp, that could be a starting point if given n(x)
similarities the one with the greater distance from current time could be considered the genesis perhaps?
life is becomes tricky without int || bigint
ids :-)
@prologic @mutefall or we could have a "pencil edit" icon near the date-time, or wherever, to indicate that it has been edited if the text edit algorithm has detected a significant change.
@ullarah that's a great signal method. could have a little hover over the pencil saying this post may have been edited, we're working to confirm
.
don't mind me. i'm still bewildered by lorenzo lamas and how his hair can stay in one place.
@mutefall Re RFC 3339
timestamps, if I understand you correctly, I _think_ it's extremely unlikely for someone to repost a Twt (an edit) within the same second (at least not humanly possible). In any case, I've only validated the ideas so far in isolation, the algorithm(s) need to be built, feature gated, measured, understood and finally put in place with some UX (I like @ullarah's gugestion)
@mutefall Re RFC 3339
timestamps, if I understand you correctly, I _think_ it's extremely unlikely for someone to repost a Twt (an edit) within the same second (at least not humanly possible). In any case, I've only validated the ideas so far in isolation, the algorithm(s) need to be built, feature gated, measured, understood and finally put in place with some UX (I like @ullarah's gugestion)
Time stamp +user id+ hashtwt should be enough. So you can have an append log and do the edit in the application.
@tkanos That's right, I _believe_ we have enough data to identify if a Twt was edited and I _think_ we can figure out a nice way to deal with this. Essentially it causes forks.
@tkanos That's right, I _believe_ we have enough data to identify if a Twt was edited and I _think_ we can figure out a nice way to deal with this. Essentially it causes forks.
@prologic i meant using the timestamp as a poor-man's id
and using the time whichever is oldest is the genesis
type of thing. was thinking out loud, doesn't meant it's coherent :-)
@prologic whereas i'm sitting here thinking how i can devote a full stick of ram to my own :-\\