The Watcher

prologic

twtxt.net

23 Sep 24 13:41 UTC

View Thread

@aelaraji LOl 😂

lyse

lyse.isobeef.org

23 Sep 24 15:30 UTC+0200

View Thread

Okay, I figured out the cause of the broken output. I also replaced the first subject = '' for the existing conversation roots with subject > ''. Somehow, my brain must have read subject <> ''. That equality check should not have been touched at all. I just updated the updated archive for anyone who is interested to follow along: https://lyse.isobeef.org/tmp/tt2cache.tar.bz2 (151.1 KiB)

aelaraji

aelaraji.com

23 Sep 24 13:18 UTC+0000

View Thread

LMAO 🤣 ... I've been scrolling through mutt(1) man page and found this:

> BUGS
> None. Mutts have fleas, not bugs.

aelaraji

aelaraji.com

23 Sep 24 13:18 UTC

View Thread

LMAO 🤣 ... I've been scrolling through mutt(1) man page and found this:

> BUGS
> None. Mutts have fleas, not bugs.

aelaraji

aelaraji.com

23 Sep 24 13:18 UTC+0000

View Thread

LMAO 🤣 ... I've been scrolling through mutt(1) man page and found this:

> BUGS
> None. Mutts have fleas, not bugs.

prologic

twtxt.net

23 Sep 24 13:03 UTC

View Thread

A new thing LLM(s) can't do well. Write patches 🤣

prologic

twtxt.net

23 Sep 24 13:03 UTC

View Thread

A new thing LLM(s) can't do well. Write patches 🤣

prologic

twtxt.net

23 Sep 24 13:00 UTC

View Thread

@lyse Yeah I _think_ it's one of the reasons why yarnd's cache became so complicated really. I mean it's a bunch of maps and lists that is recalculated every ~5m. I don't know of any better way to do this right now, but maybe one day I'll figure out a better way to represent the same information that is displayed today that works reasonably well.~

prologic

twtxt.net

23 Sep 24 13:00 UTC

View Thread

lyse

lyse.isobeef.org

23 Sep 24 14:45 UTC+0200

View Thread

@prologic Yeah, relational databases are definitely not the perfect fit for trees, but I want to give it a shot anyway. :-)

Using EXPLAIN QUERY PLAN I was able to create two indices, to avoid some table scans:

CREATE INDEX parent ON messages (hash, subject);
CREATE INDEX subject_created_at ON messages (subject, created_at);

Also, since strings are sortable, instead of str_col <> '' I now use str_col > '' to allow the use of an index.

But somehow, my output seems to be broken at the end for some reason, I just noticed. :-? Hmm.

The read status still gives me headache. I think I either have to filter in the application or create more meta data structures in the database.

I'm wondering if anyone here already used certain storages for tree data.

prologic

twtxt.net

23 Sep 24 12:26 UTC

View Thread

My point is, this is not a small trade-off to make for the sake of simplicity 😅

prologic

twtxt.net

23 Sep 24 12:26 UTC

View Thread

My point is, this is not a small trade-off to make for the sake of simplicity 😅

prologic

twtxt.net

23 Sep 24 12:26 UTC

View Thread

@movq Maybe I misspoke. It's a factor of 5 in the size of the keyspace required. The impact is significantly less for on-disk storage of raw feeds and such, around ~1-1.5x depending on how many replies there are I suppose.

I wasn't very clear; my apologies. If we update the current hash truncation length from 7 to 11. But then still decide anyway to go down this location-based twt identity and threading model then yes, we're talking about twt subjects having a ~5x increase in size on average. Going from 14 characters (11 for the has, 2 for the parens, 1 for the #) to ~63 bytes (average I've worked out of length of URL + Timestamp) + 3 byte overhead for parents and space.~

prologic

twtxt.net

23 Sep 24 12:26 UTC

View Thread

movq

www.uninformativ.de

23 Sep 24 12:06 UTC+0000

View Thread

@prologic A factor of 5 is hard to believe, to be honest. Especially disk usage. I know nothing about the internals of yarnd, but still.

If this constitutes a hard “no” to the proposal, then I think we don’t need to discuss it further.

movq

www.uninformativ.de

23 Sep 24 12:06 UTC

View Thread

movq

www.uninformativ.de

23 Sep 24 12:06 UTC+0000

View Thread

movq

www.uninformativ.de

23 Sep 24 12:06 UTC+0000

View Thread

prologic

twtxt.net

23 Sep 24 11:49 UTC

View Thread

@lyse Yes I think so.

prologic

twtxt.net

23 Sep 24 11:49 UTC

View Thread

@lyse Yes I think so.

lyse

lyse.isobeef.org

23 Sep 24 13:30 UTC+0200

View Thread

@prologic I see. I reckon, it makes to combine 1 and 2, because if we change the hashing anyway, we don't break it twice.

prologic

twtxt.net

23 Sep 24 11:20 UTC

View Thread

Don't forget about the upcoming Yarn.social meetup coming up this Saturday! See #jjbnvgq for details! Hope to see some/all of y'all there 💪

prologic

twtxt.net

23 Sep 24 11:20 UTC

View Thread

Don't forget about the upcoming Yarn.social meetup coming up this Saturday! See # for details! Hope to see some/all of y'all there 💪

prologic

twtxt.net

23 Sep 24 11:20 UTC

View Thread

Don't forget about the upcoming Yarn.social meetup coming up this Saturday! See #jjbnvgq for details! Hope to see some/all of y'all there 💪

prologic

twtxt.net

23 Sep 24 11:18 UTC

View Thread

@lyse And your query to construct a tree? Can you share the full query (_screenshot looks scary 🤣_) -- On another note, SQL and relational databases aren't really that conduces to tree-like structures are they? 🤣_

prologic

twtxt.net

23 Sep 24 11:18 UTC

View Thread

lyse

lyse.isobeef.org

23 Sep 24 13:15 UTC+0200

View Thread

This organigram example got me started: https://www.sqlite.org/lang_with.html#controlling_depth_first_versus_breadth_first_search_of_a_tree_using_order_by

But I feel execution times get worse rather quickly with more data I add. Also, caching helps tremendously, executing it for the first time took over 600ms. From then on I'm down to 40ms.

I think, it's particularly bad that parents might be missing. Thus, I cannot use an index, because there is no parent to reference. But my database knowledge is fairly limited, so I have to read up on that.

prologic

twtxt.net

23 Sep 24 11:10 UTC

View Thread

In fact it depends on how many Twts there are that form part of a thread, if you take a much larger sample size of my own feed for example, it starts to approximate ~1.5x increase in size:


$ ./compare.sh https://twtxt.net/user/prologic/twtxt.txt 500
Original file size: 126842 bytes
Modified file size: 317029 bytes
Percentage increase in file size: 149.94%
...

prologic

twtxt.net

23 Sep 24 11:10 UTC

View Thread

In fact it depends on how many Twts there are that form part of a thread, if you take a much larger sample size of my own feed for example, it starts to approximate ~1.5x increase in size:


$ ./compare.sh https://twtxt.net/user/prologic/twtxt.txt 500
Original file size: 126842 bytes
Modified file size: 317029 bytes
Percentage increase in file size: 149.94%
...

prologic

twtxt.net

23 Sep 24 11:04 UTC

View Thread

In fact @falsifian you had quite a lot of good feedback, do you mind collecting them in a task list on the doc somewhere so I can get to em? 🤔

prologic

twtxt.net

23 Sep 24 11:04 UTC

View Thread

In fact @falsifian you had quite a lot of good feedback, do you mind collecting them in a task list on the doc somewhere so I can get to em? 🤔

prologic

twtxt.net

23 Sep 24 11:00 UTC

View Thread

Can someone make the edit?

prologic

twtxt.net

23 Sep 24 11:00 UTC

View Thread

Can someone make the edit?

@jo

comam.es

23 Sep 24 13:00 UTC+0200

View Thread

[47°09′54″S, 126°43′08″W] Transfer 25% complete...

lyse

lyse.isobeef.org

23 Sep 24 13:00 UTC+0200

View Thread

There you go, @prologic, the SQLite database (with a bit more data now) and the sqlitebrowser project file containing the query: https://lyse.isobeef.org/tmp/tt2cache.tar.bz2 (133.9 KiB)

prologic

twtxt.net

23 Sep 24 10:57 UTC

View Thread

@movq Tbis was just a representative sample. The real concrete cost here is a ~5x increase in memory consumption for yarnd and/or ~5x increase in disk storage.

prologic

twtxt.net

23 Sep 24 10:57 UTC

View Thread

@movq Tbis was just a representative sample. The real concrete cost here is a ~5x increase in memory consumption for yarnd and/or ~5x increase in disk storage.

prologic

twtxt.net

23 Sep 24 10:51 UTC

View Thread

@lyse Mind sharing your schema?

prologic

twtxt.net

23 Sep 24 10:51 UTC

View Thread

@lyse Mind sharing your schema?

prologic

twtxt.net

23 Sep 24 10:50 UTC

View Thread

@lyse Not sure I'll check

prologic

twtxt.net

23 Sep 24 10:50 UTC

View Thread

@lyse Not sure I'll check

prologic

twtxt.net

23 Sep 24 10:49 UTC

View Thread

@lyse My proposal is three steps:

- increase the hash length from 7 to 11

Then:

- Add support for changing your feed's location without breaking g threads

Then much later:

- Add formal support for edits

prologic

twtxt.net

23 Sep 24 10:49 UTC

View Thread

prologic

twtxt.net

23 Sep 24 10:45 UTC

View Thread

@lyse No I don't either just say'n 😅

prologic

twtxt.net

23 Sep 24 10:45 UTC

View Thread

@lyse No I don't either just say'n 😅

lyse

lyse.isobeef.org

23 Sep 24 12:45 UTC+0200

View Thread

@falsifian I agreee. It's an optional header.

prologic

twtxt.net

23 Sep 24 10:43 UTC

View Thread

@movq That's what I want to know 🤣

prologic

twtxt.net

23 Sep 24 10:43 UTC

View Thread

@movq That's what I want to know 🤣

movq

www.uninformativ.de

23 Sep 24 10:33 UTC+0000

View Thread

@prologic What’s that in absolute numbers? My ~/Mail/twt is currently 26 MB in size. Increase that by 20% and we get 31.2 MB.

I don’t buy the argument with 2025 bytes. This worst case scenario is not relevant in practice.

movq

www.uninformativ.de

23 Sep 24 10:33 UTC+0000

View Thread

movq

www.uninformativ.de

23 Sep 24 10:33 UTC+0000

View Thread

movq

www.uninformativ.de

23 Sep 24 10:33 UTC

View Thread

lyse

lyse.isobeef.org

23 Sep 24 12:30 UTC+0200

View Thread

@movq Oha! @bender Happy cooling off!

lyse

lyse.isobeef.org

23 Sep 24 12:15 UTC+0200

View Thread

@prologic Well, mentions are also quite lengthy as they always include the feed URL. I know, that's not a good argument.

I just got a very, very wild idea that I have not put any brain power into, so it might be totally stupid: Since many replies also mention the original feed, maybe a mention and thread identifier could be compbined, something like: @<nick url timestamp>. But then we would also need another style if one does not want to mention the original author.

So, scratch that. But I put it out there anyway. Maybe this inspires someone else to come up with something neat.

movq

www.uninformativ.de

23 Sep 24 10:14 UTC+0000

View Thread

It’s a different story when you just publish a twtxt file, I think. The question here is: When you publish a twt and don’t like it anymore and want to delete it, do you have the *right* to *force* others to delete it? (Not in a technical manner, but by sueing them.) What does the GDPR have to say about that? Not a clue. 😂

movq

www.uninformativ.de

23 Sep 24 10:14 UTC

View Thread

movq

www.uninformativ.de

23 Sep 24 10:14 UTC+0000

View Thread

movq

www.uninformativ.de

23 Sep 24 10:14 UTC+0000

View Thread

GopherChat

magical.fish:70

23 Sep 24 04:14 UTC-0600

View Thread

What gossip, gopherspace?!

movq

www.uninformativ.de

23 Sep 24 10:13 UTC+0000

View Thread

@xuu I *think* it is more tricky than that.

https://commission.europa.eu/law/law-topic/data-protection/reform/rules-business-and-organisations/application-regulation/who-does-data-protection-law-apply_en

“A company *or entity* …”

Also, as I understand it, “personal or household activity” (as you called it) is rather strict: An example could be you uploading photos to a webspace behind HTTP basic auth and sending that link to a friend. So, yes, a webserver is involved and you process your friend’s data (e.g., when did he access your files), but it’s just between you and him. But if you were to publish these photos publicly on a webserver that anyone can access, then it’s a different story – even though you could say that “this is just my personal hobby, not related to any job or money”.

If you operate a public Yarn pod and *if you accept registrations from other users*, then I’m pretty sure the GDPR applies. 🤔 You process personal data and you don’t really know these people. It’s not a personal/private thing anymore.

movq

www.uninformativ.de

23 Sep 24 10:13 UTC+0000

View Thread

movq

www.uninformativ.de

23 Sep 24 10:13 UTC+0000

View Thread

movq

www.uninformativ.de

23 Sep 24 10:13 UTC

View Thread

lyse

lyse.isobeef.org

23 Sep 24 12:00 UTC+0200

View Thread

@prologic Not sure how many actually care about a 140 character limit. I don't. Not at all.

lyse

lyse.isobeef.org

23 Sep 24 11:45 UTC+0200

View Thread

@prologic I'm wondering what exactly you mean by incremental changes, what are the individual ones? What do you have in mind?

lyse

lyse.isobeef.org

23 Sep 24 11:30 UTC+0200

View Thread

@prologic I find it quite hard to rank the facets. Some go hand in hand or depend on the protocol that a feed is offered. I feel some are only relevant to specific clients. I'm sure, people interpret some of them differently.

I'm curious, is it possible to see each individual poll submission?

lyse

lyse.isobeef.org

23 Sep 24 11:15 UTC+0200

View Thread

I'm experimenting with SQLite and trees. It's going good so far with only my own 439 messages long main feed from a few days ago in the cache. Fetching these 632 rows took 20ms:

SQL query to build up the conversation trees in the cache

Now comes the real tricky part, how do I exclude completely read threads?

prologic

twtxt.net

23 Sep 24 07:58 UTC

View Thread

So just to be clear, it's not as bad as the OP in this thread, this is just a worst case scenario. With some additional analysis I did today, its closer to around ~5x the memory requirements of my pod, which would roughly go from ~22MB to ~120MB or so, probably a bit more in practise. But this is still a significant increase in memory. The on-disk requirements would also increase by around ~5x as well on average going from ~12GB to about ~60GB at current archive size.

prologic

twtxt.net

23 Sep 24 07:58 UTC

View Thread

@jo

comam.es

23 Sep 24 09:00 UTC+0200

View Thread

[47°09′20″S, 126°43′13″W] Sample analyzing complete -- starting transfer

prologic

twtxt.net

23 Sep 24 06:46 UTC

View Thread

Just out of curiosity, I inspected the yarns database (_the search engine//cralwer_) to find the average length of a Twtxt URI:


$ inspect-db yarns.db | jq -r '.Value.URL' | awk '{ total += length; count++ } END { if (count > 0) print total / count }'
40.3387

Given an RFC3339 UTC timestamp has a length of 20 characters with seconds precision. We're talking about Twt Subject taking up ~63 characters/bytes on average._~

prologic

twtxt.net

23 Sep 24 06:46 UTC

View Thread

Just out of curiosity, I inspected the yarns database (_the search engine//cralwer_) to find the average length of a Twtxt URI:


$ inspect-db yarns.db | jq -r '.Value.URL' | awk '{ total += length; count++ } END { if (count > 0) print total / count }'
40.3387

Given an RFC3339 UTC timestamp has a length of 20 characters with seconds precision. We're talking about Twt Subject taking up ~63 characters/bytes on average._~

prologic

twtxt.net

23 Sep 24 06:30 UTC

View Thread

Comparing a few feeds:

- @xuu would see an increase of ~20%
- @falsifian would see an increase of ~8%
- @bender would see an increase of ~20%
- @lyse would see an increase of ~15%
- @aelaraji would see an increase of ~13%
- @sorenpeter would see an increase of ~8%
- @movq would see an increase of ~9%

Just from a scalability standpoint along I'm not seeing a switch to location-based Twt ids to support threading a good idea here. This is what I meant when I said to @david in a recent call that we open up a new can of worms (_or new set of problems_) by drastically changing the approach, rather than incrementally improving the existing approach we have today (_which has served us well for the past 4 years already_0.~_

prologic

twtxt.net

23 Sep 24 06:30 UTC

View Thread

prologic

twtxt.net

23 Sep 24 06:23 UTC

View Thread

Reminder to take the Twtxt (_anonymous_) Poll: http://polljunkie.com/poll/xdgjib/twtxt-v2

Apologies, I can't edit the poll once it's live, so the suggestion on feedback for supporting Markdown will have to be discussed at another time.

prologic

twtxt.net

23 Sep 24 06:23 UTC

View Thread

prologic

twtxt.net

23 Sep 24 06:16 UTC

View Thread

@xuu correct

prologic

twtxt.net

23 Sep 24 06:16 UTC

View Thread

@xuu correct

prologic

twtxt.net

23 Sep 24 06:16 UTC

View Thread

@xuu 🤣🤣🤣

prologic

twtxt.net

23 Sep 24 06:16 UTC

View Thread

@xuu 🤣🤣🤣

xuu

dev.txt.sour.is

22 Sep 24 23:10 UTC-0600

View Thread

I demand full 9 digit nano second timestamps and the full TZ identifier as documented in the tz 2024b database! I need to know if there was a change in daylight savings as per the locality in question as of the provided date.

xuu

txt.sour.is

22 Sep 24 23:10 UTC-0600

View Thread

xuu

txt.sour.is

22 Sep 24 23:03 UTC-0600

View Thread

@falsifian I believe the preserve means to include the original subject hash in the start of the twt such as (#somehash)

xuu

dev.txt.sour.is

22 Sep 24 23:03 UTC-0600

View Thread

@falsifian I believe the preserve means to include the original subject hash in the start of the twt such as (#somehash)

@jo

comam.es

23 Sep 24 07:00 UTC+0200

View Thread

[47°09′47″S, 126°43′17″W] Analyzing samples

prologic

twtxt.net

23 Sep 24 04:57 UTC

View Thread

So I whipped up a quick shell script to demonstrate what I mean by the increase in feed size on average as well as the expected increase in storage and retrieval requirements.


$ ./compare.sh
Original file size: 28145 bytes
Modified file size: 70672 bytes
Percentage increase in file size: 151.10%
...

prologic

twtxt.net

23 Sep 24 04:57 UTC

View Thread

So I whipped up a quick shell script to demonstrate what I mean by the increase in feed size on average as well as the expected increase in storage and retrieval requirements.


$ ./compare.sh
Original file size: 28145 bytes
Modified file size: 70672 bytes
Percentage increase in file size: 151.10%
...

prologic

twtxt.net

23 Sep 24 04:12 UTC

View Thread

Thank goodness we relaxed that limit and I've stopped being so Puritan about it but my overall point is we would be significantly increasing the human size as well as the machine size of the identity of threads as well as twts

prologic

twtxt.net

23 Sep 24 04:12 UTC

View Thread

prologic

twtxt.net

23 Sep 24 04:12 UTC

View Thread

With the original specification of 140 character Twt length recommendation. There's only leaves you with about 78 characters worth of anything remotely useful to say in response.

prologic

twtxt.net

23 Sep 24 04:12 UTC

View Thread

With the original specification of 140 character Twt length recommendation. There's only leaves you with about 78 characters worth of anything remotely useful to say in response.

prologic

twtxt.net

23 Sep 24 04:10 UTC

View Thread

Let's say the overhead is always three bytes two parentheses under space.

prologic

twtxt.net

23 Sep 24 04:10 UTC

View Thread

Let's say the overhead is always three boats two parentheses under space.

prologic

twtxt.net

23 Sep 24 04:10 UTC

View Thread

Let's say the overhead is always three bytes two parentheses under space.

prologic

twtxt.net

23 Sep 24 04:10 UTC

View Thread

So for example, if we would use @movq 's feed as an example thread ID here, his feed with a particular timestamp, were already looking at a subject length of 59 bytes +/- a couple of bytes to denote the subject in the Twt itself/

prologic

twtxt.net

23 Sep 24 04:10 UTC

View Thread

prologic

twtxt.net

23 Sep 24 04:05 UTC

View Thread

One of the reasons we wanted to originally use Contant based addressing and short hashes as our threading model was to keep individual Twts short so that they were still readable if you viewed the manually by hand.

With the proposal to switch to location based addressing using a pointer to a feed and a timestamp in that feed you're looking at roughly 2025 characters long because both the HTTP and HTML and even URI specifications do not specify maximum length for URI(s) AFAIK only recommendations.

prologic

twtxt.net

23 Sep 24 04:05 UTC

View Thread

prologic

twtxt.net

23 Sep 24 03:59 UTC

View Thread

@bender I can't see myself personally, increasing the infrastructure and costs to run this pod to support this as we switch over potentially and as things continue to grow in scale. You would never get your infinite search and infinite timeline features that you've always wanted for example and I would have to drastically reduce what is visible or even searchable at any given point in time to much less than what it is today.

prologic

twtxt.net

23 Sep 24 03:59 UTC

View Thread