The Watcher

prologic

twtxt.net

Another interesting side effect of changing from content-based addressing to location-based addressing is that switching from 7-byte keys to 2025-character keys for 3.5 million entries would expand the database size from 24.5 MB to about 7.09 GB—an increase of roughly 7.06 GB!

prologic

twtxt.net

23 Sep 24 03:57 UTC

View Thread

prologic

twtxt.net

23 Sep 24 03:59 UTC

View Thread

@bender I can't see myself personally, increasing the infrastructure and costs to run this pod to support this as we switch over potentially and as things continue to grow in scale. You would never get your infinite search and infinite timeline features that you've always wanted for example and I would have to drastically reduce what is visible or even searchable at any given point in time to much less than what it is today.

prologic

twtxt.net

23 Sep 24 03:59 UTC

View Thread

prologic

twtxt.net

23 Sep 24 07:58 UTC

View Thread

So just to be clear, it's not as bad as the OP in this thread, this is just a worst case scenario. With some additional analysis I did today, its closer to around ~5x the memory requirements of my pod, which would roughly go from ~22MB to ~120MB or so, probably a bit more in practise. But this is still a significant increase in memory. The on-disk requirements would also increase by around ~5x as well on average going from ~12GB to about ~60GB at current archive size.

prologic

twtxt.net

23 Sep 24 07:58 UTC

View Thread

movq

www.uninformativ.de

23 Sep 24 12:06 UTC+0000

View Thread

@prologic A factor of 5 is hard to believe, to be honest. Especially disk usage. I know nothing about the internals of yarnd, but still.

If this constitutes a hard “no” to the proposal, then I think we don’t need to discuss it further.

movq

www.uninformativ.de

23 Sep 24 12:06 UTC+0000

View Thread

movq

www.uninformativ.de

23 Sep 24 12:06 UTC

View Thread

movq

www.uninformativ.de

23 Sep 24 12:06 UTC+0000

View Thread

prologic

twtxt.net

23 Sep 24 12:26 UTC

View Thread

@movq Maybe I misspoke. It's a factor of 5 in the size of the keyspace required. The impact is significantly less for on-disk storage of raw feeds and such, around ~1-1.5x depending on how many replies there are I suppose.

I wasn't very clear; my apologies. If we update the current hash truncation length from 7 to 11. But then still decide anyway to go down this location-based twt identity and threading model then yes, we're talking about twt subjects having a ~5x increase in size on average. Going from 14 characters (11 for the has, 2 for the parens, 1 for the #) to ~63 bytes (average I've worked out of length of URL + Timestamp) + 3 byte overhead for parents and space.~

prologic

twtxt.net

23 Sep 24 12:26 UTC

View Thread

prologic

twtxt.net

23 Sep 24 12:26 UTC

View Thread

My point is, this is not a small trade-off to make for the sake of simplicity 😅

prologic

twtxt.net

23 Sep 24 12:26 UTC

View Thread

My point is, this is not a small trade-off to make for the sake of simplicity 😅