subject = '' for the existing conversation roots with subject > ''. Somehow, my brain must have read subject <> ''. That equality check should not have been touched at all. I just updated the updated archive for anyone who is interested to follow along: https://lyse.isobeef.org/tmp/tt2cache.tar.bz2 (151.1 KiB)
> BUGS
> None. Mutts have fleas, not bugs.
> BUGS
> None. Mutts have fleas, not bugs.
> BUGS
> None. Mutts have fleas, not bugs.
yarnd's cache became so complicated really. I mean it's a bunch of maps and lists that is recalculated every ~5m. I don't know of any better way to do this right now, but maybe one day I'll figure out a better way to represent the same information that is displayed today that works reasonably well.~
yarnd's cache became so complicated really. I mean it's a bunch of maps and lists that is recalculated every ~5m. I don't know of any better way to do this right now, but maybe one day I'll figure out a better way to represent the same information that is displayed today that works reasonably well.~
Using
EXPLAIN QUERY PLAN I was able to create two indices, to avoid some table scans:CREATE INDEX parent ON messages (hash, subject);
CREATE INDEX subject_created_at ON messages (subject, created_at);
Also, since strings are sortable, instead of
str_col <> '' I now use str_col > '' to allow the use of an index.But somehow, my output seems to be broken at the end for some reason, I just noticed. :-? Hmm.
The read status still gives me headache. I think I either have to filter in the application or create more meta data structures in the database.
I'm wondering if anyone here already used certain storages for tree data.
I wasn't very clear; my apologies. If we update the current hash truncation length from 7 to 11. But then still decide anyway to go down this location-based twt identity and threading model then yes, we're talking about twt subjects having a ~5x increase in size on average. Going from 14 characters (11 for the has, 2 for the parens, 1 for the #) to ~63 bytes (average I've worked out of length of URL + Timestamp) + 3 byte overhead for parents and space.~
I wasn't very clear; my apologies. If we update the current hash truncation length from 7 to 11. But then still decide anyway to go down this location-based twt identity and threading model then yes, we're talking about twt subjects having a ~5x increase in size on average. Going from 14 characters (11 for the has, 2 for the parens, 1 for the #) to ~63 bytes (average I've worked out of length of URL + Timestamp) + 3 byte overhead for parents and space.~
yarnd, but still.If this constitutes a hard “no” to the proposal, then I think we don’t need to discuss it further.
yarnd, but still.If this constitutes a hard “no” to the proposal, then I think we don’t need to discuss it further.
yarnd, but still.If this constitutes a hard “no” to the proposal, then I think we don’t need to discuss it further.
yarnd, but still.If this constitutes a hard “no” to the proposal, then I think we don’t need to discuss it further.
But I feel execution times get worse rather quickly with more data I add. Also, caching helps tremendously, executing it for the first time took over 600ms. From then on I'm down to 40ms.
I think, it's particularly bad that parents might be missing. Thus, I cannot use an index, because there is no parent to reference. But my database knowledge is fairly limited, so I have to read up on that.
$ ./compare.sh https://twtxt.net/user/prologic/twtxt.txt 500
Original file size: 126842 bytes
Modified file size: 317029 bytes
Percentage increase in file size: 149.94%
...
~
$ ./compare.sh https://twtxt.net/user/prologic/twtxt.txt 500
Original file size: 126842 bytes
Modified file size: 317029 bytes
Percentage increase in file size: 149.94%
...
~
yarnd and/or ~5x increase in disk storage.
yarnd and/or ~5x increase in disk storage.
- increase the hash length from 7 to 11
Then:
- Add support for changing your feed's location without breaking g threads
Then much later:
- Add formal support for edits
- increase the hash length from 7 to 11
Then:
- Add support for changing your feed's location without breaking g threads
Then much later:
- Add formal support for edits
~/Mail/twt is currently 26 MB in size. Increase that by 20% and we get 31.2 MB.I don’t buy the argument with 2025 bytes. This worst case scenario is not relevant in practice.
~/Mail/twt is currently 26 MB in size. Increase that by 20% and we get 31.2 MB.I don’t buy the argument with 2025 bytes. This worst case scenario is not relevant in practice.
~/Mail/twt is currently 26 MB in size. Increase that by 20% and we get 31.2 MB.I don’t buy the argument with 2025 bytes. This worst case scenario is not relevant in practice.
~/Mail/twt is currently 26 MB in size. Increase that by 20% and we get 31.2 MB.I don’t buy the argument with 2025 bytes. This worst case scenario is not relevant in practice.
I just got a very, very wild idea that I have not put any brain power into, so it might be totally stupid: Since many replies also mention the original feed, maybe a mention and thread identifier could be compbined, something like:
@<nick url timestamp>. But then we would also need another style if one does not want to mention the original author.So, scratch that. But I put it out there anyway. Maybe this inspires someone else to come up with something neat.
https://commission.europa.eu/law/law-topic/data-protection/reform/rules-business-and-organisations/application-regulation/who-does-data-protection-law-apply_en
“A company *or entity* …”
Also, as I understand it, “personal or household activity” (as you called it) is rather strict: An example could be you uploading photos to a webspace behind HTTP basic auth and sending that link to a friend. So, yes, a webserver is involved and you process your friend’s data (e.g., when did he access your files), but it’s just between you and him. But if you were to publish these photos publicly on a webserver that anyone can access, then it’s a different story – even though you could say that “this is just my personal hobby, not related to any job or money”.
If you operate a public Yarn pod and *if you accept registrations from other users*, then I’m pretty sure the GDPR applies. 🤔 You process personal data and you don’t really know these people. It’s not a personal/private thing anymore.
https://commission.europa.eu/law/law-topic/data-protection/reform/rules-business-and-organisations/application-regulation/who-does-data-protection-law-apply_en
“A company *or entity* …”
Also, as I understand it, “personal or household activity” (as you called it) is rather strict: An example could be you uploading photos to a webspace behind HTTP basic auth and sending that link to a friend. So, yes, a webserver is involved and you process your friend’s data (e.g., when did he access your files), but it’s just between you and him. But if you were to publish these photos publicly on a webserver that anyone can access, then it’s a different story – even though you could say that “this is just my personal hobby, not related to any job or money”.
If you operate a public Yarn pod and *if you accept registrations from other users*, then I’m pretty sure the GDPR applies. 🤔 You process personal data and you don’t really know these people. It’s not a personal/private thing anymore.
https://commission.europa.eu/law/law-topic/data-protection/reform/rules-business-and-organisations/application-regulation/who-does-data-protection-law-apply_en
“A company *or entity* …”
Also, as I understand it, “personal or household activity” (as you called it) is rather strict: An example could be you uploading photos to a webspace behind HTTP basic auth and sending that link to a friend. So, yes, a webserver is involved and you process your friend’s data (e.g., when did he access your files), but it’s just between you and him. But if you were to publish these photos publicly on a webserver that anyone can access, then it’s a different story – even though you could say that “this is just my personal hobby, not related to any job or money”.
If you operate a public Yarn pod and *if you accept registrations from other users*, then I’m pretty sure the GDPR applies. 🤔 You process personal data and you don’t really know these people. It’s not a personal/private thing anymore.
https://commission.europa.eu/law/law-topic/data-protection/reform/rules-business-and-organisations/application-regulation/who-does-data-protection-law-apply_en
“A company *or entity* …”
Also, as I understand it, “personal or household activity” (as you called it) is rather strict: An example could be you uploading photos to a webspace behind HTTP basic auth and sending that link to a friend. So, yes, a webserver is involved and you process your friend’s data (e.g., when did he access your files), but it’s just between you and him. But if you were to publish these photos publicly on a webserver that anyone can access, then it’s a different story – even though you could say that “this is just my personal hobby, not related to any job or money”.
If you operate a public Yarn pod and *if you accept registrations from other users*, then I’m pretty sure the GDPR applies. 🤔 You process personal data and you don’t really know these people. It’s not a personal/private thing anymore.
I'm curious, is it possible to see each individual poll submission?
SQL query to build up the conversation trees in the cacheNow comes the real tricky part, how do I exclude completely read threads?
$ inspect-db yarns.db | jq -r '.Value.URL' | awk '{ total += length; count++ } END { if (count > 0) print total / count }'
40.3387
Given an RFC3339 UTC timestamp has a length of 20 characters with seconds precision. We're talking about Twt Subject taking up ~63 characters/bytes on average._~
$ inspect-db yarns.db | jq -r '.Value.URL' | awk '{ total += length; count++ } END { if (count > 0) print total / count }'
40.3387
Given an RFC3339 UTC timestamp has a length of 20 characters with seconds precision. We're talking about Twt Subject taking up ~63 characters/bytes on average._~
- @xuu would see an increase of ~20%
- @falsifian would see an increase of ~8%
- @bender would see an increase of ~20%
- @lyse would see an increase of ~15%
- @aelaraji would see an increase of ~13%
- @sorenpeter would see an increase of ~8%
- @movq would see an increase of ~9%
Just from a scalability standpoint along I'm not seeing a switch to location-based Twt ids to support threading a good idea here. This is what I meant when I said to @david in a recent call that we open up a new can of worms (_or new set of problems_) by drastically changing the approach, rather than incrementally improving the existing approach we have today (_which has served us well for the past 4 years already_0.~_
- @xuu would see an increase of ~20%
- @falsifian would see an increase of ~8%
- @bender would see an increase of ~20%
- @lyse would see an increase of ~15%
- @aelaraji would see an increase of ~13%
- @sorenpeter would see an increase of ~8%
- @movq would see an increase of ~9%
Just from a scalability standpoint along I'm not seeing a switch to location-based Twt ids to support threading a good idea here. This is what I meant when I said to @david in a recent call that we open up a new can of worms (_or new set of problems_) by drastically changing the approach, rather than incrementally improving the existing approach we have today (_which has served us well for the past 4 years already_0.~_
Apologies, I can't edit the poll once it's live, so the suggestion on feedback for supporting Markdown will have to be discussed at another time.
Apologies, I can't edit the poll once it's live, so the suggestion on feedback for supporting Markdown will have to be discussed at another time.
$ ./compare.sh
Original file size: 28145 bytes
Modified file size: 70672 bytes
Percentage increase in file size: 151.10%
...
$ ./compare.sh
Original file size: 28145 bytes
Modified file size: 70672 bytes
Percentage increase in file size: 151.10%
...
With the proposal to switch to location based addressing using a pointer to a feed and a timestamp in that feed you're looking at roughly 2025 characters long because both the HTTP and HTML and even URI specifications do not specify maximum length for URI(s) AFAIK only recommendations.
With the proposal to switch to location based addressing using a pointer to a feed and a timestamp in that feed you're looking at roughly 2025 characters long because both the HTTP and HTML and even URI specifications do not specify maximum length for URI(s) AFAIK only recommendations.