But I feel execution times get worse rather quickly with more data I add. Also, caching helps tremendously, executing it for the first time took over 600ms. From then on I'm down to 40ms.
I think, it's particularly bad that parents might be missing. Thus, I cannot use an index, because there is no parent to reference. But my database knowledge is fairly limited, so I have to read up on that.
$ ./compare.sh https://twtxt.net/user/prologic/twtxt.txt 500
Original file size: 126842 bytes
Modified file size: 317029 bytes
Percentage increase in file size: 149.94%
...
~
$ ./compare.sh https://twtxt.net/user/prologic/twtxt.txt 500
Original file size: 126842 bytes
Modified file size: 317029 bytes
Percentage increase in file size: 149.94%
...
~
yarnd and/or ~5x increase in disk storage.
yarnd and/or ~5x increase in disk storage.
- increase the hash length from 7 to 11
Then:
- Add support for changing your feed's location without breaking g threads
Then much later:
- Add formal support for edits
- increase the hash length from 7 to 11
Then:
- Add support for changing your feed's location without breaking g threads
Then much later:
- Add formal support for edits
~/Mail/twt is currently 26 MB in size. Increase that by 20% and we get 31.2 MB.I don’t buy the argument with 2025 bytes. This worst case scenario is not relevant in practice.
~/Mail/twt is currently 26 MB in size. Increase that by 20% and we get 31.2 MB.I don’t buy the argument with 2025 bytes. This worst case scenario is not relevant in practice.
~/Mail/twt is currently 26 MB in size. Increase that by 20% and we get 31.2 MB.I don’t buy the argument with 2025 bytes. This worst case scenario is not relevant in practice.
~/Mail/twt is currently 26 MB in size. Increase that by 20% and we get 31.2 MB.I don’t buy the argument with 2025 bytes. This worst case scenario is not relevant in practice.
I just got a very, very wild idea that I have not put any brain power into, so it might be totally stupid: Since many replies also mention the original feed, maybe a mention and thread identifier could be compbined, something like:
@<nick url timestamp>. But then we would also need another style if one does not want to mention the original author.So, scratch that. But I put it out there anyway. Maybe this inspires someone else to come up with something neat.
https://commission.europa.eu/law/law-topic/data-protection/reform/rules-business-and-organisations/application-regulation/who-does-data-protection-law-apply_en
“A company *or entity* …”
Also, as I understand it, “personal or household activity” (as you called it) is rather strict: An example could be you uploading photos to a webspace behind HTTP basic auth and sending that link to a friend. So, yes, a webserver is involved and you process your friend’s data (e.g., when did he access your files), but it’s just between you and him. But if you were to publish these photos publicly on a webserver that anyone can access, then it’s a different story – even though you could say that “this is just my personal hobby, not related to any job or money”.
If you operate a public Yarn pod and *if you accept registrations from other users*, then I’m pretty sure the GDPR applies. 🤔 You process personal data and you don’t really know these people. It’s not a personal/private thing anymore.
https://commission.europa.eu/law/law-topic/data-protection/reform/rules-business-and-organisations/application-regulation/who-does-data-protection-law-apply_en
“A company *or entity* …”
Also, as I understand it, “personal or household activity” (as you called it) is rather strict: An example could be you uploading photos to a webspace behind HTTP basic auth and sending that link to a friend. So, yes, a webserver is involved and you process your friend’s data (e.g., when did he access your files), but it’s just between you and him. But if you were to publish these photos publicly on a webserver that anyone can access, then it’s a different story – even though you could say that “this is just my personal hobby, not related to any job or money”.
If you operate a public Yarn pod and *if you accept registrations from other users*, then I’m pretty sure the GDPR applies. 🤔 You process personal data and you don’t really know these people. It’s not a personal/private thing anymore.
https://commission.europa.eu/law/law-topic/data-protection/reform/rules-business-and-organisations/application-regulation/who-does-data-protection-law-apply_en
“A company *or entity* …”
Also, as I understand it, “personal or household activity” (as you called it) is rather strict: An example could be you uploading photos to a webspace behind HTTP basic auth and sending that link to a friend. So, yes, a webserver is involved and you process your friend’s data (e.g., when did he access your files), but it’s just between you and him. But if you were to publish these photos publicly on a webserver that anyone can access, then it’s a different story – even though you could say that “this is just my personal hobby, not related to any job or money”.
If you operate a public Yarn pod and *if you accept registrations from other users*, then I’m pretty sure the GDPR applies. 🤔 You process personal data and you don’t really know these people. It’s not a personal/private thing anymore.
https://commission.europa.eu/law/law-topic/data-protection/reform/rules-business-and-organisations/application-regulation/who-does-data-protection-law-apply_en
“A company *or entity* …”
Also, as I understand it, “personal or household activity” (as you called it) is rather strict: An example could be you uploading photos to a webspace behind HTTP basic auth and sending that link to a friend. So, yes, a webserver is involved and you process your friend’s data (e.g., when did he access your files), but it’s just between you and him. But if you were to publish these photos publicly on a webserver that anyone can access, then it’s a different story – even though you could say that “this is just my personal hobby, not related to any job or money”.
If you operate a public Yarn pod and *if you accept registrations from other users*, then I’m pretty sure the GDPR applies. 🤔 You process personal data and you don’t really know these people. It’s not a personal/private thing anymore.
I'm curious, is it possible to see each individual poll submission?
SQL query to build up the conversation trees in the cacheNow comes the real tricky part, how do I exclude completely read threads?
$ inspect-db yarns.db | jq -r '.Value.URL' | awk '{ total += length; count++ } END { if (count > 0) print total / count }'
40.3387
Given an RFC3339 UTC timestamp has a length of 20 characters with seconds precision. We're talking about Twt Subject taking up ~63 characters/bytes on average._~
$ inspect-db yarns.db | jq -r '.Value.URL' | awk '{ total += length; count++ } END { if (count > 0) print total / count }'
40.3387
Given an RFC3339 UTC timestamp has a length of 20 characters with seconds precision. We're talking about Twt Subject taking up ~63 characters/bytes on average.~_
- @xuu would see an increase of ~20%
- @falsifian would see an increase of ~8%
- @bender would see an increase of ~20%
- @lyse would see an increase of ~15%
- @aelaraji would see an increase of ~13%
- @sorenpeter would see an increase of ~8%
- @movq would see an increase of ~9%
Just from a scalability standpoint along I'm not seeing a switch to location-based Twt ids to support threading a good idea here. This is what I meant when I said to @david in a recent call that we open up a new can of worms (_or new set of problems_) by drastically changing the approach, rather than incrementally improving the existing approach we have today (_which has served us well for the past 4 years already_0.~_
- @xuu would see an increase of ~20%
- @falsifian would see an increase of ~8%
- @bender would see an increase of ~20%
- @lyse would see an increase of ~15%
- @aelaraji would see an increase of ~13%
- @sorenpeter would see an increase of ~8%
- @movq would see an increase of ~9%
Just from a scalability standpoint along I'm not seeing a switch to location-based Twt ids to support threading a good idea here. This is what I meant when I said to @david in a recent call that we open up a new can of worms (_or new set of problems_) by drastically changing the approach, rather than incrementally improving the existing approach we have today (_which has served us well for the past 4 years already_0.~_
Apologies, I can't edit the poll once it's live, so the suggestion on feedback for supporting Markdown will have to be discussed at another time.
Apologies, I can't edit the poll once it's live, so the suggestion on feedback for supporting Markdown will have to be discussed at another time.
$ ./compare.sh
Original file size: 28145 bytes
Modified file size: 70672 bytes
Percentage increase in file size: 151.10%
...
$ ./compare.sh
Original file size: 28145 bytes
Modified file size: 70672 bytes
Percentage increase in file size: 151.10%
...
With the proposal to switch to location based addressing using a pointer to a feed and a timestamp in that feed you're looking at roughly 2025 characters long because both the HTTP and HTML and even URI specifications do not specify maximum length for URI(s) AFAIK only recommendations.
With the proposal to switch to location based addressing using a pointer to a feed and a timestamp in that feed you're looking at roughly 2025 characters long because both the HTTP and HTML and even URI specifications do not specify maximum length for URI(s) AFAIK only recommendations.
Sorry, you're right, I should have used numbers!
I'm don't understand what "preserve the original hash" could mean other than "make sure there's still a twt in the feed with that hash". Maybe the text could be clarified somehow.
I'm also not sure what you mean by markdown already being part of it. Of course people can already use Markdown, just like presumably nothing stopped people from using (twt subjects) before they were formally described. But it's not universal; e.g. as a jenny user I just see the plain text.
I have little to contribute on this reply. On bullet two, he meant the original hash. On the last bullet, markdown is already part of it (after all, it is plain text). Yarn, being a web client/server, simply renders it.
For me there's a distinction. I feel very strongly that I should be able to retain whatever private information I like. On the other hand, I do have some sympathy for requests not to publish or propagate (though I personally feel it's still morally acceptable to ignore such requests).
I hope it can remain a living document (or sequence of draft revisions) for a good long time while we figure out how this stuff works in practice.
I am not sure how I feel about all this being done at once, vs. letting conventions arise.
For example, even today I could reply to twt abc1234 with "(#abc1234) Edit: ..." and I think all you humans would understand it as an edit to (#abc1234). Maybe eventually it would become a common enough convention that clients would start to support it explicitly.
Similarly we could just start using 11-digit hashes. We should iron out whether it's sha256 or whatever but there's no need get all the other stuff right at the same time.
I have similar thoughts about how some users could try out location-based replies in a backward-compatible way (append the replyto: stuff after the legacy (#hash) style).
However I recognize that I'm not the one implementing this stuff, and it's less work to just have everything determined up front.
Misc comments (I haven't read the whole thing):
- Did you mean to make hashes hexadecimal? You lose 11 bits that way compared to base32. I'd suggest gaining 11 bits with base64 instead.
- "Clients MUST preserve the original hash" --- do you mean they MUST preserve the original twt?
- Thanks for phrasing the bit about deletions so neutrally.
- I don't like the MUST in "Clients MUST follow the chain of reply-to references...". If someone writes a client as a 40-line shell script that requires the user to piece together the threading themselves, IMO we shouldn't declare the client non-conforming just because they didn't get to all the bells and whistles.
- Similarly I don't like the MUST for user agents. For one thing, you might want to fetch a feed without revealing your identty. Also, it raises the bar for a minimal implementation (I'm again thinking again of the 40-line shell script).
- For "who follows" lists: why must the long, random tokens be only valid for a limited time? Do you have a scenario in mind where they could leak?
- Why can't feeds be served over HTTP/1.0? Again, thinking about simple software. I recently tried implementing HTTP/1.1 and it wasn't too bad, but 1.0 would have been slightly simpler.
- Why get into the nitty-gritty about caching headers? This seems like generic advice for HTTP servers and clients.
- I'm a little sad about other protocols being not recommended.
- I don't know how I feel about including markdown. I don't mind too much that yarn users emit twts full of markdown, but I'm more of a plain text kind of person. Also it adds to the length. I wonder if putting a separate document would make more sense; that would also help with the length.