# I am the Watcher. I am your guide through this vast new twtiverse.
#
# Usage:
# https://watcher.sour.is/api/plain/users View list of users and latest twt date.
# https://watcher.sour.is/api/plain/twt View all twts.
# https://watcher.sour.is/api/plain/mentions?uri=:uri View all mentions for uri.
# https://watcher.sour.is/api/plain/conv/:hash View all twts for a conversation subject.
#
# Options:
# uri Filter to show a specific users twts.
# offset Start index for quey.
# limit Count of items to return (going back in time).
#
# twt range = 1 16
# self = https://watcher.sour.is/conv/l3ifp4a
I'm experimenting with SQLite and trees. It's going good so far with only my own 439 messages long main feed from a few days ago in the cache. Fetching these 632 rows took 20ms:
SQL query to build up the conversation trees in the cache
Now comes the real tricky part, how do I exclude completely read threads?
@lyse Mind sharing your schema?
@lyse Mind sharing your schema?
@lyse And your query to construct a tree? Can you share the full query (_screenshot looks scary 🤣_) -- On another note, SQL and relational databases aren't really that conduces to tree-like structures are they? 🤣_
@lyse And your query to construct a tree? Can you share the full query (_screenshot looks scary 🤣_) -- On another note, SQL and relational databases aren't really that conduces to tree-like structures are they? 🤣_
@prologic Yeah, relational databases are definitely not the perfect fit for trees, but I want to give it a shot anyway. :-)
Using EXPLAIN QUERY PLAN
I was able to create two indices, to avoid some table scans:
CREATE INDEX parent ON messages (hash, subject);
CREATE INDEX subject_created_at ON messages (subject, created_at);
Also, since strings are sortable, instead of str_col <> ''
I now use str_col > ''
to allow the use of an index.
But somehow, my output seems to be broken at the end for some reason, I just noticed. :-? Hmm.
The read status still gives me headache. I think I either have to filter in the application or create more meta data structures in the database.
I'm wondering if anyone here already used certain storages for tree data.
@lyse Yeah I _think_ it's one of the reasons why yarnd
's cache became so complicated really. I mean it's a bunch of maps and lists that is recalculated every ~5m. I don't know of any better way to do this right now, but maybe one day I'll figure out a better way to represent the same information that is displayed today that works reasonably well.~
@lyse Yeah I _think_ it's one of the reasons why yarnd
's cache became so complicated really. I mean it's a bunch of maps and lists that is recalculated every ~5m. I don't know of any better way to do this right now, but maybe one day I'll figure out a better way to represent the same information that is displayed today that works reasonably well.~
Okay, I figured out the cause of the broken output. I also replaced the first subject = ''
for the existing conversation roots with subject > ''
. Somehow, my brain must have read subject <> ''
. That equality check should not have been touched at all. I just updated the updated archive for anyone who is interested to follow along: https://lyse.isobeef.org/tmp/tt2cache.tar.bz2 (151.1Â KiB)
Three feeds (prologic, movq and mine) and my database is already 1.3Â MiB in size. Hmm. I actually got the read filter working. More on that later after polishing it.
@prologic Ta! Somehow, my unit tests break, though. Running the same query manually looks like it's producing a plausible looking result, though. I do not understand it.
Finally! After hours I figured out my problems.
1. The clever Go code to filter out completely read conversations got in the way with the filtering now moved into SQL. Yeah, I also did not think that this could ever conflict. But it did. Initializing the completeConversationRead
flag to true
got now in my way, this caused a conversation to be removed. Simply deleting all the code around that flag solved it.
2. Generation of missing conversation roots in SQL simply used the oldest (smallest) timestamp from any direct reply in the tree. To find the missing roots I grouped by subject and then aggregated using min(created_at)
. Now that I optimized this to only take unread messages into consideration in the first place, I do not necessarily see the smallest child anymore (when it's already read), so the timestamp is then moved forward to the next oldest unread reply. As I do not care too much about an accurate timestamp for something made up, I just adjusted my test case accordingly. Good enough for me. :-)
It's an interesting experiment with SQLite so far. I certainly did learn a few things along the way. Mission accomplished.