# I am the Watcher. I am your guide through this vast new twtiverse.
# 
# Usage:
#     https://watcher.sour.is/api/plain/users              View list of users and latest twt date.
#     https://watcher.sour.is/api/plain/twt                View all twts.
#     https://watcher.sour.is/api/plain/mentions?uri=:uri  View all mentions for uri.
#     https://watcher.sour.is/api/plain/conv/:hash         View all twts for a conversation subject.
# 
# Options:
#     uri     Filter to show a specific users twts.
#     offset  Start index for quey.
#     limit   Count of items to return (going back in time).
# 
# twt range = 1 11
# self = https://watcher.sour.is/conv/zrsvbza
VizierDB, a Data-Centric Notebook

Wow, this looks interesting. A nice departure from Jupyter. It resembles Polynote, superficially, but is funded by the US National Science Foundation instead of Netflix OSS the way Polynote is. Interested in taking it for a spin.

Random thoughts:

I have nothing against Jupyter or JupyterLab, and use them regularly. However, the promise of truly polyglot notebook tools like Polynote is so high. I've never done a non-trivial data analysis in a single language/tool. Inevitably, there's a great library for doing X in some other language from the one you started the analysis using, and you really want to do X without trying to rewrite it from the ground up. It's been common for me to bounce between two or more of scala, python, sage, R, and KNIME in a single project.

I've been tinkering with Quarto, and while I like it a lot and the flexibility of its output formats is amazing, it's a bit stiff the way Jupyter is when it comes to using multiple languages in one project. It's also more tailored for publishing as opposed to being a notebook where you tinker. Cocalc is great and has amazing features, but it's expensive if you pay for it and I'm unsure whether their docker container for self hosting is going to survive forever. I do like Polynote, but I don't like that it looks to be supported largely by a corporation. So, the search goes on.
I upped the size limit for posts on my pod, per @prologic 's suggestion 😆
welp,
h
$ vizier                                                                                                                                                         
Checking for dependencies...
Setting up project library...
Starting Mimir...
Exception in thread "main" java.lang.ExceptionInInitializerError
...


...after the install process went smoothly and didn't throw any errors. Does not bode well.

It also created a directory in my home directory, which I hate 😠 Why don't people use $HOME/.local or some equivalent ffs
OK, turns out that was (yet again) a JVM version number problem. vizier depends on Apache Spark libraries that are cranky unless you use jdk 8 or jdk 11, apparently. It runs fine for me if I hard specify java 1.8.

If I weren't so used to seeing errors like this, it'd be extremely offputting and probably show stopping. How would someone new to the JVM world know what to do with an error like that unless they got lucky with a StackOverflow search? The JVM ecosystem really took a shit after 1.8, with all these bizarre incompatibilities and uninterpretable error messages. If the C ecosystem weren't worse I'd consider going full scala native and ditching the JVM.
My assessment so far is that vizier has a lot of potential but it feels early stage.

I started with a text file with 70,000 lines of tab-delimited data. Three columns are ints, and one column is a date string. The closest data format vizier had was CSV. However, it did not give any options for changing the delimiter. So, I preprocessed the data to make it a "standard" CSV with comma as the delimiter, and it imported fine. However, vizier did not autodetect the data, instead treating that column as a string. Strike 1.

Next, I tried to make a line plot out of the int columns. vizier stewed on that a bit, then told me that there were too many data points. 70,000 is a lot, so that's fair. But any other plotting tool I use regularly can handle this automagically, e.g. by downsampling (and giving you the ability to finetune what it does if you want). Strike 2.

I added a cell to downsample the data, and tried the plot again. This time it worked fine. I don't see any obvious way to change the appearance or axes of the plot once it's made. There is a Download button next to the chart, but when I clicked it, nothing happened at first. Eventually, after I'd decided it probably failed, a PDF file was downloaded. Presumably my chart. However, the file was empty. Strike 3.

One thing that's really nice about vizier is that it keeps track of these dependencies, and if you alter anything in the dependency chain it will regenerate only what's needed to update your views. For instance, it knows that the chart is built from the downsampled data, and that the downsampled data comes from a data file. If I altered the file from within vizier, the downsampling and charting would re-run. If I altered the downsampling parameters, the chart would be regenerated. All of this is version controlled and can be rolled back. This feature solves a world of headaches in data analysis, and I'd love to see the rest of the tool come together well enough to make it usable on a daily basis. Not there yet though, for me.
Hmm, it's also really sluggish. There's no reason it should be struggling with a 3.8 Mbyte 70,000-line data file on the machine I have. But every operation I'm performing takes tens of seconds to complete. I can see from the logs that the backend is multithreaded, so I don't know how to explain the sluggishness. Anyway, it's not usable like this.
Before anyone tries to poopoo the JVM, I've written highly-performant scala code that can easily handle manipulating and searching 100-million-document corpora without breaking a sweat on 2010-era rack server hardware. The platform and language are not the performance problem here.
@abucci Hah 😅 Never, I know you're a fan of Scala 🤣 But cool little project l 👌
@abucci Hah 😅 Never, I know you're a fan of Scala 🤣 But cool little project l 👌
@abucci Hah 😅 Never, I know you're a fan of Scala 🤣 But cool little project l 👌
@abucci Hah 😅 Never, I know you're a fan of Scala 🤣 But cool little project l 👌