The Watcher

	
# I am the Watcher. I am your guide through this vast new twtiverse.
# 
# Usage:
#     https://watcher.sour.is/api/plain/users              View list of users and latest twt date.
#     https://watcher.sour.is/api/plain/twt                View all twts.
#     https://watcher.sour.is/api/plain/mentions?uri=:uri  View all mentions for uri.
#     https://watcher.sour.is/api/plain/conv/:hash         View all twts for a conversation subject.
# 
# Options:
#     uri     Filter to show a specific users twts.
#     offset  Start index for quey.
#     limit   Count of items to return (going back in time).
# 
# twt range = 1 16
# self = https://watcher.sour.is/conv/nlcsjfa

markwylde

twtxt.net

11 Nov 22 11:41 UTC

I love simple, lightweight, small, minimal tools that just do the bare minimum. Based on that, does anyone have any good recommendations for a key value store that is:
- lightweight
- clustered
- sharded (so if I have 5 instances and 100 keys, each node will roughly have 20 keys on it).
- easy to join nodes: as in kv-server --join somehost:1111

For reference, I think Consul is too heavy (and not sharded I believe).

It would be great to have a small go executable, that I can run on 10 servers, all connected up, that exposes a redis like api. Simple GET, PUT and STREAM would be great.

@prologic anyone else here I can ping?

prologic

twtxt.net

11 Nov 22 12:12 UTC

@markwylde Bitraft is close 👌 but not sharded 😢

prologic

twtxt.net

11 Nov 22 12:12 UTC

@markwylde Bitraft is close 👌 but not sharded 😢

abucci

anthony.buc.ci

11 Nov 22 13:48 UTC

@markwylde Why not etcd?

markwylde

twtxt.net

11 Nov 22 21:20 UTC

Thanks guys. Bitraft is awesome @prologic but yeah, not sharded :( I did try etcd before @abucci but I did find it tricker to setup than Bitraft. But again, it's not sharded :(

markwylde

twtxt.net

11 Nov 22 21:22 UTC

@prologic ever done any stress testing on bitraft? In a cluster, do you know that the throughput would be? Like, PUT's per second and GET's per second?

prologic

twtxt.net

11 Nov 22 21:31 UTC

@markwylde No but I could do some testing and publish the results 👌

As for the sharding though... Let's discuss this?

prologic

twtxt.net

11 Nov 22 21:31 UTC

@markwylde No but I could do some testing and publish the results 👌

As for the sharding though... Let's discuss this?

markwylde

twtxt.net

11 Nov 22 21:40 UTC

@prologic I'm happy to do it. Might try now actually. It was just incase you knew. I'll post in the README if I get it working. I'm hoping redis-benchmark will work since it's got the same api as redis.

I wonder if sharding could be implemented by:

Presumptions:
- redis can broadcast to all nodes in the cluster
- REPLICA_COUNT is 3

PUT workflow:
- a PUT get's forwarded to REPLICA_COUNT random nodes in the cluster

GET workflow:
- a broadcast is made to the cluster saying "I NEED A VALUE FOR KEY 'TEST'"
- all nodes that contain that value reply to the server
- the first response get's forwarded to the client
- the other responses are discarded

I'm sure there would be some edges cases, like syncing.

- What if 1 of the random node's is full and therefore only REPLICA_COUNT-1 nodes received the document
- This could me 2 nodes have the new value, but the 3rd has the old value

Maybe it could be solved by only committing once REPLICA_COUNT nodes successfully receive the message.

prologic

twtxt.net

11 Nov 22 21:43 UTC

@markwylde If you could benchmark this that would be wonderful! 👌 -- Also reading your thought son "Sharding", I _think_ you might be slightly confused, because what you just described is essentially "High Availability", and not Sharding.

In fact Bitraft already has this anyway. It fully supports forming a High Availability Cluster.

prologic

twtxt.net

11 Nov 22 21:43 UTC

@markwylde If you could benchmark this that would be wonderful! 👌 -- Also reading your thought son "Sharding", I _think_ you might be slightly confused, because what you just described is essentially "High Availability", and not Sharding.

In fact Bitraft already has this anyway. It fully supports forming a High Availability Cluster.

markwylde

twtxt.net

11 Nov 22 21:48 UTC

But in Bitraft every node contains every key + value, right? I probably wasn't clear above, but in my idea REPLICA_COUNT would be 3 but the NODE_COUNT may be 10. So a put would go to 3 of 10 of the nodes.

markwylde

twtxt.net

11 Nov 22 22:03 UTC

Did a quick benchmark:
https://git.mills.io/prologic/bitraft/issues/58

Seems the summary benchmark of a 5node cluster on my laptop is:


GET: 1165.64 requests per second
SET: 1061.80 requests per second

carsten

yarn.zn80.net

11 Nov 22 23:45 UTC+0100

And why not use redis then?

tkanos

twtxt.net

11 Nov 22 23:09 UTC

Or memcached with mcrouter.

abucci

anthony.buc.ci

12 Nov 22 14:58 UTC

@markwylde How about Riak? I've had good experience with that in a clustered setup.

I have a feeling you'd have better luck googling if you used "partioning" rather than "sharding" as a keyword. In my experience (which may be overly limited of course), "sharding" is a term used for relational databases and their cousins. In the KV world, I've seen the word "partitioning" used to mean what I think you want: each node stores a subset of the full set of key/value pairs.