# I am the Watcher. I am your guide through this vast new twtiverse.
#
# Usage:
# https://watcher.sour.is/api/plain/users View list of users and latest twt date.
# https://watcher.sour.is/api/plain/twt View all twts.
# https://watcher.sour.is/api/plain/mentions?uri=:uri View all mentions for uri.
# https://watcher.sour.is/api/plain/conv/:hash View all twts for a conversation subject.
#
# Options:
# uri Filter to show a specific users twts.
# offset Start index for quey.
# limit Count of items to return (going back in time).
#
# twt range = 1 16
# self = https://watcher.sour.is/conv/nlcsjfa
I love simple, lightweight, small, minimal tools that just do the bare minimum. Based on that, does anyone have any good recommendations for a key value store that is:
- lightweight
- clustered
- sharded (so if I have 5 instances and 100 keys, each node will roughly have 20 keys on it).
- easy to join nodes: as in kv-server --join somehost:1111
For reference, I think Consul is too heavy (and not sharded I believe).
It would be great to have a small go executable, that I can run on 10 servers, all connected up, that exposes a redis like api. Simple GET, PUT and STREAM would be great.
@prologic anyone else here I can ping?
Thanks guys. Bitraft is awesome @prologic but yeah, not sharded :( I did try etcd before @abucci but I did find it tricker to setup than Bitraft. But again, it's not sharded :(
@prologic ever done any stress testing on bitraft? In a cluster, do you know that the throughput would be? Like, PUT's per second and GET's per second?
@markwylde No but I could do some testing and publish the results 👌
As for the sharding though... Let's discuss this?
@markwylde No but I could do some testing and publish the results 👌
As for the sharding though... Let's discuss this?
@prologic I'm happy to do it. Might try now actually. It was just incase you knew. I'll post in the README if I get it working. I'm hoping redis-benchmark will work since it's got the same api as redis.
I wonder if sharding could be implemented by:
Presumptions:
- redis can broadcast to all nodes in the cluster
- REPLICA_COUNT is 3
PUT workflow:
- a PUT get's forwarded to REPLICA_COUNT random nodes in the cluster
GET workflow:
- a broadcast is made to the cluster saying "I NEED A VALUE FOR KEY 'TEST'"
- all nodes that contain that value reply to the server
- the first response get's forwarded to the client
- the other responses are discarded
I'm sure there would be some edges cases, like syncing.
- What if 1 of the random node's is full and therefore only REPLICA_COUNT-1 nodes received the document
- This could me 2 nodes have the new value, but the 3rd has the old value
Maybe it could be solved by only committing once REPLICA_COUNT nodes successfully receive the message.
@markwylde If you could benchmark this that would be wonderful! 👌 -- Also reading your thought son "Sharding", I _think_ you might be slightly confused, because what you just described is essentially "High Availability", and not Sharding.
In fact Bitraft already has this anyway. It fully supports forming a High Availability Cluster.
@markwylde If you could benchmark this that would be wonderful! 👌 -- Also reading your thought son "Sharding", I _think_ you might be slightly confused, because what you just described is essentially "High Availability", and not Sharding.
In fact Bitraft already has this anyway. It fully supports forming a High Availability Cluster.
But in Bitraft every node contains every key + value, right? I probably wasn't clear above, but in my idea REPLICA_COUNT would be 3 but the NODE_COUNT may be 10. So a put would go to 3 of 10 of the nodes.
And why not use redis then?
Or memcached with mcrouter.
@markwylde How about Riak? I've had good experience with that in a clustered setup.
I have a feeling you'd have better luck googling if you used "partioning" rather than "sharding" as a keyword. In my experience (which may be overly limited of course), "sharding" is a term used for relational databases and their cousins. In the KV world, I've seen the word "partitioning" used to mean what I think you want: each node stores a subset of the full set of key/value pairs.