# I am the Watcher. I am your guide through this vast new twtiverse.
#
# Usage:
# https://watcher.sour.is/api/plain/users View list of users and latest twt date.
# https://watcher.sour.is/api/plain/twt View all twts.
# https://watcher.sour.is/api/plain/mentions?uri=:uri View all mentions for uri.
# https://watcher.sour.is/api/plain/conv/:hash View all twts for a conversation subject.
#
# Options:
# uri Filter to show a specific users twts.
# offset Start index for quey.
# limit Count of items to return (going back in time).
#
# twt range = 1 13
# self = https://watcher.sour.is/conv/cjv32ca
Spent the better part of the day debugging sporadic network failures in a kubernetes cluster.
TIL:
- k8s uses lots of iptables magic under the hood.
- iptables has a mechanism to apply rules *based on probability* and that’s how k8s does load balancing (e.g., if you have a service that points to several pods): https://man.archlinux.org/man/iptables-extensions.8#statistic
- The root cause of our sporadic failures were stale iptables rules: Some of them pointed to no longer existing pods (but because probabilities are involved, they didn’t always trigger).
- This isn’t Sparta, this is madness. And probably a k8s bug.
Spent the better part of the day debugging sporadic network failures in a kubernetes cluster.
TIL:
- k8s uses lots of iptables magic under the hood.
- iptables has a mechanism to apply rules *based on probability* and that’s how k8s does load balancing (e.g., if you have a service that points to several pods): https://man.archlinux.org/man/iptables-extensions.8#statistic
- The root cause of our sporadic failures were stale iptables rules: Some of them pointed to no longer existing pods (but because probabilities are involved, they didn’t always trigger).
- This isn’t Sparta, this is madness. And probably a k8s bug.
Spent the better part of the day debugging sporadic network failures in a kubernetes cluster.
TIL:
- k8s uses lots of iptables magic under the hood.
- iptables has a mechanism to apply rules *based on probability* and that’s how k8s does load balancing (e.g., if you have a service that points to several pods): https://man.archlinux.org/man/iptables-extensions.8#statistic
- The root cause of our sporadic failures were stale iptables rules: Some of them pointed to no longer existing pods (but because probabilities are involved, they didn’t always trigger).
- This isn’t Sparta, this is madness. And probably a k8s bug.
Well, the question is: What’s the root cause of the root cause? Why did those rules become stale? I’ll never know.
Well, the question is: What’s the root cause of the root cause? Why did those rules become stale? I’ll never know.
Well, the question is: What’s the root cause of the root cause? Why did those rules become stale? I’ll never know.
@movq it surely looks like a k8s bug. One would expect residual iptables to be flushed once pods are destroyed. If they are not, that's a bug. Sadly, I don't know much about k8s. Have learning about it on my TODO list.
@movq what's your cni plugin you're using? calico, cillium, etc?
rules should flush when masters get in sync. if you have any drift between the masters and/or latency/divergence in state this can happen.
k8s is a nasty bit of kit. i do quite a bit of this at the dayjob
@david worth learning if you use it or simply interested in distributed computing :-)
@mutefall I’ll have to check tomorrow. This is “managed kubernetes” at Google, so I don’t know if I know which plugin it is. :-) We have zero control over the master(s?).
@mutefall I’ll have to check tomorrow. This is “managed kubernetes” at Google, so I don’t know if I know which plugin it is. :-) We have zero control over the master(s?).
@mutefall I’ll have to check tomorrow. This is “managed kubernetes” at Google, so I don’t know if I know which plugin it is. :-) We have zero control over the master(s?).
@movq that's one of the main gripes i have over managed k8s. no true control of your masters.
@movq what'd you make of this? any progress?