# I am the Watcher. I am your guide through this vast new twtiverse.
#
# Usage:
# https://watcher.sour.is/api/plain/users View list of users and latest twt date.
# https://watcher.sour.is/api/plain/twt View all twts.
# https://watcher.sour.is/api/plain/mentions?uri=:uri View all mentions for uri.
# https://watcher.sour.is/api/plain/conv/:hash View all twts for a conversation subject.
#
# Options:
# uri Filter to show a specific users twts.
# offset Start index for quey.
# limit Count of items to return (going back in time).
#
# twt range = 1 35
# self = https://watcher.sour.is/conv/zxvfzja
Google support just told me: "Sure, if your k8s pod does $thing, then that can corrupt the ext4 on attached volumes." I'm sure this is just a silly misunderstanding.
Google support just told me: "Sure, if your k8s pod does $thing, then that can corrupt the ext4 on attached volumes." I'm sure this is just a silly misunderstanding.
Google support just told me: "Sure, if your k8s pod does $thing, then that can corrupt the ext4 on attached volumes." I'm sure this is just a silly misunderstanding.
@prologic Yeah, we got corrupted disks. Basically, some pods died too "suddenly" or "abrupt" and that confused the shit out of k8s ... I don't claim to understand the details here (and they didn't share them, either).
I really hope it *is* just a misunderstanding. I mean, if some pod can just do the wrong thing and thus corrupt an ext4 disk, then ... dude, what the heck. :D
@prologic Yeah, we got corrupted disks. Basically, some pods died too "suddenly" or "abrupt" and that confused the shit out of k8s ... I don't claim to understand the details here (and they didn't share them, either).
I really hope it *is* just a misunderstanding. I mean, if some pod can just do the wrong thing and thus corrupt an ext4 disk, then ... dude, what the heck. :D
@prologic Yeah, we got corrupted disks. Basically, some pods died too "suddenly" or "abrupt" and that confused the shit out of k8s ... I don't claim to understand the details here (and they didn't share them, either).
I really hope it *is* just a misunderstanding. I mean, if some pod can just do the wrong thing and thus corrupt an ext4 disk, then ... dude, what the heck. :D
@movq Dude what the actual fuck š³š¤¦āāļø That's like saying if Docker dies too suddenly it'll corrupt the file system š
@movq Dude what the actual fuck š³š¤¦āāļø That's like saying if Docker dies too suddenly it'll corrupt the file system š
@prologic Got another mail: They literally compared a pod dying at the wrong time to removing a USB stick without unmounting it. They said I should talk to our devs to find out what that pod is doing.
Okay, seriously, am I misunderstanding something here? How can a quitting pod cause that? I mean, the processes in the containers of a pod are just ... processes. When they quit, then they quit, end of story. They don't automatically unmount anything -- that's the job of k8s, isn't it? The applications running in pods have nothing to do with this layer.
Sure, when a pod dies at the wrong time, it might corrupt data *inside* of a filesystem -- but not the filesystem *itself*. (We were getting "bad superblock" messages in dmesg
and all that.) Maybe the support guys think that's what was happening ...
@prologic Got another mail: They literally compared a pod dying at the wrong time to removing a USB stick without unmounting it. They said I should talk to our devs to find out what that pod is doing.
Okay, seriously, am I misunderstanding something here? How can a quitting pod cause that? I mean, the processes in the containers of a pod are just ... processes. When they quit, then they quit, end of story. They don't automatically unmount anything -- that's the job of k8s, isn't it? The applications running in pods have nothing to do with this layer.
Sure, when a pod dies at the wrong time, it might corrupt data *inside* of a filesystem -- but not the filesystem *itself*. (We were getting "bad superblock" messages in dmesg
and all that.) Maybe the support guys think that's what was happening ...
@prologic Got another mail: They literally compared a pod dying at the wrong time to removing a USB stick without unmounting it. They said I should talk to our devs to find out what that pod is doing.
Okay, seriously, am I misunderstanding something here? How can a quitting pod cause that? I mean, the processes in the containers of a pod are just ... processes. When they quit, then they quit, end of story. They don't automatically unmount anything -- that's the job of k8s, isn't it? The applications running in pods have nothing to do with this layer.
Sure, when a pod dies at the wrong time, it might corrupt data *inside* of a filesystem -- but not the filesystem *itself*. (We were getting "bad superblock" messages in dmesg
and all that.) Maybe the support guys think that's what was happening ...
@movq Whaaaaatā¦ O_o No offence, but there's often a reason that first level support works at first level support. I'm not helpful, I know.
I'm with @lyse on this. Level 1 support are morons, push back and escalate.
The Pod(s) are supposed to be managed by Google's GKE service no? Or is that the Node(s)? š¤
In any case it's the responsibility of the CSI driver to deal with mounting and un mounting the file system into the Pod's namespace somewhere.
I'm with @lyse on this. Level 1 support are morons, push back and escalate.
The Pod(s) are supposed to be managed by Google's GKE service no? Or is that the Node(s)? š¤
In any case it's the responsibility of the CSI driver to deal with mounting and un mounting the file system into the Pod's namespace somewhere.
@prologic @movq @lyse
they are right in one sense but wrong in their delivery.
if a pod had a pv attached on storage plane and the pod does something (could be anything) to corrupt the pv, the scheduler will continue to kill the pod since its pvc pointing to pv cannot be fulfilled.
there is no proper time for pod death. kubernetes will kill pods for whatever reason it deems necessary at any given time. its purpose is to ensure declared state is met at all times.
now all that being said ext4 corruption can and does happen on the underlying storage that supports your storage plane (ceph+took, talos, nfs, iscsi, etc) but a pod cannot directly cause this.
if the csi driver/storage plane had some bug or takes a flaming shit sure it can corrupt the blob storage but not a pod.
basically it means they gave you the right answer to the wrong question.
if you need help im happy to discuss.
@retrocrash I think you said the same thing as me but you said it much better as you're way more experienced with k8s š
@retrocrash I think you said the same thing as me but you said it much better as you're way more experienced with k8s š
Yeah, Iām beginning to think this support guy probably doesnāt understand the difference between āa corrupted filesystemā and ācorrupted files on an intact filesystemā. Thatās the only explanation.
Iām just too naive for this. š¤£ I always take replies from support people too literally and I always assume that they know what theyāre doing. š¤£ I mean, the guy even said he talked to a team of experts, so ā¦
Yeah, Iām beginning to think this support guy probably doesnāt understand the difference between āa corrupted filesystemā and ācorrupted files on an intact filesystemā. Thatās the only explanation.
Iām just too naive for this. š¤£ I always take replies from support people too literally and I always assume that they know what theyāre doing. š¤£ I mean, the guy even said he talked to a team of experts, so ā¦
Yeah, Iām beginning to think this support guy probably doesnāt understand the difference between āa corrupted filesystemā and ācorrupted files on an intact filesystemā. Thatās the only explanation.
Iām just too naive for this. š¤£ I always take replies from support people too literally and I always assume that they know what theyāre doing. š¤£ I mean, the guy even said he talked to a team of experts, so ā¦
@movq He talked to a team of experts?! š³ Did they find evidence, do root cause analysis? Produce a repro? š¤
@movq He talked to a team of experts?! š³ Did they find evidence, do root cause analysis? Produce a repro? š¤
@prologic š I hope Iāll find out soon! š
@prologic š I hope Iāll find out soon! š
@prologic š I hope Iāll find out soon! š
@movq From my limited experiences in two companies I can anedoctic tell you, that what we developers told our support work mates after analyzing things and what they replied back to the enquirers was not always the same. That also happend when we gave them answers in written form. Always super nice support folks, no a single doubt, but their basic technical knowledge was pretty much non-existent. And plenty of them didn't even really know the softwares they're supposed to support. Granted, those were not easy programs, one was indeed super complex. But if they use them on a daily basis for years one would expect that they know them quite well. At least the main features and workflows. We also often had to tell them basic stuff several times, which was quite a bit frustrating for both sides.
But, I was super glad, that we had them in the front row. You wouldn't believe what crap queries they had to deal with and what utter bullshit they kept off our shoulders. Sometimes people wrote really offensive e-mails for no reason. Holy moly. I wouldn't want to trade with them, not in a hundred years. Lots of my developer work mates, however, didn't value our first level support at all. I mean, I totally understand, that after telling the same things over and over and over and over again it pisses you off, but treating them in a way they feel like shit, doesn't help either. It only makes things worse. I had the impression that there was a slight war between development and support.
One thing that was totally stupid, is that the POs didn't listen to improvements and suggestions on how to make things easier for the support team and also all our users. I mean, support has to deal with this software all day long and also get the same questions about workflows and stuff that's too complicated or unintuitive. So a lot of things were really low hanging fruit to improve everybody's live. But when they suggested anything, the POs always declined it, nah, it's the support's job. Period. A few times I teamed up with the support work mates and told the POs the same, the support team was suggesting and then it was accepted without hesitation. So that clearly shows there really was a two-tier society.
In my current project we don't have a support team, so we need to handle all the support queries ourselves. In that regard I miss the old project. But luckily, it's basically just other developers who are needing our help, so that's fairly okay.
@lyse Yeah first level support guys and gals are under valued really. The good ones are great and have awesome people skills. Still thick as bricks but they're there to quell the idiots at the front door š
@lyse Yeah first level support guys and gals are under valued really. The good ones are great and have awesome people skills. Still thick as bricks but they're there to quell the idiots at the front door š
@prologic Hahahahaha, very nicely put, mate! :-D
@lyse Hm, yeah. Iām probably a bit spoiled. š
(Aside from being too naive and too trusting.) In my current company, there is no traditional āfirst level supportā that just talks to the customers and has basically no idea what theyāre saying. Sure, there are different ātiersā and different sets of skills among the teams, but there are no āsupport monkeysā. When customers open tickets, they pretty much immediately get to tech-savvy people, who are actual devs/sysadmins (or at least worked as such in the past, as far as I know).
Probably quite unusual in this field. š¤ But I wouldnāt really know, Iāve only seen three companies in the IT field and Iāve been with the current one for a good decade, so ā¦
> I had the impression that there was a slight war between development and support. [ā¦] the POs didn't listen to improvements and suggestions on how to make things easier for the support team
Oof, thatās harsh. š³
@lyse Hm, yeah. Iām probably a bit spoiled. š
(Aside from being too naive and too trusting.) In my current company, there is no traditional āfirst level supportā that just talks to the customers and has basically no idea what theyāre saying. Sure, there are different ātiersā and different sets of skills among the teams, but there are no āsupport monkeysā. When customers open tickets, they pretty much immediately get to tech-savvy people, who are actual devs/sysadmins (or at least worked as such in the past, as far as I know).
Probably quite unusual in this field. š¤ But I wouldnāt really know, Iāve only seen three companies in the IT field and Iāve been with the current one for a good decade, so ā¦
> I had the impression that there was a slight war between development and support. [ā¦] the POs didn't listen to improvements and suggestions on how to make things easier for the support team
Oof, thatās harsh. š³
@lyse Hm, yeah. Iām probably a bit spoiled. š
(Aside from being too naive and too trusting.) In my current company, there is no traditional āfirst level supportā that just talks to the customers and has basically no idea what theyāre saying. Sure, there are different ātiersā and different sets of skills among the teams, but there are no āsupport monkeysā. When customers open tickets, they pretty much immediately get to tech-savvy people, who are actual devs/sysadmins (or at least worked as such in the past, as far as I know).
Probably quite unusual in this field. š¤ But I wouldnāt really know, Iāve only seen three companies in the IT field and Iāve been with the current one for a good decade, so ā¦
> I had the impression that there was a slight war between development and support. [ā¦] the POs didn't listen to improvements and suggestions on how to make things easier for the support team
Oof, thatās harsh. š³
@movq Yeah, it's also a bit of a chicken egg problem. If you have unqualified people, they can't do a lot of stuff but they have to do something, so then they're shunt off to support. And there they can't really improve because they're always overloaded. And not getting any respect they deserve also doesn't help their motivation, so the downwards spiral continues. There's more to it, but in my opinion that's one key factor.