-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting "Stale file handle" in kubernetes deployment #25
Comments
Thanks for your question. I'm not really sure if this is a bug in the image, or just normal behavior of the NFS protocol. Neither of the RFCs for NFSv4 or NFSv3 go into very much detail about stale file handles, other than to say (in RFC 1813) that error means that:
I'm also not intimately familiar with Kubernetes so you might have to help me a bit. But when the kublet goes into an Do you get this error message after the server is back up again? Or during the outage? Both? My hunch is that, unfortunately, the only workaround will be to gracefully detect this error on your clients and re-mount the shares :/ But I'm not giving up yet! |
Thanks for the answer, @ehough! Ok, here is how it works in Kubernetes: By default, the cluster is set up in a way that if the available free storage space in the node goes Answering to your question: Yes, the POD (and the container as consequence) transitions for a Failed PodPhase. This means that it will terminate. The POD is not restarted after the space is freedup. Actually, kubelet keeps restarting it forever, until it get a chance to the POD be in a Running, i.e., normal state again which will only happen probably when you free space by yourself. I received this error while the outage. When I free the space and PODs are not evicted anymore, I see all of them Running normally, but if I try to go in any of them and do a simple |
I'd suggest against using NFS inside k8s cluster. When adding or removing nodes, NFS server may scheduled to another node. At the same time, pods were using NFS will no longer works properly since NFS server is gone and these pods will stuck in "Terminating" state if you want to delete them. You can use NFS inside k8s unless you won't add/remove nodes, update k8s version or bind the NFS server to specific node, but still when NFS server gets restarted, your whole pods will behave abnormal. At that point your only choice will be tear down everything and rebuild deployments from scratch again. This is my experience from latest incident that caused 1hour downtime on my production... Right now I'm moving away my NFS server to dedicated node .. |
Hi, @shinebayar-g, do you have any other idea other than |
I think only option to achieve |
Yeah, I researched on this and came up with the same conclusion. You can actually use the same PVC for two different PODs. If, by luck, they got schedule at the same node in the cluster and having the same PVC, then you get only one volume created and shared. But if scheduler send each one to a different node, you end up with the same PVC, but different volumes. Other than that, I was thinking about storage cluster etc, but not sure if this could actually help in solving the sharing between different PODS in the cluster. |
Hello, I used this image to deploy a single pod NFSv4 service to share volume among multiple k8s nodes, which works well at present.
If we change the replicas of k8s deployment from 1 to 3, in other words, multiple NFS server pods are using the same directory exports to provide services. Will this bring some unexpected problems? |
@cpu100 I think you're fine. Multiple replicas or traditional multiple network mounted users are logically same. Actually NFS is the only storage provider that enables you to scale up replicas number. |
cephfs! It's more expensive but more stable. |
Hi,
I've been using this image for a while in a Kubernetes deployment and it's working fine. I can create my NFS POD and connect it in other PODs.
However, there is one specific situation that is happening. When the Kubernetes node that has the NFS Server POD deployed starts to run out of space, i.e., it reaches 85% or more of usage, Kubernetes starts to put PODs in the
evict state
. This is fine and is a normal behaviour of Kubernetes. Kubernetes evicts the PODs and keep trying to reschedule new ones. Once I cleanup the files in the node to get more space, all the PODs go to a stable state again and everything from now on is on the running state, i.e., normal state. However, after it, the NFS share starts to outputStale File Handle
in the shared folders that the PODs use to connect to the NFS POD.Any idea why is it failing? While I understood by searching this issue on internet is that
the stale NFS handle indicates that the client has a file open, but the server no longer recognizes the file handle
(https://serverfault.com/questions/617610/stale-nfs-file-handle-after-reboot). Shouldn't the NFS Server container itself recovery of this situation. It's important to not ehere too that IPs in this case, even after the pod eviction, they don't change because they are backed by Kubernetes services.The text was updated successfully, but these errors were encountered: