-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node Ephemeral Storage #930
Node Ephemeral Storage #930
Comments
Without wishing to "me too" too much, I'm also seeing this on Standard_D8s_v3 nodes (which are meant to have 64GiB of ephemeral storage). I've reached the same conclusion - that K8s ephemeral storage isn't on the right drive. |
I have also just experienced the same with a Standard_D4s_v3, I can see /dev/sdb is mounted but unused:
|
I've recently been in touch with support regarding this issue. |
I also have the same problem. I am running Kubernetes cluster in Azure with Standard_D8_v3, that is supposed to have 200GB Temporary storage. However, I am only getting ~30GB available for us to use: capacity: Also created this issue: Azure/aks-engine#2145 |
AKS seems to currently uses the first configuration, where root dir contains both "ephemeral storage" and other files. So we end up with /var/lib/kubelet + /var/log. This bit from the k8s docs explains the implications of splitting the two (emphasis mine):
Kubelet has I'm a bit curious about what happens to non-ephemeral things that normally live in /var/lib/kubelet. Right now, root-dir is just on the OS disk which is why you see the mismatch (I think) |
I had a similar issue where I created a B2S node pool and each node had a P10 128GiB disk attached. I expected something more like a P2 8GiB disk, as the B2S VM size is meant to have 8GiB temp storage. I found that by creating the node pool on the command line with |
Action required from @Azure/aks-pm |
Issue needing attention of @Azure/aks-leads |
This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment. |
Has this issue been resolved do we know? |
not yet. I'll likely fix this alongside some related changes in the next AKS API release, but might take some time. |
@alexeldeib Is the fix for this you refer to going to allow emptyDir volumes to utilize the /dev/sda temporary storage which currently mounts to /mnt? That's the issue that brought me here as that's something I could really use. If there's a workaround to enable that sort of behavior short of whatever other changes are being considered, that would be helpful to know. |
I think you mean the other way around, no? /dev/sda -> os disk you can take a look at https://github.com/Azure/AgentBaker/pull/414/files#diff-1f2ff99eb3c00905af727ae08e89679744166bdfa52bb09826cf8ec250cdd3b1 to see how to make this work with bind mounts and sytemd units. On a live node, you'd need to stop the kubelet service, copy /var/lib/kubelet to your desired mount point under /mnt, start the bind-mount service, and then restart kubelet. |
Sorry - yes. Remembered it backwards. Perfect - exactly the breadcrumbs I was looking for. Thanks! |
In our case, one of our containers is performing a fair bit of disk-based processing (because of the size of the working set) so having a fast temporary directory available to our containers is our primary concern. In the VM world we see a pretty significant performance improvement when we use the temp disk vs. the OS disk, so we were just hoping to achieve the same in AKS. |
I'm not sure I follow. If you use kubeletDiskType: Temporary, everything should be on the temp disk: the container images, emptyDir directories, writable container layers, container logs, etc. I don't think shuffling things around further would get you any better perf given the existing disk layouts. give it a try and let me know if that works for you? |
you can also always mount the temp disk to any container directly as host mount, just not as nice as emptyDir. There are some clever ways to make that experience smooth with e.g. multiple pods etc, but I think what AKS already has probably meets your needs (?) |
Thanks for the reply @alexeldeib Like most users who manage aks via the portal or azure-cli, I've stumbled into this issue looking into the standard docs here and the recommended way to add ephemeral storage here. I guessed I might have missed it, but I don't see any mention of being able to achieve that with standard tools. So my understanding is that a user that attempts to use a node that would benefit from temp storage would follow the docs, get an error, understanding they either have to setup a larger os disk or give up and use a managed disk... So now I know that I can probably do that from the REST API, but shouldn't such a feature also be available via more standard methods such as azure-cli? or am I missing something. Many thanks. |
hrm, yes it should, I thought this was in aks-preview CLI but i'm not seeing it over there. definitely TODO. |
@alexeldeib i tried to start using the KubeletDisk config in aks node pools but doing so gave me: │ Error: creating Node Pool: (Agent Pool Name "standard3" / Managed Cluster Name "aks-dev-we-aks1" / Resource Group "rg-dev-we-aks"): containerservice.AgentPoolsClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="PreviewFeatureNotRegistered" Message="Preview feature Microsoft.ContainerService/KubeletDisk not registered."
│ I'm trying to find any documentation that is pointing towards that KubeletDisk is still a Preview feature but I'm unable to do so. but in the new version: https://github.com/Azure/azure-rest-api-specs/blob/12573bd124f45f2fe8f7cd95af6374ff812e8cce/specification/containerservice/resource-manager/Microsoft.ContainerService/stable/2022-01-01/managedClusters.json#L2647 I can't see any notice about this. The same apply to: https://docs.microsoft.com/en-us/rest/api/aks/agent-pools/create-or-update#kubeletdisktype Do you have any pointers? Also when should we expect KubeletDisk to be out of preview? |
This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment. |
This issue is not solved. Our workaround: we don't use this VM type. Actually I wonder if people really use this type and know that they would need hostpath to use the storage they pay for. |
This issue is not solved. Our workaround: we don't use this VM type. Actually I wonder if people really use this type and know that they would need hostpath to use the storage they pay for. |
This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment. |
I really hope that you will fix this as it's currently a disadvantage of AKS 🙏 |
This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment. |
I am pretty sure this not is doing you guys a disservice - the issue is not stale, it’s work in progress. It is a bit funny though that your work in progress times are longer than your “stale” thresholds. |
This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment. |
I hope you can take some time to solve this, as its been more than 3 years since the issue was opened. Please note that this issue really makes HPC applications on AKS less cost efficient than the competition. |
This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment. |
This issue will now be closed because it hasn't had any activity for 7 days after stale. 01100010011001010110010101110000 feel free to comment again on the next 7 days to reopen or open a new issue after that time if you still have a question/issue or suggestion. |
What happened:
I deployed an AKS cluster using VMs of size
standard_e8s_v3
, resulting in each node having an ephemeral storage capacity of ~32 GiWhat you expected to happen:
Since this class of VM has 128GiB of temp disk, I would expect the ephemeral storage to be mounted on that scratch disk and have roughly that much capacity
How to reproduce it (as minimally and precisely as possible):
I'm not certain how where the ephemeral storage is mounted on a node or how much is allocated to it, but presumably just deploy an AKS cluster with a temp disk larger than 32 GiB
Anything else we need to know?:
Environment:
kubectl version
):1.12.6
aks-engine
version:v0.30.1-aks
The text was updated successfully, but these errors were encountered: