Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node Ephemeral Storage #930

Closed
01100010011001010110010101110000 opened this issue Apr 25, 2019 · 48 comments · Fixed by Azure/AgentBaker#569
Closed

Node Ephemeral Storage #930

01100010011001010110010101110000 opened this issue Apr 25, 2019 · 48 comments · Fixed by Azure/AgentBaker#569
Assignees

Comments

@01100010011001010110010101110000

What happened:
I deployed an AKS cluster using VMs of size standard_e8s_v3, resulting in each node having an ephemeral storage capacity of ~32 Gi

Name:               aks-nodepool1-47621401-0
Roles:              agent
CreationTimestamp:  Mon, 04 Mar 2019 13:46:53 -0600
Capacity:
 attachable-volumes-azure-disk:  16
 cpu:                            8
 ephemeral-storage:              30428648Ki
 hugepages-1Gi:                  0
 hugepages-2Mi:                  0
 memory:                         65970940Ki
 pods:                           20
Allocatable:
 attachable-volumes-azure-disk:  16
 cpu:                            7911m
 ephemeral-storage:              28043041951
 hugepages-1Gi:                  0
 hugepages-2Mi:                  0
 memory:                         60122876Ki
 pods:                           20

What you expected to happen:
Since this class of VM has 128GiB of temp disk, I would expect the ephemeral storage to be mounted on that scratch disk and have roughly that much capacity

How to reproduce it (as minimally and precisely as possible):
I'm not certain how where the ephemeral storage is mounted on a node or how much is allocated to it, but presumably just deploy an AKS cluster with a temp disk larger than 32 GiB

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): 1.12.6
  • Size of cluster (how many worker nodes are in the cluster?) 13
  • Others:
    • aks-engine version: v0.30.1-aks
@ball-hayden
Copy link

Without wishing to "me too" too much, I'm also seeing this on Standard_D8s_v3 nodes (which are meant to have 64GiB of ephemeral storage).

I've reached the same conclusion - that K8s ephemeral storage isn't on the right drive.

@alexjmoore
Copy link

I have also just experienced the same with a Standard_D4s_v3, I can see /dev/sdb is mounted but unused:

Filesystem      Size  Used Avail Use% Mounted on
udev            7.9G     0  7.9G   0% /dev
tmpfs           1.6G  2.5M  1.6G   1% /run
/dev/sda1        30G   25G  4.3G  86% /
tmpfs           7.9G     0  7.9G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           7.9G     0  7.9G   0% /sys/fs/cgroup
/dev/sdb1        32G   48M   30G   1% /mnt
tmpfs           1.6G     0  1.6G   0% /run/user/1001

@ball-hayden
Copy link

I've recently been in touch with support regarding this issue.
The advice seems to be to increase the OS partition size (although I'm not convinced the support agent really understood the problem).

@ericsuhong
Copy link

ericsuhong commented Oct 11, 2019

I also have the same problem.

I am running Kubernetes cluster in Azure with Standard_D8_v3, that is supposed to have 200GB Temporary storage.

However, I am only getting ~30GB available for us to use:

capacity:
attachable-volumes-azure-disk: "8"
cpu: "4"
ephemeral-storage: 30428648Ki

Also created this issue: Azure/aks-engine#2145

@alexeldeib
Copy link
Contributor

alexeldeib commented Jul 16, 2020

Looking at https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#configurations-for-local-ephemeral-storage

AKS seems to currently uses the first configuration, where root dir contains both "ephemeral storage" and other files. So we end up with /var/lib/kubelet + /var/log. This bit from the k8s docs explains the implications of splitting the two (emphasis mine):

You have a filesystem on the node that you're using for ephemeral data that comes from running Pods: logs, and emptyDir volumes. You can use this filesystem for other data (for example: system logs not related to Kubernetes); it can even be the root filesystem.

The kubelet also writes node-level container logs into the first filesystem, and treats these similarly to ephemeral local storage.

You also use a separate filesystem, backed by a different logical storage device. In this configuration, the directory where you tell
the kubelet to place container image layers and writeable layers is on this second filesystem.

The first filesystem does not hold any image layers or writeable layers.

Your node can have as many other filesystems, not used for Kubernetes, as you like.

Kubelet has --root-dir and --log-dir, after digging through the code seems like root-dir maps to the desired cadvisor FsStats: https://github.com/kubernetes/kubernetes/blob/5ed7b1afb8958fe0d5ddd3660582add89ab9a372/pkg/kubelet/stats/cri_stats_provider.go#L122-L124 (and similar)

I'm a bit curious about what happens to non-ephemeral things that normally live in /var/lib/kubelet. Right now, root-dir is just on the OS disk which is why you see the mismatch (I think)

@jamesthurley
Copy link

I had a similar issue where I created a B2S node pool and each node had a P10 128GiB disk attached. I expected something more like a P2 8GiB disk, as the B2S VM size is meant to have 8GiB temp storage.

I found that by creating the node pool on the command line with az aks nodepool add you can specify --node-osdisk-size to override this. In my case I did --node-osdisk-size 32 to create a 32GiB disk (the smallest it would allow).

@ghost ghost added the action-required label Aug 16, 2020
@ghost
Copy link

ghost commented Aug 21, 2020

Action required from @Azure/aks-pm

@ghost ghost added the Needs Attention 👋 Issues needs attention/assignee/owner label Aug 21, 2020
@ghost
Copy link

ghost commented Sep 6, 2020

Issue needing attention of @Azure/aks-leads

@alexeldeib alexeldeib self-assigned this Sep 6, 2020
@ghost ghost removed action-required Needs Attention 👋 Issues needs attention/assignee/owner labels Sep 6, 2020
@ghost ghost added the action-required label Oct 1, 2020
@ghost ghost added the stale Stale issue label Nov 30, 2020
@ghost
Copy link

ghost commented Nov 30, 2020

This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment.

@alexjmoore
Copy link

Has this issue been resolved do we know?

@ghost ghost removed the stale Stale issue label Nov 30, 2020
@alexeldeib
Copy link
Contributor

not yet. I'll likely fix this alongside some related changes in the next AKS API release, but might take some time.

@ghost ghost removed the action-required label Dec 1, 2020
@ghost ghost added the action-required label Dec 27, 2020
@mprigge
Copy link

mprigge commented Feb 1, 2021

@alexeldeib Is the fix for this you refer to going to allow emptyDir volumes to utilize the /dev/sda temporary storage which currently mounts to /mnt? That's the issue that brought me here as that's something I could really use.

If there's a workaround to enable that sort of behavior short of whatever other changes are being considered, that would be helpful to know.

@alexeldeib
Copy link
Contributor

I think you mean the other way around, no?

/dev/sda -> os disk
/dev/sdb mounted at /mnt -> temp disk

you can take a look at https://github.com/Azure/AgentBaker/pull/414/files#diff-1f2ff99eb3c00905af727ae08e89679744166bdfa52bb09826cf8ec250cdd3b1 to see how to make this work with bind mounts and sytemd units. On a live node, you'd need to stop the kubelet service, copy /var/lib/kubelet to your desired mount point under /mnt, start the bind-mount service, and then restart kubelet.

@ghost ghost removed the action-required label Feb 1, 2021
@mprigge
Copy link

mprigge commented Feb 2, 2021

Sorry - yes. Remembered it backwards. Perfect - exactly the breadcrumbs I was looking for. Thanks!

@sgerace
Copy link

sgerace commented Feb 3, 2022

In our case, one of our containers is performing a fair bit of disk-based processing (because of the size of the working set) so having a fast temporary directory available to our containers is our primary concern. In the VM world we see a pretty significant performance improvement when we use the temp disk vs. the OS disk, so we were just hoping to achieve the same in AKS.

@alexeldeib
Copy link
Contributor

alexeldeib commented Feb 3, 2022

I'm not sure I follow. If you use kubeletDiskType: Temporary, everything should be on the temp disk: the container images, emptyDir directories, writable container layers, container logs, etc. I don't think shuffling things around further would get you any better perf given the existing disk layouts.

give it a try and let me know if that works for you?

@alexeldeib
Copy link
Contributor

you can also always mount the temp disk to any container directly as host mount, just not as nice as emptyDir. There are some clever ways to make that experience smooth with e.g. multiple pods etc, but I think what AKS already has probably meets your needs (?)

@RaananHadar
Copy link

Thanks for the reply @alexeldeib

Like most users who manage aks via the portal or azure-cli, I've stumbled into this issue looking into the standard docs here and the recommended way to add ephemeral storage here. I guessed I might have missed it, but I don't see any mention of being able to achieve that with standard tools. So my understanding is that a user that attempts to use a node that would benefit from temp storage would follow the docs, get an error, understanding they either have to setup a larger os disk or give up and use a managed disk...

So now I know that I can probably do that from the REST API, but shouldn't such a feature also be available via more standard methods such as azure-cli? or am I missing something.

Many thanks.

@alexeldeib
Copy link
Contributor

shouldn't such a feature also be available via more standard methods such as azure-cli

hrm, yes it should, I thought this was in aks-preview CLI but i'm not seeing it over there. definitely TODO.

@NissesSenap
Copy link

@alexeldeib i tried to start using the KubeletDisk config in aks node pools but doing so gave me:

│ Error: creating Node Pool: (Agent Pool Name "standard3" / Managed Cluster Name "aks-dev-we-aks1" / Resource Group "rg-dev-we-aks"): containerservice.AgentPoolsClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="PreviewFeatureNotRegistered" Message="Preview feature Microsoft.ContainerService/KubeletDisk not registered."

I'm trying to find any documentation that is pointing towards that KubeletDisk is still a Preview feature but I'm unable to do so.
At the old API spec that you shared we can see a note about it: https://github.com/Azure/azure-rest-api-specs/blob/5582a35deb1bfa4aa22bac8f1d51b7934ead94ac/specification/containerservice/resource-manager/Microsoft.ContainerService/stable/2021-02-01/managedClusters.json#L3856

but in the new version: https://github.com/Azure/azure-rest-api-specs/blob/12573bd124f45f2fe8f7cd95af6374ff812e8cce/specification/containerservice/resource-manager/Microsoft.ContainerService/stable/2022-01-01/managedClusters.json#L2647 I can't see any notice about this.

The same apply to: https://docs.microsoft.com/en-us/rest/api/aks/agent-pools/create-or-update#kubeletdisktype

Do you have any pointers? Also when should we expect KubeletDisk to be out of preview?
I understand this last question is hard to answer.

@ghost ghost added the action-required label Mar 19, 2022
@ghost ghost added the stale Stale issue label May 18, 2022
@ghost
Copy link

ghost commented May 18, 2022

This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment.

@schuhu
Copy link

schuhu commented May 18, 2022

This issue is not solved. Our workaround: we don't use this VM type. Actually I wonder if people really use this type and know that they would need hostpath to use the storage they pay for.

@ghost ghost removed the stale Stale issue label May 18, 2022
@schuhu
Copy link

schuhu commented May 18, 2022

This issue is not solved. Our workaround: we don't use this VM type. Actually I wonder if people really use this type and know that they would need hostpath to use the storage they pay for.

@ghost ghost added the stale Stale issue label Jul 17, 2022
@ghost
Copy link

ghost commented Jul 17, 2022

This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment.

@RaananHadar
Copy link

I really hope that you will fix this as it's currently a disadvantage of AKS 🙏

@ghost ghost removed the stale Stale issue label Jul 18, 2022
@ghost ghost added the stale Stale issue label Sep 16, 2022
@ghost
Copy link

ghost commented Sep 16, 2022

This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment.

@jpc
Copy link

jpc commented Sep 16, 2022

I am pretty sure this not is doing you guys a disservice - the issue is not stale, it’s work in progress. It is a bit funny though that your work in progress times are longer than your “stale” thresholds.

@ghost ghost removed the stale Stale issue label Sep 16, 2022
@ghost ghost added the stale Stale issue label Nov 15, 2022
@ghost
Copy link

ghost commented Nov 15, 2022

This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment.

@fooo999992
Copy link

I hope you can take some time to solve this, as its been more than 3 years since the issue was opened. Please note that this issue really makes HPC applications on AKS less cost efficient than the competition.

@ghost ghost removed the stale Stale issue label Nov 18, 2022
@ghost ghost added the stale Stale issue label Jan 17, 2023
@ghost
Copy link

ghost commented Jan 17, 2023

This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment.

@ghost ghost closed this as completed Jan 25, 2023
@ghost
Copy link

ghost commented Jan 25, 2023

This issue will now be closed because it hasn't had any activity for 7 days after stale. 01100010011001010110010101110000 feel free to comment again on the next 7 days to reopen or open a new issue after that time if you still have a question/issue or suggestion.

@ghost ghost locked as resolved and limited conversation to collaborators Feb 24, 2023
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.