-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: AKS + VMSS + Ephemeral OS Disk #1370
Comments
@jluk I can see this item on backlog, this is nice. Do you have a ETA? |
We would love to see this. Right now we cannot use AKS at all because of this. The disk performance for AKS vs VM for the same SKU is significantly different:
I do not understand how to do the following: Any help is appreciated. |
@mohamedmansour The Standard_F32s_v2 has the following limits: 64000/512 (IOPS / MB/s) But you can reach another limit first, based on your disk size. HDD has 500/60 limit and Premium disks vary by disk size. |
@edernucci How can I make sure my pod doesn't have that limit, when doing:
It returns a
When I setup the VM with same SKU I told it to use Ephemeral instead of Premium and it was 4X faster. Do you happen to know how we can convert our AKS cluster to attach an Ephemeral Disk? (in this case the Temporary disk) The |
Hi @mohamedmansour. The disk IOPS is size based. You can verify sizes and performance here Try changing the disk size to another tier to see if you can get more throughput. |
This is unfortunately a regression. I have filed a bug here Azure/aks-engine#3227 We are using the exact same SKU from VM and AKS, our pipeline requires fast disks and previously we took advantage of the Temporary Disks which are pretty quick around 400MB/s. When trying out and migrating to AKS, the temporary disk is missing. I added a P30 Premium SSD which gives us 1TB at 200MBS and we are just getting 70MB/s, we are not running anything, just testing the provisioned hardware. |
@mohamedmansour the temporary ephemeral disk on aks nodes is mounted in /mnt by default
AKS itself does not support the mutation or remounting the disks of the underlying Virtual machine (the VMs are managed by AKS) and ephemeral OS disks are not yet supported (shipping this year). In order to resize the worker nodes within AKS you can use ARM to change the size of the OS disk to the 2tb maximum (however, that also has a hard IOPS quota limit). The size of the OS disk specifically will constrain all disk IO - AKS and most other deployments use a 30 or 120 gb OS disk size with low IOPS allotment leading to high disk IO latency that would also impact using the ephemeral disk in /mnt - additional information is in #1373 Ephemeral OS disk support is a partial remedy to this issue - using ephemeral for the operating system without doing the isolation of the container and other unbounded (logging, metrics) IO from the same IO path of the kernel and OS will result in the IOPS saturation of the VM. |
@jnoller It doesn't seem like a temporary disk thou,
Any ideas why this is happening? |
@mohamedmansour refresh my comment, I was typing 🤣 - see issue #1373 |
@mohamedmansour Also check:
|
The basic difference in your testing is 1) Bare VM - you access the VM directly 2) AKS POD - you have not accessed the VM or VMSS directly but tested using a POD. If you login directly to the node (kubectl get nodes - o wide, pick one ip and login) you should be able to get similar performance to your Bare VM |
any ETA (apart from this year) on this, please? |
August 17th release. Thanks for your patience. |
We were able to wrap it up for preview a bit sooner and since a lot of folks are keen to try it, we've released it :) Give it a shot: https://aka.ms/aks/ephemeral-os |
okay, its a bit sooner, but how to really use it for anything? not a single automation solution support sending custom headers to enable this feature. I have numerous things configured on the cluster. I cant just repoint my pipelines to a random cluster, the application just wont work -..-. is this supposed to GA in 2 weeks time or when can I expect this to start working properly though REST API so automation solutions would be able to actually create such clusters? |
Could you be more specific? Are you using ARM templates, Terraform? We're aware of difficulties with custom headers and ARM templates, but I can't share a better answer yet.
There isn't an associated API change currently for ephemeral OS. At GA we would remove the headers/preview feature but the behavior would be identical with no added fields. We would enable it when possible by default, based on requested OS disk size and VM SKU. |
We'll be making a few additional options so you're able to try these previews that don't land on the API via ARM too. Stay tuned on the next few weeks. GA behavior will just use this as Ace mentioned. We're currently still defining the GA ETA based on the feedback from preview. Thanks for your patience. |
@palma21 presumably that'll be a |
These properties (with very rare exceptions) will not be part of the API, preview or sstable, that's why this issue exists in the first place. We just want to give the opportunity for folks to flight them before they change in the background but they are not going to be provided as options/knobs. Eg. AKS Ubuntu 18.04 as base image or containerd as runtime For this particular property we're discussing based on the current signal if we should consider it as a knob or just a default. |
well, guess what, its impossible to test this in this current implementation. same with 18.04 and containerd. I'm not against setting ephemeral OS Disk as default, but I dont really see a rationale behind not letting people use an API to tweak this. All the other API's use preview api versions. Please, name me a single automation solution that supports sending custom headers to the ARM api's. And no, Az cli is not an automation solution. its a thing to accomplish one off tasks. |
we will, we are working on it since your first comment. For some context these features were designed to be used effectively as "one offs" just to validate, which is why the CLI header was considered as they'd never be part of the API. But we understand that might not be suiting to everyone, so we'll provide an option that allow other deployment mechanisms to leverage this feature on test environments. |
well, I validated, sample cluster gets created ;) now I'd like to test my cluster and my application somehow :) just to clarify, I can convert existing clusters just by recreating the node pools? but I wont be able to update existing pools, so I'd need to recreate them? |
@palma21 for what it's worth, we're seeing a large number of issues due to AKS not following the standard If the intention is to roll these out to users on an opt-in basis by default, isn't the recommended way to do that using the |
@tombuildsstuff I get the ask behind hashicorp/terraform-provider-azurerm#6793 and that's something we're looking at as mentioned. The issue you linked directly is a break in TF provider because you changed API versions (comment). |
hey, sorry for bugging, any updates? @alexeldeib :) |
We added it to the 2020-09-01 API based on feedback and the defaulting behavior being a bit opaque: Azure/azure-rest-api-specs#10598 It takes a bit of time for this to propagate everywhere (RP, SDK, CLI), we're working towards ~Ignite for the bunch. Appreciate your feedback + patience |
Correct -- not being able to change to/from ephemeral OS is unfortunately a limitation on the whole scale set, rather than per-instance. It's a compute limitation unrelated to AKS. In existing clusters you can add new pools with this feature. |
okay, I was only worried about having to recreate whole AKS for that, recreating node pools is fine. |
The switch to use proper API types is about done, the 2020-09-01 API is available and the updates are included in the 0.4.63 aks-preview CLI. Probably pending some doc updates and will close this out. |
docs are updated: https://docs.microsoft.com/en-us/azure/aks/cluster-configuration#ephemeral-os feel free to ping if you have any issues, but I'll close this. |
Would like to use an ephemeral OS disk, can't seem to change this flag after VMSS created
aks-engine
https://github.com/Azure/aks-engine/tree/master/examples/disks-ephemeral
https://github.com/Azure/aks-engine/blob/master/pkg/engine/virtualmachinescalesets.go#L678-L680
Azure/aks-engine#2145 (comment)
something like
The text was updated successfully, but these errors were encountered: