Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: AKS + VMSS + Ephemeral OS Disk #1370

Closed
djeeg opened this issue Jan 6, 2020 · 31 comments
Closed

Feature Request: AKS + VMSS + Ephemeral OS Disk #1370

djeeg opened this issue Jan 6, 2020 · 31 comments
Assignees
Labels
feature-request Requested Features

Comments

@djeeg
Copy link

djeeg commented Jan 6, 2020

Would like to use an ephemeral OS disk, can't seem to change this flag after VMSS created

az vmss update -g MC_removed -n aks-agentpool-removed-vmss \
    --set virtualMachineProfile.storageProfile.osDisk.diffDiskSettings.option=Local \
    --set virtualMachineProfile.priority=Spot \
    --set virtualMachineProfile.billingProfile.maxPrice=0.10 \
    --set virtualMachineProfile.evictionPolicy=Delete

Cannot change differencing disk settings
The specified priority value 'Spot' cannot be applied ... since no priority was specified while originally creating the Virtual Machine Scale Set

aks-engine
https://github.com/Azure/aks-engine/tree/master/examples/disks-ephemeral
https://github.com/Azure/aks-engine/blob/master/pkg/engine/virtualmachinescalesets.go#L678-L680
Azure/aks-engine#2145 (comment)

something like

az aks create --storageprofile Ephemeral
@edernucci
Copy link

@jluk I can see this item on backlog, this is nice. Do you have a ETA?

@mohamedmansour
Copy link

We would love to see this. Right now we cannot use AKS at all because of this.

The disk performance for AKS vs VM for the same SKU is significantly different:

  • VM + Ephemeral (Standard_F32s_v2) = 400MB/s
  • AKS (Standard_F32s_v2) = 100MB/s

I do not understand how to do the following:
Azure/aks-engine#2145

Any help is appreciated.

@edernucci
Copy link

We would love to see this. Right now we cannot use AKS at all because of this.

The disk performance for AKS vs VM for the same SKU is significantly different:

* VM + Ephemeral  (Standard_F32s_v2) = 400MB/s

* AKS (Standard_F32s_v2) = 100MB/s

I do not understand how to do the following:
Azure/aks-engine#2145

Any help is appreciated.

@mohamedmansour The Standard_F32s_v2 has the following limits: 64000/512 (IOPS / MB/s) But you can reach another limit first, based on your disk size. HDD has 500/60 limit and Premium disks vary by disk size.

@mohamedmansour
Copy link

@edernucci How can I make sure my pod doesn't have that limit, when doing:

az vmss list

It returns a virtualMachineProfile.storageProfile with a managed disk:

 "managedDisk": {
       "storageAccountType": "Premium_LRS"
  },

When I setup the VM with same SKU I told it to use Ephemeral instead of Premium and it was 4X faster.

Do you happen to know how we can convert our AKS cluster to attach an Ephemeral Disk? (in this case the Temporary disk) The Standard_F32s_v2 is 250GB but I am only seeing 100GB.

@edernucci
Copy link

@edernucci How can I make sure my pod doesn't have that limit, when doing:

az vmss list

It returns a virtualMachineProfile.storageProfile with a managed disk:

 "managedDisk": {
       "storageAccountType": "Premium_LRS"
  },

When I setup the VM with same SKU I told it to use Ephemeral instead of Premium and it was 4X faster.

Do you happen to know how we can convert our AKS cluster to attach an Ephemeral Disk? (in this case the Temporary disk) The Standard_F32s_v2 is 250GB but I am only seeing 100GB.

Hi @mohamedmansour. The disk IOPS is size based. You can verify sizes and performance here

Try changing the disk size to another tier to see if you can get more throughput.

@mohamedmansour
Copy link

This is unfortunately a regression. I have filed a bug here Azure/aks-engine#3227

We are using the exact same SKU from VM and AKS, our pipeline requires fast disks and previously we took advantage of the Temporary Disks which are pretty quick around 400MB/s. When trying out and migrating to AKS, the temporary disk is missing.

I added a P30 Premium SSD which gives us 1TB at 200MBS and we are just getting 70MB/s, we are not running anything, just testing the provisioned hardware.

@jnoller
Copy link
Contributor

jnoller commented May 11, 2020

@mohamedmansour the temporary ephemeral disk on aks nodes is mounted in /mnt by default

- Filesystem      Size  Used Avail Use% Mounted on
udev            3.9G     0  3.9G   0% /dev
tmpfs           796M   33M  763M   5% /run
/dev/sda1        29G  2.7G   27G  10% /
tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/sda15      105M  3.6M  101M   4% /boot/efi
/dev/sdb1        16G   44M   15G   1% /mnt

AKS itself does not support the mutation or remounting the disks of the underlying Virtual machine (the VMs are managed by AKS) and ephemeral OS disks are not yet supported (shipping this year).

In order to resize the worker nodes within AKS you can use ARM to change the size of the OS disk to the 2tb maximum (however, that also has a hard IOPS quota limit).

The size of the OS disk specifically will constrain all disk IO - AKS and most other deployments use a 30 or 120 gb OS disk size with low IOPS allotment leading to high disk IO latency that would also impact using the ephemeral disk in /mnt - additional information is in #1373

Ephemeral OS disk support is a partial remedy to this issue - using ephemeral for the operating system without doing the isolation of the container and other unbounded (logging, metrics) IO from the same IO path of the kernel and OS will result in the IOPS saturation of the VM.

@mohamedmansour
Copy link

@jnoller It doesn't seem like a temporary disk thou,

  • Bare VM (400 MB/s)
root@test-vm:/mnt# dd if=/dev/zero of=/mnt/output conv=fdatasync bs=384k count=1k; rm -f /mnt/outpu
1024+0 records in
1024+0 records out
402653184 bytes (403 MB, 384 MiB) copied, 1.08742 s, 370 MB/s
  • AKS (70 MB/s)
root@devops-agent-98c94bd77-kz7cl:/mnt# dd if=/dev/zero of=/mnt/output conv=fdatasync bs=384k count=1k; rm -f /mnt/outpu
t
1024+0 records in
1024+0 records out
402653184 bytes (403 MB, 384 MiB) copied, 5.23303 s, 76.9 MB/s

Any ideas why this is happening?

@jnoller
Copy link
Contributor

jnoller commented May 11, 2020

@mohamedmansour refresh my comment, I was typing 🤣 - see issue #1373

@jnoller
Copy link
Contributor

jnoller commented May 11, 2020

@mohamedmansour Also check:

  • Worker node IO scheduler - may be [none], try setting it to mq-deadline
  • Use ARM or the API to shut off write caching on the OS and all data disks (it is on by default)

@rufusrising
Copy link

@jnoller It doesn't seem like a temporary disk thou,

  • Bare VM (400 MB/s)
root@test-vm:/mnt# dd if=/dev/zero of=/mnt/output conv=fdatasync bs=384k count=1k; rm -f /mnt/outpu
1024+0 records in
1024+0 records out
402653184 bytes (403 MB, 384 MiB) copied, 1.08742 s, 370 MB/s
  • AKS (70 MB/s)
root@devops-agent-98c94bd77-kz7cl:/mnt# dd if=/dev/zero of=/mnt/output conv=fdatasync bs=384k count=1k; rm -f /mnt/outpu
t
1024+0 records in
1024+0 records out
402653184 bytes (403 MB, 384 MiB) copied, 5.23303 s, 76.9 MB/s

Any ideas why this is happening?

The basic difference in your testing is 1) Bare VM - you access the VM directly 2) AKS POD - you have not accessed the VM or VMSS directly but tested using a POD. If you login directly to the node (kubectl get nodes - o wide, pick one ip and login) you should be able to get similar performance to your Bare VM

@4c74356b41
Copy link

any ETA (apart from this year) on this, please?

@palma21
Copy link
Member

palma21 commented Jul 16, 2020

August 17th release. Thanks for your patience.

@palma21
Copy link
Member

palma21 commented Aug 10, 2020

We were able to wrap it up for preview a bit sooner and since a lot of folks are keen to try it, we've released it :)
https://github.com/Azure/AKS/releases/tag/2020-08-03

Give it a shot: https://aka.ms/aks/ephemeral-os

@4c74356b41
Copy link

okay, its a bit sooner, but how to really use it for anything? not a single automation solution support sending custom headers to enable this feature. I have numerous things configured on the cluster. I cant just repoint my pipelines to a random cluster, the application just wont work -..-. is this supposed to GA in 2 weeks time or when can I expect this to start working properly though REST API so automation solutions would be able to actually create such clusters?

@alexeldeib
Copy link
Contributor

not a single automation solution

Could you be more specific? Are you using ARM templates, Terraform? We're aware of difficulties with custom headers and ARM templates, but I can't share a better answer yet.

when can I expect this to start working properly though REST API

There isn't an associated API change currently for ephemeral OS. At GA we would remove the headers/preview feature but the behavior would be identical with no added fields. We would enable it when possible by default, based on requested OS disk size and VM SKU.

@palma21
Copy link
Member

palma21 commented Aug 10, 2020

We'll be making a few additional options so you're able to try these previews that don't land on the API via ARM too. Stay tuned on the next few weeks.

GA behavior will just use this as Ace mentioned. We're currently still defining the GA ETA based on the feedback from preview.

Thanks for your patience.

@tombuildsstuff
Copy link

@palma21 presumably that'll be a -preview API to match the convention set by the ARM API's?

@palma21
Copy link
Member

palma21 commented Aug 11, 2020

These properties (with very rare exceptions) will not be part of the API, preview or sstable, that's why this issue exists in the first place. We just want to give the opportunity for folks to flight them before they change in the background but they are not going to be provided as options/knobs.

Eg. AKS Ubuntu 18.04 as base image or containerd as runtime
We'll create a better way for non CLI users to test these ^

For this particular property we're discussing based on the current signal if we should consider it as a knob or just a default.

@4c74356b41
Copy link

4c74356b41 commented Aug 12, 2020

well, guess what, its impossible to test this in this current implementation. same with 18.04 and containerd. I'm not against setting ephemeral OS Disk as default, but I dont really see a rationale behind not letting people use an API to tweak this. All the other API's use preview api versions. Please, name me a single automation solution that supports sending custom headers to the ARM api's. And no, Az cli is not an automation solution. its a thing to accomplish one off tasks.

@palma21
Copy link
Member

palma21 commented Aug 12, 2020

we will, we are working on it since your first comment.
#1370 (comment)

For some context these features were designed to be used effectively as "one offs" just to validate, which is why the CLI header was considered as they'd never be part of the API. But we understand that might not be suiting to everyone, so we'll provide an option that allow other deployment mechanisms to leverage this feature on test environments.

@4c74356b41
Copy link

well, I validated, sample cluster gets created ;) now I'd like to test my cluster and my application somehow :) just to clarify, I can convert existing clusters just by recreating the node pools? but I wont be able to update existing pools, so I'd need to recreate them?

@tombuildsstuff
Copy link

@palma21 for what it's worth, we're seeing a large number of issues due to AKS not following the standard -preview pattern, particularly when breaking behavioural changes are rolled out to an existing API version (today's example) rather than in a new API version.

If the intention is to roll these out to users on an opt-in basis by default, isn't the recommended way to do that using the features functionality for a Resource Provider?

@alexeldeib
Copy link
Contributor

@tombuildsstuff I get the ask behind hashicorp/terraform-provider-azurerm#6793 and that's something we're looking at as mentioned. The issue you linked directly is a break in TF provider because you changed API versions (comment).

@4c74356b41
Copy link

hey, sorry for bugging, any updates? @alexeldeib :)

@alexeldeib
Copy link
Contributor

We added it to the 2020-09-01 API based on feedback and the defaulting behavior being a bit opaque: Azure/azure-rest-api-specs#10598

It takes a bit of time for this to propagate everywhere (RP, SDK, CLI), we're working towards ~Ignite for the bunch. Appreciate your feedback + patience

@alexeldeib
Copy link
Contributor

alexeldeib commented Aug 28, 2020

just to clarify, I can convert existing clusters just by recreating the node pools? but I wont be able to update existing pools, so I'd need to recreate them?

Correct -- not being able to change to/from ephemeral OS is unfortunately a limitation on the whole scale set, rather than per-instance. It's a compute limitation unrelated to AKS. In existing clusters you can add new pools with this feature.

@4c74356b41
Copy link

okay, I was only worried about having to recreate whole AKS for that, recreating node pools is fine.

@alexeldeib
Copy link
Contributor

@alexeldeib
Copy link
Contributor

The switch to use proper API types is about done, the 2020-09-01 API is available and the updates are included in the 0.4.63 aks-preview CLI. Probably pending some doc updates and will close this out.

@alexeldeib
Copy link
Contributor

docs are updated: https://docs.microsoft.com/en-us/azure/aks/cluster-configuration#ephemeral-os

feel free to ping if you have any issues, but I'll close this.

@ghost ghost locked as resolved and limited conversation to collaborators Jan 1, 2021
@aritraghosh aritraghosh moved this to Archive (GA older than 1 month) in Azure Kubernetes Service Roadmap (Public) Jul 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature-request Requested Features
Projects
Development

No branches or pull requests

10 participants