Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Linux 3 VMs not provisioning correctly #4307

Open
1 of 3 tasks
ilyas1974 opened this issue Oct 18, 2024 · 5 comments
Open
1 of 3 tasks

Azure Linux 3 VMs not provisioning correctly #4307

ilyas1974 opened this issue Oct 18, 2024 · 5 comments
Assignees
Labels
Ops - Service Maintenance Used to track issues related to maintaining the services .NET Eng Supports

Comments

@ilyas1974
Copy link
Contributor

ilyas1974 commented Oct 18, 2024

As reported in teams, our customers are having issues standing up workloads in the Azure Linux 3 queue.

Error they are seeing

Console log: 'Microsoft.Extensions.Logging.Generators.Roslyn3.11.Tests' from job 67363cb9-a077-426f-b406-8ef5c7ec2ad6 (azurelinux.3.amd64.open.rt) using docker image mcr.microsoft.com/dotnet-buildtools/prereqs:centos-stream8-helix on a00002Krunning $HELIX_CORRELATION_PAYLOAD/scripts/57b07292d7f64d4591ad8e370868206b/execute.sh in /mnt/work/A5D80980/w/A67909B4/e max 900 seconds

Output: Unable to clean up running docker containers
Exit Code:-6

Release Note Category

  • Feature changes/additions
  • Bug fixes
  • Internal Infrastructure Improvements

Release Note Description

restarted generating the azurelinux.3.amd64 image. this will affect all azurelinux.3.amd64* queues. changes address deployment problems (due to a missing Python package) and adds full Docker support.

@ilyas1974 ilyas1974 added the Ops - Service Maintenance Used to track issues related to maintaining the services .NET Eng Supports label Oct 18, 2024
@garath garath self-assigned this Nov 18, 2024
@garath garath removed their assignment Nov 26, 2024
@dougbu dougbu self-assigned this Nov 27, 2024
@dougbu
Copy link
Member

dougbu commented Feb 21, 2025

fixed in !46122 but reverted in !47331 and reintroduced in !47334

@richlander
Copy link
Member

Thanks! I'm looking forward to using these VMs.

@richlander
Copy link
Member

Is there any testing we should do ahead of time with real pipelines?

@dougbu
Copy link
Member

dougbu commented Feb 22, 2025

Is there any testing we should do ahead of time with real pipelines?

dotnet-helix-machines now tests simple Assert.True(true) xUnit scenarios using the Helix SDK. that includes a number of Docker containers w/ the azurelinux.3.amd64 host. I also plan to run somewhat more complicated arcade-validation scenarios on our staging environment today

going further w/ your real tests may help us get started on additional work we need to do (if any). from my perspective, starting early won't make a huge difference but it's up to you

let's chat offline if you want to test during the few days b/4 our production rollout. using our staging environment requires a different token than most pipelines have available

@richlander
Copy link
Member

We've had to roll this deployment back twice (and last time broke dotnet/runtime CI all-up). That points to needing a different strategy. Certainly, if it fails a third time, that suggests we're not adapting to signal.

This is very similar to dotnet/dotnet-buildtools-prereqs-docker#1369 (comment). If we merge that PR w/o a test run, then we are at the mercy of the devops gods. We've been successful doing test runs on new prereqs images by pushing them to ACR and creating a PR in dotnet/runtime that consumed those images. I don't have enough context to propose what an analog of that would be for our VMs queues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ops - Service Maintenance Used to track issues related to maintaining the services .NET Eng Supports
Projects
None yet
Development

No branches or pull requests

4 participants