Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AzOps - Discovery Performance Issues #438

Closed
reckitt-maciejglowacki opened this issue Oct 6, 2021 · 24 comments · Fixed by #748
Closed

AzOps - Discovery Performance Issues #438

reckitt-maciejglowacki opened this issue Oct 6, 2021 · 24 comments · Fixed by #748
Assignees
Labels
area/powershell enhancement New feature or request long-term Long term item - used for automation

Comments

@reckitt-maciejglowacki
Copy link

AzOps - Pull pipeline of AzOps Accelerator run in Azure DevOps fails to grab information about all of the subscriptions and times out.

SPN has been given privileges over root management group with about 250 subscriptions. The build times out after 4 hours That's what I've set in the pipeline itself. I'm actually not sure if it does anything for that long (or just hangs midway) because log file is too large to browse it effectively.

Here's a screenshot:
2021-10-06_15h40_06

Has anyone experienced anything like that? Is this tool designed to handle so many subscriptions? Or maybe is it some problem with DevOps pool? Any help will be much appreciated.

@daltondhcp
Copy link
Contributor

Hey @reckitt-maciejglowacki ,
We have customers with over 1000 subscription running this, so it should definitively work.
Can you share how the below settings are configured in the settings.json file?
image

@daltondhcp daltondhcp added the waiting-for-response Maintainers have replied and are awaiting a response from the bug/issue/feature creator label Oct 8, 2021
@reckitt-maciejglowacki
Copy link
Author

I'm using defaults from https://github.com/Azure/AzOps-Accelerator/blob/main/settings.json

2021-10-11_09h50_19

The only thing that I have changed is timeoutInMinutes in the pipeline itself.

@daltondhcp
Copy link
Contributor

Thank you. For troubleshooting purposes, could you please try and change the Core.SkipResourceGroup setting to true and report back the results?

@reckitt-maciejglowacki
Copy link
Author

That certainly helped :) The pipeline now runs just about 2 hours but it still fails due to #439

Are there any disadvantages to skipping rg discovery?

@daltondhcp
Copy link
Contributor

You would only want RG discovery if you intend to do RG level deployments (like VMs or other resources) with AzOps, which I assume is not the intent here?

@reckitt-maciejglowacki
Copy link
Author

It's not but we do want to be able to differentiate policy and role assignments between different resource groups.

@daltondhcp
Copy link
Contributor

Are you going to manage that from a central platform perspective via AzOps or let the individual LZ teams do it?

@reckitt-maciejglowacki
Copy link
Author

We're doing it centrally I'm afraid

@daltondhcp
Copy link
Contributor

Understood. Can you try to change back the setting to discover RGs and change the pipeline timeout in ADO to 6 hrs and see if it completes successfully?

@reckitt-maciejglowacki
Copy link
Author

Okay. I'll do that today and let you know the results.

@reckitt-maciejglowacki
Copy link
Author

Same :(

2021-10-19_08h58_59

@daltondhcp daltondhcp added area/powershell enhancement New feature or request and removed waiting-for-response Maintainers have replied and are awaiting a response from the bug/issue/feature creator labels Oct 21, 2021
@daltondhcp
Copy link
Contributor

Thank you for confirming this. We will take a look at this and see what we can do. The advise would be to disable resource group discovery for now.

@reckitt-maciejglowacki
Copy link
Author

Hi @daltondhcp Just wanted to check have you managed to look into this issue? Thanks

@reckitt-maciejglowacki
Copy link
Author

@daltondhcp bump

@daltondhcp
Copy link
Contributor

Hey @reckitt-maciejglowacki - we are currently working on this, unfortunately no short term fix. I will make sure to keep the progress updated in this issue.

@reckitt-maciejglowacki
Copy link
Author

Got it. Thanks for the info.

@reckitt-maciejglowacki
Copy link
Author

Hi, any update on this?

@jtracey93
Copy link
Contributor

Hey @reckitt-maciejglowacki,

Unfortunately we are still investigating this but as a workaround for now you could use Self-hosted agents that have an unlimited run time as per: https://docs.microsoft.com/en-us/azure/devops/pipelines/process/phases?view=azure-devops&tabs=yaml#timeouts

Guidance on creating on self-hosted agents can be found here: https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/agents?view=azure-devops&tabs=browser#install

Hope that helps move you forward in the near term 👍

@daltondhcp daltondhcp added the long-term Long term item - used for automation label Jan 13, 2022
@Jefajers Jefajers added this to the Release - v2.0.0 milestone Apr 27, 2022
@SomilGanguly SomilGanguly self-assigned this Apr 28, 2022
@daltondhcp daltondhcp changed the title AzOps - Pull times out AzOps - Discovery Performance Issues May 24, 2022
@daltondhcp daltondhcp mentioned this issue Feb 17, 2023
5 tasks
@daltondhcp daltondhcp linked a pull request Feb 17, 2023 that will close this issue
5 tasks
@reckitt-maciejglowacki
Copy link
Author

Hi. Just wanted to let you know that this update definitely hasn't fixed anything. Quite the opposite.

I'm getting various random errors when trying to execute this in an ADO pipeline. Even when it does run uninterupted (which seems completely random) it times out after an hour.

image

image

image

@Jefajers
Copy link
Member

Hi @reckitt-maciejglowacki, thanks for updating this issue and sharing.

I agree with your experience in regards to a bunch of different errors ultimately causing pipeline executions to fail.

We started seeing this as well once we release 2.0.0 into the wild and determined that majority of the different errors are due to the expanded usage of processing in parallel. When combined with an execution machine containing a "high" throttle limit and "low" amount of cores the errors starts to show a lot.

Our response to this was to implement logic in the module to detect these misalignments and override the throttle limit when detected. In addition to that we created a wiki for performance considerations.

Since release 2.0.2 improvements are included in AzOps module intended to resolve the behavior.

Could you confirm if you still have these issues on the latest release? (if yes, then lets re-open the issue).

@reckitt-maciejglowacki
Copy link
Author

Thank you @Jefajers Latest update does seem to work. I haven't tried it on a resource level yet but it runs well for subscriptions and resource groups.

@reckitt-maciejglowacki
Copy link
Author

Turns out my enthusiasm was premature..

image

@daltondhcp
Copy link
Contributor

Can you share the details of the errors? Same as before or something else?

@reckitt-maciejglowacki
Copy link
Author

The same I think:

image
image
image
image

Those seem to appear above certain number of objects but I haven't drilled it down yet.

We're using AZOPS_MODULE_VERSION 2.1.2 and pretty much default settings.json from AzOps-Accelerator project with "Core.SkipResourceGroup": false

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/powershell enhancement New feature or request long-term Long term item - used for automation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants