Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scanning large environments exit with: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full. #249

Closed
cmendible opened this issue Aug 20, 2024 · 15 comments
Assignees
Labels
bug Something isn't working

Comments

@cmendible
Copy link
Member

Expected Behavior

Scanning large environments should work without issues

Actual Behavior

Scanning large environments exit with the following exception: FTL Failed to get diagnostic settings error="Post "[https://management.azure.com/batch?api-version=2020-06-01\](https://management.azure.com/batch?api-version=2020-06-01%5C)": dial tcp [2603:1030:a0c::10]:443: bind: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full."

@cmendible
Copy link
Member Author

As mentioned in #248:

For large environments, azqr is attempting to many concurrent requests with diagnostics settings queries and failing.

If you hot this issue please use the -s flag to set a subscription Id and if needed the -g flag to specify a resource group, in order to reduce the number of scanned services.

cmendible added a commit that referenced this issue Aug 20, 2024
@cmendible
Copy link
Member Author

@red-erik can you check if preview version: v.2.0.0-preview.5 works for you.

Thanks!

@red-erik
Copy link

Hello,
scanning the whole env I receive this:

fatal error: all goroutines are asleep - deadlock!

goroutine 1 [chan send, 2 minutes]:
github.com/Azure/azqr/internal/scanners.(*DiagnosticSettingsScanner).ListResourcesWithDiagnosticSettings(0xc0000088b8, {0xc03271c000, 0x42b93, 0x48400})
D:/a/azqr/azqr/internal/scanners/diagnostics_settings.go:57 +0x111
github.com/Azure/azqr/internal/scanners.(*DiagnosticSettingsScanner).Scan(0xc03254e000?, {0xc03271c000?, 0xc00014e050?, 0x22b1c00?})
D:/a/azqr/azqr/internal/scanners/diagnostics_settings.go:158 +0x18
github.com/Azure/azqr/internal.Scanner.Scan({}, 0xc02f77d808)
D:/a/azqr/azqr/internal/scanner.go:142 +0x11fd
github.com/Azure/azqr/cmd/azqr.scan(0x2e40920, {0xc000144008, 0x40, 0x40})
D:/a/azqr/azqr/cmd/azqr/scan.go:76 +0x525
github.com/Azure/azqr/cmd/azqr.init.func55(0x2e40920, {0xc000110140?, 0x4?, 0x2039c96?})
D:/a/azqr/azqr/cmd/azqr/scan.go:39 +0x2b
github.com/spf13/cobra.(*Command).execute(0x2e40920, {0xc000110120, 0x2, 0x2})
C:/Users/runneradmin/go/pkg/mod/github.com/spf13/[email protected]/command.go:989 +0xab1
github.com/spf13/cobra.(*Command).ExecuteC(0x2e3f7e0)
C:/Users/runneradmin/go/pkg/mod/github.com/spf13/[email protected]/command.go:1117 +0x3ff
github.com/spf13/cobra.(*Command).Execute(...)
C:/Users/runneradmin/go/pkg/mod/github.com/spf13/[email protected]/command.go:1041
github.com/Azure/azqr/cmd/azqr.Execute()
D:/a/azqr/azqr/cmd/azqr/root.go:36 +0x428
main.main()
D:/a/azqr/azqr/cmd/main.go:11 +0xf

    Will check with a single sub

Regards,
Red.

@cmendible
Copy link
Member Author

Yo can also try running with the flag --azqr=false this will just scan using APRL rules and should run without issues.

@red-erik
Copy link

red-erik commented Aug 21, 2024

With a single sub I have this:

2024-08-21T09:13:13+02:00 INF Scanning subscriptions for Resource Count per Subscription and Type
2024-08-21T09:13:13+02:00 INF Generating Report: azqr_action_plan_2024_08_21_T091134.xlsx
2024-08-21T09:13:13+02:00 INF Skipping ImpactedResources. No data to render
2024-08-21T09:13:13+02:00 INF Skipping ResourceTypes. No data to render
2024-08-21T09:13:13+02:00 INF Skipping Services. No data to render
2024-08-21T09:13:13+02:00 INF Skipping Advisor. No data to render
2024-08-21T09:13:13+02:00 INF Skipping Defender. No data to render
2024-08-21T09:13:13+02:00 INF Skipping Costs. No data to render
2024-08-21T09:13:14+02:00 INF Scan completed.

Only 1 excel file produced, no pbit, no subfolder,etc.etc.

Checking a little bigger one

@red-erik
Copy link

fatal error: all goroutines are asleep - deadlock!

goroutine 1 [chan send]:
github.com/Azure/azqr/internal/scanners.(*DiagnosticSettingsScanner).ListResourcesWithDiagnosticSettings(0xc0000083f0, {0xc0043b4000, 0x2225, 0x2c00})
D:/a/azqr/azqr/internal/scanners/diagnostics_settings.go:57 +0x111
github.com/Azure/azqr/internal/scanners.(*DiagnosticSettingsScanner).Scan(0xc00368a000?, {0xc0043b4000?, 0xc0000920a0?, 0x22b1c00?})
D:/a/azqr/azqr/internal/scanners/diagnostics_settings.go:158 +0x18
github.com/Azure/azqr/internal.Scanner.Scan({}, 0xc000053808)
D:/a/azqr/azqr/internal/scanner.go:142 +0x11fd
github.com/Azure/azqr/cmd/azqr.scan(0x2e40920, {0xc0000ae008, 0x40, 0x40})
D:/a/azqr/azqr/cmd/azqr/scan.go:76 +0x525
github.com/Azure/azqr/cmd/azqr.init.func55(0x2e40920, {0xc0000581c0?, 0x4?, 0x2039c96?})
D:/a/azqr/azqr/cmd/azqr/scan.go:39 +0x2b
github.com/spf13/cobra.(*Command).execute(0x2e40920, {0xc000058180, 0x4, 0x4})
C:/Users/runneradmin/go/pkg/mod/github.com/spf13/[email protected]/command.go:989 +0xab1
github.com/spf13/cobra.(*Command).ExecuteC(0x2e3f7e0)
C:/Users/runneradmin/go/pkg/mod/github.com/spf13/[email protected]/command.go:1117 +0x3ff
github.com/spf13/cobra.(*Command).Execute(...)
C:/Users/runneradmin/go/pkg/mod/github.com/spf13/[email protected]/command.go:1041
github.com/Azure/azqr/cmd/azqr.Execute()
D:/a/azqr/azqr/cmd/azqr/root.go:36 +0x428
main.main()
D:/a/azqr/azqr/cmd/main.go:11 +0xf

@red-erik
Copy link

--azqr=false seems to be working

cmendible added a commit that referenced this issue Aug 21, 2024
@cmendible
Copy link
Member Author

@red-erik version: v.2.0.0-preview.6 really improves how azqr handles scans with high number of resources.

I ran some tests and scanning diagnostics settings for 260000 resources can take about 20 minutes. After that each subscription scan can take about 2 or 3 minutes more.

Please note that using a Managed Identity or a Service Principal, instead of Azure CLI improves performance due to token caching.

@cmendible cmendible self-assigned this Aug 21, 2024
@red-erik
Copy link

Hello,
running with ".\azqr.exe scan --mask=false -c=false -f" I receive this:

2024-08-22T11:03:34+02:00 INF Scanning subscriptions for Diagnostic Settings
2024-08-22T11:03:48+02:00 FTL Failed to get diagnostic settings error="POST https://management.azure.com/batch\n--------------------------------------------------------------------------------\nRESPONSE 429: 429 Too Many Requests\nERROR CODE: TenantRequestsThrottled\n--------------------------------------------------------------------------------\n{\n "error": {\n "code": "TenantRequestsThrottled",\n "message": "Number of 'read' requests for tenant actor 'xxxx-xxxxx-xxxxx-xxxxx-xxxxx' exceeded. Please try again after '5' seconds after additional tokens are available. Refer to https://aka.ms/arm-throttling for additional information."\n }\n}\n--------------------------------------------------------------------------------\n"

Regards,
Red.

@red-erik
Copy link

It only works with --azqr=false

cmendible added a commit that referenced this issue Aug 22, 2024
@cmendible
Copy link
Member Author

Hello, running with ".\azqr.exe scan --mask=false -c=false -f" I receive this:

2024-08-22T11:03:34+02:00 INF Scanning subscriptions for Diagnostic Settings 2024-08-22T11:03:48+02:00 FTL Failed to get diagnostic settings error="POST https://management.azure.com/batch\n--------------------------------------------------------------------------------\nRESPONSE 429: 429 Too Many Requests\nERROR CODE: TenantRequestsThrottled\n--------------------------------------------------------------------------------\n{\n "error": {\n "code": "TenantRequestsThrottled",\n "message": "Number of 'read' requests for tenant actor 'xxxx-xxxxx-xxxxx-xxxxx-xxxxx' exceeded. Please try again after '5' seconds after additional tokens are available. Refer to https://aka.ms/arm-throttling for additional information."\n }\n}\n--------------------------------------------------------------------------------\n"

Regards, Red.

I'll keep testing, I didn't get throttled yesterday while I runing some tests.

The reason disabling azqr works is because that disables diagnostic settings checks.

I'll release 2.0.0-preview.7 shortly.

@red-erik
Copy link

Preview7 seems to be working fine, with all required parameters (without disabling AZQR) .
Thanks.

Red.

@cmendible
Copy link
Member Author

That is great news! @red-erik thanks for your feedback! Out of curiosity how long did the scan take?

@red-erik
Copy link

started
2024-08-26T09:45:37+02:00 INF Scanning subscriptions for Microsoft.Automation/automationAccounts
now
2024-08-26T10:50:24+02:00 INF Generating Report: azqr_action_plan_2024_08_26_T094535.xlsx
and still waiting for the file. Should the pbit be generated too ?

@cmendible
Copy link
Member Author

cmendible commented Aug 26, 2024

The pbit file is no longer generated and populated automatically (it was a maintenance nightmare). But you can now run:

azqr pbi -p .

which will create the pbit file in the current folder. Then open it and select the xslx result from the scan as the source for the dashboard.

@cmendible cmendible added the bug Something isn't working label Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants