Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate high extension install error rate #190233

Closed
isidorn opened this issue Aug 11, 2023 · 13 comments
Closed

Investigate high extension install error rate #190233

isidorn opened this issue Aug 11, 2023 · 13 comments
Assignees
Labels
debt Code quality issues extensions Issues concerning extensions
Milestone

Comments

@isidorn
Copy link
Contributor

isidorn commented Aug 11, 2023

@dtivel discovered that we are having a high error rate of extension install.

Here are some kusto queries with interesting results:

RawEventsVSCode
| where ServerUploadTimestamp < ago(1d)
| where EventName == 'monacoworkbench/extensiongallery:install'
| summarize total_count = count(), failure_count = countif(Measures["success"] == 0.0) by bin(ServerUploadTimestamp, 1d)
| extend failure_percentage = todouble(failure_count) / total_count * 100
| project ServerUploadTimestamp, failure_percentage
| render timechart

Screenshot 2023-08-11 at 10 47 34

And here is the ErrorCodeDetail

RawEventsVSCode
| where ServerUploadTimestamp > ago(60d)
| where EventName == 'monacoworkbench/extensiongallery:install'
| where Measures["success"] == 0.0
| summarize count() by tostring(Properties["errorcode"])
| order by count_ desc

Properties_errorcode count_
Internal 10,542,542
Download 5,688,506
CorruptZip 3,909,848
Incompatible 2,331,489
Extract 780,667
IncompatiblePreRelease 453,860
Signature 227,637
incompatible 120,565
EPERM 88,294
EACCES 74,313
unknown 67,296
Invalid 51,401
Delete 28,992
downloading 27,435
ReleaseVersionNotFound 20,460
  8,735
gallery

@sandy081 why do we have such a high failure rate? What is the Internal error code?
It looks like there is a high error rate on work days when there is a higher load on MP service so it also might be influenced by the service.

I can not go back more in time than 10 months to check if we always had these error rates.
Error rates on extension update are considerably lower.

fyi @joaomoreno

@isidorn isidorn added the extensions Issues concerning extensions label Aug 11, 2023
@isidorn isidorn changed the title High extension error rate High extension install error rate Aug 11, 2023
@sandy081
Copy link
Member

Internal errors are uncaught errors and hence cannot be classified. They stood the most (50%). I will try to see if there are places in the code where we can catch more error and classify them. The remaining are classified and can be explained by the error code name. Download errors could be network or service related and corrupt zips are service related.

@isidorn
Copy link
Contributor Author

isidorn commented Sep 18, 2023

Thanks for looking into this. Being able to further classify the internal errors would greatly help here.

@joaomoreno joaomoreno added this to the October 2023 milestone Oct 4, 2023
@sandy081 sandy081 added the debt Code quality issues label Oct 23, 2023
@sandy081 sandy081 modified the milestones: October 2023, November 2023 Oct 23, 2023
@sandy081 sandy081 modified the milestones: November 2023, Backlog Nov 27, 2023
@juancampa
Copy link

juancampa commented Nov 29, 2023

I just hit this. It doesn't fix itself even after a restart.

A corrupted file is stored in CachedExtensionVSIXs and never re-downloaded (I guess there's no checksum being checked anywhere).
The cached vsix, instead of being a zip, it's an HTML with the following content:

Screenshot 2023-11-29 at 6 36 45 PM

Located in /Users/<username>/Library/Application Support/Code/CachedExtensionVSIXs/membrane.membrane-1.6.1-darwin-arm64.

An easy bandaid fix might be to delete the file once the CorruptZip error is thrown so that it gets redownloaded.

Update

Deleting the corrupted file indeed fixes the issue.

@sandy081
Copy link
Member

An easy bandaid fix might be to delete the file once the CorruptZip error is thrown so that it gets redownloaded

It should be doing this already. Can you please confirm if this is not happening? Please share the log - F1 > Open View... > Shared

@juancampa
Copy link

@sandy081 Perhaps is broken in MacOS?

Here are the logs. Notice the timestamp difference after the CorruptZip error, that was the install after I manually deleted the corrupt file.

2023-11-29 17:44:16.548 [info] Deleted uninstalled extension from disk github.copilot /Users/juan/.vscode/extensions/github.copilot-1.138.0
2023-11-29 17:44:16.549 [info] Deleted uninstalled extension from disk rust-lang.rust-analyzer /Users/juan/.vscode/extensions/rust-lang.rust-analyzer-0.3.1740-darwin-arm64
2023-11-29 17:44:16.551 [info] Deleted uninstalled extension from disk timonwong.shellcheck /Users/juan/.vscode/extensions/timonwong.shellcheck-0.34.0-darwin-arm64
2023-11-29 17:44:16.552 [info] Deleted uninstalled extension from disk ms-vscode-remote.remote-ssh /Users/juan/.vscode/extensions/ms-vscode-remote.remote-ssh-0.107.0
2023-11-29 17:44:16.552 [info] Deleted uninstalled extension from disk llvm-vs-code-extensions.vscode-clangd /Users/juan/.vscode/extensions/llvm-vs-code-extensions.vscode-clangd-0.1.24
2023-11-29 17:44:29.300 [info] Getting Manifest... membrane.membrane
2023-11-29 17:44:29.303 [info] Installing extension: membrane.membrane
2023-11-29 17:44:29.900 [error] Error while installing the extension: membrane.membrane
2023-11-29 17:44:29.903 [error] Failed to install extension. membrane.membrane
2023-11-29 17:44:29.903 [error] CorruptZip: end of central directory record signature not found
    at d.download (/Applications/Visual Studio Code.app/Contents/Resources/app/out/vs/code/node/sharedProcess/sharedProcessMain.js:109:141)
    at async ee.x (/Applications/Visual Studio Code.app/Contents/Resources/app/out/vs/code/node/sharedProcess/sharedProcessMain.js:109:19904)
    at async ee.h (/Applications/Visual Studio Code.app/Contents/Resources/app/out/vs/code/node/sharedProcess/sharedProcessMain.js:109:17947)
2023-11-29 17:44:29.903 [error] [uncaught exception in sharedProcess]: end of central directory record signature not found: CorruptZip: end of central directory record signature not found
    at d.download (/Applications/Visual Studio Code.app/Contents/Resources/app/out/vs/code/node/sharedProcess/sharedProcessMain.js:109:141)
    at async ee.x (/Applications/Visual Studio Code.app/Contents/Resources/app/out/vs/code/node/sharedProcess/sharedProcessMain.js:109:19904)
    at async ee.h (/Applications/Visual Studio Code.app/Contents/Resources/app/out/vs/code/node/sharedProcess/sharedProcessMain.js:109:17947)
2023-11-29 18:40:24.887 [info] Getting Manifest... membrane.membrane
2023-11-29 18:40:24.893 [info] Installing extension: membrane.membrane
2023-11-29 18:40:27.007 [info] Extension signature is verified: membrane.membrane
2023-11-29 18:40:27.172 [info] Extracted extension to file:///Users/juan/.vscode/extensions/membrane.membrane-1.6.1-darwin-arm64: membrane.membrane
2023-11-29 18:40:27.182 [info] Renamed to /Users/juan/.vscode/extensions/membrane.membrane-1.6.1-darwin-arm64
2023-11-29 18:40:27.183 [info] Extracting extension completed. membrane.membrane
2023-11-29 18:40:27.193 [info] Extension installed successfully: membrane.membrane

About Visual Studio Code:

Version: 1.84.2
Commit: 1a5daa3a0231a0fbba4f14db7ec463cf99d7768e
Date: 2023-11-09T10:52:57.054Z
Electron: 25.9.2
ElectronBuildId: 24603566
Chromium: 114.0.5735.289
Node.js: 18.15.0
V8: 11.4.183.29-electron.0
OS: Darwin arm64 23.1.0

@sandy081
Copy link
Member

@juancampa You are right, there is a bug in cleaning up the downloaded VSIX where it is a corrupt zip. Can you pleas file a separate issue?

@plattenschieber
Copy link

Another thing we found out today is that sometimes the vsix files seem to be packed into another zip container. When downloading the extensions via curl <extension-url> -o <extension-filename> and trying to install via code --install-extension we get the same error as in the Windows Output of VSCode:

2023-11-30 14:24:58.281 [error] end of central directory record signature not found: CorruptZip: end of central directory record signature not found

Whereas when downloading the extension via curl <extension-url> --compressed -o <extension-filename> results directly in the correct vsix file, without another zip layer around it.

A colleague and I tried debugging the fetch call in the marketplace library and coming up with a minimal example to ensure zipped vsix files can be handled correctly, but are not there yet.

@isidorn
Copy link
Contributor Author

isidorn commented Dec 8, 2023

@sandy081 could we investigate a bit more into this during our Debt / Cleanup milestones.

I would like us to get more precise reporting of extension install error rates since it is important to have precise numbers when we enable repository and publisher signing.

fyi @dtivel

@joaomoreno joaomoreno changed the title High extension install error rate Investigate high extension install error rate Dec 15, 2023
@joaomoreno joaomoreno modified the milestones: Backlog, December / January 2024 Dec 15, 2023
@sandy081
Copy link
Member

Added specific error codes to understand internal errors. Also fixed a bug that is reporting error twice.

@sandy081 sandy081 modified the milestones: December / January 2024, Backlog Candidates, February 2024 Jan 18, 2024
@starball5
Copy link

Note sure how many of these are related, but I've seen a noticeable number of extension installation failure questions on SO recently:

@sandy081
Copy link
Member

Here are the error code results from 1.90.0.

Outcome: Able to resolve the reason for internal errors. Seems majority errors are happening while updating metadata. Added additional telemetry data to further understand about metadata update errors.

Image

@sandy081 sandy081 modified the milestones: June 2024, July 2024 Jun 24, 2024
@sandy081 sandy081 modified the milestones: July 2024, August 2024 Jul 22, 2024
@sandy081
Copy link
Member

sandy081 commented Aug 19, 2024

Capturing the latest charts.

Image

Error percentages are coming down gradually. It is averaging around 10% in last 3 weeks.

Image

I am able to resolve/fix internal & update metadata errors almost completely. The next bigger percentage of errors falls under Extract category.

@sandy081 sandy081 modified the milestones: August 2024, September 2024 Aug 19, 2024
@sandy081 sandy081 modified the milestones: September 2024, October 2024 Sep 23, 2024
@sandy081 sandy081 modified the milestones: October 2024, November 2024 Oct 21, 2024
@sandy081
Copy link
Member

Closing this. Lets create new issues for new findings

@vs-code-engineering vs-code-engineering bot locked and limited conversation to collaborators Jan 13, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
debt Code quality issues extensions Issues concerning extensions
Projects
None yet
Development

No branches or pull requests

7 participants