Skip to content

Latest commit

 

History

History
24 lines (17 loc) · 5.03 KB

NuGetPackageExplorerToCsv.md

File metadata and controls

24 lines (17 loc) · 5.03 KB

NuGetPackageExplorerToCsv

This driver runs select NuGet Package Explorer analyzers on NuGet packages and saves the results to CSV. This driver was added to investigate the extent of package reproducibility on NuGet.org.

CatalogScanDriverType enum value NuGetPackageExplorerToCsv
Driver implementation NuGetPackageExplorerToCsvDriver
Processing mode process latest catalog leaf per package ID and version
Cursor dependencies V3 package content: this driver needs the .nupkg from the package content resource
Components using driver output Kusto ingestion via KustoIngestionMessageProcessor, since this driver produces CSV data
Temporary storage config Table Storage:
CsvRecordTableName (name prefix): holds CSV records before they are added to a CSV blob
TaskStateTableName (name prefix): tracks completion of CSV blob aggregation
Persistent storage config Blob Storage:
NuGetPackageExplorerContainerName: contains CSVs for the NuGetPackageExplorers table
NuGetPackageExplorerFileContainerName: contains CSVs for the NuGetPackageExplorerFiles table
Output CSV tables NuGetPackageExplorerFiles
NuGetPackageExplorers

Algorithm

This driver downloads the whole package (.nupkg) to disk from the NuGet.org V3 package content resource. This is required for the NuGet Package Explorer APIs to work. It cannot operate on a generic, seekable package stream. After the download is complete, a NuGet Package Explorer ZipPackage is instantiated. This instance is passed to the NuGet Package Explorer SymbolValidator which performs much of the validation desired.

There are many failure modes for the NuGet Package Explorer validation, so the result CSV record can express that instead of just throwing exceptions and blocking the processing pipeline. For example, some package analysis fails with a timeout and for other packages it fails based on a poorly (or creatively) authored structure or metadata.

Package-level analysis is stored in the NuGetPackageExplorers table. More granular file-level analysis is stored in the NuGetPackageExplorerFiles table.

This workflow parses symbol data related to a package. This means that some Source Link information is available. This can be an additional source of package repository information in addition to the .nuspec <repository> element read by the PackageManifestToCsv driver.