Skip to content

Latest commit

 

History

History
22 lines (16 loc) · 3.31 KB

PackageIconToCsv.md

File metadata and controls

22 lines (16 loc) · 3.31 KB

PackageIconToCsv

This driver reads package icon files from NuGet.org and maps image metadata to CSV records.

CatalogScanDriverType enum value PackageIconToCsv
Driver implementation PackageIconToCsvDriver
Processing mode process latest catalog leaf per package ID and version
Cursor dependencies V3 package content: blocks on this cursor to align with other drivers
Components using driver output Kusto ingestion via KustoIngestionMessageProcessor, since this driver produces CSV data
Temporary storage config Table Storage:
CsvRecordTableName (name prefix): holds CSV records before they are added to a CSV blob
TaskStateTableName (name prefix): tracks completion of CSV blob aggregation
Persistent storage config Blob Storage:
PackageIconContainerName: contains CSVs for the PackageIcons table
Output CSV tables PackageIcons

Algorithm

For each catalog leaf passed to the driver, the package icon is downloaded from the NuGet.org V3 package content resource. A package may not have an icon. The icon stored in the package content resource is either a cached copy of the icon specified by the package author in the legacy iconUrl package metadata field or a copy of the embedded icon file.

Once the icon file is downloaded, an attempt is made to detect the format code based off of NuGet.org's own image format detection logic.

Then Magick.NET is used to perform rigorous analysis of the image file. Various properties are pulled from the Magick.NET results into a CSV record. Magick.NET is used because it can actually read various image format and determine things like image dimensions and Exif metadata.