Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for package instances #444

Closed
JonoYang opened this issue Jun 8, 2022 · 7 comments
Closed

Add support for package instances #444

JonoYang opened this issue Jun 8, 2022 · 7 comments
Milestone

Comments

@JonoYang
Copy link
Member

JonoYang commented Jun 8, 2022

In scancode-toolkit, we report each specific instance with a package_uid to differentiate multiple detection of the same package in a codebase. We should implement this idea in scancode.io when we discover packages in pipelines.

@tdruez
Copy link
Contributor

tdruez commented Jun 20, 2022

@JonoYang is the recent merged changes address this? Or is there more we can do?

@JonoYang
Copy link
Member Author

@tdruez The recent changes adds the package_uid field to DiscoveredPackage model, which is populated if we use the scan_package pipeline. However, we use the function scan_for_application_packages (https://github.com/nexB/scancode.io/blob/2d342fa1a1b4f06fea9be73298a00e627cbb6a46/scanpipe/pipes/scancode.py#L311) during the scan_for_application_packages step on the docker, rootfs, and scan_codebase pipelines. This function doesn't assign a package_uid for discovered application packages, which we would need to add to bring scancode.io application package scanning to be more in line with scancode-toolkit.

The package scanning function of scancode is now two steps, where we first scan the individual Resources for package data, and then afterwards we assemble the package data from multiple resources and assign package instances. We would need a way to do the package assembly step in scancode.io.

JonoYang added a commit that referenced this issue Jul 21, 2022
    * Update scan_for_application_packages to save detected Package data to the CodebaseResource it is from, then iterate through the CodebaseResources with Package data and use the proper Package handler to process the Package data
    * Create DiscoveredDependency model
    * Add package_data JSON field to CodebaseResource

Signed-off-by: Jono Yang <[email protected]>
@pombredanne pombredanne added this to the v32.0.0 milestone Jul 28, 2022
@JonoYang
Copy link
Member Author

JonoYang commented Aug 3, 2022

After talking to @pombredanne, it may be necessary to create a new model that for Package and Dependency instances. The current charts in the project detail view show the total number of Packages or Dependencies discovered in a Project. With the introduction of Package and Dependency instances, the Project chart stats are thrown off because we could have multiple copies of the same packages and dependencies.

  1. Using an instance for packages and deps as an intermediate model
    Pkg -> PkgInstance -> Res
    Pkg -> PkgInstance -> Dep -> DepInstance - >(and when we resolve -> Pkg or PkgInstance)

Here, we can show each unique Package and Dependency, then show the individual instances of the packages, and then show which resources correspond to which Package instance.

@pombredanne
Copy link
Member

Right now the latest thinking would be to treat the current DiscoveredPackage record as being a package instance. We would add a UID and we would display each copy of a package as it's own. Duplication in the UI and reports is something that is not exceptional but not the main case either and we can solve this in the UI in future.

@pombredanne
Copy link
Member

Should datafile_paths be an actual relationship to a CodebaseResource? ... IMHO yes and we are missing it https://github.com/nexB/scancode-toolkit/blob/bfa6d9632c9ef63250267c8ce3be2d5ddae1a9fc/src/packagedcode/models.py#L1224

JonoYang added a commit that referenced this issue Aug 9, 2022
Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Aug 10, 2022
Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Aug 10, 2022
    * This is so we are consistent with scancode-toolkit JSON output

Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Aug 10, 2022
    * This is so we are consistent with scancode-toolkit JSON output

Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Aug 11, 2022
    * This is so we are consistent with scancode-toolkit JSON output

Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Aug 11, 2022
    * This is so we are consistent with scancode-toolkit JSON output
    * Update expected test results

Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Aug 11, 2022
Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Aug 11, 2022
    * This is so we are consistent with scancode-toolkit JSON output
    * Update expected test results

Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Aug 12, 2022
Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Aug 12, 2022
    * This is so we are consistent with scancode-toolkit JSON output
    * Update expected test results

Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Aug 12, 2022
JonoYang added a commit that referenced this issue Aug 12, 2022
JonoYang added a commit that referenced this issue Aug 15, 2022
Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Aug 15, 2022
    * This is so we are consistent with scancode-toolkit JSON output
    * Update expected test results

Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Aug 15, 2022
JonoYang added a commit that referenced this issue Aug 15, 2022
JonoYang added a commit that referenced this issue Aug 23, 2022
Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Aug 23, 2022
    * This is so we are consistent with scancode-toolkit JSON output
    * Update expected test results

Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Aug 23, 2022
JonoYang added a commit that referenced this issue Aug 23, 2022
JonoYang added a commit that referenced this issue Aug 25, 2022
Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Aug 25, 2022
    * This is so we are consistent with scancode-toolkit JSON output
    * Update expected test results

Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Aug 25, 2022
JonoYang added a commit that referenced this issue Aug 25, 2022
tdruez added a commit that referenced this issue Aug 25, 2022
* Implement package assembly in scancode.io #447

Signed-off-by: Jono Yang <[email protected]>

* Minor formatting changes for consistency #447

Signed-off-by: Thomas Druez <[email protected]>

* Create DiscoveredPackages before other models #447

Signed-off-by: Jono Yang <[email protected]>

* Revert "Create DiscoveredPackages before other models #447"

This reverts commit c9b8bed.

Sorting Packages, Dependencies, and Resources from DatafileHandler.assemble() will never work. The code needs to be changed in scancode-toolkit.

Signed-off-by: Jono Yang <[email protected]>

* Update migration #444

Signed-off-by: Jono Yang <[email protected]>

* Return package_uids in for_packages #444

    * This is so we are consistent with scancode-toolkit JSON output
    * Update expected test results

Signed-off-by: Jono Yang <[email protected]>

* Add test for assemble_packages #444

Signed-off-by: Jono Yang <[email protected]>

* Update has_package_data filter logic #444

Signed-off-by: Jono Yang <[email protected]>

* Create directory Resources in docker pipeline #485

    * Update test expectations

Signed-off-by: Jono Yang <[email protected]>

* Bump scancode-toolkit and commoncode #485

Signed-off-by: Jono Yang <[email protected]>

* Add test for pypi wheel #485

Signed-off-by: Jono Yang <[email protected]>

Signed-off-by: Jono Yang <[email protected]>
Signed-off-by: Thomas Druez <[email protected]>
Co-authored-by: Thomas Druez <[email protected]>
@tdruez
Copy link
Contributor

tdruez commented Aug 25, 2022

@JonoYang Is this one ready to be closed since #485 is merged?

@JonoYang
Copy link
Member Author

This is ready to be closed now that we have implemented package assembly in v31.0.0

JonoYang added a commit that referenced this issue Aug 25, 2022
Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Aug 25, 2022
    * This is so we are consistent with scancode-toolkit JSON output
    * Update expected test results

Signed-off-by: Jono Yang <[email protected]>
tdruez added a commit that referenced this issue Aug 31, 2022
* Implement package assembly in scancode.io #447

Signed-off-by: Jono Yang <[email protected]>

* Minor formatting changes for consistency #447

Signed-off-by: Thomas Druez <[email protected]>

* Create DiscoveredPackages before other models #447

Signed-off-by: Jono Yang <[email protected]>

* Revert "Create DiscoveredPackages before other models #447"

This reverts commit c9b8bed.

Sorting Packages, Dependencies, and Resources from DatafileHandler.assemble() will never work. The code needs to be changed in scancode-toolkit.

Signed-off-by: Jono Yang <[email protected]>

* Update migration #444

Signed-off-by: Jono Yang <[email protected]>

* Return package_uids in for_packages #444

    * This is so we are consistent with scancode-toolkit JSON output
    * Update expected test results

Signed-off-by: Jono Yang <[email protected]>

* Create directory Resources in docker pipeline #485

    * Update test expectations

Signed-off-by: Jono Yang <[email protected]>

* Implement package assembly in scancode.io #447

Signed-off-by: Jono Yang <[email protected]>

* Implement package assembly in scancode.io #447

Signed-off-by: Jono Yang <[email protected]>

* Create DiscoveredDependency model #447

    * Create new dependency list and detail views
    * Update assemble_packages() to create DiscoveredDependencies
    * Update test expectations

Signed-off-by: Jono Yang <[email protected]>

* Update fields on DiscoveredDependency #447

    * Remove for_package_uid and replace with ForeignKey for_package
    * Remove datafile_path and replace with ForeignKey datafile_resource
    * Create properties for the two removed fields
    * Update dependency views to link to datafile_resource
    * Update expected test results

Signed-off-by: Jono Yang <[email protected]>

* Properly pluralize verbose name #447

Signed-off-by: Jono Yang <[email protected]>

* Create new argument for create_from_data #447

    * Add strip_datafile_path_root to DiscoveredDependency.create_from_data
    * This argument strips the root path segment from `datafile_path` before using the path to look up the corresponding CodebaseResource
    * This is used in the case where we are importing a scan from scancode-toolkit, where the root path segments are not stripped by default
    * Update expected test results

Signed-off-by: Jono Yang <[email protected]>

* Update prefetch_related #447

    * Used cached_property for DiscoveredDependency properties

Signed-off-by: Jono Yang <[email protected]>

* Prefetch related models in output code #447

Signed-off-by: Jono Yang <[email protected]>

* Import scancode.io 30.2.0 scans in load_codebase

    * Order DiscoveredDependencies by is_runtime, is_optional, is_resolved, and dependency_uid
    * Do not show dependency_uid value in DiscoveredDependency list view

Signed-off-by: Jono Yang <[email protected]>

* Revert changes for importing old scancode.io scans

Signed-off-by: Jono Yang <[email protected]>

* Regen migrations for DiscoveredDependency #447

Signed-off-by: Jono Yang <[email protected]>

* Migrate DiscoveredPackage.dependencies #447

    * Create migrations to generate new DiscoveredDependency objects from DiscoveredPackage.dependencies before removing the dependencies field

Signed-off-by: Jono Yang <[email protected]>

* Update test expectations #447

Signed-off-by: Jono Yang <[email protected]>

* Remove accidentally committed files #447

Signed-off-by: Jono Yang <[email protected]>

* Update migration logic #447

    * Remove unnecessary else from DiscoveredDependency properties

Signed-off-by: Jono Yang <[email protected]>

* Add PackageURLMixin to DiscoveredDependency #447

Signed-off-by: Jono Yang <[email protected]>

* Set DiscoveredDependencies purl fields #447

    * Create migration that populates purl fields for existing DiscoveredDependencies

Signed-off-by: Jono Yang <[email protected]>

* Store purl values in purl fields #447

    * Do not store dependency_uid in purl fields

Signed-off-by: Jono Yang <[email protected]>

* Remove purl field from DiscoveredDependency #447

    * We are already storing this info in the purl fields
    * Create purl property on DiscoveredDependency for compatibility

Signed-off-by: Jono Yang <[email protected]>

* Update DependencyFilterSet #447

    * Add search and purl fields

Signed-off-by: Jono Yang <[email protected]>

* Don't show DiscoveredDependencies purl fields #447

Signed-off-by: Jono Yang <[email protected]>

* Update package detail view dependencies tab #447

Signed-off-by: Jono Yang <[email protected]>

* Add package_type to dependency serializer #511

    * Update test expectations

Signed-off-by: Jono Yang <[email protected]>

* Update expected test results

Signed-off-by: Jono Yang <[email protected]>

* Add dependency table column #447

Signed-off-by: Jono Yang <[email protected]>

* Use tabset in dependency detail view #447

    * Add package_type property to DiscoveredDependency

Signed-off-by: Jono Yang <[email protected]>

* Update dependency list view #447

    * Use updated table header include
    * Update dependency presentation in package detail view
    * Show package uid on hover on for package tab

Signed-off-by: Jono Yang <[email protected]>

* Set DiscoveredDependency serializer fields #511

    * Update DiscoveredDependency ordering

Signed-off-by: Jono Yang <[email protected]>

* Create donut chart for package type #447

Signed-off-by: Jono Yang <[email protected]>

* Consolidate migrations #447

    * Update DiscoveredDependency ordering
    * Update daglib test expectations

Signed-off-by: Jono Yang <[email protected]>

* Update dependency JSON ordering #447

    * Update test expectations

Signed-off-by: Jono Yang <[email protected]>

* Set proper discovereddependencies related_name #447

Signed-off-by: Thomas Druez <[email protected]>

* Fix template indentation #447

Signed-off-by: Thomas Druez <[email protected]>

* Refactor update_from_data method into a UpdateFromDataMixin #447

Signed-off-by: Thomas Druez <[email protected]>

* Fix the ProjectSerializer fields #447

Signed-off-by: Thomas Druez <[email protected]>

* Fix test_scanpipe_api_project_detail unit test #447

Signed-off-by: Thomas Druez <[email protected]>

* Add HTML title for list views #506

Signed-off-by: Thomas Druez <[email protected]>

* Update dependency tabs #447

    * Only show links in dependency for_package tab or dependency datafile_resource tab if there is a value

Signed-off-by: Jono Yang <[email protected]>

* Use UpdateFromDataMixin #447

    * Use UpdateFromDataMixin in DiscoveredDependency
    * Create test for DiscoveredDependency.update_from_data()

Signed-off-by: Jono Yang <[email protected]>

* Fix formatting #447

Signed-off-by: Thomas Druez <[email protected]>

Signed-off-by: Jono Yang <[email protected]>
Signed-off-by: Thomas Druez <[email protected]>
Co-authored-by: Thomas Druez <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants