-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Package scanning update #482
Conversation
* Update scan_for_application_packages to save detected Package data to the CodebaseResource it is from, then iterate through the CodebaseResources with Package data and use the proper Package handler to process the Package data * Create DiscoveredDependency model * Add package_data JSON field to CodebaseResource Signed-off-by: Jono Yang <[email protected]>
* Increase field sizes in DiscoveredDependency Signed-off-by: Jono Yang <[email protected]>
Signed-off-by: Jono Yang <[email protected]>
* Create DiscoveredDependencySerializer Signed-off-by: Jono Yang <[email protected]>
Signed-off-by: Jono Yang <[email protected]>
Signed-off-by: Jono Yang <[email protected]>
Signed-off-by: Jono Yang <[email protected]>
* We never run into the situation where we are updating a DiscoveredDependency when scanning packages * Clean up migrations Signed-off-by: Jono Yang <[email protected]>
* Update scancode-toolkit to 31.0.0rc3 #447 * Update expected test results * Update code formatting Signed-off-by: Jono Yang <[email protected]>
* This is to reflect the changes made to the Package model in scancode-toolkit * Update test expectations Signed-off-by: Jono Yang <[email protected]>
* purl and dependency_uid are the only required fields on DiscoveredDependency * Update migrations * Check for created DiscoveredDependency in tests Signed-off-by: Jono Yang <[email protected]>
* Create DiscoveredDependencies in load_codebase pipeline * Update tests to check for DiscoveredDependencies * Update test expectations Signed-off-by: Jono Yang <[email protected]>
* Update test expectations Signed-off-by: Jono Yang <[email protected]>
* This will fail until scancode-toolkit 31.0.0rc4 is released * Update test expectations Signed-off-by: Jono Yang <[email protected]>
Signed-off-by: Jono Yang <[email protected]>
Signed-off-by: Jono Yang <[email protected]>
Signed-off-by: Jono Yang <[email protected]>
Signed-off-by: Jono Yang <[email protected]>
Signed-off-by: Jono Yang <[email protected]>
Some more to do's:
|
* Remove related dependencies when resetting a project Signed-off-by: Jono Yang <[email protected]>
Let's try to make it part of the same migration to limit the number of migration files.
Can you try to refresh/reset your cache? The tab system is working fine on my side in chrome. On a more general note, I think there's a bit too much within this one PR and it makes it harder to review and test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code quality is great! I think we are also missing a few tests for the new functions and methods.
scanpipe/models.py
Outdated
def parent_path(self): | ||
""" | ||
Return the parent path for this CodebaseResource or None. | ||
""" | ||
return parent_directory(self.path, with_trail=False) | ||
|
||
def has_parent(self): | ||
""" | ||
Return True if this CodebaseResource has a parent CodebaseResource or | ||
False otherwise. | ||
""" | ||
parent_path = self.parent_path() | ||
if not parent_path: | ||
return False | ||
if self.project.codebaseresources.filter(path=parent_path).exists(): | ||
return True | ||
return False | ||
|
||
def parent(self, codebase=None): | ||
""" | ||
Return the parent CodebaseResource object for this CodebaseResource or | ||
None. | ||
|
||
`codebase` is not used in this context but required for compatibility | ||
with the commoncode.resource.Codebase class API. | ||
""" | ||
parent_path = self.parent_path() | ||
return parent_path and self.project.codebaseresources.get(path=parent_path) | ||
|
||
def has_siblings(self, codebase=None): | ||
""" | ||
Return True is this CodebaseResource has siblings. | ||
|
||
`codebase` is not used in this context but required for compatibility | ||
with the commoncode.resource.Codebase class API. | ||
""" | ||
return self.has_parent() and self.parent(codebase).has_children() | ||
|
||
def siblings(self, codebase=None): | ||
""" | ||
Return a sequence of sibling Resource objects for this Resource | ||
or an empty sequence. | ||
|
||
`codebase` is not used in this context but required for compatibility | ||
with the commoncode.resource.Codebase class API. | ||
""" | ||
if self.has_parent(): | ||
return self.parent(codebase).children(codebase) | ||
return [] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are those new method related to this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These methods are called in the .assembly()
methods of different Package handlers in packagedcode. Some of these methods has logic that determine the Resources of a Package based on the files around a Package manifest. e.g. https://github.com/nexB/scancode-toolkit/blob/develop/src/packagedcode/pypi.py#L185
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've also updated the CodebaseResource walk test to also test for these methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok let's make sure we have full test coverage for those.
_scan_and_save( | ||
resource_qs=resource_qs, | ||
scan_func=scan_for_package_data, | ||
save_func=save_scan_package_results, | ||
) | ||
|
||
# Iterate through CodebaseResources with Package data and handle them using | ||
# the proper Package handler from packagedcode | ||
assemble_packages(project=project) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are the performances of the new assemble? Do we have some stats about the time spent to assemble compared to the scan_and_save processing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any metrics on the performance impact that assemble_packages
has on application package scanning. What is the process you do to collect performance metrics on different scancode.io parts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We run the pipeline using various inputs pre-changes then again post-changes and we can compare the execution time of the relevant steps.
Also, a profiling utility is available to go deeper into optimizations: https://github.com/nexB/scancode.io/blob/main/scanpipe/pipelines/__init__.py#L155
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are the performances of the new assemble? Do we have some stats about the time spent to assemble compared to the scan_and_save processing?
The new assemble is a must have functional change that we should get working and tested before we can optimize if we ever need to optimize? ... @tdruez did you experience an issue?
Signed-off-by: Jono Yang <[email protected]>
5671373
to
2640c2f
Compare
* Show and link relations in the views Signed-off-by: Jono Yang <[email protected]>
* Remove dependency field from CodebaseResourceSerializer output * Update dependency donut chart labels * Revert Package URL hover to package_uid value in package_list view Signed-off-by: Jono Yang <[email protected]>
83f545d
to
4da9b62
Compare
* Update walk test to also test for new parent and sibling related methods on CodebaseResource * Remove has_sibling method from CodebaseResource Signed-off-by: Jono Yang <[email protected]>
4da9b62
to
205dddf
Compare
@tdruez Thanks! I've responded to your comments. Now, I am wondering about the best approach for #444, where we want to represent individual instances of Packages and Dependencies. Right now, in the case where we have detected the same package multiple times, we create unique packages for each instance even though it is essentially the same package. |
I am closing this PR and splitting it into two:
|
This PR updates application package scanning in scancode.io to reflect the updates made to package scanning in scancode-toolkit. Most notably, the DiscoveredDependency model has been added to scancode.io (as the analogue to the Dependency model from scancode-toolkit), and the
scan_for_application_packages
scancode pipe function has been updated to run theassemble()
methods from scancode-toolkit Package handlers. This allows us to create DiscoveredDependencies for DiscoveredPackages and to properly associate application packages to CodebaseResources.The Project API has been updated with the following:
dependency_count
discovered_dependency_summary
total
is_runtime
is_optional
is_resolved
The Project detail view has been updated to show the DiscoveredDependencies stats and a DiscoveredDependency view, but more work on the UI will be needed. In particular, the field labels (I'm not sure what the best labels are) and colors used for boolean values should be updated.
The DiscoveredDependency view field labels, column widths, and search box (doesn't filter) need to be updated as well.
This feature requires aboutcode-org/scancode-toolkit#3035, which will be available in scancode-toolkit 31.0.0rc4