Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some packages scanned but license was not discovered although the license file exists #662

Closed
RabeeaEgbareia opened this issue Apr 4, 2023 · 22 comments

Comments

@RabeeaEgbareia
Copy link

I scanned a zip file with several packages with scan_codebase pipeline, the job finished successfully but I see some packages(for example: iperf package) was scanned but no license discovered although the licenses file (LICENSE) exists.
I'm using the api: /api/projects/[PROJECT_ID]/packages/ to get the packages license

I attached the example zip file I mentioned above (iperf.zip)

  • I'm using v32.10
    image

iperf.zip

@RabeeaEgbareia RabeeaEgbareia changed the title Some packages scanned but license was not discovered although the license file exist Some packages scanned but license was not discovered although the license file exists Apr 4, 2023
@mjherzog
Copy link
Member

mjherzog commented Apr 4, 2023

iperf.zip is an archive, not a Package in the ScanCode.io context. SCIO will only report package data where you have package metadata as for npms, RubyGems, etc. You should be able to get the file-level data for iperf from the Scan Resources.

@RabeeaEgbareia
Copy link
Author

@mjherzog
So seems I misunderstand something ..

  1. "SCIO will only report package data where you have package metadata as for npms, RubyGems, etc.", where can I find all the list ?
  2. what do you suggest me to do to find the license(s) used in iperf.zip ? I see scan resources results and it a lot of information. Can you please explain what should I take ?
    I'm using the api: api/projects/[PROJECT_ID]/results/

@tdruez
Copy link
Contributor

tdruez commented Apr 5, 2023

"SCIO will only report package data where you have package metadata as for npms, RubyGems, etc.", where can I find all the list ?

https://scancode-toolkit.readthedocs.io/en/doc-update-licenses/reference/available_package_parsers.html

what do you suggest me to do to find the license(s) used in iperf.zip ?

  1. You can have a look at the charts from the project page UI to get an overview of the licenses detected in the input archive:
    Screenshot 2023-04-05 at 11 59 35

  2. You can click on the chart to get a list of resources related to a specific license or check the whole list of resources.
    Screenshot 2023-04-05 at 12 02 43

  3. You can look at a file content to visualize snippets of text where the licenses where detected:
    Screenshot 2023-04-05 at 12 08 49

I see scan resources results and it a lot of information.

Indeed.

I'm using the API

REST APIs are a great way to exchange data, not so much to review scan results.

@RabeeaEgbareia
Copy link
Author

RabeeaEgbareia commented Apr 5, 2023

@tdruez
I'm integrating scancode.io as automation tool to my product. For the automation I'm using only the rest api of scancode.io (including uploading files and starting pipeline) and I need to report all the results with teamCity ( the final results for me are to know all the licenses used in my different projects and which packages they belong to).
So the graphs in this case are nice extra information and good to know but the graphs are not the goal.

REST APIs are a great way to exchange data, not so much to review scan results.

So what's you suggestion ? what's the fastest way to get my goal I mentioned above ?

@tdruez
Copy link
Contributor

tdruez commented Apr 5, 2023

You can get the whole scan data results for each individual file resource and package from the results API endpoint: /api/projects/UUID/results/

The fields you are looking for are ["packages"]["license_expression"], ["files"]["license_expressions"], and ["files"]["for_packages"]

@RabeeaEgbareia
Copy link
Author

@tdruez
Thanks a lot !

@RabeeaEgbareia
Copy link
Author

@tdruez
A question:
Why ["packages"] field has only one license expression ? and ["files"] could have several license expressions ?
Is a package like rpm, npms, RubyGems, etc can have only one license and and archive folder (like iperf) can have more than one license ?

@AyanSinhaMahapatra
Copy link
Member

Hey @RabeeaEgbareia

TL;DR: this was a wart, for no specific reason. Being standardized in later versions with one license-expression string for both file/package.

This is scancode-toolkit v31.x output format btw, we are releasing v32.x shortly and we have a seperate output format where both in files and packages we have a single (detected/declared) license expression. (and a secondary license expression for packages) We will also update scancode.io to this new output format, see #569

Previously in v31 (which is what latest scancode.io is using currently) we licenses containing license matches and license-expressions with their respective license-expression and this was a list. But for packages we didn't show license matches similarly, and in most cases performed case-specific license detections and hence we had just one license-expression as a result.

See also https://scancode-toolkit.readthedocs.io/en/doc-update-licenses/reference/license-detection-reference.html for more details on this.

@RabeeaEgbareia
Copy link
Author

@AyanSinhaMahapatra
Thanks you for the clarification.
For now I will manage with the current output, I will update my project with the new release v32.x once it released.
Do you have an estimated time ? how can I be noticed/updated with the new release ?

@AyanSinhaMahapatra
Copy link
Member

AyanSinhaMahapatra commented Apr 10, 2023

Do you have an estimated time ?

I hope this week 🤞 (this is a stable release of scancode-toolkit we are talking about btw, a scancode.io update will take longer of course)

I will update my project with the new release v32.x once it released

Great! It has a lot of improvements.
But note that v32.0.0rc3 is also stable in terms of output format, we are not changing anything there anymore.

how can I be noticed/updated with the new release ?

We do announce new releases in https://matrix.to/#/#aboutcode-org_discuss:gitter.im and https://matrix.to/#/#aboutcode-org_scancode:gitter.im, and we are also considering to automate release updates for all our projects: aboutcode-org/aboutcode#122 but we are not there yet

@pombredanne
Copy link
Member

@tdruez A question: Why ["packages"] field has only one license expression ? and ["files"] could have several license expressions ? Is a package like rpm, npms, RubyGems, etc can have only one license and and archive folder (like iperf) can have more than one license ?

@RabeeaEgbareia The license_expression of a package is derived from a single input which is the package manifest field that stores a license, such as a Maven POM <licenses> tag or an npm 'license" package.json attribute: we derive a single license expression from this single tag.

In constrast, each file may have many different discrete license statements in multiple positions: we report one license expression for each in these cases.

Note that as @AyanSinhaMahapatra mentioned above we have updated formats coming up.

@RabeeaEgbareia on another note you wrote:

For now I will manage with the current output, I will update my project with the new release v32.x once it released.

I am curious about what you project is! Can you share some details about it?

@RabeeaEgbareia
Copy link
Author

@pombredanne
Thanks ! so, do you suggest me to wait with my integration in my project with scancode.io until the new version be released ? Does the new version have significant changes in the format ?

@pombredanne for your second question

I am curious about what you project is! Can you share some details about it?

As I wrote above:

I'm integrating scancode.io as automation tool to my product. For the automation I'm using only the rest api of scancode.io (including uploading files and starting pipeline) and I need to report all the results with teamCity ( the final results for me are to know all the licenses used in my different projects and which packages they belong to).
So the graphs in this case are nice extra information and good to know but the graphs are not the goal.

And:
I'm integrating scancode.io api in Java, which means, writing java code to integrate scancode.io api to scan projects we use in our company and to be aware which licenses we use in our projects and which packages use these licenses.
My scan results and analyzes should look as the following:
image

Every week we should scan our projects and know if we added new packages in use and which licenses added in that week and which licenses are already in use ("old licenses")

I hope I explained myself clear enough, if not, let me know if you have more questions.

@pombredanne
Copy link
Member

Thanks ! so, do you suggest me to wait with my integration in my project with scancode.io until the new version be released ? Does the new version have significant changes in the format ?

I would not wait. We will try to minimize the impact and will provide upgrade instructions when this happens.

@RabeeaEgbareia
Copy link
Author

Ok thanks

@RabeeaEgbareia
Copy link
Author

Hi all,
@pombredanne @tdruez @AyanSinhaMahapatra @mjherzog
I have another question:

What is the difference between "license_expression" and "declared_license" in "packages" ?
Why I'm getting such like this results?
(I'm with the same images version)

"license_expression": "unknown",
"declared_license": "[{'url': 'http://www.apache.org/licenses/LICENSE-2.0.txt', 'name': 'Apache 2', 'comments': None, 'distribution': 'repo'}]",

@pombredanne
Copy link
Member

@RabeeaEgbareia You wrote

I have another question:

It is usually better to re start a new issue for a new topic ;)

What is the difference between "license_expression" and "declared_license" in "packages" ?

  • "declared_license" is the original license tag, field or structure found in a package manifest, converted to a plain data structure. For instance this "declared_license": "[{'url': 'http://www.apache.org/licenses/LICENSE-2.0.txt', 'name': 'Apache 2', 'comments': None, 'distribution': 'repo'}]", is likely seen in a Maven POM XML such as this https://repo1.maven.org/maven2/org/parboiled/parboiled-core/0.10.1/parboiled-core-0.10.1.pom, but here converted the XML to plain data. Note that the "declared_license" field is renamed to "extracted_license_statement" in the upcoming release of ScanCode Toolkit and will be renamed also in ScanCode.io

  • "license_expression" is the normalized license expression detected from the declared_license/extracted_license_statement field. It is a bug that we do report this as "unknown" here. There is a lot of pending fixes worked on Maven POM licenses ongoing in License detection improvements and review scancode-toolkit#3346 FWIW.

@RabeeaEgbareia
Copy link
Author

RabeeaEgbareia commented May 1, 2023

@pombredanne

It is usually better to re start a new issue for a new topic ;)

Ok, Got it :)

Thanks for the explanation.
I used "license_expression" in my project, what do you suggest me to do ? Is there any estimation when this will be fixed ?
For now, seems not trivial (for me) to start using "declared_license" instead of "license_expression" ...

@AyanSinhaMahapatra
Copy link
Member

@RabeeaEgbareia this is fixed in scancode-toolkit and we are working on #569 which would be in scancode.io v33 here. I'm not sure I have an accurate estimate on the release, but this is being actively worked on and is top priority for us at this point.

As mentioned above, we have also streamlined and renamed some of our license fieldnames as seen here in the CHANGELOG, and we would have detailed documentation on this too for upgrading.

I used "license_expression" in my project, what do you suggest me to do

This would be the way IMHO for now, as opposed to using declared_license which are not normalized/mapped to actual license expressions, just the text as is, as found in the manifest.

@RabeeaEgbareia
Copy link
Author

Hi again, @AyanSinhaMahapatra
Is there any progress regarding this issue ? estimation for the stable version ?

@RabeeaEgbareia this is fixed in scancode-toolkit and we are working on #569 which would be in scancode.io v33 here. I'm not sure I have an accurate estimate on the release, but this is being actively worked on and is top priority for us at this point.

As mentioned above, we have also streamlined and renamed some of our license fieldnames as seen here in the CHANGELOG, and we would have detailed documentation on this too for upgrading.

I used "license_expression" in my project, what do you suggest me to do

This would be the way IMHO for now, as opposed to using declared_license which are not normalized/mapped to actual license expressions, just the text as is, as found in the manifest.

@AyanSinhaMahapatra
Copy link
Member

Hi @RabeeaEgbareia
Yes!

We do have a stable release of scancode-toolkit out now, currently at 32.0.4 and this is also supported in the latest scancode.io release: https://github.com/nexB/scancode.io/releases/tag/v32.3.0 (upgrade done in #752 and #772, see these and the documentation at https://scancode-toolkit.readthedocs.io/en/stable/reference/license-detection-reference.html for updating reference).

@RabeeaEgbareia
Copy link
Author

@AyanSinhaMahapatra
Thank you ! I will install it again and try.

@tdruez
Copy link
Contributor

tdruez commented Jul 3, 2023

Closing as completed.

@tdruez tdruez closed this as completed Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants