-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(spdx): use the hasExtractedLicensingInfos
field for licenses that are not listed in the SPDX
#8077
base: main
Are you sure you want to change the base?
fix(spdx): use the hasExtractedLicensingInfos
field for licenses that are not listed in the SPDX
#8077
Conversation
pkg/sbom/spdx/marshal.go
Outdated
return fmt.Sprintf("(%s)", license) | ||
}), " AND ") | ||
|
||
normalizedLicense, err := expression.Normalize(license, licensing.NormalizeLicense, expression.NormalizeForSPDX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to override the expression.NormalizeForSPDX
function, but we use With
as a delimiter to parse licenses:
%token<token> IDENT OR AND WITH |
So there are problems with overwriting licenses with the wrong exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about my approach?
#8257
It would be as below. I just wrote a PoC, so it doesn't work as is, of course, but I hope you'll get my point.
otherLicenses := map[string]*spdx.OtherLicense{}
replaceOtherLicenses := func(expr expression.Expression) expression.Expression {
var licenseName string
switch e := expr.(type) {
case expression.SimpleExpression:
if expression.ValidSpdxLicense(e.License) {
return license
}
licenseName = e.License
case expression.CompoundExpression:
if e.Conjunction() != expression.TokenWith {
return expr
}
if expression.ValidSpdxLicense(e.Left().String()) && expression.ValidSpdxLicense(e.Right().String()) {
return expr
}
licenseName = e.String()
}
license := newOtherLicense(licenseName, false)
if license == nil {
...
}
otherLicenses[license.LicenseIdentifier] = license
return expression.SimpleExpr{License: license.LicenseIdentifier}
}
normalizedLicense, err := expression.Normalize(license, licensing.NormalizeLicense, expression.NormalizeForSPDX, replaceOtherLicenses)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case expression.CompoundExpression:
if e.Conjunction() != expression.TokenWith {
return expr
}
if expression.ValidSpdxLicense(e.Left().String()) && expression.ValidSpdxLicense(e.Right().String()) {
return expression.SimpleExpr{License: expr.String()}
}
licenseName = e.String()
It looks like we need to return SimpleExpr
to a valid SPDX license to avoid a separate license and exception checking, but overall your approach should work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we need to return SimpleExpr to a valid SPDX license to avoid a separate license and exception checking
It's just a waste of time as it will validate the same licenses and exceptions twice, but it doesn't cause any bugs, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
won't we get something like duplicates?
I mean license with exception
, license
and separate exception
.
But maybe I'm wrong - I'll check and let you know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe not, but let's see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we use otherLicense for exception.
e.g. for happy path with WITH operator
test:
-AFL-2.0 AND AFL-3.0 WITH Autoconf-exception-3.0
+AFL-2.0 AND AFL-3.0 WITH LicenseRef-ab9a5d8bfe1416c4
So i saved license with exception as SimpleExp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated in a8a85ad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this code will fix the issue and provide compliance SPDX license expressions.
I had a few suggestions to produce a bit more understandable results and a few coding suggestions.
} | ||
if text { | ||
otherLicense.ExtractedText = license | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else { |
I would always include the license as the license name - not just when the text is present
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second thought, this may be appropriate if the license text is the actual text of the license. In most cases, the metadata for packages includes text that are supposed to be a license name or identifier in which case it should also be in the name. If we know the license text is really the text, then the existing code is OK.
If, however, the license text is what is found in the package metadata files and they are not the actual text, I would add the same field as the name PLUS add a LicenseComment to explain - something like `otherLicense.LicenseComment = "The license text represents text found in package metadata and may not represent the full text of the license"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are cases (e.g. license
field from python METADATA files) when we can't understand that license is name/id/text.
We tried to detect text
of license:
trivy/pkg/licensing/normalize.go
Lines 596 to 617 in f6acdf7
var licenseTextKeywords = []string{ | |
"http://", | |
"https://", | |
"(c)", | |
"as-is", | |
";", | |
"hereby", | |
"permission to use", | |
"permission is", | |
"use in source", | |
"use, copy, modify", | |
"using", | |
} | |
func isLicenseText(str string) bool { | |
for _, keyword := range licenseTextKeywords { | |
if strings.Contains(str, keyword) { | |
return true | |
} | |
} | |
return false | |
} |
So Trivy split license name/id and license text(text://
prefix).
That is why i used both LicenseName
and ExtractedText
fields:
https://github.com/DmitriyLewen/trivy/blob/f851f9bb18411db838fc65e0c6c4351b04953f8f/pkg/sbom/spdx/marshal_private_test.go#L92-L112
Another question - i used NOASSERTION
for LicenseName
and ExtractedText
fields (see link above), because these fields are mandatory (https://github.com/spdx/tools-golang/blob/f6e45fdb9e4e0c993105f798bee5f8aa8ea70f84/spdx/v2/v2_3/other_license.go#L12-L20).
Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a LicenseComment to explain - something like `otherLicense.LicenseComment = "The license text represents text found in package metadata and may not represent the full text of the license"
I liked this idea 👍
Added in 041ab21
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So Trivy split license name/id and license text(text:// prefix).
That is why i used both LicenseName and ExtractedText fields
Makes sense - I wasn't sure how the text was captured. Since you are capturing the text and name distinctly, your approach should work. BTW, it's a bit tricky to find the start and end of the license text and even trickier to match it to know licenses - something we've been working on in the SPDX java libraries for about 10 years and still don't have it perfected ;).
i used NOASSERTION for LicenseName and ExtractedText fields (see link above), because these fields are mandatory (https://github.com/spdx/tools-golang/blob/f6e45fdb9e4e0c993105f798bee5f8aa8ea70f84/spdx/v2/v2_3/other_license.go#L12-L20).
Is this correct?
For the license name, this is OK. The spec isn't very clear on how these should be treated, so many people make up a name based on the text. Unlike other parts of the spec, NOASSERTION is not required if you don't know - but in this case I think it would be fine to use NOASSERTION - for the name.
For the text, I would put in whatever string was actually found - even if it is the name. The definition of the field is the license text found - so if someone puts in "This software is licensed under mylicense" - you can put that exact string in the text field even though it is not technically the text of "mylicense". I wouldn't use "NOASSERTION" for the text field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! Thanks for your opinion!
Updated text field in 659f992
pkg/licensing/expression/category.go
Outdated
"VSFTPD-OPENSSL-EXCEPTION", | ||
"WXWINDOWS-EXCEPTION-3.1", | ||
"X11VNC-OPENSSL-EXCEPTION", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment may deserve to be a separate suggestion, but in reading the code I would recommend building the license and exception IDs from the JSON files maintained by the SPDX legal team. The license list is updated every 3 months with new IDs and maintaining these in code can be a challenge to keep up and maintain. What I do in the code I maintain is attempt to access the current JSON files on the website https://spdx.org/licenses/licenses.json and https://spdx.org/licenses/exceptions.json. If I can not access the website or if the user specified not to use the online version, I'll use a cached version of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are cases when users run multiple times.
Downloading these files for each run is not good.
But we can save licenses.json
and exceptions.json
files in the cache dir and use them.
The files contain releaseDate
field, so we can update this file only when releaseDate + 3 months
has expired.
The license list is updated every 3 months
How strictly is this rule followed?
Anyway let's move this discussion into another issue/pr.
What I do in the code I maintain is attempt to access the current JSON files on the website https://spdx.org/licenses/licenses.json and https://spdx.org/licenses/exceptions.json
I found that https://spdx.org/licenses/exceptions.json and https://github.com/spdx/license-list-data/blob/592c2dcb8497c6fe829eea604045f77d3bce770b/json/exceptions.json are different (see harbour-exception
).
Which file would be more correct to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How strictly is this rule followed?
Not very strictly. There is, however, a license list version field which is reliably incremented on release.
I found that https://spdx.org/licenses/exceptions.json and https://github.com/spdx/license-list-data/blob/592c2dcb8497c6fe829eea604045f77d3bce770b/json/exceptions.json are different (see harbour-exception).
Which file would be more correct to use?
The lists at https://spdx.org/licenses - these will always be the latest released version. The github repo master will have the latest in development version which may not be stable. The github repo is tagged with release versions, so if you go to the tag for the latest release in github, it will match what is on the website.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, update exception list in 659f992
} | ||
return NormalizeLicense(c.Licenses) | ||
otherLicense.LicenseIdentifier = LicenseRefPrefix + "-" + licenseID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestion - you could simplify this by including the "-" in the LicenseRefPrefix definition and just use LicenseRefPrefix + licenseID
The string "LicenseRef" will likely never be used without the trailing "-".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is our internal decision - we don't use -
suffixes in constants :)
pkg/licensing/expression/category.go
Outdated
// SpdxLicenseExceptions contains all supported SPDX Exceptions | ||
// cf. https://spdx.org/licenses/exceptions.json | ||
// used `awk -F'"' '/"licenseExceptionId":/ {print toupper("\"" $4 "\"," )}' exceptions.json ` command |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we need to keep it up-to-date, it should be done by mage spdx
or something like that. I think we should create a separate file for the list and add
// Code generated by "mage spdx", DO NOT EDIT.
// source: https://spdx.org/licenses/exceptions.json
to the header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to check exceptions.json
file in tests?
I mean the same as for mage docs:generate
This will help keep the file up-to-date, but can be noisy for PRs when a new version of file is released.
On the other hand, we can add a separate action to check the file's relevance once a week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using curl
and awk
through go generate
is the easiest way, but some environments don't have curl
, and CLI flags might be different. Ideally, we should do that in Go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to check exceptions.json file in tests?
No, we don't need it for now. We can update the file when we notice that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's what i planned
this command (using awk
and other commands) is just a quick way to get all exceptions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can add a separate action to check the file's relevance once a week.
This sounds better. Also, we don't need to fail the test. We can notify it on Microsoft Teams.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated PR.
Take a look, when you have time, please.
pkg/licensing/expression/category.go
Outdated
}) | ||
|
||
// ValidSpdxLicense returns true if SPDX license lists contain licenseID and license exception (if exists) | ||
func ValidSpdxLicense(license string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: IsValidSPDXLicense or ValidateSPDXLicense are more aligned with the naming style in Trivy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed in 48a46b8
pkg/licensing/expression/category.go
Outdated
if spdxLicenses == nil { | ||
initSpdxLicenses() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we can just call this function as nil is checked at the beginning of the function.
if spdxLicenses == nil { | |
initSpdxLicenses() | |
} | |
initSpdxLicenses() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed in 8a96a8b
pkg/licensing/expression/category.go
Outdated
if spdxExceptions == nil { | ||
initSpdxExceptions() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed in 8a96a8b
pkg/licensing/expression/category.go
Outdated
@@ -359,3 +371,79 @@ var ( | |||
ZeroBSD, | |||
} | |||
) | |||
|
|||
var spdxLicenses map[string]struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After merging #8149 into this branch, you can use set.Set.
var spdxLicenses map[string]struct{} | |
var spdxLicenses set.Set[string] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated in 82a52f1
pkg/licensing/expression/category.go
Outdated
//go:embed exceptions.json | ||
var exceptions []byte | ||
|
||
var spdxExceptions map[string]struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated in 82a52f1
pkg/licensing/expression/category.go
Outdated
id, exception, ok := strings.Cut(license, " WITH ") | ||
if _, licenseIdFound := spdxLicenses[id]; licenseIdFound { | ||
if !ok { | ||
return true | ||
} | ||
if _, exceptionFound := spdxExceptions[strings.ToUpper(exception)]; exceptionFound { | ||
return true | ||
} | ||
} | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I tried to make the nest less, but it's just my preference. You can decide it.
id, exception, hasException := strings.Cut(license, " WITH ")
if !spdxLicenses.Contains(id) {
return false
}
if !hasException {
return true
}
if !spdxExceptions.Contains(strings.ToUpper(exception)) {
return false
}
return true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I separated this logic in a8a85ad
pkg/sbom/spdx/marshal.go
Outdated
@@ -99,6 +102,7 @@ func NewMarshaler(version string, opts ...marshalOption) *Marshaler { | |||
format: spdx.Document{}, | |||
hasher: hashstructure.Hash, | |||
appVersion: version, | |||
logger: log.WithPrefix("SPDX"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The current prefix is always lowercase.
https://github.com/aquasecurity/trivy/blob/a2ebc32f7172961f8081af2a99f1467d67e568ec/pkg/log/logger.go/#L20-L27
logger: log.WithPrefix("SPDX"), | |
logger: log.WithPrefix("spdx"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed in b38b6eb
pkg/licensing/expression/category.go
Outdated
|
||
var exs []string | ||
if err := json.Unmarshal(exceptions, &exs); err != nil { | ||
log.WithPrefix("SPDX").Warn("Unable to parse SPDX exception file", log.Err(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to define a const for SPDX as it's used in several places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in b38b6eb
pkg/sbom/spdx/marshal.go
Outdated
return fmt.Sprintf("(%s)", license) | ||
}), " AND ") | ||
|
||
normalizedLicense, err := expression.Normalize(license, licensing.NormalizeLicense, expression.NormalizeForSPDX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about my approach?
#8257
It would be as below. I just wrote a PoC, so it doesn't work as is, of course, but I hope you'll get my point.
otherLicenses := map[string]*spdx.OtherLicense{}
replaceOtherLicenses := func(expr expression.Expression) expression.Expression {
var licenseName string
switch e := expr.(type) {
case expression.SimpleExpression:
if expression.ValidSpdxLicense(e.License) {
return license
}
licenseName = e.License
case expression.CompoundExpression:
if e.Conjunction() != expression.TokenWith {
return expr
}
if expression.ValidSpdxLicense(e.Left().String()) && expression.ValidSpdxLicense(e.Right().String()) {
return expr
}
licenseName = e.String()
}
license := newOtherLicense(licenseName, false)
if license == nil {
...
}
otherLicenses[license.LicenseIdentifier] = license
return expression.SimpleExpr{License: license.LicenseIdentifier}
}
normalizedLicense, err := expression.Normalize(license, licensing.NormalizeLicense, expression.NormalizeForSPDX, replaceOtherLicenses)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I save and check exceptions in uppercase because we don't have normalization for exceptions like we do for licenses.
I suggest waiting for user feedback.
Maybe we need to update this logic (e.g. check in uppercase but save in original (from this file) case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we need to update this logic (e.g. check in uppercase but save in original (from this file) case).
In the SPDX tools I maintain, I take the approach of saving in the original case but comparing ignoring case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description
For cases where the SPDX license list does not contain the detected license, we should use the
hasExtractedLicensingInfos
field.See #7721 for more details.
Before:
After:
test runs for cron action:
Related issues
hasExtractedLicensingInfos
for licenses not in the SPDX license list #7721Checklist