Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: REUSE materialize to annotate codebased on reuse.toml #921

Open
nicorikken opened this issue Feb 25, 2024 · 4 comments
Open

Idea: REUSE materialize to annotate codebased on reuse.toml #921

nicorikken opened this issue Feb 25, 2024 · 4 comments
Labels
discussion needed enhancement New feature or request

Comments

@nicorikken
Copy link
Member

Idea: An option to take the copyright and license information from reuse.toml and apply it to an existing codebase.

You would run:

$ reuse materialize

And all files would be annotated with headers or .license files with the information provided in the reuse.toml files.

Reason

There are some use-cases where annotating a codebase directly is not ideal because the code needs to be kept up to date or because it would result in many .license files cluttering the codebase. The .reuse/dep5 file and future reuse.toml cover these use-cases by proving a designated location to place annotations. Still this method is less explicit than annotating all files and should be considered a last resort.

Use-cases

Annotate for release

This Materialize option would enable projects to explicitly annotate a codebases when packaging for release. This way they can distribute code in a way that is REUSE-able in the best way possible.

Annotate declaratively

If implemented it would also allow users to use a reuse.toml file to decleratively annotate codebases that have more complex copyright and license information: first describe the copyright and license information in the reuse.toml file and then annotate files accordingly using Materialize.

Use as preprocessor for license scanner

Not all software license scanners properly detect copyright and license information of codebases considered REUSE compliant, because they lack support for the .license files, let alone the .reuse/dep5 or reuse.toml file. With this feature those tools could use the Materialize option as a preprocessor for REUSE-compliant projects. Note those tools would still have to support the .license files.

What if we don't implement it?

Codebases can still be REUSE-compliaint. Users can manually reuse annotate files according to the reuse.toml. They can even create a patch set to repeatedly apply on codebases that shouldn't be touched.

Risks of feature creep

When users start using the reuse.toml file as a base to annotate, they might want to have more control mechanisms in place, like how the files should be annotated (comment style, modify files or add .license file).

@nicorikken nicorikken added enhancement New feature or request discussion needed labels Feb 25, 2024
@mxmehl
Copy link
Member

mxmehl commented Feb 26, 2024

I like the idea but I wonder about the input. Wouldn't it make more sense to use an SBOM on the input side? This could increase the use-cases where an organisation could export an SBOM for a project and run the materialize command to turn it into REUSE compliance?

@silverhook
Copy link
Contributor

I see pros and cons of relying on a (full) SBOM for this.

pro:

  • whoever runs the reuse materialize would have (presumably) already made a full scan of the package (but this can be worked around with a simple reuse spdx)
  • in certain cases, the user (esp. companies) might already have an SBOM lying around

contra:

  • with SBOMs parsing gets pretty complex pretty fast
    • SPDX and/or CycloneDX?
    • which version(s)?
    • if SPDX, which of the flavours?
    • do you take into account “declared” or “concluded” info? What if the SBOM includes both, but not all files with a declared field have also a concluded field? – what if the person does not run a tool that supports conclusions
  • how would someone create an SBOM with a scanner, if there are no REUSE tags (and as such, I assume other license/copyright info) in the source code already? One option would be to file up FOSSology, select everything and batch add the copyright and license info …which sounds like globbing in reuse.toml with extra steps and the preference for a specific (type) of tool

(playing devil’s advocate a bit, don’t be mad)

Ultimately, I think the SBOM as input idea is also good, but perhaps in addition to the reuse.toml input idea.

@mxmehl
Copy link
Member

mxmehl commented Feb 26, 2024

You have some point there, definitely.

I do wonder, also with regard to the risk of feature creep, whether standalone scripts (for ingesting reuse.toml and SBOMs) would be the better way. The reuse.toml script would probably depend on the output of reuse lint --json, while the SBOM "importers" would probably need to ask for preferences on the license field(s).

Another reason why I somehow dislike the idea of putting the command into the reuse core is that it promotes a non-recommended practice (putting everything in reuse.toml). That said, I appreciate the general idea even if I currently wouldn't need it in my daily life.

@silverhook
Copy link
Contributor

How about importing from .ABOUT files?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion needed enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants