-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Canonical license texts ? #1452
Comments
Also see the related discussion at #1924. |
This has also come up in the context of FSFE's REUSE tooling, and their desire to use the test files as source content for license texts. There's a mildly-related discussion in the long thread at https://lists.spdx.org/g/Spdx-legal/topic/88334638#3095 My two cents:
|
@swinslow to your last point: people should consume them from https://github.com/spdx/license-list-data. We can obviously create another repo (e.g., spdx/canonical-texts) instead of a directory in this repo; this might be cleaner. |
Reference #1396 (comment) Perhaps we should have a real-time discussion on this to finally decide what the solution is? I can support any reasonable solution with changes in the LicenseListPublisher. |
I tend to agree to that. Plus, that separate repo could run a GitHub action to continuously crawl the locations for the canonical license text and auto-commit them, so we'd get the diffs if the upstream license text ever changes. |
Just to level-set, what are the criteria that you're picturing would be used to determine whether a "canonical" text version exists for a given license? I'd assume something like all of the following, if the intention is to claim that this is a byte-for-byte "canonical" version of the license:
There might be other criteria, but that's what comes to mind offhand. If so, do we have a guess at what percentage of the License List would actually fall into this category? Skimming through the list and making some assumptions, I'd guess maybe the CC licenses, probably some or all of the GNU licenses (though I know GNU has changed their content from time to time), some of the others here and there. I'd guess it's significantly less than a majority of what's on the License List. I would not be in favor of putting anything inside of a "canonical texts" repo that isn't official according to the accepted upstream steward for that license. For example, for the MIT license, MIT is not actually the steward and there's lots of replaceable text, so I assume nothing would be included in the "canonical texts" repo for it. I suspect there's a lot of similar, widely-used licenses that would fall into that category. |
If "steward" here is not limited to a person, but it could also be an organization / foundation, I'd agree.
That depends on what you count as a "version". E.g. Apache (formally) has versions 1.1 and 2.0, so that's (at least) two versions "of that license". Also, do you count different file formats of the same text as different versions? To me, "canonical" is specific to the file format. Like, there could be each a canonical text, PDF, etc. version of a specific license.
I basically agree, but as to me "canonical" is a file-format-specific thing, it's not necessarily limited to plain-text.
That would not be a criteria for me. E.g. https://www.apache.org/licenses/LICENSE-2.0.txt does contain an appendix about how the license should be applied (incl. placeholders), but I still regard it as the canonical license. |
Is the proposal to: |
Just FYI - one of the recent GSoC projects implemented a license text scraper in the LicenseListPublisher for the purpose of verifying the license URL's. Some of that code could be leveraged for this purpose. The code can be found here: https://github.com/spdx/LicenseListPublisher/tree/master/src/org/spdx/crossref |
How can you have a BSD-canonical license? There's no license steward, the text has a huge number of variations (even when you omit the ones that talk about the voices in Bill Paul's head) and the 'original' isn't at all templated and uses terms that are specific to a tape distribution of a known version which fit less well to the continuous release that all open source projects with SCMs facing the internet do. At best we can have a constructed after the fact idealized license for this class of licenses. And it's a large an important class, not some obscure back water of open source. I'd love to have this, as it makes it my job of having files with only the SDPX License Expression to indirectly refer to the license a lot easier to explain in our policy documents (which is required, imho, to create the legal contract (or whatever the right word is for a one-sided grant) by making it clear what that license grant is). The nuts and bolts of having it in a separate repo, apis to access it, etc are interesting. I rather like that too, but I'm stumbling on 'canonical' to describe it. At best we can get is more of a 'specimen' which is as representative a license as we can get that's as generic as possible that would certainly be more than adequate to drive whatever testing use case prompted this request. |
As a gut instinct, I feel strongly against a new, separate repo for this. That is another thing to maintain and therefore have criteria around etc. for the reasons already stated, is going to be more challenging that it seems. Based on previous discussions, it seems like we got to a point of 1) recommending against using the text files in this repo for this purpose; and 2) pointing people to something either a) already in the license-list-data repo; or b) something to-be-created in the license-list-data repo. I'd strongly recommend we pick up there and, as @goneall suggests, perhaps try out using some iteration of what has been discussed recently in terms of identifying some key aspects in terms of: 1) what is the issue to be solved; 2) how does it fit with the SPDX mission/vision; and 3) is this something we should/have time/will solve (and then if so, how) is solving and |
also discussed at #1575 |
Do we want to keep a "canonical" text for licenses that have one?
The question has been raised in the past, and is currently triggering #1396.
I would be in favor of having another directory with canonical texts, if they exist.
I would not be in favor using the test files for this purpose.
What do others think?
The text was updated successfully, but these errors were encountered: