-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON and website inexactly match for AGPL-1.0 which forbids non-verbatim copies #2358
Comments
The text in the JSON file actually come from a text file and not the XML. For context, please refer to this pull request for the tool that generates the JSON and website from the XML and test data: spdx/LicenseListPublisher#83 If the JSON data is incorrect, then the test data is incorrect. BTW - there is a flag in the LicenseListPublisher tool to generate the JSON file from the XML instead of the test data. If we change the switch, it will reopen many issues raised in the above mentioned pull request. |
Referencing the Wayback Machine archive for http://affero.org/oagpl.html on 2006-01-05 gives me this:
From this HTML: <td width="99%" valign="Top" align="Center">
<div align="Left">
<p><b><big><big>AFFERO GENERAL PUBLIC LICENSE</big></big></b><br>
</p>
<p><big>Version 1, March 2002</big><br>
<br>
Copyright © 2002 Affero Inc.<br>
510 Third Street - Suite 225, San Francisco, CA 94107,
USA</p> So yes, it seems that in this case:
Obviously, no one is really using the AGPL 1.0 for new work right now, indeed as far as I am aware it was never very popular, and then the AGPL 3.0 happened only a few years later. But that was why I chose it as an initial test case: it's fairly easy to reference its canonical version, and I had, at the time, figured its lack of popularity meant there wouldn't be as much dispute over its exact contents, which is an issue that plagues e.g. MIT, the various BSD-N-clauses, etc. |
@workingjubilee Thanks for submitting this. Apologies as I may not be fully following the question here. From SPDX's perspective, the primary purpose of the license list is to enable testing of license texts against the license list entries according to the SPDX Matching Guidelines. As reflected in the guidelines, whitespace differences are disregarded and all whitespace should be treated as a single blank space. It has not been SPDX's purpose to ensure that website, JSON, etc. versions of license texts are exactly identical (especially from a whitespace perspective) to the license stewards' encodings of those license texts. Just to confirm, then, are you seeing any differences in the substantive text between the original author's text for AGPL-1.0, and the version of AGPL-1.0 tracked by SPDX? Thanks! |
I haven't seen a response to the latest comment, so I'm going to go ahead and close this one. Please feel free to re-open a new issue if there are differences in substantive text (not just whitespace) between the original author's text and the version tracked by SPDX. Thank you! |
THE VERY SHORT VERSION: Translating XML to JSON seems to result in significant differences between the JSON and rendered website text.
I printed the JSON text data from https://github.com/spdx/license-list-data/blob/main/json/details/AGPL-1.0.json using a Rust program after applying the transformation of the
\u2007
escaping sequence to a Rust-recognized\u{2007}
sequence. Later experiments with JS REPLs seem to yield an exactly matching text output. I acquired this: LICENSE.txt. Yet this is different from what the website renders, because the website's rendered version looks like:However, the JSON-tripped version is:
Note that both get the first line right and then start on the same second line but then disagree on the next three. The JSON data for `"licenseText" up to that point is the following:
The XML data looks like:
That is, it includes a pair of
<br/>
s here, one in each<p></p>
pair, which I believe is accounting for the rendered spacing on the website. This causes copying the version from the website to get a LICENSE-RIGHTCLICK.txt and running that through tools like askalono to return an inexact match, despite being, as far as I know, an exact copy!Note that the AGPL 1.0 has the clause:
"Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed."
I have excerpted this quote in a standard citational form but I have not added emphasis because, as the license says... changing it is not allowed. This suggests one of the two forms, the XML-encoded text, or the JSON string, is meaningfully incorrect, as they render to substantively different displayed text by typical renderers for their encoding.
I have no idea if this actually matters, of course. I am not a lawyer, this is not legal advice, etc. etc. etc. However, it seems that the generation of the JSON data from the XML masters may be dropping important formatting details, and it would not seem strange to me if a legal case, however frivolous-seeming, hinged on this difference, given how many cases have been decided on the presence or absence of commas.
This seems to have fellow issues in, but does not seem to be an exact duplicate of,
The reason why it does not seem to be an exact copy of #1924 is that it seems like all the data necessary to achieve a replication of the website's formatting is there in the XML, but not in the JSON, and that the checked-in test data seems to be derived from a JSONified-first form?
This could also be, say, an HTML vs. XML difference.
The text was updated successfully, but these errors were encountered: