JSON and website inexactly match for AGPL-1.0 which forbids non-verbatim copies #2358

workingjubilee · 2024-01-31T01:17:38Z

THE VERY SHORT VERSION: Translating XML to JSON seems to result in significant differences between the JSON and rendered website text.

I printed the JSON text data from https://github.com/spdx/license-list-data/blob/main/json/details/AGPL-1.0.json using a Rust program after applying the transformation of the \u2007 escaping sequence to a Rust-recognized \u{2007} sequence. Later experiments with JS REPLs seem to yield an exactly matching text output. I acquired this: LICENSE.txt. Yet this is different from what the website renders, because the website's rendered version looks like:

AFFERO GENERAL PUBLIC LICENSE
Version 1, March 2002

Copyright © 2002 Affero Inc.
510 Third Street - Suite 225, San Francisco, CA 94107, USA

However, the JSON-tripped version is:

AFFERO GENERAL PUBLIC LICENSE
Version 1, March 2002 Copyright © 2002 Affero Inc. 510 Third Street - Suite 225, San Francisco, CA 94107, USA

Note that both get the first line right and then start on the same second line but then disagree on the next three. The JSON data for `"licenseText" up to that point is the following:

"licenseText": "AFFERO GENERAL PUBLIC LICENSE\nVersion 1, March 2002 Copyright © 2002 Affero Inc. 510 Third Street - Suite 225, San Francisco, CA 94107, USA\n\n

The XML data looks like:

      <titleText>
         <p>AFFERO GENERAL PUBLIC LICENSE
          <br/>Version 1, March 2002 </p>
      </titleText>
      <p>Copyright © 2002 Affero Inc.
      <br/> 510 Third Street - Suite 225, San Francisco, CA 94107, USA</p>

That is, it includes a pair of <br/>s here, one in each <p></p> pair, which I believe is accounting for the rendered spacing on the website. This causes copying the version from the website to get a LICENSE-RIGHTCLICK.txt and running that through tools like askalono to return an inexact match, despite being, as far as I know, an exact copy!

Note that the AGPL 1.0 has the clause:

"Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed."

I have excerpted this quote in a standard citational form but I have not added emphasis because, as the license says... changing it is not allowed. This suggests one of the two forms, the XML-encoded text, or the JSON string, is meaningfully incorrect, as they render to substantively different displayed text by typical renderers for their encoding.

I have no idea if this actually matters, of course. I am not a lawyer, this is not legal advice, etc. etc. etc. However, it seems that the generation of the JSON data from the XML masters may be dropping important formatting details, and it would not seem strange to me if a legal case, however frivolous-seeming, hinged on this difference, given how many cases have been decided on the presence or absence of commas.

This seems to have fellow issues in, but does not seem to be an exact duplicate of,

The reason why it does not seem to be an exact copy of #1924 is that it seems like all the data necessary to achieve a replication of the website's formatting is there in the XML, but not in the JSON, and that the checked-in test data seems to be derived from a JSONified-first form?

This could also be, say, an HTML vs. XML difference.

The text was updated successfully, but these errors were encountered:

goneall · 2024-01-31T11:08:53Z

The text in the JSON file actually come from a text file and not the XML.

For context, please refer to this pull request for the tool that generates the JSON and website from the XML and test data: spdx/LicenseListPublisher#83

If the JSON data is incorrect, then the test data is incorrect.

BTW - there is a flag in the LicenseListPublisher tool to generate the JSON file from the XML instead of the test data. If we change the switch, it will reopen many issues raised in the above mentioned pull request.

workingjubilee · 2024-01-31T17:29:24Z

Referencing the Wayback Machine archive for http://affero.org/oagpl.html on 2006-01-05 gives me this:

AFFERO GENERAL PUBLIC LICENSE

Version 1, March 2002

Copyright © 2002 Affero Inc.
510 Third Street - Suite 225, San Francisco, CA 94107, USA

From this HTML:

     <td width="99%" valign="Top" align="Center">
      <div align="Left">
      <p><b><big><big>AFFERO GENERAL PUBLIC LICENSE</big></big></b><br>
      </p>
      <p><big>Version 1, March 2002</big><br>
      <br>
                    Copyright © 2002 Affero Inc.<br>
                    510 Third Street - Suite 225, San Francisco, CA 94107,
 USA</p>

So yes, it seems that in this case:

the AGPL-1.0 XML data is correct (or "more correct")
the AGPL-1.0 JSON data is incorrect
thus the AGPL-1.0 test data is incorrect

Obviously, no one is really using the AGPL 1.0 for new work right now, indeed as far as I am aware it was never very popular, and then the AGPL 3.0 happened only a few years later. But that was why I chose it as an initial test case: it's fairly easy to reference its canonical version, and I had, at the time, figured its lack of popularity meant there wouldn't be as much dispute over its exact contents, which is an issue that plagues e.g. MIT, the various BSD-N-clauses, etc.

swinslow · 2024-12-15T12:36:05Z

@workingjubilee Thanks for submitting this. Apologies as I may not be fully following the question here.

From SPDX's perspective, the primary purpose of the license list is to enable testing of license texts against the license list entries according to the SPDX Matching Guidelines. As reflected in the guidelines, whitespace differences are disregarded and all whitespace should be treated as a single blank space.

It has not been SPDX's purpose to ensure that website, JSON, etc. versions of license texts are exactly identical (especially from a whitespace perspective) to the license stewards' encodings of those license texts.

Just to confirm, then, are you seeing any differences in the substantive text between the original author's text for AGPL-1.0, and the version of AGPL-1.0 tracked by SPDX? Thanks!

swinslow · 2024-12-30T14:42:02Z

I haven't seen a response to the latest comment, so I'm going to go ahead and close this one. Please feel free to re-open a new issue if there are differences in substantive text (not just whitespace) between the original author's text and the version tracked by SPDX. Thank you!

jlovejoy added this to the 3.24 milestone Feb 7, 2024

jlovejoy added technical issue XML schema change potential policy change change to spec (also) labels Feb 7, 2024

swinslow modified the milestones: 3.24, 3.25 May 21, 2024

swinslow modified the milestones: 3.25.0, 3.26.0 Aug 19, 2024

swinslow closed this as not planned Won't fix, can't repro, duplicate, stale Dec 30, 2024

swinslow removed this from the 3.26.0 milestone Dec 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON and website inexactly match for AGPL-1.0 which forbids non-verbatim copies #2358

JSON and website inexactly match for AGPL-1.0 which forbids non-verbatim copies #2358

workingjubilee commented Jan 31, 2024

goneall commented Jan 31, 2024

workingjubilee commented Jan 31, 2024

swinslow commented Dec 15, 2024

swinslow commented Dec 30, 2024

JSON and website inexactly match for AGPL-1.0 which forbids non-verbatim copies #2358

JSON and website inexactly match for AGPL-1.0 which forbids non-verbatim copies #2358

Comments

workingjubilee commented Jan 31, 2024

goneall commented Jan 31, 2024

workingjubilee commented Jan 31, 2024

swinslow commented Dec 15, 2024

swinslow commented Dec 30, 2024