-
-
Notifications
You must be signed in to change notification settings - Fork 563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add all the CC licenses to the ScanCode detectable licenses #514
Comments
And this contains some code to handle them: https://github.com/warpr/licensedb.git |
What is needed to be done to solve this issue? |
well, this is about adding new licenses and rules and detection tests as needed by checking that each and every of the CC licenses, including all the translations exist as ScanCode licenses in src/licensedcode/data/licenses |
the best way to do this would be to use some simple script to automate the basics |
You should also check recent PRs related to adding new licenses, rules and tests for examples. |
Any particular one that you want to point out? |
Check the opened and closed PR. There are plenty of these and some are linked in the wiki doc on licenses |
So what you want to say is that I need to create a script that will check the file names and compares the file names in the directory and the link? And it will tell about the files names that does not exist in the directory you mentioned. |
Please see if the procedure that I am going to is right or wrong. Let's say that I am adding the license for this one: https://github.com/creativecommons/creativecommons.org/blob/45420471049bbf7a6420a12c737376b6bf3fc9dd/docroot/legalcode/by-nc-sa_3.0_es_gl.html First of all, I will check if the license already exist or not. As this is an HTML file I need open it in the browser. After that, I copy the content and save it in src/licensedcode/data/license, with a file name ending with .LICENSE and corresponding .yml file. And then I am done. |
@singh1114 let me come back with more details tomorrow. I have actually a script for something similar that I am working for SPDX licenses as part #41 .... Once I push this you could use it as a template and/or update this to support CC licenses. |
Hi @pombredanne . |
@aviaryan PR welcomed. The works consist in:
Not much more would be needed for now, as adding a license or rule automatically creates a test for it. |
In the first repo, licenses are in HTML form with HTML formatting, not plain text. I assumed we won't be able to use them as they are, would we? Also some html pages there point to an external link for the LICENSE. (Example) I am kind of occupied for the next 48 hours so will only be able to start work on this after that. I hope this isn't a problem. |
@aviaryan I think for the HTML format, we need to open the file in the browser and get the real content from it in the browser. @pombredanne was also mentioning something about the script. @pombredanne, Any update on that part? |
Yes, but I would just write a script to do that for me. |
@aviaryan That was what I wanted to say. Probably I missed it in the last comment. |
You could open the pages in lynx alright and you would get clean text.
This would eschew most of the parsing. |
@singh1114 my mention of a script is something different to deal with the need for a reconciliation with occurrences of existing license instances. (eg there are some cc licenses already there). |
Note also that some .txt file exist ... but most translations and older licenses do not have a plain text |
@pombredanne That script will be very useful. Does that mean, if the licenses have an HTML format we have to put the whole HTML text into the .txt file? |
@singh1114 we want the plain text, not the html. e.g. a lynx dump or similar, e.g no markup |
@pombredanne How should we name translated licenses? |
Thanks! A massive 800 files PR would take too long to review. |
@aviaryan note that what is as much of interest as the licenses is a script to automate adding these CC licenses if you wrote such thing. Manually is OK too as once we are over the big hurdle new additions will likely be small in the future |
@pombredanne Thanks for your feedback. Yes, I have written a script for the task. I am busy trying to make it as robust as possible. |
* cc-GPL-2.0-pt * cc-LGPL-2.1-pt * cc-by-nc-nd-2.0-at * cc-by-nc-nd-2.0-au Signed-off-by: Avi Aryan <[email protected]>
Also fix spdx key values Signed-off-by: Avi Aryan <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
* These introduce a bias in word frequencies that needs to be supported first. It otherwise skews license detection too much and is a risk for low perf and false positives. Link: #514 Signed-off-by: Philippe Ombredanne <[email protected]>
At this stage I checked we have all the CC licenses except the non-english translations. This is tracked in #139 so I am closing this now. |
See https://github.com/creativecommons/creativecommons.org/tree/45420471049bbf7a6420a12c737376b6bf3fc9dd/docroot/legalcode
With all the translations and variants, this would be a good addition even if some are rather exotic...
The text was updated successfully, but these errors were encountered: