-
-
Notifications
You must be signed in to change notification settings - Fork 574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add or test all licenses in https://github.com/okfn/licenses #863
Comments
@pombredanne In this site(http://licenses.opendefinition.org/licenses/groups/all.json) there are more than 90 licenses . How do you want to go about it ? I know most of them are available in scancode but still running scans manually is a huge task .Right? |
can i take up this issue? @pombredanne |
@starlord1311 I think that @SaravananOffl is working on it, though you could split up the work alright |
IMHO
|
@pombredanne I'm thinking of writing a python script to get(i.e to scrap) the name of each license from the json file(http://licenses.opendefinition.org/licenses/groups/all.json).This method could certainly reduce the tasks for us. |
@SaravananOffl Are you done with the script ? |
@SaravananOffl Any word? |
@aviral1701 go for it :) |
Hi I wish to contribute to this issue. Is it available? |
@dakshaladia Sorry for the late reply... actually @AyanSinhaMahapatra is already working on that one |
@pombredanne I've already written a script to download all these license texts (and some notices) and performed a scan of them. In the process of analyzing the results, as the scan result json is pretty long, 10k lines. I had a few questions btw,
Btw the scan results
This is a rough summary of the license detection, out of 114 license texts. |
@pombredanne Could you take a look at these questions above? |
You may want to run one scan per file for ease of handling too. |
@AyanSinhaMahapatra you wrote:
This seems like a one off, so you can instead paste the script in the ticket or related PR comment.
In general yes, smaller PRs are easier for new licenses. And for rules, that's OK to have many a batch of new rules at once.
We likely already have the licenses for these, so there are likely very few new licenses to add, but we should consider adding rules if they are not detected correctly. (and withing reason, as the web scraping may be introducing quirks that we may not care for and therefore a new rule may not always be warranted .
for very old/deprecated licenses we want to have at least a license entry and possibly a few rules for their typical notices, but that should be rather limited rule-wise. Also the caveat about the bias of screen-scraped texts still applies. |
So this was before we had
Okay Sure, I'll paste that instead.
So yes, as these are scraped there are extra text, and although I cleaned a bit manually there are still texts that remain.
Yes, there aren't a lot of new ones so we should be good to add just those + and get rid of extra texts and see if there are any detection problems then.
Understood. |
So my plan was using these, as in all of the licenses in this issue as tests for scancode-results-analyzer, as in the problems being detected, and the rules generated as much as possible. You've tagged me in these two issues also, in the same sense - #2275 (comment) and #2274 (comment), these are also valid tests as the texts and rule will be generated automatically from the json scan results (not just that particular file, but the whole scan of that package). While looking at the scan results of the whole package to see if these cases are successfully detected, we detect more issues that we don't have tickets for, you'd remember this from our conversations, and you wanted me to add them to scancode asap. So I'll open one PR for these where all the rules are automatically generated, and we discuss if the rules and modifications in the Does that sound okay? |
Perfect! |
how i contributed |
We likely have most, but we need 1. tests, 2. add rules 3.eventually add new licenses
The text was updated successfully, but these errors were encountered: