Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#514 Add some CC licenses #587

Merged
merged 2 commits into from
Apr 18, 2017

Conversation

aviaryan
Copy link
Contributor

@aviaryan aviaryan commented Apr 4, 2017

As discussed at #514 (comment), this PR adds some CC licenses to the ScanCode database.
These files were auto-generated using a script which can be found here (ugly code, no docs as of now). https://github.com/aviaryan/cc-licenses-parser-scancode

Note - I wasn't able to parse the category from the licenses. Therefore, they are fixed now at Copyleft Limited. @pombredanne Can you guide me a little bit about how these categories are decided?

@aviaryan
Copy link
Contributor Author

aviaryan commented Apr 5, 2017

@pombredanne Can you please help me interpret the build logs. I can't figure out how the diff in this PR causes the build to fail.

@pombredanne
Copy link
Member

@aviaryan Thanks! Please check https://github.com/nexB/aboutcode/wiki/Writing-good-commit-messages
As for the CI build logs they are exactly the same as a local test run

In your case if you check this job on windows for the license tests you see a failure in that test TestMatchAccuracyWithFullIndex.test_match_has_correct_line_positions_in_automake_perl_file that will likely fail too if you run the test suite locally.
And since we do not run tests verbosely on the CI (otherwise we may exceed their max log size) you can rerun this test locally and see it fail. (You will need to check the py.test help for these options):
py.test -vvs -k test_match_has_correct_line_positions_in_automake_perl_file will likely do it.

@aviaryan
Copy link
Contributor Author

aviaryan commented Apr 5, 2017

Hi @pombredanne
Thanks for replying.

[Don't bother; rebase is needed]
So I ran the test locally and this is what I get. As you can see, none of the licenses mentioned here belong to the set that I have changed in this PR. So I don't get what I have done wrong. Sorry for bugging you again but I'm almost lost here. How is the build failing showing only the un-changed license files.
PS - I ran the same test as the job you mentioned above. (py.test -vvs -k test_match_has_correct_line_positions_in_automake_perl_file)

E       AssertionError: assert [('gpl-2.0-pl...(1291, 1314))] == [('gpl-2.0-plu...(1292, 1315))]
E         At index 1 diff: (u'fsf-mit', (231, 238), Span(950, 1014)) != (u'fsf-mit', (231, 238), Span(951, 1015))
E         Full diff:
E         [(u'gpl-2.0-plus', (12, 25), Span(48, 159)),
E         -  (u'fsf-mit', (231, 238), Span(950, 1014)),
E         ?                                  ^     ^
E         +  (u'fsf-mit', (231, 238), Span(951, 1015)),
E         ?                                  ^     ^
E         -  (u'free-unknown', (306, 307), Span(1291, 1314))]
E         ?                                        ^     ^
E         +  (u'free-unknown', (306, 307), Span(1292, 1315))]
E         ?                                        ^     ^

UPDATE:
My base branch is outdated, let me just update it.

UPDATE:
Fixed commit message.

@aviaryan aviaryan force-pushed the batch-add-cc-licenses branch from 0082efc to d0d8992 Compare April 5, 2017 21:15
* cc-GPL-2.0-pt
* cc-LGPL-2.1-pt
* cc-by-nc-nd-2.0-at
* cc-by-nc-nd-2.0-au

Signed-off-by: Avi Aryan <[email protected]>
@aviaryan aviaryan force-pushed the batch-add-cc-licenses branch from d0d8992 to 78d3b3d Compare April 5, 2017 21:31
@@ -0,0 +1,11 @@
key: cc-GPL-2.0-pt
short_name: CC-GPL-2.0-PT
name: >-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the names on one line

@@ -0,0 +1,11 @@
key: cc-GPL-2.0-pt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use all lowercase keys

category: Copyleft Limited
owner: Creative Commons
homepage_url: https://creativecommons.org/licenses/GPL/2.0/
spdx_license_key: CC-GPL-2.0-PT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this is a valid SPDX license identifier at spdx.org?

@pombredanne
Copy link
Member

What is likely happening here that by adding new licenses especially the ones in Portuguese and German you introduced a new batch of words aka. tokens that were never seen in the current set of licenses and rules. The way matching works is based on this set of tokens and whether they are known or not. This may subtly affect the way words are found and change the offsets where they are found (as these may only consider known words). Or this highlighted a bug somewhere in the code.

The fix is easy: e.g. update the expected spans in the test with the new observed values. Yet this is likely the sign of some bug elsewhere: so the fix should not be applied for now.

What would be great is to manually check in develop and in your branch what the matched words are exactly on the query side by running a scan of the test files in that test with --license --diag --license-text and verify if and then why the captured license texts may be different (or the same).

Also fix spdx key values

Signed-off-by: Avi Aryan <[email protected]>
@pombredanne
Copy link
Member

Thanks for the quick commit!

@aviaryan
Copy link
Contributor Author

aviaryan commented Apr 6, 2017

@pombredanne After rebase, build is still failing in the same manner. So I tried your suggestion of testing how captured licenses are differing in develop and my branch but it seems that the results come out identical. (See https://www.diffchecker.com/FiIaJdna)
Is there a scancode param to let us know at which position was a license matched ? (not line number, but absolute char position).
As it seems from the error message I posted above, licenses are being detected correctly but they are being detected at a different position from what is specified in the test. How should I modify the numbers in the test?

Thanks.

@pombredanne
Copy link
Member

@aviaryan can you post the results of running scancode --license --diag --license-text --format json-pp on the same file as used in the failing test and in both develop and your branch?

that should tell us where the quirks is. It smells like a bug for now. Not in your code. But induced by the new data. So we need to fully investigate this

@aviaryan
Copy link
Contributor Author

aviaryan commented Apr 6, 2017

@aviaryan can you post the results of running scancode --license --diag --license-text --format json-pp on the same file as used in the failing test and in both develop and your branch?

Both outputs are identical. Please see https://www.diffchecker.com/hf6g9ZHZ for the logs.
I ran the following command.

./scancode tests/licensedcode/data/positions/automake.pl  --license --diag --license-text --format json-pp

Also, I would like to repeat the question that I asked in my previous comment.

Is there a scancode param to let us know at which position was a license matched ? (not line number, but absolute char position).

Thank you @pombredanne for your patience.
Regards.

@pombredanne
Copy link
Member

Is there a scancode param to let us know at which position was a license matched ? (not line number, but absolute char position).

not in the CLI. You could call the Index.match() and get LicenseMatch objects back with this shortcut. Note that the offset in a Span is never a character offset but instead a token position e.g. this is based on the tokenized "words" stream and not the character stream.

@pombredanne pombredanne merged commit 5f06d5b into aboutcode-org:develop Apr 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants