Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve copyright detection #3910

Merged
merged 45 commits into from
Sep 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
1847eb4
Improve copyright detection
pombredanne Sep 6, 2024
ba17dbb
Improve copyright detection more
pombredanne Sep 6, 2024
1f5b8e6
Bump commoncode to latest version
pombredanne Sep 6, 2024
f248399
Use general test data files
pombredanne Sep 6, 2024
cf42292
Add new copyright tests
pombredanne Sep 6, 2024
dc1d358
Format test data
pombredanne Sep 6, 2024
d101937
Join lines in copyright tests
pombredanne Sep 6, 2024
e21fdea
Detect year in the form of 2001-201x
pombredanne Sep 6, 2024
e95f904
Add more copyright tests
pombredanne Sep 6, 2024
ed84329
Merge latest develop
pombredanne Sep 6, 2024
be5cdba
Align license and tests with latest copyright
pombredanne Sep 6, 2024
7f72ab9
Improve copyright detection
pombredanne Sep 6, 2024
49387c6
Correct copyright tests
pombredanne Sep 7, 2024
e272d56
Improve more copyrights from linux
pombredanne Sep 7, 2024
59e4121
Format code
pombredanne Sep 7, 2024
b56b961
Improve copyrights more
pombredanne Sep 7, 2024
4939a9b
Improve copyright detection more
pombredanne Sep 8, 2024
db79e7a
Improve copyright detection more
pombredanne Sep 8, 2024
9139681
Improve copyright detection
pombredanne Sep 8, 2024
6903f6f
Improv copyright detection even more
pombredanne Sep 8, 2024
bb30a1f
Ensure we run demarkup tests
pombredanne Sep 9, 2024
5062e3b
Use correct copyright expectation
pombredanne Sep 9, 2024
c1bd189
Remove impractical test
pombredanne Sep 9, 2024
bde4f92
Rename test files for clarity
pombredanne Sep 9, 2024
9c3d731
Fix test failures
AyanSinhaMahapatra Sep 9, 2024
f0e775d
Remove very large markup test files
pombredanne Sep 9, 2024
e76e5ab
Add tests for markup stripper
pombredanne Sep 9, 2024
f6696fe
Combine markup stripping code in markup.py
pombredanne Sep 9, 2024
370f1ec
Improve markup handling for copyrights
pombredanne Sep 10, 2024
b3caf23
Improve copyright detection
pombredanne Sep 11, 2024
7d46ce9
Refine marker stripper for copyrights
pombredanne Sep 11, 2024
e71f8af
Remove unused test code, inline imports
pombredanne Sep 11, 2024
2d5b269
Improve copyright detection
pombredanne Sep 11, 2024
fb8aea8
Add new copyright tests
pombredanne Sep 11, 2024
e1c12ca
Improve copyright detection
pombredanne Sep 11, 2024
b4309dd
Improve copyright detection more
pombredanne Sep 11, 2024
55b9717
Update tests for copyright with URLs
pombredanne Sep 11, 2024
79fc8ee
Fix copyright doctests
pombredanne Sep 11, 2024
cb40371
Improve copyright detection
pombredanne Sep 11, 2024
373cac7
Do not crash if license is missing
pombredanne Sep 11, 2024
5d2c0e7
Correct copyright tests
pombredanne Sep 11, 2024
c340812
Only skip tags with case considered
pombredanne Sep 11, 2024
9ec10fd
Update CHANGELOG for Copyright
pombredanne Sep 11, 2024
b30d99a
Fix tests for demarkup updates
pombredanne Sep 11, 2024
6594909
Merge latest develop
pombredanne Sep 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
4 changes: 4 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ v33.0.0 (next next, roadmap)
license/exception texts added, and also 1 license was deprecated.
For more details see https://github.com/aboutcode-org/scancode-toolkit/pull/3897

- New and improved copyright detection with many false positive removed
and refined detection added.


v32.2.1 - 2024-07-02
---------------------

Expand Down
2 changes: 1 addition & 1 deletion etc/scripts/gen_copyright_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ def create_copyright_tests(
content = url

if end_line != 0:
content = "".join(content.strip().splitlines()[start_line:end_line])
content = "\n".join(content.strip().splitlines()[start_line:end_line])

with open(name, "w") as out:
out.write(content)
Expand Down
4 changes: 2 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ chardet==5.0.0
charset-normalizer==2.1.0
click==8.1.7
colorama==0.4.5
commoncode==31.2.1
commoncode==32.0.0
construct==2.10.68
container-inspector==31.1.0
cryptography==42.0.5
Expand Down Expand Up @@ -54,7 +54,7 @@ ply==3.11
publicsuffix2==2.20191221
pyahocorasick==2.1.0
pycparser==2.21
pygmars==0.7.0
pygmars==0.9.0
Pygments==2.13.0
pymaven-patch==0.3.2
pyparsing==3.0.9
Expand Down
7 changes: 4 additions & 3 deletions setup-mini.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ install_requires =
chardet >= 3.0.0
click >= 6.7, !=7.0
colorama >= 0.3.9
commoncode >= 31.0.3
commoncode >= 32.0.0
container-inspector >= 31.0.0
debian-inspector >= 31.1.0
dparse2 >= 0.7.0
Expand Down Expand Up @@ -100,7 +100,7 @@ install_requires =
plugincode >= 32.0.0
publicsuffix2
pyahocorasick >= 2.0.0
pygmars >= 0.7.0
pygmars >= 0.9.0
pygments
pymaven_patch >= 0.2.8
requests >= 2.7.0
Expand All @@ -112,7 +112,7 @@ install_requires =
xmltodict >= 0.11.0
zipp >= 3.0.0; python_version < "3.9"
typecode >= 30.0.1
# typecode[full] >= 30.0.0
# typecode[full] >= 30.0.1
# extractcode[full] >= 31.0.0


Expand Down Expand Up @@ -199,6 +199,7 @@ scancode_post_scan =
filter-clues = cluecode.plugin_filter_clues:RedundantCluesFilter
consolidate = summarycode.plugin_consolidate:Consolidator
license-references = licensedcode.licenses_reference:LicenseReference
todo = summarycode.todo:AmbiguousDetectionsToDoPlugin
classify = summarycode.classify_plugin:FileClassifier


Expand Down
8 changes: 4 additions & 4 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ install_requires =
chardet >= 3.0.0
click >= 6.7, !=7.0
colorama >= 0.3.9
commoncode >= 31.0.3
commoncode >= 32.0.0
container-inspector >= 31.0.0
debian-inspector >= 31.1.0
dparse2 >= 0.7.0
Expand Down Expand Up @@ -100,7 +100,7 @@ install_requires =
plugincode >= 32.0.0
publicsuffix2
pyahocorasick >= 2.0.0
pygmars >= 0.7.0
pygmars >= 0.9.0
pygments
pymaven_patch >= 0.2.8
requests >= 2.7.0
Expand Down Expand Up @@ -138,9 +138,9 @@ testing =

docs =
Sphinx == 5.1.0
sphinx-rtd-theme >= 0.5.0
doc8 >= 0.8.1
sphinx_rtd_theme >= 0.5.1
sphinx-reredirects >= 0.1.2
doc8 >= 0.8.1
sphinx-autobuild
sphinx-rtd-dark-mode>=1.3.0
sphinx-copybutton
Expand Down
Loading
Loading