Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCTK 30.1.0 detects classpath-exception-2.0 based only on word "classpath" in Java comments #2769

Open
mjherzog opened this issue Nov 26, 2021 · 2 comments

Comments

@mjherzog
Copy link
Member

Description

Please leave a brief description of the bug or feature request:

SCTK reports a license_score=7.33 for a set of Java files based only on the word "classpath" in comments.
The match details are:
matched_rule_identifier = classpath-exception-2.0_5.RULE
matched_rule_matcher = 2-aho
matched_rule_length = 2
matched_rule_match_coverage = 2
matched-rule_relevance = 11

Since the word "classpath" is likely to appear frequently in Java comments, it would be good to avoid this false positive.

  • What version of scancode-toolkit was used to generate the scan file? version 30.1.0
@pombredanne
Copy link
Member

This make sense but https://github.com/nexB/scancode-toolkit/blob/develop/src/licensedcode/data/rules/classpath-exception-2.0_5.RULE has this text classpath exception

Would you have a file with the problematic detection to link or attach?

@pombredanne
Copy link
Member

Never mind I can the see the issue now. For instance this C++ snippet from https://github.com/SanDisk-Open-Source/SSD_Dashboard/blob/f0240a983544a86989eec80a9a5210f2b14fa1c1/uefi/gcc/gcc-4.6.3/libjava/gnu/classpath/jdwp/natVMVirtualMachine.cc#L280:

	    using namespace gnu::classpath::jdwp::exception;
	    throw new InvalidLocationException ();

is detected by this rule and with matched_text: classpath::[jdwp]::exception; and it should not be detected if there are extra words between classpath and exception

pombredanne added a commit that referenced this issue Dec 23, 2021
Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Dec 23, 2021
This is applying the renaming doone in the code to the actual rules

Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Dec 23, 2021
Rename all match filter functions to use more explict names.

Refactor function to set the lines as a LicenseMatch method.

Add misc. new and improved license detection rules.

Improve the order in which some match filters are processed.
For instance this help to ensure that non spurious smaller matches are
not merged and discarded in short spurious matches too early.

Refine non-continuous matches filter for #2769
Rename filter_if_only_known_words_rule() to filter_non_continuous_matches()
Also rename "continuous" Rule field to "is_continuous"

Add new filter_short_matches_scattered_on_too_many_lines() filter
This works by discarding some short matches that are scattered on too
many lines to be a correct match.

Improve overlapping filter for two-token matches that precede or
follow longer matches and overlap only on the word "license". In these
cases, these may be spurious and may be discarded.

Add new and improved license detection rules, and improve existing
license metadata.

Improve code formatting and logging.

Move model fields comments as help text on the model field defeinitions,
such as License and Rule. This will help generate API documentation
later.

Signed-off-by: Philippe Ombredanne <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants