Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some ideas for extra heuristics #29

Open
pombredanne opened this issue Nov 13, 2020 · 2 comments
Open

Some ideas for extra heuristics #29

pombredanne opened this issue Nov 13, 2020 · 2 comments
Labels
enhancement New feature or request new heuristic Adds an enhancement in the analysis heuristic to make the issue detection more correct.

Comments

@pombredanne
Copy link
Member

from a chat with @maxhbr

  • a short license reference (single word?) detected after several 100 lines (say after the 1000's line is likely false positive

  • a short license reference (single word?) detected in a binary and where the actual text has a different case than the case of the rule is likely false positive. For instance gPL or gPl. This could be also things that are done directly in Scancode toolkit

@AyanSinhaMahapatra
Copy link
Member

@pombredanne The second case is definitely better off being as a scancode-toolkit change IMHO.

The first one could be added as a results-analyzer heuristic to detect false-positives, as there could be short references after 100 lines more often? Will integrate this as soon as the plugin is ready.

These extra heuristics would be very easy to add once the structure is ready, and would be very important in the analysis process, we could also look at more statistics to get more of these!

I'm pushing a PR for the docs at #22 where the current classification/heuristics are detailed.

Also, in both cases, shouldn't they be license-tags in place of license-reference? What I saw from examples, false positives mostly get detected by single-word license-tag rules?

Thanks!

AyanSinhaMahapatra added a commit to AyanSinhaMahapatra/scancode-analyzer that referenced this issue Jan 27, 2021
Adds an extra heuristic for false positives detection, based on
line number and rule length, and their related tests.

Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
@AyanSinhaMahapatra
Copy link
Member

This commit adds the first case of the extra heuristic, and partially solves this issue - #34

AyanSinhaMahapatra added a commit to AyanSinhaMahapatra/scancode-analyzer that referenced this issue Jan 27, 2021
Adds an extra heuristic for false positives detection, based on
line number and rule length, and their related tests.

Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
AyanSinhaMahapatra added a commit to AyanSinhaMahapatra/scancode-analyzer that referenced this issue Jan 27, 2021
Adds an extra heuristic for false positives detection, based on
line number and rule length, and their related tests.

Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
@AyanSinhaMahapatra AyanSinhaMahapatra added enhancement New feature or request new heuristic Adds an enhancement in the analysis heuristic to make the issue detection more correct. labels Jan 28, 2021
AyanSinhaMahapatra added a commit to AyanSinhaMahapatra/scancode-analyzer that referenced this issue Jan 28, 2021
Adds an extra heuristic for false positives detection, based on
line number and rule length, and their related tests. Fixes error
reporting issue and other keyerrors.

Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
AyanSinhaMahapatra added a commit to AyanSinhaMahapatra/scancode-analyzer that referenced this issue Jan 29, 2021
Adds an extra heuristic for false positives detection, based on
line number and rule length, and their related tests. Fixes error
reporting issue and other keyerrors.

Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
AyanSinhaMahapatra added a commit to AyanSinhaMahapatra/scancode-analyzer that referenced this issue Feb 12, 2021
Adds an extra heuristic for false positives detection, based on
line number and rule length, and their related tests. Fixes error
reporting issue and other keyerrors.

Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
AyanSinhaMahapatra added a commit to AyanSinhaMahapatra/scancode-analyzer that referenced this issue Feb 12, 2021
Adds an extra heuristic for false positives detection, based on
line number and rule length, and their related tests. Fixes error
reporting issue and other keyerrors.

Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request new heuristic Adds an enhancement in the analysis heuristic to make the issue detection more correct.
Projects
None yet
Development

No branches or pull requests

2 participants