Skip to content

Commit

Permalink
Fix documentation errors
Browse files Browse the repository at this point in the history
Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
  • Loading branch information
AyanSinhaMahapatra committed Feb 17, 2021
1 parent 74f3d45 commit 1b23422
Show file tree
Hide file tree
Showing 4 changed files with 43 additions and 29 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci-docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
run: cd docs && pip install -r requirements.txt

- name: Check Sphinx Documentation build minimally
run: sphinx-build -E source build
run: sphinx-build -E ./source ./build

- name: Check for documentation style errors
run: ./scripts/doc8_style_check.sh
Expand Down
5 changes: 2 additions & 3 deletions INSTALL.rst
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
Quickstart - Scancode Plugin
----------------------------

``scancode-results-analyzer`` can be installed as a scancode post-scan plugin,
using ``pip``.
``scancode-results-analyzer`` can be installed as a scancode post-scan plugin.

1. Clone the Repository and navigate to the ``scancode-results-analyzer`` directory.

2. Configure::
2. Configure (Installs the requirements, and scancode-toolkit with the plugin)::

./configure

Expand Down
34 changes: 25 additions & 9 deletions docs/source/how-analysis-is-performed/cases-incorrect-scans.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,53 +70,69 @@ All Issue Types
---------------

.. list-table::
:widths: 15 15
:widths: 5 15 15
:header-rows: 1

* - ``text/notice/tag/reference``
* - ``license``
- ``issue_type::classification_id``
- ``Description``

* - ``text``
- ``text-legal-lic-files``
- The matched text is present in a file whose name is a known legal filename.

* - ``text``
- ``text-non-legal-lic-files``
- The matched license text is present in a file whose name is not a known legal filename.

* - ``text``
- ``text-lic-text-fragments``
- ``lic-text-fragments``
- Only parts of a larger license text are detected.

* - ``notice``
- ``notice-and-or-with-notice``
- ``and-or-with-notice``
- A notice with a complex license expression (i.e. exceptions, choices or combinations).

* - ``notice``
- ``notice-single-key-notice``
- ``single-key-notice``
- A notice with a single license.

* - ``notice``
- ``notice-has-unknown-match``
- License notices with unknown licenses detected.

* - ``notice``
- ``notice-false-positive``
- A piece of code/text is incorrectly detected as a license.

* - ``tag``
- ``tag-tag-coverage``
- ``tag-low-coverage``
- A part of a license tag is detected

* - ``tag``
- ``tag-other-tag-structures``
- ``other-tag-structures``
- A new/common structure of tags are detected with scope for being handled differently.

* - ``tag``
- ``tag-false-positives``
- A piece of code/text is incorrectly detected as a license.

* - ``reference``
- ``reference-lead-in-or-unknown-refs``
- ``lead-in-or-unknown-reference``
- Lead-ins to known license references are detected.

* - ``reference``
- ``reference-low-coverage-refs``
- ``low-coverage-reference``
- License references with a incomplete match.

* - ``reference``
- ``reference-to-local-file``
- Matched to an unknown rule as the license information is present in another file,
which is referred to in this matched piece of text.

* - ``reference``
- ``reference-false-positive``
- A piece of code/text is incorrectly detected as a license.

.. _case_lic_text:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -98,20 +98,21 @@ this is efficient enough, and passes through the list of matches once.
File-regions with Incorrect Scans
---------------------------------

The attribute ``license_scan_analysis_result`` in the analysis results has information on if the
The attribute ``issue_id`` in the analysis results has information on if the
file-region has any license detection issue in it, bases on coverage values, presence of extra words
or false positive tags.

.. note::

The 6 possible values of ``license_scan_analysis_result`` are:
The 5 possible values of ``issue_id`` are:

1. ``correct-license-detection``
2. ``imperfect-match-coverage``
3. ``near-perfect-match-coverage``
4. ``extra-words``
5. ``false-positive``
6. ``unknown-match``
1. ``imperfect-match-coverage``
2. ``near-perfect-match-coverage``
3. ``extra-words``
4. ``false-positive``
5. ``unknown-match``

If we do not have an issue, it is a correct license detection.

Scancode detects most licenses accurately, so our focus is only on the parts where the detection has
issues, and so primarily in the first step we separate this from the Correct Scans.
Expand All @@ -126,7 +127,7 @@ So in ``Step 1``::
are wrong detections, and also detections where all the matches have a perfect
``match_coverage``, i.e. 100.

These fall into the first category::
These fall into the first category:

1. ``correct-license-detection``

Expand All @@ -151,7 +152,7 @@ There is also another case where ``score != matched_coverage * rule_relevance``,
some extra words, i.e. the entire rule was matched, but there were some extra words which caused the
decrease in score.

So the 3 category of issues as classified in this step are::
So the 3 category of issues as classified in this step are:

2. ``imperfect-match-coverage``
3. ``near-perfect-match-coverage``
Expand All @@ -165,12 +166,12 @@ less than a threshold (i.e. say less than 4 words) and the start-line of the mat
be more than a threshold (i.e. say more than 1000) for it to be considered a false positive.

This is the ``Step 3`` and here a NLP sentence Classifier could be used to improve accuracy.
The issue class is called::
The issue class is called:

5. ``false-positives``

Even if all the matches has perfect `match_coverage`, if there are `unknown` license
matches there, there's likely a license detection issue. This issue is a::
matches there, there's likely a license detection issue. This issue is a:

6. ``unknown-match``

Expand Down Expand Up @@ -212,8 +213,6 @@ I.e. the policy is::
“matched_rule_identifier” and “match_coverage” across these multiple files, we keep only
one file among them and discard the others.

This is performed in the summary plugin, where all the unique license detection issues are
reported in the summary together, each with a list of their occurrences.

For example, in `scancode-toolkit#1920 <https://github.com/nexB/scancode-toolkit/issues/1920>`_, socat-2.0.0 has
multiple (6) files with each file having the same 3 matched rules and match_coverage sets, i.e. -
Expand All @@ -226,5 +225,5 @@ So, we need to keep only one of these files, as the others have the same license

.. note::

This isn't followed in the ``scancode`` ``post-scan plugin`` as the processing is per-file,
and this is a codebase-level operation.
This is performed in the summary plugin, where all the unique license detection issues are
reported in the summary together, each with a list of their occurrences.

0 comments on commit 1b23422

Please sign in to comment.