aboutcode-org · AyanSinhaMahapatra · Apr 7, 2021 · Apr 2, 2021 · Apr 2, 2021 · Apr 2, 2021
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -1,5 +1,7 @@
-Release notes
--------------
-### Version 0.0.0
+Changelog
+=========
 
-*xxxx-xx-xx* -- Initial release.
+v21.4.2
+-------
+
+Initial release.
diff --git a/INSTALL.rst b/INSTALL.rst
@@ -1,9 +1,15 @@
-Quickstart - Scancode Plugin
-----------------------------
+Installation
+============
 
-``scancode-results-analyzer`` can be installed as a scancode post-scan plugin.
+The installation methods install the `scancode-analyzer` post-scan plugin, installed
+with `scancode`, extending it to have the `--analyze-license-results` option.
 
-1. Clone the Repository and navigate to the ``scancode-results-analyzer`` directory.
+Install Plugin from Source
+--------------------------
+
+``scancode-analyzer`` can be installed as a scancode post-scan plugin.
+
+1. Clone the Repository and navigate to the ``scancode-analyzer`` directory.
 
 2. Configure (Installs the requirements, and scancode-toolkit with the plugin)::
 
@@ -23,13 +29,24 @@ Quickstart - Scancode Plugin
 
 6. OR, import a JSON scan result and run the plugin on that scan::
 
-    scancode --json-pp results.json --from-json tests/data/results-test/selective-before-rules-added/only_errors.json --analyze-license-results
+    scancode --json-pp results.json --from-json path/to/scan_result.json --analyze-license-results
 
 .. note::
 
-    `scancode-results-analyzer` has required CLI options, as these produce attributes
+    `scancode-analyzer` has required CLI options, as these produce attributes
     essential to the analysis process. These are:
     `--license --info --license-text --is-license-text --classify`
     Even when loading from json, the scan generating these json files should have
     been run with this options for the analysis plugin to work.
 
+
+Install plugin via `pip`
+------------------------
+
+1. Install all `scancode` `prerequisites`_ and create a `virtualenvironment`_.
+
+2. Run `pip install scancode-analyzer` to install the latest version of Scancode Analyzer.
+
+
+.. _virtualenvironment: https://scancode-toolkit.readthedocs.io/en/latest/getting-started/install.html#installation-as-a-library-via-pip
+.. _prerequisites: https://scancode-toolkit.readthedocs.io/en/latest/getting-started/install.html#prerequisites
diff --git a/README.rst b/README.rst
@@ -1,19 +1,22 @@
-scancode-results-analyzer
-=========================
+scancode-analyzer
+=================
 
-.. what-is-scancode-results-analyzer
+.. what-is-scancode-analyzer
 
-What is Scancode-Results-Analyzer
----------------------------------
+What is Scancode-Analyzer
+-------------------------
 
-`ScanCode`_ detects licenses, copyrights, package manifests and direct dependencies and more both in source code and
-binary files.
+`ScanCode`_ detects licenses, copyrights, package manifests and direct dependencies and more both in
+source code and binary files.
 
-ScanCode license detection is using multiple techniques to accurately detect licenses based on automatons, inverted
-indexes and multiple sequence alignments. The detection is not always accurate enough. The goal of this project is to
-improve the accuracy of license detection leveraging the ClearlyDefined and other datasets, where ScanCode is used
-to massively scan millions of packages. It would also be available as a `ScanCode`_ ``post-scan`` plugin to use it
-in scans directly, or in `scancode.io`_ pipelines.
+ScanCode license detection is using multiple techniques to accurately detect licenses based on
+automatons, inverted indexes and multiple sequence alignments. As the detection supports approximate
+matching, there's a lot of `unknown` detections, or multiple approximate matches.
+
+The goal of this project is to improve the accuracy of license detection leveraging scancode scans,
+
+It is a `ScanCode`_ ``post-scan`` plugin to use it in scans directly, and in future as
+`scancode.io`_ pipelines, with better issue review and reporting features.
 
 This project aims to:
 
@@ -22,7 +25,7 @@ This project aims to:
 - Add this as a `scancode`_ post-scan plugin
 - Add to pipelines in `scancode.io`_
 - Write reusable tools and models to assist in the semi-automated reviews of scan results.
-- It will also create new license detection rules semi-automatically to fix the detected anomalies
+- It will also suggest new license detection rules semi-automatically to fix the detected anomalies
 
 .. _ScanCode: https://github.com/nexB/scancode-toolkit
 .. _scancode.io: https://github.com/nexB/scancode.io
@@ -37,12 +40,12 @@ Refer to the installation instructions on `INSTALL.rst`_
 Documentation
 -------------
 
-Documentation: https://scancode-results-analyzer.readthedocs.io/en/latest/ [WIP]
+Documentation: https://scancode-analyzer.readthedocs.io/en/latest/
 
 Project Board
 -------------
 
-`Project Board`_ for  ``scancode-results-analyzer`` : Analysing Scancode License Detection Results.
+`Project Board`_ for  ``scancode-analyzer`` : Analysing Scancode License Detection Results.
 
-.. _INSTALL.rst: https://github.com/nexB/scancode-results-analyzer/tree/master/INSTALL.rst
-.. _Project Board: https://github.com/nexB/scancode-results-analyzer/projects/1
+.. _INSTALL.rst: https://github.com/nexB/scancode-analyzer/tree/master/INSTALL.rst
+.. _Project Board: https://github.com/nexB/scancode-analyzer/projects/1
diff --git a/docs/source/analysis-use-case/suggesting-licenses.rst b/docs/source/analysis-use-case/suggesting-licenses.rst
@@ -56,7 +56,7 @@ The steps are as follows:
 1. First from the list of `license expressions`, all the `license expressions` are sorted according
    to their occurrences.
 
-2. Generic `license_expressions` like `unknown`, `warranty-disclaimer` are removed fro, this sorted
+2. Generic `license_expressions` like `unknown`, `warranty-disclaimer` are removed from this sorted
    list.
 
 3. If there's only one `license_expression` with the most number of occurrences, then that is the
@@ -73,7 +73,7 @@ The steps are as follows:
 1. The boolean value denoting the license type, i.e. license text/notice/tag/reference is determined
    from their respective class of problem, which they are already divided into.
 
-2. The ``ignorable`` attributes are added later by using scripts.
+2. The ``ignorable`` attributes could be added later by using scripts.
 
 3. The possible license id (like ``mit``) is predicted as the license ID of the match with the
    longest ``match_coverage``. This has to be manually verified in most cases.

diff --git a/docs/source/api-and-outputs/json-output.rst b/docs/source/api-and-outputs/json-output.rst
@@ -1,13 +1,13 @@
 JSON Output Format
 ==================
 
-`scancode-results-analyzer` is meant to be used as a post-scan Plugin for Scancode, where after
+`scancode-analyzer` is meant to be used as a post-scan Plugin for Scancode, where after
 running a scan, the scan results are then analyzed for scan errors, and that information is
 added to the scancode JSON results.
 
-Command Line Argument to use ``scancode-results-analyzer``: ``--analyze-license-results``
+Command Line Argument to use ``scancode-analyzer``: ``--analyze-license-results``
 
-Here's how example result-JSONs from `scancode-results-analyzer` could look like, post-analysis.
+Here's how example result-JSONs from `scancode-analyzer` could look like, post-analysis.
 
 .. _license_detection_issues_result_json:
 
@@ -23,13 +23,6 @@ for each resource in the codebase this list of dictionary will be added, where e
 is for each corresponding file-region :ref:`file_region`, having the results of the analysis for all
 the match(es) in that file-region.
 
-.. note::
-
-    [WIP]
-    There would also be a codebase-level dictionary added,
-    1. With statistics on the license_detection issues.
-    2. All the unique license detection issues and their occurrences.
-    3. Header information.
 
 .. code-block:: json
 
@@ -110,6 +103,7 @@ a file-region, and containing analysis results for all the license matches in a
                     "is_license_notice": true,
                     "is_license_tag": false,
                     "is_license_reference": false,
+                    "is_license_intro": false,
                     "analysis_confidence": "high",
                     "is_suggested_matched_text_complete": true
                 },
@@ -159,6 +153,9 @@ location.
                 "licenses": [
                   {
                     "key": "lgpl-2.0"
+                  },
+                  {
+                    "key": "gpl-3.0-plus"
                   }
                 ],
                 "licence_detection_issues": [
@@ -174,13 +171,19 @@ location.
                             "is_license_notice": true,
                             "is_license_tag": false,
                             "is_license_reference": false,
+                            "is_license_intro": false,
                             "analysis_confidence": "medium",
                             "is_suggested_matched_text_complete": true
                         },
                         "suggested_license": {
                             "license_expression": "lgpl-2.0-plus",
                             "matched_text": " *  licensed under the terms of the LGPL.... "
-                        }
+                        },
+                        "original_licenses": [
+                            {
+                                "key": "lgpl-2.0"
+                            }
+                        ]
                     },
                     {
                         "start_line": 54,
@@ -194,14 +197,19 @@ location.
                             "is_license_notice": true,
                             "is_license_tag": false,
                             "is_license_reference": false,
+                            "is_license_intro": false,
                             "analysis_confidence": "high",
                             "is_suggested_matched_text_complete": true
                         },
                         "suggested_license": {
                             "license_expression": "gpl-3.0-plus",
                             "matched_text": "\"genshellopt is free software: you can redistribute it and/or modify it under \\\nthe terms of the GNU General Public License as published by the Free Software \\\nFoundation, either version 3 of the License, or (at your option) any later \\\nversion."
                         },
-                        "original_licenses": []
+                        "original_licenses": [
+                            {
+                                "key": "gpl-3.0-plus"
+                            }
+                        ]
                     }
                 ]
             }
@@ -260,6 +268,7 @@ it is an empty list.
                     "is_license_notice": true,
                     "is_license_tag": false,
                     "is_license_reference": false,
+                    "is_license_intro": false,
                     "analysis_confidence": "medium",
                     "is_suggested_matched_text_complete": true
                 },
@@ -304,13 +313,19 @@ it is an empty list.
                     "is_license_notice": true,
                     "is_license_tag": false,
                     "is_license_reference": false,
+                    "is_license_intro": false,
                     "analysis_confidence": "medium",
                     "is_suggested_matched_text_complete": true
                 },
                 "suggested_license": {
                     "license_expression": "lgpl-2.0-plus",
                     "matched_text": " *  licensed under the terms of the LGPL. "
-                }
+                },
+                "original_licenses": [
+                    {
+                        "key": "unknown"
+                    }
+                ]
             }
         ]
     }
@@ -336,22 +351,24 @@ All Unique License Detection Issues
 
 .. code-block:: json
 
-    "unique_license_detection_issues": [
-        {
-            "unique_identifier": 1,
-            "files": [
-                {
-                    "path": "1921-socat-2.0.0-error.h",
-                    "start_line": 3,
-                    "end_line": 3
+    {
+        "unique_license_detection_issues": [
+            {
+                "unique_identifier": 1,
+                "files": [
+                    {
+                        "path": "1921-socat-2.0.0-error.h",
+                        "start_line": 3,
+                        "end_line": 3
+                    }
+                ],
+                "license_detection_issue": {
+                    "issue_category": "imperfect-match-coverage",
+                    "issue_description": "The license detection is inconclusive with high confidence, because only a small part of the rule text is matched."
                 }
-            ],
-            "license_detection_issue": {
-                "issue_category": "imperfect-match-coverage",
-                "issue_description": "The license detection is inconclusive with high confidence, because only a small part of the rule text is matched."
             }
-        }
-    ]
+        ]
+    }
 
 
 Basic Statistics
@@ -395,7 +412,7 @@ BERT model versions used.
 
     {
         "header": {
-            "tool_name": "scancode-results-analyzer",
+            "tool_name": "scancode-analyzer",
             "version": 0.1,
             "cases_version": 0.1,
             "ml_models": [
@@ -434,7 +451,7 @@ BERT model versions used.
 Related Issues
 --------------
 
-- `nexB/scancode-results-analyzer#22 <https://github.com/nexB/scancode-results-analyzer/issues/22>`_
-- `nexB/scancode-results-analyzer#20 <https://github.com/nexB/scancode-results-analyzer/issues/20>`_
-- `nexB/scancode-results-analyzer#21 <https://github.com/nexB/scancode-results-analyzer/issues/21>`_
+- `nexB/scancode-analyzer#22 <https://github.com/nexB/scancode-analyzer/issues/22>`_
+- `nexB/scancode-analyzer#20 <https://github.com/nexB/scancode-analyzer/issues/20>`_
+- `nexB/scancode-analyzer#21 <https://github.com/nexB/scancode-analyzer/issues/21>`_
 
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -17,8 +17,8 @@
 
 # -- Project information -----------------------------------------------------
 
-project = 'scancode-results-analyzer'
-copyright = '2020, nexb'
+project = 'scancode-analyzer'
+copyright = '2021, nexb'
 author = 'nexb'
 
 # -- General configuration ---------------------------------------------------

diff --git a/docs/source/how-analysis-is-performed/cases-incorrect-scans.rst b/docs/source/how-analysis-is-performed/cases-incorrect-scans.rst
@@ -134,6 +134,10 @@ All Issue Types
       - ``reference-false-positive``
       - A piece of code/text is incorrectly detected as a license.
 
+    * - ``intro``
+      - ``intro-unknown-match``
+      - A piece of common introduction to a license text/notice/reference is detected.
+
 .. _case_lic_text:
 
 License Texts

diff --git a/docs/source/how-analysis-is-performed/selecting-incorrect-unique.rst b/docs/source/how-analysis-is-performed/selecting-incorrect-unique.rst
@@ -44,7 +44,7 @@ Why we need to divide matches in a file into file-regions:
 
 2. If there are multiple matches in a region, they need to be analyzed as a whole, as even if most
    matches have perfect ``score`` and ``match_coverage``, only one of them with a imperfect
-   `match_coverage`` would mean there is a issue with that whole file-region. For example one
+   ``match_coverage`` would mean there is a issue with that whole file-region. For example one
    license notice can be matched to a notice rule with imperfect scores, and several small
    license reference rules.
 

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -1,18 +1,18 @@
-.. scancode-results-analyzer documentation master file, created by
+.. scancode-analyzer documentation master file, created by
    sphinx-quickstart on Fri Oct 30 21:27:08 2020.
    You can adapt this file completely to your liking, but it should at least
    contain the root `toctree` directive.
 
-Welcome to `scancode-results-analyzer` Documentation!
-=====================================================
+Welcome to `scancode-analyzer` Documentation!
+=============================================
 
 
 .. include:: ../../README.rst
-   :start-after: what-is-scancode-results-analyzer
+   :start-after: what-is-scancode-analyzer
    :end-before: from-github-links
 
-Getting Started with `scancode-results-analyzer`
-------------------------------------------------
+Getting Started with `scancode-analyzer`
+----------------------------------------
 
 .. toctree::
    :maxdepth: 3

diff --git a/scancode-analyzer.ABOUT b/scancode-analyzer.ABOUT
@@ -0,0 +1,7 @@
+about_resource: .
+name: scancode-analyzer
+license_expression: apache-2.0
+copyright: Copyright (c) nexB Inc. and others.
+homepage_url: https://github.com/nexB/scancode-analyzer
+vcs_url: git+https://github.com/nexB/scancode-analyzer
+bug_tracking_url: https://github.com/nexB/scancode-analyzer/issues