Skip to content

v32.0.0rc1

Pre-release
Pre-release
Compare
Choose a tag to compare
@github-actions github-actions released this 22 Jan 17:52
· 1036 commits to develop since this release
18a842e

This is a major new release with API breaking changes.
v32.0.0rc1 is the first release candidate and we expect to have a few more.

Important API changes:

This is a major release with major API and output format changes and significant
feature updates.

In particular changed to the output format for the licenses and packages, and
we changed some of the command line options.

The output format version is now 3.0.0.

Package detection:

  • Update GemfileLockParser to track the gem which the Gemfile.lock is for,
    which we assign to the new GemfileLockParser.primary_gem field. Update
    GemfileLockHandler.parse() to handle the case where there is a primary gem
    detected from a gemfile.lock. If there is a primary gem, a single Package
    is created and the detected gem data within the gemfile.lock are assigned as
    dependencies. If there is no primary gem, then all of the dependencies are
    collected into Package with no name and yielded.

    #3072

  • Fix issue where dependencies were not reported when scanning an extracted
    Python project by modifying BaseExtractedPythonLayout.assemble() to favor
    using package data from a PKG-INFO file from an egg-info directory. Package
    data from a PKG-INFO file from an egg-info directory contains the dependency
    information collected from the requirements.txt file along side PKG-INFO.

    #3083

  • Fix issue where we were returning incorrect purl package type for cocoapods.
    pods was being returned as a purl type for cocoapods, it should be
    cocoapods instead.
    https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#cocoapods

    #3081

  • Code for parsing a Maven POM, npm package.json, freebsd manifest and haxelib
    JSON have been separated into two functions: one that creates a PackageData
    object from the parsed Resource, and another that calls the previous function
    and yields the PackageData. This was done such that we can use the package
    manifest data parsing code outside of the scancode-toolkit context in other
    libraries.

License detection:

  • The SPDX license list has been updated to the latest v3.19

  • This is a major update to license detection where we now combine one or more
    license matches in a larger license detection. This approach improves the
    accuracy of license detection and removes a larger number of false positive
    or ambiguous license detections. See for details
    #2878

  • There is a new license_detections codebase level attribute with all the
    unique license detections in the whole scan, both in resources and packages.
    This has the 3 attributes also present in package/resource level license
    detections: license_expression, matches and detection_log and has
    two additional attributes:

    • identifier: which is the license_expression with an UUID created out
      of the detection contents and is the same for same detections.

    • count: Number of times in the codebase this unique license detection
      was encountered.

  • The data structure of the JSON output has changed for licenses at file level:

    • The licenses attribute is deleted.

    • A new for_license_detections attribute is aded which references the codebase
      level unique license detections, and this is a list of identifer strings from
      the codebase level license detections it references.

    • A new license_detections attribute contains license detections in that file.
      This object has three attributes: license_expression, detection_log
      and matches. matches is a list of license matches and is roughly
      the same as licenses in the previous version with additional structure
      changes detailed below.

    • A new attribute license_clues contains license matches with the
      same data structure as the matches attribute in license_detections.
      This contains license matches that are mere clues and where not considered
      to be a proper conclusive license detection.

    • The license_expressions list of license expressions is deleted and
      replaced by a detected_license_expression single expression.
      Similarly spdx_license_expressions was removed and replaced by
      detected_license_expression_spdx.

    • See license updates documentation <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#change-in-license-data-format-resource>_
      for examples and details.

  • The data structure of license attributes in package_data and the codebase
    level packages has been updated accordingly:

    • There is a new license_detections attribute for the primary, top-level
      declared licenses of a package and an other_license_detections attribute
      for the other secondary detections.

    • The license_expression is replaced by the declared_license_expression
      and other_license_expression attributes with their SPDX counterparts
      declared_license_expression_spdx and other_license_expression_spdx.
      These expressions are parallel to detections.

    • The declared_license attribute is renamed extracted_license_statement
      and is now a YAML-encoded string.

      See license updates documentation <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#change-in-license-data-format-package>_
      for examples and details.

  • The license matches structure has changed: we used to report one match for each
    license key of a matched license expression. We now report instead one
    single match for each matched license expression, and list the license keys
    as a licenses attribute. This avoids data duplication.
    Inside each match, we list each match and matched rule attributred directly
    avoiding nesting. See license updates doc <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#licensematch-result-data>_
    for examples and details.

  • There are new and codebase level attributes default with --licenses to report
    reference license metadata and texts once for each license matched across the
    scan; we now have two codebase level attributes: license_references and
    license_rule_references that list unique detected license and license rules.
    for examples and details. This reference data is also removed from license matches
    in all levels i.e. from codebase, package and resource level license detections and
    resource level license clues.
    See license updates documentation <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#comparision-before-after-license-references>_

  • We replaced the scancode --reindex-licenses command line option with a
    new separate command named scancode-reindex-licenses.

    • The --reindex-licenses-for-all-languages CLI option is also moved to
      the scancode-reindex-licenses command as an option --all-languages.

    • We can now detect licenses using custom license texts and license rules
      stored in a directory or packaged as a plugin for consistent reuse and deployment.

    • There is an --additional-directory option with the scancode-reindex-licenses
      command to add the licenses from a directory.

    • There is also a --only-builtin option to use ony builtin licenses
      ignoring any additional license plugins.

    • See #480 for more details.

  • We combined the licensedata file and text file of each license in a single
    file with a .LICENSE extension. The .yml data file is now included at the
    top of each .LICENSE file as "YAML frontmatter". The same applies to license
    rules and their .RULE and .yml files. This halves the number of data files
    from about 60,000 to 30,000. Git line history is preserved for the combined
    text + yml files.

  • There is a new console script scancode-license-data to export
    license data in JSON, YAML and HTML, with indexes and a static website for use
    in the licensedb web site. This becomes the API way to getr scancode license
    data.

    See #2738

  • The deprecated "--is-license-text" option has been removed.
    This is now built-in with the --license-text option and --info
    and exposed with the "percentage_of_license_text" attribute.

All Changes

New Contributors

Full Changelog: v31.2.4...v32.0.0rc1