Skip to content

Commit

Permalink
Add support for tagging input files in CLI and UI #708 (#1069)
Browse files Browse the repository at this point in the history
* Add support for tagging input files in CLI #708

Signed-off-by: tdruez <[email protected]>

* Add ability to update input source tag in UI #708

Signed-off-by: tdruez <[email protected]>

* Add changelog entry and unit tests #708

Signed-off-by: tdruez <[email protected]>

* Add support for tag in API #708

Signed-off-by: tdruez <[email protected]>

* Refine documentation for the tagging features #708

Signed-off-by: tdruez <[email protected]>

* Fix changelog #708

Signed-off-by: tdruez <[email protected]>

---------

Signed-off-by: tdruez <[email protected]>
  • Loading branch information
tdruez authored Feb 2, 2024
1 parent 6873047 commit f72d26a
Show file tree
Hide file tree
Showing 22 changed files with 302 additions and 59 deletions.
8 changes: 6 additions & 2 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,15 @@ Unreleased
- Improve the inspect_manifest pipeline to accept archives as inputs.
https://github.com/nexB/scancode.io/issues/1034

- Add support for "tagging" download URL inputs using the "#<fragment>" section of the
URL.
- Add support for "tagging" download URL inputs using the "#<fragment>" section of URLs.
This feature is particularly useful in the map_develop_to_deploy pipeline when
download URLs are utilized as inputs. Tags such as "from" and "to" can be specified
by adding "#from" or "#to" fragments at the end of the download URLs.
Using the CLI, the uploaded files can be tagged using the "filename:tag" syntax
while using the `--input-file` arguments.
In the UI, tags can be edited from the Project details view "Inputs" panel.
On the REST API, a new `upload_file_tag` field is available to use along the
`upload_file`.
https://github.com/nexB/scancode.io/issues/708

v33.0.0 (2024-01-16)
Expand Down
10 changes: 10 additions & 0 deletions docs/built-in-pipelines.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,16 @@ Load Inventory

Map Deploy To Develop
---------------------

.. warning::
This pipeline requires input files to be tagged with the following:

- "from": For files related to the source code (also known as "develop").
- "to": For files related to the build/binaries (also known as "deploy").

Tagging your input files varies based on whether you are using the REST API,
UI, or CLI. Refer to the :ref:`faq_tag_input_files` section for guidance.

.. autoclass:: scanpipe.pipelines.deploy_to_develop.DeployToDevelop()
:members:
:member-order: bysource
Expand Down
16 changes: 16 additions & 0 deletions docs/command-line-interface.rst
Original file line number Diff line number Diff line change
Expand Up @@ -87,9 +87,17 @@ Optional arguments:
- ``--input-file INPUTS_FILES`` Input file locations to copy in the :guilabel:`input/`
work directory.

.. tip::
Use the "filename:tag" syntax to **tag** input files:
``--input-file path/filename:tag``

- ``--input-url INPUT_URLS`` Input URLs to download in the :guilabel:`input/` work
directory.

.. tip::
Use the "url#tag" syntax to tag downloaded files:
``--input-url https://url.com/filename#tag``

- ``--copy-codebase SOURCE_DIRECTORY`` Copy the content of the provided source directory
into the :guilabel:`codebase/` work directory.

Expand Down Expand Up @@ -128,9 +136,17 @@ Adds input files in the project's work directory.
- ``--input-file INPUTS_FILES`` Input file locations to copy in the :guilabel:`input/`
work directory.

.. tip::
Use the "filename:tag" syntax to **tag** input files:
``--input-file path/filename:tag``

- ``--input-url INPUT_URLS`` Input URLs to download in the :guilabel:`input/` work
directory.

.. tip::
Use the "url#tag" syntax to tag downloaded files:
``--input-url https://url.com/filename#tag``

- ``--copy-codebase SOURCE_DIRECTORY`` Copy the content of the provided source directory
into the :guilabel:`codebase/` work directory.

Expand Down
35 changes: 35 additions & 0 deletions docs/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -143,3 +143,38 @@ You can refer to the :ref:`automation` to automate your projects management.
Also, A new GitHub action is available at
`scancode-action repository <https://github.com/nexB/scancode-action>`_
to run ScanCode.io pipelines from your GitHub Workflows.

.. _faq_tag_input_files:

How to tag input files?
-----------------------

Certain pipelines, including the :ref:`pipeline_map_deploy_to_develop`, require input
files to be tagged. This section outlines various methods to tag input files based on
your project management context.

Using download URLs as inputs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You can provide tags using the "#<fragment>" section of URLs. This tagging method is
universally applicable in the User Interface, REST API, and Command Line Interface.

Example:

.. code-block::
https://url.com/sources.zip#from
https://url.com/binaries.zip#to
Uploading local files
^^^^^^^^^^^^^^^^^^^^^

There are multiple ways to tag input files when uploading local files:

- **User Interface:** Utilize the "Edit flag" link in the "Inputs" panel of the Project
details view.

- **REST API:** Use the "upload_file_tag" field in addition to the "upload_file" field.

- **Command Line Interface:** Tag uploaded files using the "filename:tag" syntax.
Example: ``--input-file path/filename:tag``.
6 changes: 6 additions & 0 deletions docs/rest-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,11 @@ Using cURL:
To upload more than one file, you can use the :ref:`rest_api_add_input` endpoint of
the project.

.. tip::

To tag the ``upload_file``, you can provide the tag value using the
``upload_file_tag`` field.

Using Python and the **"requests"** library:

.. code-block:: python
Expand Down Expand Up @@ -222,6 +227,7 @@ This action adds provided ``input_urls`` or ``upload_file`` to the ``project``.
Data:
- ``input_urls``: A list of URLs to download
- ``upload_file``: A file to upload
- ``upload_file_tag``: An optional tag to add on the uploaded file

Using cURL to provide download URLs:

Expand Down
5 changes: 4 additions & 1 deletion scanpipe/api/serializers.py
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,7 @@ class ProjectSerializer(
help_text="Execute pipeline now",
)
upload_file = serializers.FileField(write_only=True, required=False)
upload_file_tag = serializers.CharField(write_only=True, required=False)
input_urls = StrListField(
write_only=True,
required=False,
Expand All @@ -182,6 +183,7 @@ class Meta:
"url",
"uuid",
"upload_file",
"upload_file_tag",
"input_urls",
"webhook_url",
"created_date",
Expand Down Expand Up @@ -265,6 +267,7 @@ def create(self, validated_data):
This ensures the Project data integrity before running any pipelines.
"""
upload_file = validated_data.pop("upload_file", None)
upload_file_tag = validated_data.pop("upload_file_tag", "")
input_urls = validated_data.pop("input_urls", [])
pipeline = validated_data.pop("pipeline", [])
execute_now = validated_data.pop("execute_now", False)
Expand All @@ -273,7 +276,7 @@ def create(self, validated_data):
project = super().create(validated_data)

if upload_file:
project.add_uploads([upload_file])
project.add_upload(upload_file, tag=upload_file_tag)

for url in input_urls:
project.add_input_source(download_url=url)
Expand Down
8 changes: 7 additions & 1 deletion scanpipe/api/views.py
Original file line number Diff line number Diff line change
Expand Up @@ -273,14 +273,20 @@ def add_input(self, request, *args, **kwargs):
return Response(message, status=status.HTTP_400_BAD_REQUEST)

upload_file = request.data.get("upload_file")
upload_file_tag = request.data.get("upload_file_tag", "")
input_urls = request.data.get("input_urls", [])

if not (upload_file or input_urls):
message = {"status": "upload_file or input_urls required."}
return Response(message, status=status.HTTP_400_BAD_REQUEST)

if upload_file:
project.add_uploads([upload_file])
project.add_upload(upload_file, tag=upload_file_tag)

# Add support for providing multiple URLs in a single string.
if isinstance(input_urls, str):
input_urls = input_urls.split()
input_urls = [url for entry in input_urls for url in entry.split()]

for url in input_urls:
project.add_input_source(download_url=url)
Expand Down
22 changes: 22 additions & 0 deletions scanpipe/forms.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@

from django import forms
from django.apps import apps
from django.core.exceptions import ObjectDoesNotExist
from django.core.exceptions import ValidationError

from taggit.forms import TagField
Expand Down Expand Up @@ -183,6 +184,27 @@ def save(self, project):
return project


class EditInputSourceTagForm(forms.Form):
input_source_uuid = forms.CharField(
max_length=50,
widget=forms.widgets.HiddenInput,
required=True,
)
tag = forms.CharField(
widget=forms.TextInput(attrs={"class": "input"}),
)

def save(self, project):
input_source_uuid = self.cleaned_data.get("input_source_uuid")
try:
input_source = project.inputsources.get(uuid=input_source_uuid)
except (ValidationError, ObjectDoesNotExist):
return

input_source.update(tag=self.cleaned_data.get("tag", ""))
return input_source


class ArchiveProjectForm(forms.Form):
remove_input = forms.BooleanField(
label="Remove inputs",
Expand Down
52 changes: 29 additions & 23 deletions scanpipe/management/commands/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ def add_arguments(self, parser):
parser.add_argument(
"--input-file",
action="append",
dest="inputs_files",
dest="input_files",
default=list(),
help="Input file locations to copy in the input/ work directory.",
)
Expand All @@ -171,28 +171,45 @@ def add_arguments(self, parser):
),
)

def handle_input_files(self, inputs_files):
"""Copy provided `inputs_files` to the project's `input` directory."""
@staticmethod
def extract_tag_from_input_files(input_files):
"""
Add support for the ":tag" suffix in file location.
For example: "/path/to/file.zip:tag"
"""
input_files_data = {}
for file in input_files:
if ":" in file:
key, value = file.split(":", maxsplit=1)
input_files_data.update({key: value})
else:
input_files_data.update({file: ""})
return input_files_data

def handle_input_files(self, input_files_data):
"""Copy provided `input_files` to the project's `input` directory."""
copied = []

for file_location in inputs_files:
for file_location, tag in input_files_data.items():
self.project.copy_input_from(file_location)
filename = Path(file_location).name
copied.append(filename)
self.project.add_input_source(filename=filename, is_uploaded=True)
self.project.add_input_source(
filename=filename,
is_uploaded=True,
tag=tag,
)

msg = f"File{pluralize(inputs_files)} copied to the project inputs directory:"
msg = f"File{pluralize(copied)} copied to the project inputs directory:"
self.stdout.write(msg, self.style.SUCCESS)
msg = "\n".join(["- " + filename for filename in copied])
self.stdout.write(msg)

@staticmethod
def validate_input_files(inputs_files):
"""
Raise an error if one of the provided `inputs_files` is not an existing
file.
"""
for file_location in inputs_files:
def validate_input_files(input_files):
"""Raise an error if one of the provided `input_files` entry does not exist."""
for file_location in input_files:
file_path = Path(file_location)
if not file_path.is_file():
raise CommandError(f"{file_location} not found or not a file")
Expand Down Expand Up @@ -224,17 +241,6 @@ def handle_copy_codebase(self, copy_from):
shutil.copytree(src=copy_from, dst=project_codebase, dirs_exist_ok=True)


def validate_input_files(file_locations):
"""
Raise an error if one of the provided `file_locations` is not an existing
file.
"""
for file_location in file_locations:
file_path = Path(file_location)
if not file_path.is_file():
raise CommandError(f"{file_location} not found or not a file")


def validate_copy_from(copy_from):
"""Raise an error if `copy_from` is not an available directory"""
if copy_from:
Expand Down
11 changes: 6 additions & 5 deletions scanpipe/management/commands/add-input.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ class Command(AddInputCommandMixin, ProjectCommand):

def handle(self, *args, **options):
super().handle(*args, **options)
inputs_files = options["inputs_files"]
input_files = options["input_files"]
input_urls = options["input_urls"]
copy_from = options["copy_codebase"]

Expand All @@ -41,14 +41,15 @@ def handle(self, *args, **options):
"Cannot add inputs once a pipeline has started to execute on a project."
)

if not (inputs_files or input_urls or copy_from):
if not (input_files or input_urls or copy_from):
raise CommandError(
"Provide inputs with the --input-file, --input-url, or --copy-codebase"
)

if inputs_files:
self.validate_input_files(inputs_files)
self.handle_input_files(inputs_files)
if input_files:
input_files_data = self.extract_tag_from_input_files(input_files)
self.validate_input_files(input_files=input_files_data.keys())
self.handle_input_files(input_files_data)

if input_urls:
self.handle_input_urls(input_urls)
Expand Down
11 changes: 5 additions & 6 deletions scanpipe/management/commands/create-project.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@

from scanpipe.management.commands import AddInputCommandMixin
from scanpipe.management.commands import validate_copy_from
from scanpipe.management.commands import validate_input_files
from scanpipe.management.commands import validate_pipelines
from scanpipe.models import Project

Expand Down Expand Up @@ -70,7 +69,7 @@ def add_arguments(self, parser):
def handle(self, *args, **options):
name = options["name"]
pipeline_names = options["pipelines"]
inputs_files = options["inputs_files"]
input_files = options["input_files"]
input_urls = options["input_urls"]
copy_from = options["copy_codebase"]
execute = options["execute"]
Expand All @@ -86,7 +85,8 @@ def handle(self, *args, **options):

# Run validation before creating the project in the database
pipeline_names = validate_pipelines(pipeline_names)
validate_input_files(inputs_files)
input_files_data = self.extract_tag_from_input_files(input_files)
self.validate_input_files(input_files=input_files_data.keys())
validate_copy_from(copy_from)

if execute and not pipeline_names:
Expand All @@ -100,9 +100,8 @@ def handle(self, *args, **options):
project.add_pipeline(pipeline_name)

self.project = project
if inputs_files:
self.validate_input_files(inputs_files)
self.handle_input_files(inputs_files)
if input_files:
self.handle_input_files(input_files_data)

if input_urls:
self.handle_input_urls(input_urls)
Expand Down
Loading

0 comments on commit f72d26a

Please sign in to comment.