Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding projects through the API runs all groups #1426

Closed
AyanSinhaMahapatra opened this issue Oct 31, 2024 · 7 comments · Fixed by #1442
Closed

Adding projects through the API runs all groups #1426

AyanSinhaMahapatra opened this issue Oct 31, 2024 · 7 comments · Fixed by #1442
Labels
bug Something isn't working

Comments

@AyanSinhaMahapatra
Copy link
Member

AyanSinhaMahapatra commented Oct 31, 2024

Sending the following scan to a scancode.io instance triggers a scan with all the groups of the selected pipeline:

import requests

api_url = "http://localhost/api/projects/"
data = {
    "name": "worksbench_via_api",
    "input_urls": "https://github.com/aboutcode-org/scancode-workbench/archive/refs/tags/v4.0.3.tar.gz",
    "pipeline": "inspect_packages",
    "execute_now": True,
}
response = requests.post(api_url, data=data)
response.json()

From the log:

2024-10-31 14:48:35.585 Pipeline [inspect_packages] starting
2024-10-31 14:48:35.587 Step [download_missing_inputs] starting
2024-10-31 14:48:35.588 Fetching input from https://github.com/aboutcode-org/scancode-workbench/archive/refs/tags/v4.0.3.tar.gz
2024-10-31 14:48:40.984 Step [download_missing_inputs] completed in 5 seconds
2024-10-31 14:48:40.986 Step [copy_inputs_to_codebase_directory] starting
2024-10-31 14:48:41.013 Step [copy_inputs_to_codebase_directory] completed in 0 seconds
2024-10-31 14:48:41.016 Step [extract_archives] starting
2024-10-31 14:48:42.888 Step [extract_archives] completed in 2 seconds
2024-10-31 14:48:42.891 Step [collect_and_create_codebase_resources] starting
2024-10-31 14:48:44.034 Step [collect_and_create_codebase_resources] completed in 1 seconds
2024-10-31 14:48:44.036 Step [flag_empty_files] starting
2024-10-31 14:48:44.040 Step [flag_empty_files] completed in 0 seconds
2024-10-31 14:48:44.043 Step [flag_ignored_resources] starting
2024-10-31 14:48:44.044 Step [flag_ignored_resources] completed in 0 seconds
2024-10-31 14:48:44.046 Step [scan_for_application_packages] starting
2024-10-31 14:48:44.156 Progress: 10% (44/440) ETA: 1 seconds
2024-10-31 14:48:44.172 Progress: 20% (88/440)
2024-10-31 14:48:44.192 Progress: 30% (132/440)
2024-10-31 14:48:44.219 Progress: 40% (176/440)
2024-10-31 14:48:44.246 Progress: 50% (220/440)
2024-10-31 14:48:44.273 Progress: 60% (264/440)
2024-10-31 14:48:44.311 Progress: 70% (308/440)
2024-10-31 14:48:44.360 Progress: 80% (352/440)
2024-10-31 14:48:44.403 Progress: 90% (396/440)
2024-10-31 14:48:49.462 Progress: 100% (440/440)
2024-10-31 14:50:45.416 Step [scan_for_application_packages] completed in 121 seconds (2.0 minutes)
2024-10-31 14:50:45.418 Step [resolve_dependencies] starting
2024-10-31 14:51:56.518 Step [resolve_dependencies] completed in 71 seconds (1.2 minutes)
2024-10-31 14:51:56.521 Pipeline completed in 201 seconds (3.3 minutes)

We can see that the resolve_dependencies step is run, even though the StaticResolver group in inspect_packages pipeline is optional.

There is also no clear way to select groups when starting a project from the API:
It's not possible to choose groups like it is done via the CLI: "pipeline": "inspect_packages:StaticResolver", is not supported. It returns: {'pipeline': ['"inspect_packages:StaticResolver" is not a valid choice.']}

We also do not see the selected groups correctly in the pipeline run modal when scan is triggered from the API, as in the above case, pipeline steps from the StaticResolver group is run, but this is not visible in the modal, but only in the log.

@AyanSinhaMahapatra AyanSinhaMahapatra added the bug Something isn't working label Oct 31, 2024
@tdruez
Copy link
Contributor

tdruez commented Nov 1, 2024

Sending the following scan to a scancode.io instance triggers a scan with all the groups of the selected pipeline

Yes, this is the expected behavior. When no groups are provided, all the steps are included.
See https://github.com/aboutcode-org/scancode.io/blob/main/aboutcode/pipeline/__init__.py#L52

There is also no clear way to select groups when starting a project from the API

I'm adding support for this in #1427

We also do not see the selected groups correctly in the pipeline run modal when scan is triggered from the API, as in the above case, pipeline steps from the StaticResolver group is run, but this is not visible in the modal, but only in the log.

This is also expected. The current "selected group" implementation allows you to limit your pipeline steps to the specified groups. If no groups are provided, all the steps are run during the pipeline execution. We could improve the group feature but this will require design and refactoring.

tdruez added a commit that referenced this issue Nov 1, 2024
tdruez added a commit that referenced this issue Nov 1, 2024
)

* Add pipeline selected groups in create project API endpoint #1426

Signed-off-by: tdruez <[email protected]>

* Add proper pipeline validation in the OrderedMultiplePipelineChoiceField #1426

Signed-off-by: tdruez <[email protected]>

---------

Signed-off-by: tdruez <[email protected]>
@tdruez
Copy link
Contributor

tdruez commented Nov 1, 2024

@AyanSinhaMahapatra You can now provide multiple pipelines during project creation using the API.
Added in #1427
https://scancodeio.readthedocs.io/en/latest/rest-api.html#create-a-project

@AyanSinhaMahapatra
Copy link
Member Author

Thanks @tdruez for adding the support!

Yes, this is the expected behavior.

Ah right, this makes sense.

But if we have a way to select Groups in a pipeline and the default pipeline (when no groups are given) is all the groups included, then it seems that there are no way to select a pipeline without any optional groups?

Like the inspect_packages pipeline was structured more towards the default pipeline being package scanning + assembly and then an optional group is present where we resolve dependencies statically. Was this a wrong pipeline design then? As it is not possible to de-select the Optional steps through the API?

  1. So if the default is to not select any options, and then if someone wants to select options they explicitly state it, we can run the pipeline with all possible combination of steps through the API, if the default is all optional steps enabled, then there is no way in the API to run the pipeline with just the mandatory steps.
  2. In the UI too, the experience is that if no groups/options are selected, then the pipeline is run without those steps, and the default is without any optional steps. Optional steps are only enabled on explicit selection. This is different from the API experience too.

So do you think we should:

  1. Make no optional steps the default for starting a project via the API
  2. Restructure the pipelines such that users should ideally select one/multiple/all optional groups, and the default is all groups selected

If no groups are provided, all the steps are run during the pipeline execution.

Whether the groups are provided, or selected automatically by default, I think it would be consistent to show the selected groups in the modal, whenever they are run. Otherwise it is required to know which steps are from which options, and check the logs to determine the run. Again as the UI and API defaults are different, it makes sense to be more verbose

What do you think?

@tdruez
Copy link
Contributor

tdruez commented Nov 1, 2024

then it seems that there are no way to select a pipeline without any optional groups?

Right, this is a flaw of the current approach (which was initially design for the m2m pipeline)

Make no optional steps the default for starting a project via the API

We could make this the default, but that should be the same default regardless of using the UI, API, or CLI. The default behavior should be consistent.


For now, only 3 pipelines leverage the groups:

  • inspect_packages
  • map_deploy_to_develop
  • resolve_dependencies

It's still time to review the group usage in those and see what would be the best refactoring.

@AyanSinhaMahapatra
Copy link
Member Author

@tdruez upon considering the above mentioned pipelines:

I think both the inspect_packages and map_deploy_to_develop pipelines have steps which are not part of any group which are quite valuable on their own.

In case of inspect_packages pipeline the dominant use case is actually without the optional group and even in case of map_deploy_to_develop the basic mapping/natching/ABOUT files steps can be used in an isolated manner too.

@pombredanne also had a suggestion sometime ago (I can't find the issue/comment) to rename groups to options which seems to favour these pipeline steps as optional add-ons in contrast to groups that can be disabled optionally.

It also makes less sense for example to run all the d2d steps for all groups by default in the API.

Even by design the UI/CLI rightly has the design that these pipeline steps or groups are optional, and enabled optionally upon explicit request, and I think this should be the case for the API as well.
This also gives us more flexibility in pipeline design with optional steps, and the option to run just the basic steps of the pipeline too, instead of being forced to select one/multiple groups or run all of them by default. The former can also run all the steps of a pipeline, but only by explicit choice.

What do you think?

@tdruez
Copy link
Contributor

tdruez commented Nov 6, 2024

@AyanSinhaMahapatra That makes sense, I can see this transition to "optional" steps.
Renaming to options and @optional_step would help to emphasize the "excluded by default" behavior.

@tdruez
Copy link
Contributor

tdruez commented Nov 7, 2024

@AyanSinhaMahapatra I've started the refactoring in #1442
Please have a look and let me know if anything else needs to be changed (I've tried to keep the changes to a minimum)
Also, the tests are failing following the behavior change, we need to ensure that the impact is expected.

tdruez added a commit that referenced this issue Nov 21, 2024
#1442)

* Refactor the `group` decorator for pipeline steps as `optional_step` #1426

Signed-off-by: tdruez <[email protected]>

* Add optional groups explicitly in pipeline tests

Signed-off-by: Ayan Sinha Mahapatra <[email protected]>

* Update documentation references of group to option

Signed-off-by: Ayan Sinha Mahapatra <[email protected]>

---------

Signed-off-by: tdruez <[email protected]>
Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
Co-authored-by: Ayan Sinha Mahapatra <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants