Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pipeline-manager: pipeline GET selector and allow missing in POST/PUT #3161

Merged
merged 1 commit into from
Dec 18, 2024

Conversation

snkas
Copy link
Contributor

@snkas snkas commented Dec 16, 2024

The code query parameter for the retrieval of a list of pipelines is replaced because it is not descriptive enough and its boolean value is not expressive enough. It is replaced by the selector query parameter, which can be set to status for only status-related information or all (default) for all fields. The internal fields of the pipeline descriptor are no longer included in the API response.

When POST or PUT a pipeline, currently the API requires supplying a value for all fields. This commit instead allows the description, runtime_config, udf_rust, udf_toml, and program_config to not be specified (or null). Fields name and program_code remain required.

This commit includes edits to the CLI client fda and the Web Console to handle the aforementioned API changes.

@snkas snkas requested review from gz and Karakatiza666 December 16, 2024 14:05
Copy link
Collaborator

@gz gz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of unwrap() added now which wasnt there before? I wonder why these fields are now optional? Generally its nice to know that some fields are set (which they should be, e.g. pipeline name) and unwrap shouldnt be necessary so I am not convinced its a good change?

@snkas snkas force-pushed the pipeline-api branch 2 times, most recently from 7559c03 to 654b760 Compare December 16, 2024 18:02
@snkas
Copy link
Contributor Author

snkas commented Dec 16, 2024

Fair point, I've made the 13 fields which are common across both selectors to be required again (id, name, description, created_at, version, platform_version, program_version, program_status, program_status_since, deployment_status, deployment_status_since, deployment_desired_status, deployment_error). The other 6 fields (runtime_config, program_code, udf_rust, udf_toml, program_config, program_info) are still kept optional.

I've also separated it into PipelineInfo and PipelineSelectedInfo, such that endpoints that will always give back all fields will have no unnecessary options.

Could you take another look @gz ?

@ryzhyk
Copy link
Contributor

ryzhyk commented Dec 16, 2024

Does it need changes to Python as well (cc @abhizer )

@snkas
Copy link
Contributor Author

snkas commented Dec 16, 2024

As far as I could see, the Python API could take advantage of e.g., GET of only status, but it shouldn't be affected by the changes themselves. What do you think @abhizer ?

crates/fda/src/main.rs Show resolved Hide resolved
crates/fda/src/main.rs Show resolved Hide resolved
crates/pipeline-manager/src/api/examples.rs Outdated Show resolved Hide resolved
crates/pipeline-manager/src/api/examples.rs Outdated Show resolved Hide resolved
crates/pipeline-manager/src/integration_test.rs Outdated Show resolved Hide resolved
@gz
Copy link
Collaborator

gz commented Dec 17, 2024

When POST or PUT a pipeline, currently the API requires supplying a value for all fields. This commit instead allows the description, runtime_config, udf_rust, udf_toml, and program_config to not be specified (or null). Fields name and program_code remain required.

I think this change makes sense, I'm not so sure about the next ones though

The code query parameter for the retrieval of a list of pipelines is replaced because it is not descriptive enough and its boolean value is not expressive enough. It is replaced by the selector query parameter, which can be set to status for only status-related information or all (default) for all fields.

I don't understand the problem we're trying to solve? Bandwidth of loading a big pipeline? Why do we need to separate with a selector (could we just have a different endpoint to get status information only?). Or if bandwidth was a problem why isn't just having a code attribute enough? FWIW the current selector seems to also only have two states but only getting the code or excluding the code isn't one of them?

The internal fields of the pipeline descriptor are no longer included in the API response.

why do we need to remove them? is there harm in showing it?

@abhizer
Copy link
Collaborator

abhizer commented Dec 17, 2024

Does it need changes to Python as well

No, it doesn't require any changes to python. But once this goes in, I'll create a PR to use selector=status to take advantage of this.

@snkas
Copy link
Contributor Author

snkas commented Dec 17, 2024

Selecting only some of the fields can be relatively large (10-50 KiB for each pipeline, depending on program size, really large can be up to 4 MiB). The UI displays a status list of all the pipelines, which it fetches every 500ms or 1s. As such, let's say 10 pipelines, it can be 1 MiB/s, though it can get a lot higher when we're talking about 100s of pipelines (pagination might even be necessary in future). Hence the name status.

The current code attribute/implementation is not sufficient in that (1) it should also cover the program_info, and (2) its naming (what "code" does it cover, etc.) is not descriptive enough. With the new descriptor argument, it selects a subset of the fields depending on the descriptor.

Regarding internal fields, if they are not exposed through the API, then the internal workings can still be changed without having to do a major version bump. If that is not an issue/it's important for debugging at this stage, I can keep them in.

@snkas
Copy link
Contributor Author

snkas commented Dec 17, 2024

I'm not sure making it a separate endpoint would be a good idea, it seems difficult to name consistently. E.g., GET /v0/pipelines-filtered/<name>? This seems more like a limitation of OpenAPI not allowing to specify return type based on a query parameter, rather than an issue of the API itself. See also: https://www.openapis.org/blog/2023/12/06/openapi-moonwalk-2024 and OAI/OpenAPI-Specification#164

The `code` query parameter for the retrieval of a list of pipelines is
replaced because it is not descriptive enough and its boolean value is
not expressive enough. It is replaced by the `selector` query parameter,
which can be set to `status` for only status-related information or
`all` (default) for all fields. The internal fields of the pipeline
descriptor are no longer included in the API response.

When POST or PUT a pipeline, currently the API requires supplying a
value for all fields. This commit instead allows the `description`,
`runtime_config`, `udf_rust`, `udf_toml`, and `program_config` to not be
specified (or `null`). Fields `name` and `program_code` remain required.

This commit includes edits to the CLI client `fda` and the Web Console
to handle the aforementioned API changes.

Signed-off-by: Simon Kassing <[email protected]>
@snkas snkas enabled auto-merge December 18, 2024 10:37
@snkas snkas added this pull request to the merge queue Dec 18, 2024
Merged via the queue into main with commit 7289954 Dec 18, 2024
6 checks passed
@snkas snkas deleted the pipeline-api branch December 18, 2024 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants