pipeline-manager: pipeline GET selector and allow missing in POST/PUT #3161

snkas · 2024-12-16T13:54:43Z

The code query parameter for the retrieval of a list of pipelines is replaced because it is not descriptive enough and its boolean value is not expressive enough. It is replaced by the selector query parameter, which can be set to status for only status-related information or all (default) for all fields. The internal fields of the pipeline descriptor are no longer included in the API response.

When POST or PUT a pipeline, currently the API requires supplying a value for all fields. This commit instead allows the description, runtime_config, udf_rust, udf_toml, and program_config to not be specified (or null). Fields name and program_code remain required.

This commit includes edits to the CLI client fda and the Web Console to handle the aforementioned API changes.

gz

There is a lot of unwrap() added now which wasnt there before? I wonder why these fields are now optional? Generally its nice to know that some fields are set (which they should be, e.g. pipeline name) and unwrap shouldnt be necessary so I am not convinced its a good change?

snkas · 2024-12-16T18:06:01Z

Fair point, I've made the 13 fields which are common across both selectors to be required again (id, name, description, created_at, version, platform_version, program_version, program_status, program_status_since, deployment_status, deployment_status_since, deployment_desired_status, deployment_error). The other 6 fields (runtime_config, program_code, udf_rust, udf_toml, program_config, program_info) are still kept optional.

I've also separated it into PipelineInfo and PipelineSelectedInfo, such that endpoints that will always give back all fields will have no unnecessary options.

Could you take another look @gz ?

ryzhyk · 2024-12-16T19:48:15Z

Does it need changes to Python as well (cc @abhizer )

snkas · 2024-12-16T20:15:53Z

As far as I could see, the Python API could take advantage of e.g., GET of only status, but it shouldn't be affected by the changes themselves. What do you think @abhizer ?

crates/fda/src/main.rs

crates/pipeline-manager/src/api/examples.rs

crates/pipeline-manager/src/integration_test.rs

gz · 2024-12-17T09:55:55Z

When POST or PUT a pipeline, currently the API requires supplying a value for all fields. This commit instead allows the description, runtime_config, udf_rust, udf_toml, and program_config to not be specified (or null). Fields name and program_code remain required.

I think this change makes sense, I'm not so sure about the next ones though

The code query parameter for the retrieval of a list of pipelines is replaced because it is not descriptive enough and its boolean value is not expressive enough. It is replaced by the selector query parameter, which can be set to status for only status-related information or all (default) for all fields.

I don't understand the problem we're trying to solve? Bandwidth of loading a big pipeline? Why do we need to separate with a selector (could we just have a different endpoint to get status information only?). Or if bandwidth was a problem why isn't just having a code attribute enough? FWIW the current selector seems to also only have two states but only getting the code or excluding the code isn't one of them?

The internal fields of the pipeline descriptor are no longer included in the API response.

why do we need to remove them? is there harm in showing it?

abhizer · 2024-12-17T11:07:54Z

Does it need changes to Python as well

No, it doesn't require any changes to python. But once this goes in, I'll create a PR to use selector=status to take advantage of this.

snkas · 2024-12-17T11:08:56Z

Selecting only some of the fields can be relatively large (10-50 KiB for each pipeline, depending on program size, really large can be up to 4 MiB). The UI displays a status list of all the pipelines, which it fetches every 500ms or 1s. As such, let's say 10 pipelines, it can be 1 MiB/s, though it can get a lot higher when we're talking about 100s of pipelines (pagination might even be necessary in future). Hence the name status.

The current code attribute/implementation is not sufficient in that (1) it should also cover the program_info, and (2) its naming (what "code" does it cover, etc.) is not descriptive enough. With the new descriptor argument, it selects a subset of the fields depending on the descriptor.

Regarding internal fields, if they are not exposed through the API, then the internal workings can still be changed without having to do a major version bump. If that is not an issue/it's important for debugging at this stage, I can keep them in.

snkas · 2024-12-17T12:07:35Z

I'm not sure making it a separate endpoint would be a good idea, it seems difficult to name consistently. E.g., GET /v0/pipelines-filtered/<name>? This seems more like a limitation of OpenAPI not allowing to specify return type based on a query parameter, rather than an issue of the API itself. See also: https://www.openapis.org/blog/2023/12/06/openapi-moonwalk-2024 and OAI/OpenAPI-Specification#164

The `code` query parameter for the retrieval of a list of pipelines is replaced because it is not descriptive enough and its boolean value is not expressive enough. It is replaced by the `selector` query parameter, which can be set to `status` for only status-related information or `all` (default) for all fields. The internal fields of the pipeline descriptor are no longer included in the API response. When POST or PUT a pipeline, currently the API requires supplying a value for all fields. This commit instead allows the `description`, `runtime_config`, `udf_rust`, `udf_toml`, and `program_config` to not be specified (or `null`). Fields `name` and `program_code` remain required. This commit includes edits to the CLI client `fda` and the Web Console to handle the aforementioned API changes. Signed-off-by: Simon Kassing <[email protected]>

snkas requested review from gz and Karakatiza666 December 16, 2024 14:05

Karakatiza666 approved these changes Dec 16, 2024

View reviewed changes

gz reviewed Dec 16, 2024

View reviewed changes

snkas force-pushed the pipeline-api branch 2 times, most recently from 7559c03 to 654b760 Compare December 16, 2024 18:02

gz reviewed Dec 17, 2024

View reviewed changes

abhizer mentioned this pull request Dec 17, 2024

[py] take advantage of GET selectors #3176

Open

gz approved these changes Dec 17, 2024

View reviewed changes

snkas force-pushed the pipeline-api branch from c564f1f to 760f902 Compare December 18, 2024 10:27

snkas enabled auto-merge December 18, 2024 10:37

snkas added this pull request to the merge queue Dec 18, 2024

Merged via the queue into main with commit 7289954 Dec 18, 2024
6 checks passed

snkas deleted the pipeline-api branch December 18, 2024 14:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pipeline-manager: pipeline GET selector and allow missing in POST/PUT #3161

pipeline-manager: pipeline GET selector and allow missing in POST/PUT #3161

snkas commented Dec 16, 2024

gz left a comment

snkas commented Dec 16, 2024

ryzhyk commented Dec 16, 2024 •

edited

Loading

snkas commented Dec 16, 2024

gz commented Dec 17, 2024

abhizer commented Dec 17, 2024

snkas commented Dec 17, 2024

snkas commented Dec 17, 2024 •

edited

Loading

pipeline-manager: pipeline GET selector and allow missing in POST/PUT #3161

pipeline-manager: pipeline GET selector and allow missing in POST/PUT #3161

Conversation

snkas commented Dec 16, 2024

gz left a comment

Choose a reason for hiding this comment

snkas commented Dec 16, 2024

ryzhyk commented Dec 16, 2024 • edited Loading

snkas commented Dec 16, 2024

gz commented Dec 17, 2024

abhizer commented Dec 17, 2024

snkas commented Dec 17, 2024

snkas commented Dec 17, 2024 • edited Loading

ryzhyk commented Dec 16, 2024 •

edited

Loading

snkas commented Dec 17, 2024 •

edited

Loading