-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDK/Components - Simplified _create_task_factory_from_component_spec function #662
Conversation
@@ -186,156 +186,50 @@ def _make_name_unique_by_adding_index(name:str, collection, delimiter:str): | |||
return unique_name | |||
|
|||
|
|||
#Holds the transformation functions that are called each time TaskSpec instance is created from a component. If there are multiple handlers, the last one is used. | |||
_created_task_transformation_handler = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some comments about _created_task_transformation_handler? The only place this is populated seems to be the line 195 so it is not clear why it needs to exist as a list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent is to set it in the Pipeline
context. I had that code in the PR, but then I decided to split that into its own PR. The intent will be more obvious there.
Python context managers that replace some value usually make the value holder a list so that the context is re-entrant - so that the code does not break if you happen to have nested contexts. https://docs.python.org/3/library/contextlib.html#reentrant-cms
The high-level idea is as follows: Conceptually, what you get when you give arguments to a Component is a Task. But currently we need to get a ContainerOp
instance. So there needs to be a TaskSpec
-> ContainerOp
transformation that's applied automatically when the pipeline is being compiled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think would be the best wording for a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some new cases (the existing cases will continue to work) load_component
will need to return TaskSpec
objects instead of ContainerOp
objects. So there needs to be a way to enable/disable the TaskSpec
conversion to ContainerOp
. The _created_task_transformation_handler
holds the transformation procedure that can be changed or disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since some of the code is for graph components, would you hold it off this when we decide graph component priority? Or, is it possible to decouple graph component from container component?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code change is a foundation that is needed for multiple puzzle pieces that me and other team members are delivering this quarter. It's needed for the following efforts:
- Debug a single component locally (container components for now, graph components - later)
- Submit a run for a single component which is needed for component tests (container components for now, graph components - later)
- Intermediate YAML API in the backend.
- Passing the pipeline metadata (description) and input/output metadata (names, descriptions, types) to the UI. After discussing with @vicaire, @IronPan and @gaoning777 it was decided to skip the "pass information through Argo template metadata" short-term approach I suggested and either go with the intermediate YAML only or attach that intermediate yaml to the end of Argo YAML.
- Implementing static type checking - All 6 cases (submission from UI, submission from Python, constant arguments in python pipeline, constant arguments in graph component, passing outputs to inputs when composing pipelines in Python, passing outputs to inputs when composing pipelines in graph component YAML)
- Artifact support. Artifacts are involved when passing data between components which happens in a graph and requires the graph support.
The reason this affects that much is that many features depend on having the full information which TaskSpec
has, but that's lost in transition to ContainerOp
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks @Ark-kun.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify: The actual change that will make "component task factory function" output TaskSpec
instead of ContainerOp
in some scenarios is coming in the next PR.
This PR is just a refactoring that does not change the behavior.
BTW, this was the intent behind the whole _dsl_bridge.py
file and the _task_object_factory
variable - bridge between the component structures and dsl structures like ContainerOp
https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/components/_dsl_bridge.py#L40 . It turned out the initial effort was insufficient since only limited information was passed to that handler.
this change was submitted 2 days ago and I was on it the same day or next. I want to clarify the priority of things so I have a clear view. It blocks the backend API? Is it because backend API depends on intermediate yaml? If so, that's exactly the thing I want to confirm (whether intermediate yaml is in scope). Add @paveldournov to comment or approve. |
It blocks the creation of the "submit intermediate YAML to backend" API that I was asked to prioritize this week. I already planned it for Q1, but I was asked to deliver it sooner.
I understand the possible confusion. The "Intermediate YAML" P0 CUJ was in the release planning doc, but it's not in the NEXT planning doc. That might be because it was rolled into the "Package and share reusable components and pipelines" which at some point was shortened even further. |
Why did the robot add l g t m? #662 (comment) I'm removing it since I'm not sure where it came from. |
/lgtm |
Sorry, I got confused because GitHub only updated the main discussion thread, but not the comment thread, so there wasn't any LGTM comment. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Ark-kun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1 similar comment
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Ark-kun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Simple refactoring:
Most of the code was moved to
_dsl_bridge.create_container_op_from_task
and is less indented now._dsl_bridge._create_task_object
to_create_container_op_from_resolved_task
and moved the output name sanitization there._components._created_task_transformation_handler
handler that specifies the transformation function that is called whenTaskSpec
instance is created fromComponentSpec
. Such transformation can for example convertTaskSpec
toContainerOp
.Rationale: When the pipeline author loads a component and uses it
the
train_test_task
is not always container task. Sometimes it's a graph task. In this cases it should not be converted toContainerOp
. It should remain aTaskSpec
. There are some other cases (e.g. creating graph components from a python pipeline function) whereTaskSpec
is needed.This change is