-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Argo artifacts #336
Comments
I'm working on support for this as part of my components effort. |
This is great! Would love this feature for local and on-prem pipelines |
The current plan is to support
There seems to be a couple difficulties with Argo artifact storage feature:
A volume implementation backed by an object store (GCS/S3) would provide something equivalent to Argo artifact storage but without the drawbacks. |
Support for artifact passing in DSL is independent from the low-level storage details. Features that are needed to support artifact passing in DSL:
Full artifact passing DSL example (based on the Argo's artifact-passing.yaml example):
The pipeline looks exactly the same as when the ops are passing parameters - no new constructs are introduced there. This pipeline is portable, works on-premise and does not depend on GCS. |
Any updates on this one? |
This one mentions a "future PR"
Does this PR exist yet? |
All the PRs related to artifact passing have been merged. The preferred ways to utilize the artifact passing is by either:
Let me help you with your use case. Using |
@kevinpauli I've updated the code in the comment: #336 (comment) |
@Ark-kun thank you so much for your response! My use case is to be able to use ContainerOps directly, to be able to do artifact passing in KFP just like Argo's artifact-passing.yaml example that is linked above. I wanted pretty much exactly what you had shown for the "future PR" code snippet I had referenced above. But still when I try it (using kfp 0.1.32), it fails to compile due to You say that it is "possible" using ContainerOp to directly consume artifacts that were produced in this same pipeline in an earlier step, but despite searching for a couple days I haven't been able to locate a working code example. In #791 when someone asks for example code using DSL, it refers back to this issue #336. Plus #791 seems to be focused on "raw" artifacts... in my case, I want to wire the output artifact of one component into the input of another. All with ContainerOp. So any help is much appreciated! |
@Ark-kun nevermind, I just re-read where you said we should use
Thanks! |
Hmm. Sorry for some confusion in this area. I understand that this part in ContainerOp is overly confusing. Adding artifact passing took a very long time and some intermediate parameters were added that are not needed in the final result.
Take a look at my Creating components from command-line programs sample. Component specifications are very similar to the |
Hi @Ark-kun , I want to do something similar to the example given here but instead of passing small text or local files or GCS Paths, I want to pass S3 paths. My use case is as follows: I need to download some files from a folder in an S3 bucket, do some processing on them, then upload some results to another S3 folder. Is this possible with this approach? I have read the data passing tutorial and the "Creating components from command-line programs" tutorial but I am still quite confused about how to achieve this. In the latter tutorial, it is not clear to me for example how the system decides which "Repo dir", sub, or GCS Path to return. Apologies if this is not the best place for this question. |
@ksonbol - did you find an answer. I have the exact same usecase, except I want to use the built-in "minio://" endpoint. I've created a folder in that repository and uploaded the content there; would like for the pipeline to automatically download the files. |
Hi @Ark-kun ! I've been trying to recreate pipeline above using reusable component but it's not working. Could you please show me how? This is what I'm doing producer_text = '''
name: producer
inputs:
- {name: text}
outputs:
- {name: text-artifact}
implementation:
container:
image: alpine
command:
- sh
- -c
- echo
- {inputValue: text}
- >
- /tmp/output.txt
fileOutputs:
text-artifact: /tmp/output.txt
'''
producer_op = components.load_component_from_text(producer_text)
consumer_text = '''
name: consumer
inputs:
- {name: Text}
implementation:
container:
image: alpine
command:
- cat
- {inputPath: Text}
'''
consumer_op = components.load_component_from_text(consumer_text)
@kfp.dsl.pipeline(
name='artifact-passing'
)
def artifact_passing():
producer_task = producer_op('Hello world!')
consumer_task = consumer_op(producer_task.outputs['text-artifact']) |
Argo has built in support for Artifacts, however that is not seem to be currently supported in pipelines.This is a critical feature, lack of which adds a lot of friction. Currently the only way to pass large objects ( images etc) is to actually copy them and read them back manually. Furthermore any support for caching artifacts based on the run / data used requires manual development. Version control and caching for artifacts is a separate feature ask though it ties to the over all experience hence adding here as well.
This is not a blocker as there are two work arounds as follows, however both add friction.
The text was updated successfully, but these errors were encountered: