Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Support PV/PVCs as an alternative to Object Storage for KFP #10510

Open
HumairAK opened this issue Feb 21, 2024 · 15 comments
Open

[feature] Support PV/PVCs as an alternative to Object Storage for KFP #10510

HumairAK opened this issue Feb 21, 2024 · 15 comments

Comments

@HumairAK
Copy link
Collaborator

HumairAK commented Feb 21, 2024

Feature Area

What feature would you like to see?

Currently there is a hard dependency in the launcher for object storage used for artifact passing. Users should have the option to configure a PV instead. This would drastically increase the different types of backing storage KFP can support by virtue of taking advantage of K8S storage class abstraction.

Note: Object storage is also used by the API Server for storing the Pipeline IR, but this dependency should be removed via: #10509

What is the use case or pain point?

Many end users operating in airgapped (or other) environments that do not have an enterprise ready object store solution, find it an unnecessary requirement when they have other means of storage hooked in to their k8s platform they would rather utilize instead.

Is there a workaround currently?

No


Love this idea? Give it a 👍.

@Tomcli
Copy link
Member

Tomcli commented Feb 21, 2024

One thing to consider in this case is the access control list. With pure PVC, you have no API to control who can access the artifacts. Since pipelines artifacts can be cached, not all the artifacts will be in the same pipeline subpath which can create extra complexity.

I would recommend to support another client that can support the some storage API where it can pass the artifact files into PVC without mounting to the task pod.

@gregsheremeta
Copy link
Contributor

I would recommend to support another client that can support the some storage API where it can pass the artifact files into PVC without mounting to the task pod.

Hi @Tomcli, can you rephrase or elaborate? I didn't understand this part.

@Tomcli
Copy link
Member

Tomcli commented Feb 22, 2024

I would recommend to support another client that can support the some storage API where it can pass the artifact files into PVC without mounting to the task pod.

Hi @Tomcli, can you rephrase or elaborate? I didn't understand this part.

By default file storage and pvc have no API. Therefore, projects like minio has the S3 API to help take the artifact file in the form of HTTP etc and store into the mounted pvc along with ACL to manage who owns the file. Without such API, if we volume mount every task pod to the PVC, we will be creating active volume connections per each task.

@Tomcli
Copy link
Member

Tomcli commented Feb 22, 2024

We have discussed in the past for the Tekton community on how it should implemented.
tektoncd/pipeline#4012

@juliusvonkohout
Copy link
Member

Actually KFP v1 still supports a PVC as object passing method natively. They just removed it in V2.

@chensun
Copy link
Member

chensun commented Feb 28, 2024

Actually KFP v1 still supports a PVC as object passing method natively. They just removed it in V2.

Using PVC for data passing is supported in v2: https://www.kubeflow.org/docs/components/pipelines/v2/platform-specific-features/

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Feb 28, 2024

Actually KFP v1 still supports a PVC as object passing method natively. They just removed it in V2.

Using PVC for data passing is supported in v2: https://www.kubeflow.org/docs/components/pipelines/v2/platform-specific-features/

This is manual data passing, not comparable to the old automatic data_passing_method
https://kubeflow-pipelines.readthedocs.io/en/1.8.22/source/kfp.dsl.html#kfp.dsl.PipelineConf.data_passing_method it is not even documented in V1 but it is in the code and works

@property
    def data_passing_method(self):
        return self._data_passing_method

    @data_passing_method.setter
    def data_passing_method(self, value):
        """Sets the object representing the method used for intermediate data
        passing.

        Example:
          ::

            from kfp.dsl import PipelineConf, data_passing_methods
            from kubernetes.client.models import V1Volume, V1PersistentVolumeClaimVolumeSource
            pipeline_conf = PipelineConf()
            pipeline_conf.data_passing_method =
            data_passing_methods.KubernetesVolume(
                volume=V1Volume(
                    name='data',
                    persistent_volume_claim=V1PersistentVolumeClaimVolumeSource('data-volume'),
                ),
                path_prefix='artifact_data/',
            )
        """
        self._data_passing_method = value

@HumairAK
Copy link
Collaborator Author

as @juliusvonkohout mentioned above, while users could do this for their individual pipelines

this issue requests a feature that does this automatically by default at the kfp level so users don't have to worry about this abstraction

Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Apr 30, 2024
@juliusvonkohout
Copy link
Member

/frozen
/lifecycle-frozen

@stale stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Apr 30, 2024
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jun 30, 2024
@juliusvonkohout
Copy link
Member

juliusvonkohout commented Jun 30, 2024

/lifecycle frozen

this is still relevant and a regression from V1

@google-oss-prow google-oss-prow bot added lifecycle/frozen and removed lifecycle/stale The issue / pull request is stale, any activities remove this label. labels Jun 30, 2024
@HumairAK
Copy link
Collaborator Author

HumairAK commented Jul 2, 2024

this is still relevant and a regression from V1

I might be missing something, how is this a regression from V1?

@juliusvonkohout
Copy link
Member

this is still relevant and a regression from V1

I might be missing something, how is this a regression from V1?

Because it was supported in V1 and removed in V2 as mentioned above.

@HumairAK
Copy link
Collaborator Author

HumairAK commented Jul 2, 2024

Once again this issue is about a blanket support for a kfp install that utilizes PVC over object store. Though what you seem to be talking about does not sound entirely unrelated, however it seems like it will require sdk changes as well. The use cases while related are slightly different:

  1. As a user I want to specify within my pipeline code a PVC to utilize for data passing
  2. As a user I want to deploy KFP so that a pvc (specified at deploy time) is utilized for data passing instead of object store by default for all pipeline runs

I suppose the assignee is welcome to address both of these, however afaik only (1) is a regression, (2) is a new feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants