Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backend] Unable to create directory in Minio when using Artifacts: Permission denied #10397

Closed
Tracked by #2763
jmaunon opened this issue Jan 15, 2024 · 24 comments
Closed
Tracked by #2763

Comments

@jmaunon
Copy link

jmaunon commented Jan 15, 2024

Hi Developers

I have tried to create a simple pipeline using and transfering data using "built-in" artifacts approach without success.
Difficult to say what is hapenning but I have found similar issues in other threads.

Please, if you know a manual patch, let us know. I see artifacts a core solution/approach.

cc: @juliusvonkohout , @chensun

I am aware that there are some issues related, but I do not see a final solution or alternative patch. See: #6530 , kubeflow/manifests#2573, #7629

Environment

Steps to reproduce

I get a permission denied error when using Artifacts.

Snippet of code:

@dsl.component(base_image="kubeflownotebookswg/jupyter-pytorch-full:v1.8.0-rc.0")
def download_data(test_path: Output[Dataset]):
    
    import torch
    
    from torchvision.transforms import ToTensor
    from torchvision.datasets import MNIST
    
    mnist_test  = MNIST(".", download=True, train=False, transform=ToTensor())
        
    with open(test_path.path, "wb") as f:
        torch.save(mnist_test,f)

@dsl.pipeline(
    name='mnist',
    description='Detect digits',
)
def run():
    step_1 = download_data()

client.create_run_from_pipeline_func(run)

Associated logs:

failed to execute component: unable to create directory "/minio/mlpipeline/v2/artifacts/mnist/43f760f9-b638-4129-87fe-602e24076beb/download-data" for output artifact "test_path": mkdir /minio: permission denied

Expected result

Work without issus

Materials and Reference


Impacted by this bug? Give it a 👍.

@juliusvonkohout
Copy link
Member

Please use the final 1.8 image, not jupyter-pytorch-full:v1.8.0-rc.0 and join the biweekly KFP meeting to discuss this.

@juliusvonkohout
Copy link
Member

You should also try to update from KFP 2.0.3 to 2.0.5 first.

@jmaunon
Copy link
Author

jmaunon commented Jan 16, 2024

Thans for the reply @juliusvonkohout . I write here my findings:

  • I have already tested with jupyter-pytorch-full:v1.8.0-rc.0 and jupyter-pytorch-full:v1.8.0 and the problem persist.
  • I have not updated to 2.0.5, so I cannot confirm if the error is fixed or not (I do not think so).

For any readers, I did not understand the explanation of #6530 but:

  • If I use base_image=python:3.10 the pipelines executes without problem because the user of the docker seems to be root. See associated dockerfiles
  • If use base_image=kubeflownotebookswg/jupyter-pytorch-full:v1.8.0 the pipeline raises the permission denied problem and i I see in the associated dockerfile that the user of the docker image is not root. See asociated dockerfile

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Jan 17, 2024

@rimolive this might be something to track for 1.9

@zijianjoy
Copy link
Collaborator

/assign @juliusvonkohout

@rimolive
Copy link
Member

rimolive commented Mar 6, 2024

We have an open PR for that #10538.

@rimolive
Copy link
Member

rimolive commented Mar 6, 2024

/assign @gregsheremeta

Copy link

@rimolive: GitHub didn't allow me to assign the following users: gregsheremeta.

Note that only kubeflow members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @gregsheremeta

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

github-actions bot commented May 6, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label May 6, 2024
Copy link

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

@majuss
Copy link

majuss commented May 28, 2024

/reopen
This issue still persists

Copy link

@majuss: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen
This issue still persists

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@rimolive
Copy link
Member

/reopen

Copy link

@rimolive: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@google-oss-prow google-oss-prow bot reopened this May 28, 2024
@github-actions github-actions bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label May 29, 2024
@thesuperzapper
Copy link
Member

This issue is actually because Kubeflow Pipelines requires that component containers run as root, the container you have chosen kubeflownotebookswg/jupyter-pytorch-full:v1.8.0-rc.0 runs as non-root.

There is a PR to fix this issue by mounting emptyDir volumes at the /minio and other paths, but that will need to be reviewed:

@chensun @HumairAK @Tomcli we definitely need to prioritize fixing this issue, because it's pretty bad to have a hard requirement on root container images.

@thesuperzapper
Copy link
Member

I also want to say that the lack of securityContext support is related to this, because if we had it, it would provide a possible workaround:

That is, if users could set the Pod securityContext, they could set runAsUser: 0 to override the UID of images which don't run as root by default.

@droctothorpe
Copy link
Contributor

We're running into this now. All our end user containers run as non-root to optimize security. This is a pretty universal expectation at any security sensitive company.

@droctothorpe
Copy link
Contributor

For anyone else running into this, we found a short-term workaround using kyverno that's not contingent on this PR being merged. Huge shout out to @moorthy156 for implementing it lightning fast. Just update the mountPath to minio or gcs or whatever else you need it to be.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: add-volume-mount-pipelineroot
spec:
  background: true
  failurePolicy: Ignore
  rules:
  - match:
      any:
      - resources:
          kinds:
          - Pod
          namespaceSelector:
            matchLabels:
              app.kubernetes.io/part-of: "kubeflow-profile"
          selector:
            matchExpressions:
            - key: pipelines.kubeflow.org/v2_component
              operator: In
              values:
              - "true"
    mutate:
      patchStrategicMerge:
        spec:
          volumes:
          - name: pipelineroot
          containers:
          - (name): main | wait
            volumeMounts:
            - mountPath: /s3
              name: pipelineroot
            env:
            - name: AWS_REGION
              value: us-east-1
    name: add-volume-mount-pipelineroot
    preconditions:
      all:
      - key: '{{ request.operation }}'
        operator: Equals
        value: CREATE

@thesuperzapper
Copy link
Member

Just wanted to update everyone that there is a new PR being worked on that will fix this issue:

@thesuperzapper
Copy link
Member

@chensun @james-jwu @zijianjoy can we please cherry-pick #10857 into the 2.2 branch, and cut a 2.2.1 release with this fix?

This is a very important issue, as it prevents non-root containers from working in pipeline steps, which stops many people adopting Kubeflow Pipelines.

@juliusvonkohout
Copy link
Member

@chensun @james-jwu @zijianjoy can we please cherry-pick #10857 into the 2.2 branch, and cut a 2.2.1 release with this fix?

This is a very important issue, as it prevents non-root containers from working in pipeline steps, which stops many people adopting Kubeflow Pipelines.

We can also do a follow up Kubeflow 1.9.1, but one way or the other we need a new release of KFP.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Sep 21, 2024
@thesuperzapper
Copy link
Member

This should have been resolved by #10857 in 2.3.0

/close

Copy link

@thesuperzapper: Closing this issue.

In response to this:

This should have been resolved by #10857 in 2.3.0

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@stale stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Sep 21, 2024
@github-project-automation github-project-automation bot moved this from Needs triage to Closed in KFP Runtime Triage Sep 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Closed
Development

No branches or pull requests

7 participants