Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: need a way to overwrite node_selector labels #61

Open
rptaylor opened this issue Apr 19, 2024 · 1 comment
Open

[Issue]: need a way to overwrite node_selector labels #61

rptaylor opened this issue Apr 19, 2024 · 1 comment

Comments

@rptaylor
Copy link
Contributor

Problem Description

There are default node_selector labels: https://github.com/ROCm/k8s-device-plugin/blob/master/helm/amd-gpu/values.yaml#L35

In my environment (maybe my NFD is configured differently, not sure) I need feature.node.kubernetes.io/pci-1002.present instead of feature.node.kubernetes.io/pci-0300_1002.present.

If I set this in my values file, it is the normal behaviour of Helm to combine all the keys together, resulting in both labels being applied so the daemonset doesn't run anywhere.

This line 35 has been there for awhile so it is not related to any recent change. Unless feature.node.kubernetes.io/pci-0300_1002.present is actually incorrect for default NFD installations (?) it should remain unchanged. Instead there will need to be some logic that overwrites node_selector with the user values instead of appending them.

Operating System

N/A

CPU

N/A

GPU

AMD Instinct MI210

ROCm Version

ROCm 6.0.0

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

@danieljkemp
Copy link

danieljkemp commented Dec 23, 2024

I can confirm that this is still an issue, as I have just encountered it.

I was wondering why the pods weren't running then saw

node_selector:
  feature.node.kubernetes.io/pci-1002.present: "true"
  feature.node.kubernetes.io/pci-0300_1002.present: "true"

in the pod specs.

If you set

node_selector:
  feature.node.kubernetes.io/pci-1002.present: "true"
  feature.node.kubernetes.io/pci-0300_1002.present: null

you'll get the desired result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants