Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot init nim cache with ReconcileFailed reason #271

Closed
RandyChen1985 opened this issue Dec 25, 2024 · 4 comments
Closed

cannot init nim cache with ReconcileFailed reason #271

RandyChen1985 opened this issue Dec 25, 2024 · 4 comments
Assignees

Comments

@RandyChen1985
Copy link

1. Quick Debug Information

  • OS/Version(e.g. RHEL8.6, Ubuntu22.04):
  • NIM Operator Version: 1.0.0
root@bcm10-headnode:~/nim-operator-workspace# kubectl describe nimcaches.apps.nvidia.com -n nim-service nimcache-llama3-8b-v1-3-3
Name:         nimcache-llama3-8b-v1-3-3
Namespace:    nim-service
Labels:       <none>
Annotations:  <none>
API Version:  apps.nvidia.com/v1alpha1
Kind:         NIMCache
Metadata:
  Creation Timestamp:  2024-12-25T03:05:12Z
  Finalizers:
    finalizer.nimcache.apps.nvidia.com
  Generation:        2
  Resource Version:  2248066
  UID:               3830e678-3c8a-48db-bb7e-26a9b0d5f785
Spec:
  Resources:
    Cpu:     0
    Memory:  0
  Source:
    Ngc:
      Auth Secret:  ngc-api-secret
      Model:
      Model Puller:  nvcr.io/nim/meta/llama-3.1-8b-instruct:1.3.3
      Pull Secret:   ngc-secret
  Storage:
    Pvc:
      Create:              true
      Size:                200Gi
      Storage Class:       local-path
      Volume Access Mode:  ReadWriteMany
Status:
  Conditions:
    Last Transition Time:  2024-12-25T03:05:12Z
    Message:               The PVC has been created for caching NIM model
    Reason:                PVCCreated
    Status:                True
    Type:                  NIM_CACHE_PVC_CREATED
    Last Transition Time:  2024-12-25T03:27:05Z
    Message:               yaml: unmarshal errors:
  line 1: cannot unmarshal !!str `2.0` into nimparser.NIMProfile
  line 2: cannot unmarshal !!str `auto` into nimparser.NIMProfile
  line 3: cannot unmarshal !!str `meta/ll...` into nimparser.NIMProfile
  line 4: cannot unmarshal !!str `1.3.0` into nimparser.NIMProfile
  line 6: cannot unmarshal !!seq into nimparser.NIMProfile
    Reason:  ReconcileFailed
    Status:  True
    Type:    NIM_CACHE_RECONCILE_FAILED
  State:     NotReady
Events:
  Type     Reason           Age                   From                 Message
  ----     ------           ----                  ----                 -------
  Warning  ReconcileFailed  27m                   nimcache-controller  NIMCache nimcache-llama3-8b-v1-3-3 reconcile failed, msg: Pod "nimcache-llama3-8b-v1-3-3-pod" not found
  Warning  ReconcileFailed  5m11s (x20 over 27m)  nimcache-controller  NIMCache nimcache-llama3-8b-v1-3-3 reconcile failed, msg: yaml: unmarshal errors:
  line 1: cannot unmarshal !!str `2.0` into nimparser.NIMProfile
  line 2: cannot unmarshal !!str `auto` into nimparser.NIMProfile
  line 3: cannot unmarshal !!str `meta/ll...` into nimparser.NIMProfile
  line 4: cannot unmarshal !!str `1.3.0` into nimparser.NIMProfile
  line 6: cannot unmarshal !!seq into nimparser.NIMProfile

Image
Image

Image

@RandyChen1985
Copy link
Author

and i change the image ,try again ,
the image 'nvcr.io/nim/meta/llama3-8b-instruct:1.0.3' is working good ,
the image 'nvcr.io/nim/meta/llama-3.1-8b-instruct:1.3.3' is not working good , but i use pvc cache model ,this image is also working good.

Image Image

@mkhaas
Copy link
Collaborator

mkhaas commented Dec 25, 2024

Thanks @RandyChen1985 for the detailed info. We have fixed this in patch release that will be available in early Jan.

@shivamerla
Copy link
Collaborator

@RandyChen1985 please use the image ghcr.io/nvidia/k8s-nim-operator:main to try out the fix meanwhile.

@shivamerla
Copy link
Collaborator

@RandyChen1985 The patch release v1.0.1 is out now. Please use the latest release and verify.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants