-
Notifications
You must be signed in to change notification settings - Fork 459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StudyJob won't start and StudyJob Controller keeps crashing (invalid memory address) #358
Comments
I doubt, it is a problem with template parsing. See ConfigMap example in https://github.com/kubeflow/katib/blob/master/examples/grid-example.yaml and usage in https://github.com/kubeflow/katib/blob/master/examples/grid-example.yaml To debug, can you also try by directly pasting template code using RawTemplate field(instead of TemplatePath) and run StudyJob again ? |
Hello, I'm using same installation as Marcel and the command line isn't working either. but nothing shows in Study List. Thanks for your help, |
do you mean |
Hello,
Is there any way to check the status of these jobs? Thanks, |
For status: |
FYI, an update to KubeFlow 4.1 fixed the issue with the studyjob-controller, it runs fine now. @juan-sv Are the jobs created via YAML appear in the UI? Do jobs created by the UI work? |
/assign |
Hello,
we are expriementing with Katib but cannot get it to work. The random-example from the website is working but nothing else. We can create the job (using the UI) and the resource appears in the cluster but it wont start, keeps sitting there for ages. Resoruces (CPU, GPU, Memory) are avialable in the cluster. The job's image is on docker hub, so no authentication issues should appear.
Here is the content of the resource:
On a maybe related note, the pod studyjob-controller keeps crashing. Removing and Readding it does not help. I tried multiple times to reinstall everything but it does not work. Here ist the log:
What is going wrong, is there any log of katib that we can look into? I had a look into all kativ related pods (katib-ui, vizier etc.) but none state something usefull.
Thanks
Marcel
The text was updated successfully, but these errors were encountered: