-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot create certain GCP GPU instances #1398
Comments
Hello, @awendel-presien! It looks like the instance maintenance behavior is already being set to |
Hi @0x2b3bfa0, thanks for having a look at this! Do you have any other ideas as to why it might return that error for the newer |
@0x2b3bfa0, unfortunately this error persists for us. As a test, I tried creating a So the problem only occurs when trying to start Any ideas? |
We managed to get this working by including the GPU type and number in the I think this is something that should at least be documented, because it is technically superfluous; i.e. It's also not necessary to specify the number and type of GPUs when using the Terraform Provider Iterative directly - it works just fine when only providing the machine type. And |
Hi @awendel-presien, Did you manage to run any |
@hopeai can you try as |
Thanks @dacbd, I was able to solve this problem by setting
|
My recommendation if its something that you encounter often would be to try some kind of simple bash loop, something like this: zones=("us-central1-a", "us-central1-b", "us-central1-c")
for zone in "{zones[@]}"; do
cml runner launch ... \
--region="$zone" \
...
if [ $? -eq 0 ]; then
echo "deploy runner in $zone"
break
else
echo "Runner in $zone failed, trying next zone"
fi
done (I haven't explicitly tested the above) @hopeai we aren't doing much active development on CML for the moment, but if you want to add this feature yourself, I'd be happy to prioritize testing any pull requests you make, and releasing any new additions. |
Thanks for the recommendation @dacbd. At the moment I'm using a similar bash loop, but I'd like to know if this is something that will be addressed in |
check quotas of your gcp account and try to provision resources accordingly via cml runner |
Any update regarding this? |
Hi everyone,
I'm getting the following error when trying to create A100 or L4 based instances on GCP using
cml runner launch
(a2-highgpu
andg2-standard
types respectively):I have no problem creating V100 and T4 instances (both
n1
types).I have found this discussion, which suggests the maintenance policy needs to be set to
TERMINATE
. Am I on the right track, and if yes, is there a way to do that usingcml runner launch
?Regards,
Alex.
The text was updated successfully, but these errors were encountered: