Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for dynamic quota in azurerm_cognitive_deployment #23988

Closed
1 task done
rmoesbergen opened this issue Nov 22, 2023 · 10 comments · Fixed by #28100
Closed
1 task done

Support for dynamic quota in azurerm_cognitive_deployment #23988

rmoesbergen opened this issue Nov 22, 2023 · 10 comments · Fixed by #28100

Comments

@rmoesbergen
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment and review the contribution guide to help.

Description

A azure cognitive services deployment now supports dynamic scaling of quota when capacity is available in the account. Please add this setting to the azurerm_cognitive_deployment terraform resource so it can be auto-provisioned. (The setting is called "Dynamic Quota" in the UI:

image

New or Affected Resource(s)/Data Source(s)

azurerm_cognitive_deployment

Potential Terraform Configuration

resource "azurerm_resource_group" "example" {
  name     = "example-resources"
  location = "West Europe"
}

resource "azurerm_cognitive_account" "example" {
  name                = "example-ca"
  location            = azurerm_resource_group.example.location
  resource_group_name = azurerm_resource_group.example.name
  kind                = "OpenAI"
  sku_name            = "S0"
}

resource "azurerm_cognitive_deployment" "example" {
  name                 = "example-cd"
  cognitive_account_id = azurerm_cognitive_account.example.id
  model {
    format  = "OpenAI"
    name    = "text-curie-001"
    version = "1"
  }

  scale {
    type = "Standard"
    capacity = "10" # K Transactions per minute
    dynamic = true  # <---- this would be the new setting
  }
}

References

https://microsoftlearning.github.io/mslearn-openai/Instructions/Labs/01-get-started-azure-openai.html#deploy-a-model

@rcskosir
Copy link
Contributor

Thank you for taking the time to open this feature request!

@unique-dominik
Copy link
Contributor

unique-dominik commented Jan 3, 2024

Good request 🍻

I observe that if you toggle this from the portal it sends:

{"displayName":"gpt-35-turbo-16k","sku":{"name":"Standard","capacity":240},"properties":{"model":{"format":"OpenAI","version":"0613","name":"gpt-35-turbo-16k"},"versionUpgradeOption":"NoAutoUpgrade","dynamicThrottlingEnabled":true,"raiPolicyName":"Microsoft.Nil"}}

and backward

{"displayName":"gpt-35-turbo-16k","sku":{"name":"Standard","capacity":240},"properties":{"model":{"format":"OpenAI","version":"0613","name":"gpt-35-turbo-16k"},"versionUpgradeOption":"NoAutoUpgrade","dynamicThrottlingEnabled":false,"raiPolicyName":"Microsoft.Nil"}}

Notice that the dynamicThrottlingEnabled flips.

Interestingly, on the cognitive_account there is a dynamic_throttling_enabled.

I try out later if these are equal or not 👀 I suspect no as accounts/create#dynamicThrottlingEnabled has its own dynamicThrottlingEnabled vs deployments/create#dynamicThrottlingEnabled

If we are lucky, they get inherited 🤣

@illgitthat
Copy link

Interestingly, on the cognitive_account there is a dynamic_throttling_enabled.

I try out later if these are equal or not 👀 I suspect no as accounts/create#dynamicThrottlingEnabled has its own dynamicThrottlingEnabled vs deployments/create#dynamicThrottlingEnabled

If we are lucky, they get inherited 🤣

That dynamic_throttling_enabled on the cognitive account resource level is different from the actual model deployments, like you mentioned.

I tried it and it gives:
DynamicThrottlingNotSupported: Thank you for your interest in Dynamic Throttling for Cognitive Services. This feature is currently not supported for the resource kind OpenAI and sku S0.

Thanks for opening this initial issue, would love to know if there is any planned update for this or I will investigate further on implementing this via azapi terraform provider.

@JorisAndrade
Copy link

Any news on this? From https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/dynamic-quota it is indeed dynamicThrottlingEnabled

az rest --method patch --url "https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.CognitiveServices/accounts/{accountName}/deployments/{deploymentName}?2023-10-01-preview" --body '{"properties": {"dynamicThrottlingEnabled": true} }'

@illgitthat
Copy link

@JorisAndrade you can accomplish this via azapi provider in the meantime. Hope this helps!

resource "azapi_resource" "model_deployment" {
  type                      = Microsoft.CognitiveServices/accounts/deployments@2023-10-01-preview
  schema_validation_enabled = false
  parent_id                 = cognitive_account.account.id
  name                      = "gpt-4"
  body = jsonencode({
    sku = {
      name     = "gpt-4",
      capacity = 1
    },
    properties = {
      model = {
        format  = "OpenAI"
        name    = "gpt-4"
        version = "1106-preview"
      },
      dynamicThrottlingEnabled = true
      versionUpgradeOption     = "OnceNewDefaultVersionAvailable" # Options: NoAutoUpgrade, OnceCurrentVersionExpired, OnceNewDefaultVersionAvailable
    }
  })
  depends_on = [cognitive_account.account]
}

@VickyWinner
Copy link

@illgitthat thanks for sharing. will try this out.
@rcskosir any ETA on the enhancement to be available as part of the provider?

@rcskosir
Copy link
Contributor

Thanks for reaching out, unfortunately I do not have an ETA on this enhancement. Any future work via the team or the community should end up linked here via a PR.

guilhem added a commit to guilhem/pandora that referenced this issue Mar 24, 2024
@guilhem
Copy link

guilhem commented Mar 24, 2024

I opened a PR on pandora to add https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/resource-manager/Microsoft.CognitiveServices/preview/2023-10-01-preview in https://github.com/hashicorp/go-azure-sdk

https://github.com/Azure/azure-rest-api-specs/blob/82f3d9571517966992eaf97b1db73f0a821cd06b/specification/cognitiveservices/resource-manager/Microsoft.CognitiveServices/preview/2023-10-01-preview/cognitiveservices.json#L3303

After done, we will be able to import it in provider to add this feature

@savage-alex
Copy link

Hi, thanks for raising this issue. Is there any idea of timeframe for this to get implemented please? Thank you again.

Copy link

github-actions bot commented Jan 2, 2025

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 2, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.