Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor alerts for AKS failing in the latest AzureRM version #7910

Closed
prabhakarreddy1234 opened this issue Jul 27, 2020 · 5 comments · Fixed by #7995
Closed

Monitor alerts for AKS failing in the latest AzureRM version #7910

prabhakarreddy1234 opened this issue Jul 27, 2020 · 5 comments · Fixed by #7995

Comments

@prabhakarreddy1234
Copy link

prabhakarreddy1234 commented Jul 27, 2020

azurerm_monitor_metric_alert resource provisioning started failing after upgrading to AzureRM: 2.20.0. I Confirm that same configuration works well with AzureRM:2.18.0.

Exception :
Error creating or updating metric alert "kubernetes-pods-failing-alert" (resource group "rg-name"): insights.MetricAlertsClient#CreateOrUpdate: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="BadRequest" Message="Alert update failed. Updating from StaticThresholdCriteria and odata.type SingleResourceMultipleMetricCriteria to StaticThresholdCriteria and odata.type MultipleResourceMultipleMetricCriteria is not supported. Activity ID: 2af1fe71-2891-47ec-99b7-6b2b2d53a58b."

I see that there have been some feature enhancements on azurerm_monitor_metric_alert in AzureRM:2.19.0. So just wondering if i need to make any changes to my configuration.

It's happening for all AKS alerts.

  • azurerm_2.20.0
resource "azurerm_monitor_metric_alert" "kubernetes-pods-failing-alert" {
  name                = "pod-failure-rate"
  resource_group_name =  azurerm_resource_group.pftrans.name
  scopes              = [azurerm_kubernetes_cluster.pftrans.id]
  description         = "${title(var.environment_name)} - Pods failing in (${azurerm_kubernetes_cluster.pftrans.name})"

  criteria {
    metric_namespace = "Microsoft.ContainerService/managedClusters"
    metric_name      = "kube_pod_status_phase"
    aggregation      = "Average"
    operator         = "GreaterThan"
    threshold        = 1

    dimension {
        name     = "phase"
        operator = "Include"
        values   = ["Failed"]
    }

    dimension {
        name     = "namespace"
        operator = "Include"
        values   = [var.environment_name]
    }
  }

  action {
    action_group_id = data.azurerm_monitor_action_group.xxx.id
  }

  tags = local.common_tags
}
@prabhakarreddy1234 prabhakarreddy1234 changed the title Monitor alerts for AKS failing in the latest AzureRM versions Monitor alerts for AKS failing in the latest AzureRM version Jul 27, 2020
@aristosvo
Copy link
Contributor

Hi @prabhakarreddy1234 ! Thanks for raising this issue. I think I’ve spotted the issue:

As your error message already showed, it is a problem if an already created alert with SingleResourceMultipleMetricCriteria is updated to MultipleResourceMultipleMetricCriteria. After the introduction of the feature in azurerm_monitor_metric_alert all alerts are constructed as MultipleResource, even if there is only one resource.

The easiest solution would probably be to recreate your alerts by tainting them, but I agree that this issue should be solved in the code itself as well.

@aristosvo
Copy link
Contributor

@magodo Do you agree with my temporary solution?

For the definite solution, would you propose tainting based on a catched error message or reimplement it with backwards compatibility?

@magodo magodo self-assigned this Aug 3, 2020
@magodo
Copy link
Collaborator

magodo commented Aug 3, 2020

@prabhakarreddy1234 Apologize for your inconvinience :(

@aristosvo You are right, both the workaround and the root cause analysis. The new AKS cluster will always use MultipleResourceMultipleMetricCriteria`. I'll try figure out a way to make it backwards compatible.

@katbyte katbyte added this to the v2.23.0 milestone Aug 11, 2020
@katbyte katbyte added the bug label Aug 11, 2020
katbyte pushed a commit that referenced this issue Aug 11, 2020
…tiMetricCriteria for legacy metric alerts (#7995)

The change in #7159 deprecates the usage of SingleResourceMultiMetricCriteria outright (replaced by MultipleResourceMultipleMetricCriteria). Unfortunately, that breaks the users who have metric alert created before that PR merged, which was using SingleResourceMultiMetricCriteria as its type. Then once those metric alerts get updated, current code will trigger an update from SingleResourceMultiMetricCriteria to MultipleResourceMultipleMetricCriteria, which seems not supported by service (as reported in #7910).

This PR keeps those legacy resource to use the SingleResourceMultiMetricCriteria. If user wants to use the MultipleResourceMultipleMetricCriteria, they will have to recreate the resource.

(fixes #7910)
@ghost
Copy link

ghost commented Aug 13, 2020

This has been released in version 2.23.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:

provider "azurerm" {
    version = "~> 2.23.0"
}
# ... other configuration ...

@ghost
Copy link

ghost commented Sep 10, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked and limited conversation to collaborators Sep 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.