Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redeploy an bicep deployment, which contains an APP-GW results in deleting the existing backend pools, listeners and rules #2316

Open
dirien opened this issue Apr 19, 2021 · 35 comments

Comments

@dirien
Copy link

dirien commented Apr 19, 2021

Bicep version
Bicep CLI version 0.3.255 (589f037)

Describe the bug
TL;DR: Redeploy an bicep deployment, which contains an APP-GW results in deleting the existing backend pools, listeners and rules

We use the azure application gateway ingress controller for our AKS cluster. Depending on the workload and deployed Ingress it will automatically creats the backend pool/listeners and rules in the Application Gateway.

Wen we redeploy the bicep scripts, these existing backend pools, listeners and rules gets deleted and the defaults get created again (you can not deploy an applicaton gatway empty...)

Is there a way to preserve existing backend pools, listeners and rules, when redeploying the bicep script?

To Reproduce
Follow https://github.com/Azure/application-gateway-kubernetes-ingress/blob/master/docs/setup/install-new.md and redeploy the bicep script, which contains the APP-GW.

@ghost ghost added the Needs: Triage 🔍 label Apr 19, 2021
@alex-frankel
Copy link
Collaborator

@bmoore-msft - is this backendPools property a good example of array properties that are not dual-modelled, but are still problematic?

@bmoore-msft
Copy link
Contributor

It can be any array property - though this sounds a little different... @dirien can you share the bicep source that exhibits this behavior?

@dirien
Copy link
Author

dirien commented Jun 7, 2021

@bmoore-msft: thats quite a big bicep file...

var appGatewayFrontendConfigurationName = '${name}-${stage}-fe-ip-configuration'
var applicationGatewayName = '${name}-${stage}-agw'
var appGatewayFrontendPortHttpName = 'httpPort'
var appGatewayFrontendPortHttpsName = 'httpsPort'
var appGatewayHttpListenerName = '${name}-${stage}-http-port-listener'
var appGatewayHttpSettings80Name = '${name}-${stage}-be-80'
var appGatewayHttpsSettings443Name = '${name}-${stage}-be-443'
var appGatewayBackendPoolName = '${name}-${stage}-be-address-pool'

resource appGW 'Microsoft.Network/applicationGateways@2020-11-01' = {
  name: applicationGatewayName
  location: resourceGroup().location
  tags: resourceGroup().tags
  properties: {
    sku: {
      name: 'WAF_v2'
      tier: 'WAF_v2'
      capacity: 2
    }
    webApplicationFirewallConfiguration: {
      enabled: true
      firewallMode: 'Detection'
      ruleSetVersion: '3.2'
      ruleSetType: 'OWASP'
    }
    gatewayIPConfigurations: [
      {
        name: '${name}-${stage}-app-gateway-ip-config'
        properties: {
          subnet: {
            id: appGWSubnet.id
          }
        }
      }
    ]
    frontendPorts: [
      {
        name: appGatewayFrontendPortHttpName
        properties: {
          port: 80
        }
      }
      {
        name: appGatewayFrontendPortHttpsName
        properties: {
          port: 443
        }
      }
    ]
    frontendIPConfigurations: [
      {
        name: appGatewayFrontendConfigurationName
        properties: {
          publicIPAddress: {
            id: publicIPAGW.id
          }
        }
      }
    ]
    backendAddressPools: [
      {
        name: appGatewayBackendPoolName
      }
    ]
    backendHttpSettingsCollection: [
      {
        name: appGatewayHttpSettings80Name
        properties: {
          cookieBasedAffinity: 'Disabled'
          port: 80
          protocol: 'Http'
          requestTimeout: 1
        }
      }
      {
        name: appGatewayHttpsSettings443Name
        properties: {
          cookieBasedAffinity: 'Disabled'
          port: 443
          protocol: 'Https'
          requestTimeout: 1
        }
      }
    ]
    httpListeners: [
      {
        name: appGatewayHttpListenerName
        properties: {
          frontendIPConfiguration: {
            id: resourceId('Microsoft.Network/applicationGateways/frontendIPConfigurations', applicationGatewayName, appGatewayFrontendConfigurationName)
          }
          frontendPort: {
            id: resourceId('Microsoft.Network/applicationGateways/frontendPorts', applicationGatewayName, appGatewayFrontendPortHttpName)
          }
          protocol: 'Http'
        }
      }
    ]
    requestRoutingRules: [
      {
        name: 'rule1'
        properties: {
          ruleType: 'Basic'
          httpListener: {
            id: resourceId('Microsoft.Network/applicationGateways/httpListeners', applicationGatewayName, appGatewayHttpListenerName)
          }
          backendAddressPool: {
            id: resourceId('Microsoft.Network/applicationGateways/backendAddressPools', applicationGatewayName, appGatewayBackendPoolName)
          }
          backendHttpSettings: {
            id: resourceId('Microsoft.Network/applicationGateways/backendHttpSettingsCollection', applicationGatewayName, appGatewayHttpSettings80Name)
          }
        }
      }
    ]
  }
  dependsOn: [
    k8sVnet
  ]
}

But i needed to rollout today some changes in a FW rule, and the APP-GW gets resetted again. So i needed to restart the application-gateway-kubernetes-ingress pod again to recreate the APP-GW Backends..

In my opinion, it should keep the existing backends...

@bmoore-msft
Copy link
Contributor

Ah ok... so your AKS deployment has it's own backendHttpSettingsCollection, that you don't control. So whenever you need to update the GW, that gets removed (because it's not defined in your template)?

@dirien
Copy link
Author

dirien commented Jun 9, 2021

Yes, i cant influece this. It gets dynamically genereated via the Ingress Ressources in my AKS Cluster.

But every time i deploy the bicep file, it resets it to the one defined in the template.

image

I mean from a IaC point it makes sense, to eliminate the drift.

@bmoore-msft
Copy link
Contributor

Got it - yes that is, unfortunately, a familiar scenario. We're working on these with the RP teams but don't have a time frame just yet...

@mauve
Copy link

mauve commented Nov 4, 2021

We are hitting this bug as well, and quite frankly it is a huge problem for use, because there is no way for us to guarantee we do not have infrastructure drift unless we can apply our templates.

Is there a way to loop in the Application Gateway, AKS or AGIC teams to get this fixed? Actually since the application gateway actually knows that the rules are supplied by AGIC, it feels like the only thing which needs to be fixed is Application Gateway.

@brwilkinson
Copy link
Collaborator

@mauve the AGIC (the service controller itself) should re-populate all of the backend rules automatically if you redeploy out blank rules. It's certainly not ideal, however you should see the rules return after a short time?

@mauve
Copy link

mauve commented Nov 4, 2021

Sure, still means my backend is unreachable for up to 30 seconds. Which is unacceptable.

@brwilkinson
Copy link
Collaborator

@mauve yes agree, just wanted to confirm that they were being re-published for you.

On Load balancers there is a standalone property called inbound Nat Rules.

image

https://docs.microsoft.com/en-us/azure/templates/microsoft.network/loadbalancers/inboundnatrules?tabs=bicep

In this case since it's a standalone property it's possible to redeploy the Load Balancer and not pass these settings in, in which case it will not overwrite the current settings.

Ideally if the Application Gateway enabled this for the backend then it would allow you to continue to deploy the App Gateway without over writing the rules.

@mauve
Copy link

mauve commented Nov 11, 2021

I redact my precious statement they are not getting republished.

After waiting for 20min I gave up and killed the AGIC pod.

Totally unacceptable.

@alex-frankel
Copy link
Collaborator

Hey @mauve, we'd ask that you follow our code of conduct in your responses. I understand and empathize with your frustration, but on the bicep side, there is not anything we can do to solve this problem.

Have you tried opening a support case so this can be shared with the Network Resource Provider team that works on App Gateway?

@mauve
Copy link

mauve commented Nov 19, 2021

@alex-frankel Agreed, the profanity was unprofessional on my part.

I am a little unsure how a support case can help, this is even documented (somewhat hidden though) in the AGIC documentation as a deficiency in the current design.

@mauve
Copy link

mauve commented Mar 16, 2022

FWIW: I opened a support ticket

@brwilkinson
Copy link
Collaborator

@mauve please confirm with support that the correct tags are being updated on the application gateway when the AGIC re/publishes the rules.
Also confirm how long after the deployment runs it takes to re-publish,
Then determine what the expected behavior is for that timing. It should be a short interval and if you are not seeing it update, you possibly have a mis configuration. Given this is a known issue, the timing is the only parameter that can determine if this is working as expected or if there is an issue.

@mohatb
Copy link

mohatb commented May 24, 2022

@mauve have you tried to reference it as an existing resource ?

https://docs.microsoft.com/en-us/azure/azure-resource-manager/bicep/existing-resource

@mauve
Copy link

mauve commented May 24, 2022

No, I haven't. It also doesn't fix the problem, seeing as some properties of the gateway can only be managed with Bicep in and setup which uses Bicep+AGIC.

For example SSL certificates need to be configured using Bicep before they can be referenced from the AGIC config.

Another thing is I want to be able to run my Bicep scripts continuously to ensure I do not have infrastructure drift - without having downtime.

@kamilzzz
Copy link

Referencing it as an existing resource doesn't really solve an issue in my opinion.

Yes, we can split deployment to have one Bicep deploying AGW and another one deploying AKS which references existing AGW. In this case we can continuously deploy AKS Bicep without having any configuration drift but then we can't deploy our AGW template because it will cause a downtime.

Any change to AGW (like for example additional routes beside AKS generated one, changing autoscale policy etc.) will have to be done manually as again, we cannot redeploy AGW Bicep without downtime. So we again have configuration drift.

@mauve did you find any reliable workaround?

@mauve
Copy link

mauve commented Jun 24, 2022

I opened a bug report, but they just said they cannot reproduce this problem. As I was busy with some other stuff I didn't have the time to create a new setup to recreate we had to close the support case.

@kamilzzz pretty sure @mohatb was part of the support case, maybe you can sync?

@kamilzzz
Copy link

kamilzzz commented Jun 24, 2022

I opened a bug report, but they just said they cannot reproduce this problem.

Uhhh... ok, interesting.
Not much to reproduce here as it's quite obvious what's gonna happen. I've faced this issue when I tried to use AGIC for the first time 2 years ago.

Without redesigning the way how AppGw ARM resource provider works, nothing really can be done IMHO.

There is exactly the same case when using AKS' kubenet network plugin with bringing my own subnet & route table (e.g., when I need my outbound traffic to go through network appliance for additional filtering). Deploying it for the first time works as within my Bicep template I have a route table with my custom routes and then AKS adds his requried routes for kubenet.
Then redeploying my Bicep causes a downtime as it deletes AKS generated routes. AKS will reconcile them after a while, but there's a short period of time where routes required for kubenet are gone.

But I must agree this is not something related directly to the Bicep. It's more about how some ARM resource providers work. There are similar discussions regarding deployment of standalone subnets here - Azure/azure-quickstart-templates#2786. But unfortunately doesn't look like this is going to be resolved any time soon (ever?).

@Xplz3d
Copy link

Xplz3d commented Apr 20, 2023

Hi !
I'm facing the same issue and as i'm using the AGIC addon, there is no way to configure reconciliation ...
After redeploying the infrastructure, my K8S workloads are present and pods running but backend config on APPGW is deleted...
The workaround is to delete ingress and redeploy it on AKS side but that cause downtime to the app and so this not "end user" friendly ....
do you have any updates on that issue ?

@brwilkinson
Copy link
Collaborator

@Xplz3d There is an option with AKS deployment (for AGIC) which is called Greenfield.

With this deployment type, you specify the effectiveApplicationGatewayId (resourceId) and the subnetCIDR (subnet address space) for the app gateway and it will be deployed by the AGIC. You don't pre-deploy the App Gateway. By using this method, when redeploying the AKS the rules will not be overwritten and you don't need to deploy the App Gateway at all.

e.g.

{
  IngressApplicationGateway: {
    enabled: true
    config: {
      applicationGatewayName: 'AEU1-PE-CTL-D1-waf01'
      effectiveApplicationGatewayId: '/subscriptions/4185fa9b-f470-466a-b3ae-8e6c3314a543/resourceGroups/AEU1-PE-CTL-RG-D1-aks01/providers/Microsoft.Network/applicationGateways/AEU1-PE-CTL-D1-waf01'
      subnetCIDR: '10.182.241.0/24'
    }
  }
}

image

You do the following:

  • BYO network and subnet for the app gateway, pre-create the subnet with the desired cidr and name your subnet '${appGWName}-subnet' or AGIC will create it for you based on the subnetCIDR you provide, with that name, which you don't want.
  • Give the user assigned identity Contributor access, based on the doc below
  • You do need to create a WAF Policy and attach it manually to the app gateway created by the AGIC, however at least you can continuously deploy over the Policy that you own after that.

This is not ideal, since many teams prefer to pre-deploy the App Gateway, however given the AGIC team are aware of the issue and there is currently no fix, I would say it's the current best option for this scenario.

  • The only other option is to disable redeployment of the Brownfield Application Gateway, which partially negates the benefit.

I believe the main sticking point with either of the above is that the one item the AGIC doesn't manage is the sslCertificates so you still need an external method to publish these Certs to the app gateway, which used for the backend communication to pods, before ingress rules will even work. I will do some follow up on this item and see what I can think of or continue the conversation over on:

@Xplz3d
Copy link

Xplz3d commented Apr 27, 2023

Hi @brwilkinson and thanks for your reply !
Bad news that no fix for now but i hope it'll come soon !

I'll give a try but the aim is to be able to redeploy all of the architecture (AKS, AGIC, AppGW, VNET, ...) by using IaC (Bicep) in case of architecture update but if after other tests/workaround i'll not be able to make it works "simply", i would probably consider using nginx as an ingress inside AKS rather than using AppgW...

@flavian-anselmo
Copy link

its 2023 and this issue has not been fixed.

FWIW: I opened a support ticket
what was your work around with this issue?

@Xplz3d
Copy link

Xplz3d commented Aug 7, 2023

@flavian-anselmo, i'm still waiting for a fix asap as i have a customer needing it ...

@flavian-anselmo
Copy link

@Xplz3d iam also in a middle of a project using AGIC and my ingress controller needs to deleted in the cluster and recreated for my the app to come up again.

@VaclavK
Copy link

VaclavK commented Aug 8, 2023

@Xplz3d iam also in a middle of a project using AGIC and my ingress controller needs to deleted in the cluster and recreated for my the app to come up again.

I don't need to do that - we simply use the workaround of detecting app GW existing or not and use existing vs new condition in bicep

Even if I recreated app GW, all definitions ať app GW are eventually refreshed by the controller but of course there would be a small ingress outage

@flavian-anselmo
Copy link

@Xplz3d iam also in a middle of a project using AGIC and my ingress controller needs to deleted in the cluster and recreated for my the app to come up again.

I don't need to do that - we simply use the workaround of detecting app GW existing or not and use existing vs new condition in bicep

Even if I recreated app GW, all definitions ať app GW are eventually refreshed by the controller but of course there would be a small ingress outage

could you provide a code snippet on the same

@Xplz3d
Copy link

Xplz3d commented Aug 11, 2023

@flavian-anselmo of course and the small "ingress outage" is not a good thing ...
Please provide us a fix MS ...

@flavian-anselmo
Copy link

flavian-anselmo commented Aug 11, 2023

@Xplz3d so I was able to solve this problem with a scalable solution. I will share a detailed solution before EOD

@Xplz3d
Copy link

Xplz3d commented Aug 11, 2023

@flavian-anselmo perfect ! i'm curious about your solution :)

@flavian-anselmo
Copy link

flavian-anselmo commented Aug 12, 2023

@Xplz3d As I promised here is a detailed (https://github.com/flavian-anselmo/appgateway-agic-fix) workaround to this problem. Iam using outputs to pass the backend pools from the existing application gateway hence preventing a downtime and also allowing you to modify the application gateway without getting a downtime. Kindly find the detailed workaround with a readme and also code in the github repo here @alex-frankel , @bmoore-msft @mauve and @brwilkinson kindly review this workaround since it was able to work for me and my team. Thank you!

below are some of the code snippets on how I solved this

/**

-----------------------------------------
PREVENT BACKEND POOLS FROM BEING DELETED 
-----------------------------------------
This code snippet will pick all the required data from azure during deployment 
and store them in arrays. The arrays will then be passed in the actual deployment to  retain the backend pools ssl certs etc 

*/

param appGateWayName string
resource existingAppGateway 'Microsoft.Network/applicationGateways@2022-07-01' existing = {
  name:appGateWayName
}

output backendPoolsOutput array = existingAppGateway.properties.backendAddressPools
output backendHttpSettingsCollectionsOutput array = existingAppGateway.properties.backendHttpSettingsCollection
output probesOutput array = existingAppGateway.properties.probes
output httpListenersOutput array = existingAppGateway.properties.httpListeners
output urlPathsOutput array = existingAppGateway.properties.urlPathMaps
output requestRoutingRuleOutput array = existingAppGateway.properties.requestRoutingRules
output frontEndPortsOutput array = existingAppGateway.properties.frontendPorts
output frontEndIpsConfigOutput array = existingAppGateway.properties.frontendIPConfigurations
output sslCertOutput array = existingAppGateway.properties.sslCertificates

Outputs in the modules

@description('THis is an actual application gateway provison ')
module applicationGateway '../core/actualAppGateway.bicep'={
  name: 'appGateway'

  params:{
    appGateWayName: appGateWayName
    applicationGatewaySkuCapacity: applicationGatewaySkuCapacity
    applicationGatewaySkuName: applicationGatewaySkuName
    applicationGatwaySkuTier: applicationGatwaySkuTier
    location:location
    appGatewaySubnetName: appGatewaySubnetName
    vNetName: vNetName

    // USED DURING THE FIRST DEPLOYMENT WHEN N APPLICATION GATEWAY DOESNOT EXIST 
    //---------------------------------------------------------------------
    backendAddressPoolName: backendAddressPoolName
    backendHttpSettingsCollectionCookieBasedAffinity: backendHttpSettingsCollectionCookieBasedAffinity
    backendHttpSettingsCollectionName: backendHttpSettingsCollectionName
    backendHttpSettingsCollectionProtocol: backendHttpSettingsCollectionProtocol
    frontendIPConfigurationsName: frontendIPConfigurationsName
    frontendPortsName: frontendPortsName
    gatewayIPConfigurationsName: gatewayIPConfigurationsName
    httpListenersName: httpListenersName
    httpListenersProtocol: httpListenersProtocol
    requestRoutingRulesName: requestRoutingRulesName
    requestRoutingRulesRuleType:requestRoutingRulesRuleType
    publicIPAddressName: publicIPAddressName

    //--------------------------------------------------------------------

    //THE PART WITH OUTPUTS 
    //--------------------------------------------------------------------------
    backendPoolsOutputFromExisting: existingAppGateway.outputs.backendPoolsOutput
    frontEndPortOutput:existingAppGateway.outputs.frontEndPortsOutput
    httpListenersOutput:existingAppGateway.outputs.httpListenersOutput
    probeOutput:existingAppGateway.outputs.probesOutput
    requestRoutingOutput:existingAppGateway.outputs.requestRoutingRuleOutput
    sslCertOutput:existingAppGateway.outputs.sslCertOutput
    urlPathsOutput:existingAppGateway.outputs.urlPathsOutput
    backendHttpSettingsCollectionOutput:existingAppGateway.outputs.backendHttpSettingsCollectionsOutput
    frontEndIpConfigOutput:existingAppGateway.outputs.frontEndIpsConfigOutput
    //--------------------------------------------------------------------------------
  }
}



@description('This refernces the above actual application gateway')
module existingAppGateway '../core/existingAppGateway.bicep' = {
  name:'existingGateway'
  params:{
    appGateWayName:appGateWayName
  }
}

@flavian-anselmo
Copy link

@dirien Please also review since you opened this issue.

@VaclavK
Copy link

VaclavK commented Aug 14, 2023

@Xplz3d iam also in a middle of a project using AGIC and my ingress controller needs to deleted in the cluster and recreated for my the app to come up again.

I don't need to do that - we simply use the workaround of detecting app GW existing or not and use existing vs new condition in bicep
Even if I recreated app GW, all definitions ať app GW are eventually refreshed by the controller but of course there would be a small ingress outage

could you provide a code snippet on the same

I was on holidays so only replying now

so first we detect if it already exists

 $appGWExists = $null -ne (Get-AzApplicationGateway -Name $appGWName -ResourceGroupName $ResourceGroupName -ErrorAction SilentlyContinue)

which is then used to pass to bicep as "deploy mode"

 -appGWDeployMode ($appGWExists -eq $true ? 'existing' : 'new') `

and bicep then uses new vs existing keywords

resource applicationGateWayExisting 'Microsoft.Network/applicationGateways@2023-02-01' existing = if (appGWDeployMode == 'existing') {
  name: appGWName
}

resource applicationGateWayNew 'Microsoft.Network/applicationGateways@2023-02-01' = if (appGWDeployMode == 'new') {
  name: appGWName
.....
.....

this is valid for "static" app gateways...should you need to change its aspect e.g. SKU... you would have to delete or do it outside of bicep

it may be easier set up for setup "once and forget"

I will look at the Flavian's approach

@flavian-anselmo
Copy link

@Xplz3d iam also in a middle of a project using AGIC and my ingress controller needs to deleted in the cluster and recreated for my the app to come up again.

I don't need to do that - we simply use the workaround of detecting app GW existing or not and use existing vs new condition in bicep
Even if I recreated app GW, all definitions ať app GW are eventually refreshed by the controller but of course there would be a small ingress outage

could you provide a code snippet on the same

I was on holidays so only replying now

so first we detect if it already exists

 $appGWExists = $null -ne (Get-AzApplicationGateway -Name $appGWName -ResourceGroupName $ResourceGroupName -ErrorAction SilentlyContinue)

which is then used to pass to bicep as "deploy mode"

 -appGWDeployMode ($appGWExists -eq $true ? 'existing' : 'new') `

and bicep then uses new vs existing keywords

resource applicationGateWayExisting 'Microsoft.Network/applicationGateways@2023-02-01' existing = if (appGWDeployMode == 'existing') {
  name: appGWName
}

resource applicationGateWayNew 'Microsoft.Network/applicationGateways@2023-02-01' = if (appGWDeployMode == 'new') {
  name: appGWName
.....
.....

this is valid for "static" app gateways...should you need to change its aspect e.g. SKU... you would have to delete or do it outside of bicep

it may be easier set up for setup "once and forget"

I will look at the Flavian's approach

My solution kinda has part of your solution the difference is that mine supports application gateway modification

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants