Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource Timeouts for creation / deletion #171

Closed
mooperd opened this issue Jul 13, 2017 · 48 comments · Fixed by #2744
Closed

Resource Timeouts for creation / deletion #171

mooperd opened this issue Jul 13, 2017 · 48 comments · Fixed by #2744

Comments

@mooperd
Copy link

mooperd commented Jul 13, 2017

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Description

Is there a way of configuring timeouts?

azurerm_subnet.subnet: Creating...
  address_prefix:            "" => "10.0.2.0/24"
  ip_configurations.#:       "" => "<computed>"
  name:                      "" => "bach"
  network_security_group_id: "" => "<computed>"
  resource_group_name:       "" => "bach"
  route_table_id:            "" => "<computed>"
  virtual_network_name:      "" => "bach"
azurerm_subnet.subnet: Still creating... (10s elapsed)
azurerm_subnet.subnet: Still creating... (20s elapsed)
azurerm_subnet.subnet: Still creating... (30s elapsed)
azurerm_subnet.subnet: Still creating... (40s elapsed)
azurerm_subnet.subnet: Still creating... (50s elapsed)
azurerm_subnet.subnet: Still creating... (1m0s elapsed)
azurerm_subnet.subnet: Still creating... (1m10s elapsed)
azurerm_subnet.subnet: Still creating... (1m20s elapsed)
azurerm_subnet.subnet: Still creating... (1m30s elapsed)
azurerm_subnet.subnet: Still creating... (1m40s elapsed)
azurerm_subnet.subnet: Still creating... (1m50s elapsed)
azurerm_subnet.subnet: Still creating... (2m0s elapsed)
azurerm_subnet.subnet: Still creating... (2m10s elapsed)
azurerm_subnet.subnet: Still creating... (2m20s elapsed)
azurerm_subnet.subnet: Still creating... (2m30s elapsed)
azurerm_subnet.subnet: Still creating... (2m40s elapsed)
azurerm_subnet.subnet: Still creating... (2m50s elapsed)
azurerm_subnet.subnet: Still creating... (3m0s elapsed)
azurerm_subnet.subnet: Still creating... (3m10s elapsed)
azurerm_subnet.subnet: Still creating... (3m20s elapsed)
azurerm_subnet.subnet: Still creating... (3m30s elapsed)
azurerm_subnet.subnet: Still creating... (3m40s elapsed)
azurerm_subnet.subnet: Still creating... (3m50s elapsed)
azurerm_subnet.subnet: Still creating... (4m0s elapsed)
azurerm_subnet.subnet: Still creating... (4m10s elapsed)
azurerm_subnet.subnet: Still creating... (4m20s elapsed)
azurerm_subnet.subnet: Still creating... (4m30s elapsed)
azurerm_subnet.subnet: Still creating... (4m40s elapsed)
azurerm_subnet.subnet: Still creating... (4m50s elapsed)
azurerm_subnet.subnet: Still creating... (5m0s elapsed)
azurerm_subnet.subnet: Still creating... (5m10s elapsed)
azurerm_subnet.subnet: Still creating... (5m20s elapsed)
azurerm_subnet.subnet: Still creating... (5m30s elapsed)
azurerm_subnet.subnet: Still creating... (5m40s elapsed)
azurerm_subnet.subnet: Still creating... (5m50s elapsed)
azurerm_subnet.subnet: Still creating... (6m0s elapsed)
azurerm_subnet.subnet: Still creating... (6m10s elapsed)
azurerm_subnet.subnet: Still creating... (6m20s elapsed)
azurerm_subnet.subnet: Still creating... (6m30s elapsed)
azurerm_subnet.subnet: Still creating... (6m40s elapsed)
azurerm_subnet.subnet: Still creating... (6m50s elapsed)
azurerm_subnet.subnet: Still creating... (7m0s elapsed)
azurerm_subnet.subnet: Still creating... (7m10s elapsed)
azurerm_subnet.subnet: Still creating... (7m20s elapsed)
azurerm_subnet.subnet: Still creating... (7m30s elapsed)
azurerm_subnet.subnet: Still creating... (7m40s elapsed)
azurerm_subnet.subnet: Still creating... (7m50s elapsed)
azurerm_subnet.subnet: Still creating... (8m0s elapsed)
azurerm_subnet.subnet: Still creating... (8m10s elapsed)
azurerm_subnet.subnet: Still creating... (8m20s elapsed)
azurerm_subnet.subnet: Still creating... (8m30s elapsed)
azurerm_subnet.subnet: Still creating... (8m40s elapsed)
azurerm_subnet.subnet: Still creating... (8m50s elapsed)
azurerm_subnet.subnet: Still creating... (9m0s elapsed)
azurerm_subnet.subnet: Still creating... (9m10s elapsed)
azurerm_subnet.subnet: Still creating... (9m20s elapsed)
azurerm_subnet.subnet: Still creating... (9m30s elapsed)
azurerm_subnet.subnet: Still creating... (9m40s elapsed)
azurerm_subnet.subnet: Still creating... (9m50s elapsed)
azurerm_subnet.subnet: Still creating... (10m0s elapsed)
azurerm_subnet.subnet: Still creating... (10m10s elapsed)
@tombuildsstuff
Copy link
Contributor

Hey @mooperd

As you're seeing above - the time it takes to provision resources in Azure can very wildly - and thus the Azure SDK keeps polling for completion until either the resource is created or an error occurs. There's several resources in Azure which can take a considerable amount of time to provision (e.g. Storage Accounts can be up to 30m or Virtual Network Gateway's up to 2 hours).

Within Terraform, it's possible to specify a custom timeout for each resource - however each resource needs to opt-in for this - and as such we've not got this hooked up for the Azure resources yet. Is this a particular problem you're seeing consistently with the Subnet resource?

Thanks!

@mooperd
Copy link
Author

mooperd commented Jul 13, 2017 via email

@mooperd
Copy link
Author

mooperd commented Jul 13, 2017

Some debug from my failing terraform apply: https://gist.github.com/anonymous/458beb2cf154ec20ba6d1c1430a06919

and the .tf:

provider "azurerm" {
  subscription_id = "-"
  client_id       = "-"
  client_secret   = "-"
  tenant_id       = "-"
}

# Create a resource group
resource "azurerm_resource_group" "resource_group" {
  name     = "${var.resource_group}"
  location = "West US"
}

resource "azurerm_virtual_network" "virtual_network" {
  name                = "${azurerm_resource_group.resource_group.name}"
  address_space       = ["10.0.0.0/16"]
  location            = "West US"
  resource_group_name = "${azurerm_resource_group.resource_group.name}"
}

resource "azurerm_subnet" "subnet" {
  name                 = "${azurerm_resource_group.resource_group.name}"
  resource_group_name  = "${azurerm_resource_group.resource_group.name}"
  virtual_network_name = "${azurerm_virtual_network.virtual_network.name}"
  address_prefix       = "10.0.2.0/24"
}

resource "azurerm_public_ip" "testing123" {
    name = "testing123"
    location = "${azurerm_resource_group.resource_group.location}"
    resource_group_name = "${azurerm_resource_group.resource_group.name}"
    public_ip_address_allocation = "dynamic"
}

resource "azurerm_network_interface" "test" {
  name                = "acctni"
  location            = "West US"
  resource_group_name = "${azurerm_resource_group.resource_group.name}"

  ip_configuration {
    name                          = "testconfiguration1"
    subnet_id                     = "${azurerm_subnet.subnet.id}"
    private_ip_address_allocation = "dynamic"
    public_ip_address_id           = "${azurerm_public_ip.testing123.id}"
  }
}

resource "azurerm_virtual_machine" "test" {
  name                  = "acctvm"
  location              = "West US"
  resource_group_name   = "${azurerm_resource_group.resource_group.name}"
  network_interface_ids = ["${azurerm_network_interface.test.id}"]
  vm_size               = "Standard_A2_v2"


  storage_os_disk {
    name          = "myosdisk1"
    image_uri     = "https://factor3packer.blob.core.windows.net/system/Microsoft.Compute/Images/images/packer-osDisk.828b9190-5bc0-462f-87aa-a80367c2959e.vhd"
    vhd_uri       = "https://factor3packer.blob.core.windows.net/images/need_random_value.vhd"
    os_type       = "linux"
    caching       = "ReadWrite"
    create_option = "FromImage"
  }

  os_profile {
    computer_name  = "hostname"
    admin_username = "centos"
    admin_password = "X9deiX9dei"
  }

  os_profile_linux_config {
    disable_password_authentication = false
  }

  tags {
    environment = "staging"
  }
}

@tombuildsstuff
Copy link
Contributor

Hey @mooperd

Thanks for posting your Terraform config.

I've taken a look and using your config I've been able to replicate this on Terraform 0.9.11 - and from what I can see this has been fixed in #6 which has been merged and is available in Terraform 0.10-rc1. I've also tested this config on Terraform 0.10-rc1 and can confirm the deadlock issue you're seeing is no longer present :)

The other issue regarding not being able to set timeouts on individual resources still stands however - and as such I'm going to make this issue an enhancement request for those - which we'll investigate adding in the near future :)

Thanks!

@rcarun rcarun added this to the M1 milestone Oct 11, 2017
@codyaray
Copy link

codyaray commented Jan 4, 2018

I need a configurable timeout on azurerm_virtual_machine resources. I'm trying to spin up a bunch at once, and a lot of them fail due to timeout. But they're actually created in Azure so my state just gets out of sync.

Any chance this is happening soon?

EDIT: this appears to be because the virtual machine creations are queued for a while waiting on the first "batch" (parallelism) of machines to finish creation. They don't even start "Creating..." before they end up timing out.

@schmandforke
Copy link

schmandforke commented Mar 17, 2018

+1
would be great for many resources ;-)

Error: Error applying plan:

28 error(s) occurred:

* azurerm_managed_disk.wf_disk_page[0] (destroy): 1 error(s) occurred:

* azurerm_managed_disk.wf_disk_page.0: azure#WaitForCompletion: context has been cancelled: StatusCode=204 -- Original Error: context deadline exceeded

...

@pixelicous
Copy link

@tombuildsstuff can you guys take a relook on this?? Did timeout values change in new terraform versions? We have a custom script extension that runs for longer than an hour and it started failing on us with timeouts.. Now this isnt even the regular cse timeout. I think azure defaut timeouts should be the minimum.. I tried the timeout value.. It isnt opted in..

@miat-asowers
Copy link

@tombuildsstuff I'm also seeing this trying to create an App Service Environment via template deployment. Times out after 1h (as you know, ASEs take 90-120 minutes).

[...]
module.ase-internal.azurerm_template_deployment.ase_template: Still creating... (59m51s elapsed)
module.ase-internal.azurerm_template_deployment.ase_template: Still creating... (1h0m1s elapsed)

Error: Error applying plan:

1 error(s) occurred:

* module.ase-internal.azurerm_template_deployment.ase_template: 1 error(s) occurred:

* azurerm_template_deployment.ase_template: Error creating deployment: azure#WaitForCompletion: context has been cancelled: StatusCode=200 -- Original Error: context deadline exceeded

If I try to add

  timeouts 
  {
    create = "3h"
    delete = "3h"
  }

to my template resource, I get

Error: Error refreshing state: 2 error(s) occurred:

* module.ase-internal.azurerm_template_deployment.ase_template: 1 error(s) occurred:

* module.ase-internal.azurerm_template_deployment.ase_template: [ERR] Error decoding timeout: Timeout Key (delete) is not supported
* module.ase-external.azurerm_template_deployment.ase_template: 1 error(s) occurred:

* module.ase-external.azurerm_template_deployment.ase_template: [ERR] Error decoding timeout: Timeout Key (create) is not supported

To deploy this template, I either need to have a longer default timeout, or the abillity to specify my own timeout.

Thanks (again!)

@pixelicous
Copy link

not sure why we are not defaulting to regular timeouts or help us out by defining timeouts

@Neutrollized
Copy link

I would like to see timeouts for azurerm_virtual_machine myself as well. For me, we're using custom Windows images and it's exceeding the default 10m timeout period to deploy even just one VM :(

@pixelicous
Copy link

@Neutrollized Is there a possibility that the VM is stuck on boot? That can happen if not waiting for the registry flag of sysprep to change.
Also, we had a problem where we installed a bunch of prerequisites on silent installs, those installs apparently needed one clean boot from sysprepping.

Check the Azure portal, if its still flagging your VM as creating for that long, that might be an Azure issue? Well depends how big your image is :) maybe the timeout is your problem..

@Neutrollized
Copy link

@pixelicous unfortunately not. I created a VM from the same image via Azure portal and it took just over 10 minutes to deploy. My current work around for this is to make the VM size larger (4CPU/16GB mem) instead of (2CPU/8GB) and it deployed in under 6 min and was ok.

@pixelicous
Copy link

@Neutrollized As suspected, we do not even get the default timeouts of the provider API, but terraform's.. I think the support should be the default timeout, with the option to change for any resource..

@tombuildsstuff
Copy link
Contributor

tombuildsstuff commented Apr 9, 2018

@pixelicous @miat-asowers this is/was a bug in the way that the Azure SDK handled polling (where it used the default polling delay returned from the service, rather than what was specified as in previous versions); now that #825 has been resolved we should be able to supporting custom timeouts on resources.

For me, we're using custom Windows images and it's exceeding the default 10m timeout period to deploy even just one VM

@Neutrollized out of interest which timeout are you referring too? Azure (HyperV) has a 10 minute boot timeout after which a hard-error is raised and the machine enters the Failed state.

Thanks!

@tombuildsstuff tombuildsstuff modified the milestones: M1, Soon Apr 9, 2018
@pixelicous

This comment has been minimized.

@tombuildsstuff tombuildsstuff changed the title timeouts for resource creation / deletion Resource Timeouts for creation / deletion Apr 16, 2018
@tombuildsstuff tombuildsstuff self-assigned this Apr 16, 2018
@rahulkp220
Copy link

Yes, I face this issue as well. Terraform does have a tendency to complain about request timeout stuff and as a result, I have to retry provision via code.

@tombuildsstuff
Copy link
Contributor

re-opening since #2744 only added the groundwork for this

@Turil
Copy link

Turil commented May 2, 2019

Would love to see this feature for "azurerm_template_deployment" as well, I am trying to provision an Azure Managed SQL Instance that runs into a timeout after one hour. (Inital deployment can take up to six hours at the moment):

Error: Error applying plan:

1 error(s) occurred:

* azurerm_template_deployment.mssql: 1 error(s) occurred:

* azurerm_template_deployment.mssql: Error waiting for deployment: Future#WaitForCompletion: context has been cancelled: StatusCode=200 -- Original Error: context deadline exceeded

@pixelicous

This comment has been minimized.

@Lachlan-White

This comment has been minimized.

@andrew-kelleher

This comment has been minimized.

@philbal611

This comment has been minimized.

@fpytloun

This comment has been minimized.

@tombuildsstuff
Copy link
Contributor

@fpytloun we're working on it, but there's a larger dependency chain than we first thought, unfortunately. It turns out to implement this the way we need to we need to replace the Storage SDK, since we can't use the replacement (which I'm working on at the moment) - but this feature is planned to ship as a part of 2.0.

@smushtaq

This comment has been minimized.

@BenMitchell1979

This comment has been minimized.

@tombuildsstuff
Copy link
Contributor

@BenMitchell1979 Terraform/the Azure SDK already accounts for those by polling on them, this issue's tracking supporting services with extremely long provisioning times (e.g. SQL Managed Instance) by allowing users to specify a custom timeout - which should resolve this.


To give an update here: we've started working on 2.0 as such support for this will be added to Terraform in the not-too-distant future - but we don't have a timeline just yet, unfortunately - when we do we'll post that in the meta issue for 2.0: #2807

Thanks!

@mjazwiecki

This comment has been minimized.

@tombuildsstuff

This comment has been minimized.

@tombuildsstuff tombuildsstuff self-assigned this Feb 3, 2020
@tombuildsstuff
Copy link
Contributor

👋🏼

Over the past few months we’ve been working on the functionality coming in version 2.0 of the Azure Provider (outlined in #2807). We've just released version 1.43 of the Azure Provider which allows you to opt-in to the Beta of these upcoming features, including the ability to set Custom Timeouts on Resources.

More details on how to opt-into the Beta can be found in the Beta guide - however please note that this is only supported in Version 1.43 of the Azure Provider. You can upgrade to this version by updating your Provider block like so:

provider "azurerm" {
  version = "=1.43.0"
}

and then running terraform init -upgrade which will download this version of the Azure Provider.

Once you've opted into the Beta you can specify a timeouts block on Resources (as shown below) to override the default timeouts for each resource - which can be found at the bottom of each page in the documentation (example).

resource "azurerm_resource_group" "test" {
  name     = "example-resources"
  location = "West Europe"
  
  timeouts {
    create = "60m"
    delete = "2h"
  }
}

Note: Certain Azure API's also have hard-coded timeouts within the Azure API (for example, the Compute API's have a hard-timeout starting a Virtual Machine at which point it considers it "Failed"), which it's not possible to override.


Custom Timeouts will be going GA with Version 2.0 of the Azure Provider in the coming weeks - if you've tried the Beta and have feedback please open a Github Issue using the special Beta Feedback category and we'll take a look.

Thanks!

@tombuildsstuff
Copy link
Contributor

👋

Custom Timeouts have been enabled by default in #5705 which will ship in version 2.0 of the Azure Provider - as such I'm going to close this issue for the moment. If you're looking to use this in the interim you should be able to use the Beta link above to opt-into the Custom Timeouts Beta.

Thanks!

@ghost
Copy link

ghost commented Feb 24, 2020

This has been released in version 2.0.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:

provider "azurerm" {
    version = "~> 2.0.0"
}
# ... other configuration ...

@evmimagina
Copy link

Hi all,

Sorry, but timeouts seem are not working when creating resources like "azurerm_kubernetes_cluster".

Before upgrading to azurerm 2.0 I haven't had the need to specify any timeout.

Since update to azurerm 2.0 I keep receiving this error after 3-4 minutes provisioning of resource starts;

"Error waiting for creation of Managed Kubernetes Cluster "XXXXXX" (Resource Group"XXXXXXXX"): Future#WaitForCompletion: the number of retries has been exceeded: StatusCode=404 -- Original Error: Code="NotFound" Message="The entity was not found.""

I can see that the resource gets provisioned without problems on Azure.

I remove the resource and try to recreate again, specifying the timeout section with values of 30m.

No luck!

The worst thing is that I can't roll-back to a previous azurerm, when doing it I receive the following error;

"rpc error: code = Unavailable desc = transport is closing"

Not even on the deployment stage but in the planning one!

Any help please?

Regards,

@johnathon-b
Copy link

I am also having issues with this using AzureRM => 2.0. I am receiving the same error:

Future#WaitForCompletion: the number of retries has been exceeded: StatusCode=202

I am creating an App Service Plan on an Application Service Environment. I've added a timeouts block w/ 120m but it still times out in TFE in 26 minutes.

@ghost
Copy link

ghost commented Mar 26, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked and limited conversation to collaborators Mar 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.