Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EC2 auto recovery #5388

Closed
toddrosner opened this issue Feb 29, 2016 · 12 comments
Closed

EC2 auto recovery #5388

toddrosner opened this issue Feb 29, 2016 · 12 comments

Comments

@toddrosner
Copy link

I'm trying to setup auto recovery for an EC2 instance. I have the ARN and properties defined in the alarm, but can't see a way to tie that to the instance.

resource "aws_cloudwatch_metric_alarm" "autorecover" {
  alarm_name = "ec2-autorecover"
  namespace = "AWS/EC2"
  evaluation_periods = "2"
  period = "60"
  alarm_description = "This metric auto recovers EC2 instances"
  alarm_actions = ["arn:aws:automate:${var.region}:ec2:recover"]
}

Is auto recovery possible, and if so, what am I missing?

@toddrosner
Copy link
Author

Looks like this might not be possible and that the alarms are really only useful with auto scaling polices. After trying a few things, I get the following error:

Errors:

  * aws_cloudwatch_metric_alarm.instance: "threshold": required field is not set
  * aws_cloudwatch_metric_alarm.instance: "comparison_operator": required field is not set
  * aws_cloudwatch_metric_alarm.instance: "metric_name": required field is not set
  * aws_cloudwatch_metric_alarm.instance: "statistic": required field is not set

These properties aren't available when setting up auto recovery in the console, so I assume that setting up auto recovery in Terraform is not possible at this time. Someone please correct me if I'm wrong.

@jen20
Copy link
Contributor

jen20 commented Mar 3, 2016

Hi @toddrosner! Thanks for opening this issue. I think this is related to #5390 also, but since this one is (just about!) earlier, I'm going to keep this one open for future discussion. It looks like this is not supported right now in Terraform, but I'd imagine it should be possible, so I'll tag this as an enhancement.

@toddrosner
Copy link
Author

@jen20 Thanks for the update. Hopefully we'll see this implemented soon.

@craigwatson
Copy link

craigwatson commented May 26, 2016

I have gotten an autorecover CloudWatch alarm created as follows:

resource "aws_cloudwatch_metric_alarm" "autorecover" {
  alarm_name          = "ec2-autorecover"
  namespace           = "AWS/EC2"
  evaluation_periods  = "2"
  period              = "60"
  alarm_description   = "This metric auto recovers EC2 instances"
  alarm_actions       = ["arn:aws:automate:${var.aws_region}:ec2:recover"]
  statistic           = "Minimum"
  comparison_operator = "GreaterThanThreshold"
  threshold           = "0"
  metric_name         = "StatusCheckFailed_System"
  dimensions {
      InstanceId = "${aws_instance.app.id}"
  }
}

I replicated the CloudFormation definition for an EC2 autorecover alarm from the Amazon documentation - seems you just need to pass dummy values for the missing parameters.

I'm still unable to test that the autorecover alarm is actually working, but at least the resource is created!

@br0ch0n
Copy link
Contributor

br0ch0n commented Jul 18, 2016

Oddly enough, @craigwatson 's solution is correct, even though I still see the same behavior from #5390. That is, I cannot create the autorecover action in the console (greyed out, claiming unsupported instance type even though EBS only, etc) yet, creating the alarm via TF seems to have succeeded. As mentioned though, there's no way to test autorecovery in AWS so we'll all have to wait for somebody's instance to die and for them to report back :)

@stack72
Copy link
Contributor

stack72 commented Oct 26, 2016

Hi folks

I am going to close this out. I was able to get this working as follows:

provider "aws" {
  region = "us-west-2"
}

resource "aws_cloudwatch_metric_alarm" "autorecover" {
  alarm_name          = "ec2-autorecover"
  namespace           = "AWS/EC2"
  evaluation_periods  = "2"
  period              = "60"
  alarm_description   = "This metric auto recovers EC2 instances"
  alarm_actions       = ["arn:aws:automate:us-west-2:ec2:recover"]
  statistic           = "Minimum"
  comparison_operator = "GreaterThanThreshold"
  threshold           = "1"
  metric_name         = "StatusCheckFailed_System"
  dimensions {
      InstanceId = "${aws_instance.app.id}"
  }
}

resource "aws_internet_gateway" "foo" {
    vpc_id = "${aws_vpc.foo.id}"
    tags {
        bar = "baz"
    }
}

resource "aws_vpc" "foo" {
    cidr_block = "10.50.0.0/16"
}

resource "aws_subnet" "foo" {
    cidr_block = "10.50.1.0/24"
    vpc_id = "${aws_vpc.foo.id}"
}

resource "aws_instance" "app" {
    ami = "ami-5fe5423f"
    instance_type = "m3.medium"
    subnet_id = "${aws_subnet.foo.id}"
}

Hope this helps

Paul

@timonwong
Copy link
Contributor

timonwong commented Dec 5, 2016

@br0ch0n
Sadly our AWS support guy tells us it won't work :( When #8455 is done, it can resolve this issue.

@borsboom
Copy link

Even with #8455, I don't seem to be able to create the auto-recovery alarm in the AWS console for an EC2 instance created like this:

resource "aws_instance" "instance" {
  ami                    = "${var.ami}"
  instance_type          = "${var.instance_type}"
  subnet_id              = "${var.subnet_id}"
  vpc_security_group_ids = ["${var.security_group_ids}"]
  private_ip             = "${var.private_ip}"
  key_name               = "${var.key_name}"
  root_block_device {
    volume_type = "gp2"
    volume_size = 8
  }
  ephemeral_block_device {
    device_name = "/dev/sdb"
    no_device = true
  }
  tags {
    Name                 = "${var.name}"
  }

I got the error:

The EC2 'Recover' Action is not valid for the associated instance. Please remove or change to a different EC2 action.

I tried this with an m3.medium instance, and confirmed that no instance storage was mounted (whereas if I left out the ephemeral_block_device, there was instance storage mounted).

I created an identical instance in the console (e.g. running a diff between the results of aws ec2 describe-instances on the two instances only shows the expected differences like instance ID and IP address), and I was able to create the alarm for that one.

@timonwong
Copy link
Contributor

timonwong commented Feb 25, 2017

@borsboom I think you can only use it with EBS-only instance types.

Some AMIs contains more than one ephemeral block devices definitions, you need to exclude them all, they are not visible in neither the Console nor the API. You can only query the AMIs, and find out them in the "Block Devices" section.

@borsboom
Copy link

borsboom commented Feb 25, 2017

@timonwong: thank you! Once I add no_device ephemeral_block_devices for both that were defined by the AMI, I was able to create the auto-recovery alarm in the AWS console. That gives me much more confidence that a Terraform-created auto-recovery alarm will work as well.

For anyone finding this issue in the future, you can find the block devices for an AMI using something like this:

$ aws ec2 describe-images --image-ids=ami-a58d0dc5
{
    "Images": [
        {
            [...snip...]
            "BlockDeviceMappings": [
                [...snip...]
                {
                    "DeviceName": "/dev/sdb",
                    "VirtualName": "ephemeral0"
                },
                {
                    "DeviceName": "/dev/sdc",
                    "VirtualName": "ephemeral1"
                }
            ],
            [...snip...]
        }
    ]
}

This translates into the following ephemeral_block_devices:

resource "aws_instance" "instance" {
  [...snip...]
  ephemeral_block_device {
    device_name = "/dev/sdb"
    no_device = true
  }
  ephemeral_block_device {
    device_name = "/dev/sdc"
    no_device = true
  }
}

@cxmcc
Copy link

cxmcc commented Aug 18, 2017

Thanks for all the info. I went through similar issues mentioned in this post and finally came up with the full implementation here: https://github.com/cxmcc/tf_aws_ec2_auto_recovery Please let me know if you have any feedback.

@ghost
Copy link

ghost commented Apr 7, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants