Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding uptime parameter to community.aws.ec2_instance_info #316

Closed
wants to merge 17 commits into from
Closed

Conversation

IPvSean
Copy link
Contributor

@IPvSean IPvSean commented Dec 3, 2020

SUMMARY

This parameter used to exist in the ec2_instance_find.py module that somehow got lost time written by James Laska. It is currently (before this PR) very difficult and not very Ansible freindly to determine how long an instance has been running, as the only filter provided is launch_time. It is a common cloud automation use-case to determine which instances have been online longer than a certain time period (e.g. which instances have been running for more than 2 hours?)

From experimentation and much Googling, the best I could come up with was something like this->

    - name: math task
      debug:
        msg: "{{ (ansible_date_time.iso8601[:19] | to_datetime('%Y-%m-%dT%H:%M:%S') - aws_time[:19] | to_datetime('%Y-%m-%dT%H:%M:%S')).total_seconds() / 3600 }}"

which is not easy to read, requires some Python knowledge and may be outside the skill set of an operations engineer who is writing Ansible Playbooks. Working with the infamous @Spredzy we have come up with a very simple uptime parameter based on the previous module, and merged this into the existing functionality of ec2_instance_info so we don't need to add "yet another AWS module" to the infinite queue. We also made sure this works for both python 2 and python 3 environments.

ISSUE TYPE
  • Feature Pull Request
COMPONENT NAME

ec2_instance_info

ADDITIONAL INFORMATION

Here is an example

- name: Gather information about any instance with Name beginning with RHEL and been running at least 60 minutes
  community.aws.ec2_instance_info:
    region: "{{ ec2_region }}"
    uptime: 60
    filters:
      "tag:Name": "RHEL-*"
  register: ec2_node_info

adding uptime parameter to ec2_instance_info.py based on older Ansible 2.3 module written by James Laska ec2_instance_find.py that does not appear to be in the upstream anywhere, so this feature was lost in time.
adding additional information to the task desciption int he exampel
minor mistake, I had minutes instead of hours, the int is in hours
@gravesm
Copy link
Member

gravesm commented Dec 4, 2020

Thanks for this. I agree this kind of functionality would be nice. I have a couple questions/concerns about the new parameter.

  1. Uptime may not necessarily reflect the system uptime, for example, if an instance has been stopped and then restarted at a later point. What this is filtering on is more like instance age. At the very least, this should be clearly communicated in the docs.
  2. We might want to consider a more granular time filter than hours. I'm not sure if seconds makes sense in this context, but maybe minutes?
  3. If we did provide a more granular time filter than hours, as the code is currently written, this might filter out instances when a user hasn't specified a filter. This would be the case if the local system clock differed enough from AWS. There's not a great solution here, but the best we could probably do would be to only filter the instances by uptime if the user has explicitly provided an uptime.

Maybe @jillr could jump in?

@IPvSean
Copy link
Contributor Author

IPvSean commented Dec 4, 2020

Uptime may not necessarily reflect the system uptime, for example, if an instance has been stopped and then restarted at a later point. What this is filtering on is more like instance age. At the very least, this should be clearly communicated in the docs.

So the launch time that AWS returns gets reset if you stop and restart the instance. I did a bunch of testing to see that particular parameter.

So specifically from this list of filters-> https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeInstances.html

The launch-time gets reset if you stop the instance and restart it. So i had some instances that were off for a long time (like multiple weeks) and I turned on once instance out of that group, and it would consistently pickup a new launch time, even though it was not a net-new instance.

We might want to consider a more granular time filter than hours. I'm not sure if seconds makes sense in this context, but maybe minutes?

I just copied the idea from the original module, so I don't really have an opinion here. Minutes is fine since you get more granular... but I chose hours b/c you get charged per hours, and usually workshops, QE, etc are all done in hours (it takes about an hour to test all 5 instances). So totally flexible here.

If we did provide a more granular time filter than hours, as the code is currently written, this might filter out instances when a user hasn't specified a filter. This would be the case if the local system clock differed enough from AWS. There's not a great solution here, but the best we could probably do would be to only filter the instances by uptime if the user has explicitly provided an uptime.

Yeah but its the same problem if you just push the logic to the Ansible Playbook, system clock time is always going to be a problem, but we tested it from a variety of random work stations and it is consistently seems to get us the results we need. Having this complexity in the module makes much more sense than an insanely complicated Ansible Playbook.

The use case allows really clean multi-region Ansible Job Templates in the workflow visualizer to clean up regions on specified policy->
image

* Sanity test / doc-default-does-not-match-spec fixups
* General docs cleanup
@gravesm
Copy link
Member

gravesm commented Dec 9, 2020

Ah, didn't realize the launch time gets reset. Good to know. What I'm suggesting with time difference is to retain the functionality you are providing, but shift a few things around in the code. As it stands this will opt every playbook using this module into using the uptime filter. So instead of setting the time delta to 0 if uptime hasn't been set do something like this after the initial list of instances has been created:

if uptime:
    timedelta = int(uptime)
    ...
    instances = [instance for instance in instances if ...<time filter>]

tremble and others added 3 commits December 9, 2020 20:23
* import order

* Add retry decorators

* Switch tests to using module_defaults

* module_defaults

* Add initial _info tests

* Handle Boto Errors with fail_json_aws

* Test state=absent when IGW missing

* Support not purging tags

* Support converting Tags from boto to dict

* Add tagging tests

* Use random CIDR for VPC

* Add check_mode tests

* changelog
…andle AWS rate limiting (#324)

* Add jittered_backoff to handle AWS rate limiting
* Fix for failing test
* Add changelog fragment
* Explicitly pass the subnet the instance should live on - try to avoid https://github.com/ansible-collections/community.aws/issues/329

* Make sure we delete the Instance (and free the EIP) before we try to drop the IGW

* Use IDs when cleaning up EC2 instances
@jillr
Copy link
Collaborator

jillr commented Dec 14, 2020

Also please make sure this functionality gets integration tests - thanks!

as per @gravesm suggestion (hopefully) trying a different approach here
adding uptime parameter to ec2_instance_info.py based on older Ansible 2.3 module written by James Laska ec2_instance_find.py that does not appear to be in the upstream anywhere, so this feature was lost in time.
adding additional information to the task desciption int he exampel
minor mistake, I had minutes instead of hours, the int is in hours
as per @gravesm suggestion (hopefully) trying a different approach here
@IPvSean
Copy link
Contributor Author

IPvSean commented Dec 14, 2020

@gravesm can you check if I did what you were thinking?

@jillr I will attempt!

@tremble
Copy link
Contributor

tremble commented Dec 15, 2020

@IPvSean,

It looks like your merge has done something a little weird, so you're triggering additional tests. Rather than "git merge main" I'd strongly recommend using git rebase:

git fetch origin
git rebase -i origin/main

Atlassian have a nice description of what it is and how it works:
https://www.atlassian.com/git/tutorials/rewriting-history/git-rebase

changing to minutes per @gravesm 's suggestion
@IPvSean
Copy link
Contributor Author

IPvSean commented Jan 12, 2021

ok @tremble ... I think I did this correctly now.... (maybe....)

now working on tests

adding integration test per @jillr suggestion
@IPvSean
Copy link
Contributor Author

IPvSean commented Jan 12, 2021

i messed up this pull request, going to open a clean one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants