Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] self.paginated_response does not return the aggregated list of distributions (cloudfront_info) #769

Closed
1 task done
TheOptimisticFactory opened this issue Oct 18, 2021 · 5 comments · Fixed by #780
Labels
bug This issue/PR relates to a bug module module plugins plugin (any type)

Comments

@TheOptimisticFactory
Copy link
Contributor

Summary

AWS paginates distributions by default at 100 items per page.

Trying the following can cause an error if the distribution appears in the first page of the results:

- name: "Check if distribution already exists for {{ domainName }}"
  cloudfront_info:
    <<: *AWS_CREDENTIALS
    distribution: true
    domain_name_alias: "{{ domainName }}"
  register: distribution_data

The error lies at the line 502, in the function paginated_response:

results.update(result)

For distributions, AWS returns an object as follows:

{ 
  "IsTruncated": true,
  "Items": [{ 
     "ARN": "REDACTED",
     // ... distribution data 
   }, {
     "ARN": "REDACTED",
     // ... distribution data 
  }],
  "Marker": "",
  "MaxItems": 100,
  "NextMarker": "REDACTED_MARKER",
  "Quantity": 100
}

Which means results.update(result) on the Nth page overwrites the previous results, causing the returned data to only include the distributions included in the last page -- as the items arrays are not concatenated together

In my case the 2nd page shows the following object:

{ 
  "IsTruncated": true,
  "Items": [{ 
     "ARN": "REDACTED",
     // ... distribution data 
   }, {
     "ARN": "REDACTED",
     // ... distribution data 
  }],
  "Marker": "REDACTED_MARKER",
  "MaxItems": 100,
  "NextMarker": "",
  "Quantity": 59
}

And printing the length of results returned by service_mgr.get_distribution_id_from_domain_name(domain_name_alias) yields 59, which is incorrect, and should be 159 as can be seen from the following screenshot

image

Issue Type

Bug Report

Component Name

cloudfront_info

Ansible Version

image

Collection Versions

The bug exists in both

AWS SDK versions

image

Configuration

$ ansible-config dump --only-changed

ANY_ERRORS_FATAL(~/workspace/ansible/ansible.cfg) = True
DEFAULT_STDOUT_CALLBACK(~/workspace/ansible/ansible.cfg) = debug
DUPLICATE_YAML_DICT_KEY(~/workspace/ansible/ansible.cfg) = warn
HOST_KEY_CHECKING(~/workspace/ansible/ansible.cfg) = False
INJECT_FACTS_AS_VARS(~/workspace/ansible/ansible.cfg) = False
INTERPRETER_PYTHON(~/workspace/ansible/ansible.cfg) = auto
RETRY_FILES_ENABLED(~/workspace/ansible/ansible.cfg) = False
TRANSFORM_INVALID_GROUP_CHARS(~/workspace/ansible/ansible.cfg) = never

OS / Environment

Debian 4.19.194-3 (2021-07-18) x86_64

Steps to Reproduce

1. Have over 100 distributions (or adjust the pagination of distribtions down enough to have multiple pages)
2. Try to fetch the info of a distribution that would appear on the first of multiple pages
3. Lo and behold: "Error unable to source a distribution id from domain_name_alias"

Sample fetch:

- name: "Check if distribution already exists for {{ domainName }}"
  cloudfront_info:
    <<: *AWS_CREDENTIALS
    distribution: true
    domain_name_alias: "{{ domainName }}"
  register: distribution_data

Expected Results

EXPECTATIONS:

  • Have the corresponding distribution data fetched disregarding on which page it may be

Actual Results

fatal: [REDACTED]: FAILED! => {
    "changed": false,
    "invocation": {
        "module_args": {
            "all_lists": false,
            "aws_access_key": "REDACTED",
            "aws_ca_bundle": null,
            "aws_config": null,
            "aws_secret_key": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER",
            "debug_botocore_endpoint_logs": false,
            "distribution": true,
            "distribution_config": false,
            "distribution_id": null,
            "domain_name_alias": "REDACTED",
            "ec2_url": null,
            "invalidation": false,
            "invalidation_id": null,
            "list_distributions": false,
            "list_distributions_by_web_acl_id": false,
            "list_invalidations": false,
            "list_origin_access_identities": false,
            "list_streaming_distributions": false,
            "origin_access_identity": false,
            "origin_access_identity_config": false,
            "origin_access_identity_id": null,
            "profile": null,
            "region": "eu-west-3",
            "security_token": null,
            "streaming_distribution": false,
            "streaming_distribution_config": false,
            "summary": false,
            "validate_certs": true
        }
    }
}

MSG:

Error unable to source a distribution id from domain_name_alias

Thrown by:

# get distribution id from domain name alias
if require_distribution_id and distribution_id is None:
distribution_id = service_mgr.get_distribution_id_from_domain_name(domain_name_alias)
if not distribution_id:
module.fail_json(msg='Error unable to source a distribution id from domain_name_alias')

Code of Conduct

  • I agree to follow the Ansible Code of Conduct
@ansibullbot
Copy link

Files identified in the description:

If these files are inaccurate, please update the component name section of the description or use the !component bot command.

click here for bot help

@ansibullbot
Copy link

@ansibullbot ansibullbot added bug This issue/PR relates to a bug module module needs_triage plugins plugin (any type) labels Oct 18, 2021
@TheOptimisticFactory
Copy link
Contributor Author

TheOptimisticFactory commented Oct 18, 2021

An ugly-yet-simple fix would be:

    # Only necessary until https://github.com/ansible-collections/community.aws/issues/769 gets fixed
    def paginated_response(self, func, result_key=""):
        '''
        Returns expanded response for paginated operations.
        The 'result_key' is used to define the concatenated results that are combined from each paginated response.
        '''
        args = dict()
        results = dict()
        items = []
        loop = True
        while loop:
            response = func(**args)
            if result_key == "":
                result = response
                result.pop('ResponseMetadata', None)
            else:
                result = response.get(result_key)
            results.update(result)
            if 'Items' in result.keys():
              items = items + result.get('Items')
              results['Items'] = items
            args['Marker'] = response.get('NextMarker')
            for key in response.keys():
                if key.endswith('List'):
                    args['Marker'] = response[key].get('NextMarker')
                    break
            loop = args['Marker'] is not None
        return results

@tremble
Copy link
Contributor

tremble commented Oct 18, 2021

Hi @TheOptimisticFactory,

There's actually a simpler option: boto3 has paginator helpers (only for some calls)
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/cloudfront.html#CloudFront.Paginator.ListDistributions

You can see an example of how to use them in #472 :
https://github.com/ansible-collections/community.aws/pull/472/files#diff-5b6740d7309875806709fffb540ad4344cf7621a6bcb580e28cdee673a453a95R162

Would you be interested in submitting a Pull Request?

Mark

@TheOptimisticFactory
Copy link
Contributor Author

TheOptimisticFactory commented Oct 19, 2021

While my problem is purely on the distribution listing, it may also occur on any of the calls to self.paginated_response() which covers an array of different AWS functions

  1. self.client.get_distribution
  2. self.client.get_distribution_config
  3. self.client.get_cloud_front_origin_access_identity
  4. self.client.get_cloud_front_origin_access_identity_config
  5. self.client.get_invalidation
  6. self.client.get_streaming_distribution
  7. self.client.get_streaming_distribution_config
  8. self.client.list_cloud_front_origin_access_identities
  9. self.client.list_distributions
  10. self.client.list_distributions_by_web_acl_id
  11. self.client.list_invalidations
  12. self.client.list_streaming_distributions
  13. self.client.list_distributions_by_web_acl_id

The method get_distribution_id_from_domain_name itself uses both distributions and streaming distributions (L462-L463)

def get_distribution_id_from_domain_name(self, domain_name):
try:
distribution_id = ""
distributions = self.list_distributions(False)
distributions += self.list_streaming_distributions(False)
for dist in distributions:
if 'Items' in dist['Aliases']:
for alias in dist['Aliases']['Items']:
if str(alias).lower() == domain_name.lower():
distribution_id = dist['Id']
break
return distribution_id
except (botocore.exceptions.ClientError, botocore.exceptions.BotoCoreError) as e:
self.module.fail_json_aws(e, msg="Error getting distribution id from domain name")


Switching to boto3 iterators implies adjusting all the above calls, and I would rather wait for an experienced maintainer to carry out that task safely :)

ansible-zuul bot pushed a commit that referenced this issue Oct 25, 2021
Fix cloudfront_info pagination bug

SUMMARY
Currently the cloudfront_info module is using a bespoke paginator function, which is causing some issues when over 100 distributions exist (see linked issue), when in fact it can be easily switched to a native boto3 paginator that would fix the bug reported in the linked issue.
Fixes #769
ISSUE TYPE

Bugfix Pull Request

COMPONENT NAME
cloudfront_info
ADDITIONAL INFORMATION
Pagination docs: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/cloudfront.html#paginators
Tests following steps from linked issue:
ran a ansible task:
- hosts: localhost
  connection: local
  gather_facts: false
  tasks:
    - name: "Check if distribution already exists"
      cloudfront_info:
        distribution: true
        domain_name_alias: "XX"
      register: distribution_data
Results:
TASK [Check if distribution already exists] ************************************************************************************************************************************************************************************************************************************
task path: /root/test/test.yml:6
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: root
<127.0.0.1> EXEC /bin/sh -c 'echo ~root && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /root/.ansible/tmp `"&& mkdir "` echo /root/.ansible/tmp/ansible-tmp-1635164453.7083595-389-139483031845166 `" && echo ansible-tmp-1635164453.7083595-389-139483031845166="` echo /root/.ansible/tmp/ansible-tmp-1635164453.7083595-389-139483031845166 `" ) && sleep 0'
Using module file /root/test/library/cloudfront_info.py
<127.0.0.1> PUT /root/.ansible/tmp/ansible-local-3861wn9tjv7/tmp6o0jv2s2 TO /root/.ansible/tmp/ansible-tmp-1635164453.7083595-389-139483031845166/AnsiballZ_cloudfront_info.py
<127.0.0.1> EXEC /bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-tmp-1635164453.7083595-389-139483031845166/ /root/.ansible/tmp/ansible-tmp-1635164453.7083595-389-139483031845166/AnsiballZ_cloudfront_info.py && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '/usr/bin/python3 /root/.ansible/tmp/ansible-tmp-1635164453.7083595-389-139483031845166/AnsiballZ_cloudfront_info.py && sleep 0'
<127.0.0.1> EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1635164453.7083595-389-139483031845166/ > /dev/null 2>&1 && sleep 0'
ok: [localhost] => {
    "changed": false,
    "cloudfront": {
        "XXXXXX": {
            "Distribution": {
                "ARN": "arn:aws:cloudfront::XXXXXXXX:distribution/XXXXXX",
                "ActiveTrustedKeyGroups": {
                    "Enabled": false,
                    "Quantity": 0
                },
                "ActiveTrustedSigners": {
                    "Enabled": false,
                    "Quantity": 0
                },
                "AliasICPRecordals": [
                    {
                        "CNAME": "XXXX",
                        "ICPRecordalStatus": "APPROVED"
                    }
                ],
...

Reviewed-by: Mark Chappell <None>
Reviewed-by: Mark Woolley <[email protected]>
Reviewed-by: Markus Bergholz <[email protected]>
Reviewed-by: None <None>
alinabuzachis added a commit to alinabuzachis/community.aws that referenced this issue May 25, 2022
…ns#769)

Update changelog (main branch) with the amazon.aws 3.2.0 changes

SUMMARY

Update changelog (main branch) with the amazon.aws 3.2.0 changes

ISSUE TYPE


Docs Pull Request

Reviewed-by: Mark Chappell <None>
Reviewed-by: Markus Bergholz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue/PR relates to a bug module module plugins plugin (any type)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants