Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ansible-galaxy doesn't handle rate limiting correctly #74191

Closed
newswangerd opened this issue Apr 8, 2021 · 6 comments · Fixed by #74240
Closed

ansible-galaxy doesn't handle rate limiting correctly #74191

newswangerd opened this issue Apr 8, 2021 · 6 comments · Fixed by #74240
Assignees
Labels
affects_2.10 This issue/PR affects Ansible v2.10 bug This issue/PR relates to a bug. python3 support:core This issue/PR relates to code supported by the Ansible Engineering Team.

Comments

@newswangerd
Copy link
Member

Summary

ansible-galaxy collection install makes a lot of requests to galaxy.ansible.com and can occasionally hit the rate limit. When that happens, galaxy.ansible.com will return a 520 http code and ansible-galaxy fails to download the collection and exits. Ideally when the client encounters a rate limiting http code (either 520, or 429), it should wait, slow down the request rate and try again rather than exiting.

More information is available here: ansible/galaxy#2429

Issue Type

Bug Report

Component Name

ansible-galaxy

Ansible Version

$ ansible --version
root@ubuntu-s-1vcpu-1gb-nyc3-01:~# ansible --version
ansible 2.10.7
  config file = None
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.8/dist-packages/ansible
  executable location = /usr/local/bin/ansible
  python version = 3.8.5 (default, Jan 27 2021, 15:41:15) [GCC 9.3.0]

Configuration

$ ansible-config dump --only-changed
root@ubuntu-s-1vcpu-1gb-nyc3-01:~# ansible-config dump --only-changed
root@ubuntu-s-1vcpu-1gb-nyc3-01:~#

OS / Environment

All

Steps to Reproduce

Run ansible-galaxy collection install amazon.aws`. If your internet is fast enough, this will occasionally fail when galaxy.ansible.com returns a 520 error code.

Expected Results

Collection should be installed.

Actual Results

`ansible-galaxy` encounters a 429 or 520 http code and exits.

Code of Conduct

I agree to follow the Ansible Code of Conduct

@ansibot
Copy link
Contributor

ansibot commented Apr 8, 2021

Files identified in the description:

  • lib/ansible/galaxy

If these files are incorrect, please update the component name section of the description or use the !component bot command.

click here for bot help

@ansibot ansibot added affects_2.10 This issue/PR affects Ansible v2.10 bug This issue/PR relates to a bug. needs_triage Needs a first human triage before being processed. python3 support:core This issue/PR relates to code supported by the Ansible Engineering Team. labels Apr 8, 2021
@awcrosby
Copy link
Contributor

awcrosby commented Apr 8, 2021

Note: as brought up earlier, a mitigation could be to use the page_size parameter on the community galaxy v2/collections/<ns>/<name>/versions/ endpoint to reduce requests

@sivel
Copy link
Member

sivel commented Apr 8, 2021

What recommendation would the galaxy team have for a balanced page_size?

@awcrosby
Copy link
Contributor

awcrosby commented Apr 8, 2021

I feel like page_size=100 could be a good balance, reducing requests, and there are only 2 keys for each result on the v2 versions/ endpoint, not sure if @newswangerd or @cutwater wants to weigh in

@sivel
Copy link
Member

sivel commented Apr 8, 2021

I believe for the page_size=100 adjustment, this is all that is required:

Click for diff...

diff --git a/lib/ansible/galaxy/api.py b/lib/ansible/galaxy/api.py
index 352082ded7..93b6df4114 100644
--- a/lib/ansible/galaxy/api.py
+++ b/lib/ansible/galaxy/api.py
@@ -21,7 +21,7 @@ from ansible.errors import AnsibleError
 from ansible.galaxy.user_agent import user_agent
 from ansible.module_utils.six import string_types
 from ansible.module_utils.six.moves.urllib.error import HTTPError
-from ansible.module_utils.six.moves.urllib.parse import quote as urlquote, urlencode, urlparse
+from ansible.module_utils.six.moves.urllib.parse import quote as urlquote, urlencode, urlparse, parse_qs
 from ansible.module_utils._text import to_bytes, to_native, to_text
 from ansible.module_utils.urls import open_url, prepare_multipart
 from ansible.utils.display import Display
@@ -312,6 +312,7 @@ class GalaxyAPI:
     def _call_galaxy(self, url, args=None, headers=None, method=None, auth_required=False, error_context_msg=None,
                      cache=False):
         url_info = urlparse(url)
+        query = parse_qs(url_info.query)
         cache_id = get_cache_id(url)
         if cache and self._cache:
             server_cache = self._cache.setdefault(cache_id, {})
@@ -342,7 +343,7 @@ class GalaxyAPI:
 
                 return res
 
-            elif not url_info.query:
+            elif 'page' not in query:
                 # The cache entry had expired or does not exist, start a new blank entry to be filled later.
                 expires = datetime.datetime.utcnow()
                 expires += datetime.timedelta(days=1)
@@ -781,7 +782,7 @@ class GalaxyAPI:
             api_path = self.available_api_versions['v2']
             pagination_path = ['next']
 
-        versions_url = _urljoin(self.api_server, api_path, 'collections', namespace, name, 'versions', '/')
+        versions_url = _urljoin(self.api_server, api_path, 'collections', namespace, name, 'versions', '/?page_size=100')
         versions_url_info = urlparse(versions_url)
 
         # We should only rely on the cache if the collection has not changed. This may slow things down but it ensures

With ansible.aws this makes 2 requests instead of 15 to /api/v2/collections/amazon/aws/versions/

Still doesn't directly address retries, but it does speedup install of ansible.aws for me from about 20-30s to 5-10s (with an empty cache).

@bcoca
Copy link
Member

bcoca commented Apr 8, 2021

we have throttling code already in module_utils, just need to start usinig that

@samdoran samdoran removed the needs_triage Needs a first human triage before being processed. label Apr 8, 2021
@s-hertel s-hertel self-assigned this Apr 8, 2021
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue May 25, 2021
v2.10.10

Bugfixes

- Correctly set template_path and template_fullpath for usage in template lookup and action plugins.
- Fix fileglob bug where it could return different results for different order of parameters (ansible/ansible#72873).
- Improve resilience of ``ansible-galaxy collection`` by increasing the page size to make fewer requests overall and retrying queries with a jittered exponential backoff when rate limiting HTTP codes (520 and 429) occur. (ansible/ansible#74191)
- ansible-test - Use documented API to retrieve build information from Azure Pipelines.
- ansible.builtin.cron - Keep non-empty crontabs, when removing cron jobs (ansible/ansible#74497).
- ansible_test - add constraint for ``MarkupSafe`` (ansible/ansible#74666)
- filter plugins - patch new versions of Jinja2 to prevent warnings/errors on renamed filter decorators (ansible/ansible#74667)
- service - compare version without LooseVersion API (ansible/ansible#74488).
@ansible ansible locked and limited conversation to collaborators Jun 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
affects_2.10 This issue/PR affects Ansible v2.10 bug This issue/PR relates to a bug. python3 support:core This issue/PR relates to code supported by the Ansible Engineering Team.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants