Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add resource requests to default podspec #11559

Merged
merged 1 commit into from
Jan 20, 2022

Conversation

kdelee
Copy link
Member

@kdelee kdelee commented Jan 18, 2022

Extend the timeout, assuming that we want to let the kubernetes scheduler
start containers when it wants to start them. This allows us to make
resource requests knowing that when some jobs queue up waiting for
resources, they will not get reaped in as short of a
timeout.

this is an alternative to #11551
How is this different:

  • doesn't replicate what is already a bad pattern w/ the ResourceQuota workaround
  • don't get pretty messages in awx UI, it does show "running" the whole time

How does this achieve similar goals:

  • DO get expected behavior that jobs queue up in kubernetes and complete eventually

Problem with this approach:
Reveals that image pull errors are not well reflected in receptor ansible/receptor#521, for which I'm working on ansible/receptor#522

Extend the timeout, assuming that we want to let the kubernetes scheduler
start containers when it wants to start them. This allows us to make
resource requests knowing that when some jobs queue up waiting for
resources, they will not get reaped in as short of a
timeout.
@kdelee
Copy link
Member Author

kdelee commented Jan 19, 2022

It would be nice to have this,because if we use the setting from #11395 for items in the actual container entry, we override everything about the worker container. That said, as is pointed out in https://github.com/ansible/awx/pull/10569/files -- the image and args always get ignored.

Also, DEFAULT_EXECUTION_QUEUE_POD_SPEC_OVERRIDE only sets the resource requests for the default contianer group pod spec, and I think giving the baseline request is a good pattern to establish in the default podspec.

Alternatives I can think of are, yet another setting "DEFAULT_CONTAINER_GROUP_CONTAINER_RESOURCES", or work on the deepmerge function that will merge the list items in containers based on if they share names.

@shanemcd
Copy link
Member

I think I am -1 on this idea, it's likely to cause more problems than it is to help. I would also encourage the usage of a LimitRange rather than putting this in the pod spec itself.

@shanemcd
Copy link
Member

I spoke too soon. For whatever reason I read this and thought you were setting limits.

@kdelee kdelee merged commit faba648 into ansible:devel Jan 20, 2022
@kdelee kdelee deleted the pending_container_group_jobs_take2 branch January 20, 2022 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants