-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set limits for XCOM pod #28125
Set limits for XCOM pod #28125
Conversation
041ee6e
to
d4e3439
Compare
Tests are failing. |
670dc57
to
37c8156
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about fixing the tests @dolfinus ?
Is there any way to get failing pod log or something like that? I don't understand why container is starting using one executor but not other ones, and these executors are changing all the time |
Surely - you can download the logs from the job - just click on the "Summary" and you will be able to downlad the logs from failed jobs - it is an artifact that is attached to the job (this is a complete export of all logs from failed test). Also you can very easily reproduce what the tests are doing locally by following our documentation: https://github.com/apache/airflow/blob/main/TESTING.rst#typical-testing-pattern-for-kubernetes-tests is a step-by-step wizard like guide and above that you will find explanation of what happens in each step. Also breeze has |
XCOM sidecar container was killed because of OOM, even on memory limit 10Mi and 20Mi. I didn't expect that bash + sleep requires more RAM than 1Mi. Increased limit to 50Mi, fixed. |
Any updates? |
Well you removed two reviewers that likely would have been much better to review it (not sure why) so you limited your chances to get it reviewed ... |
I am not sure we want to set arbitrary limits for the pods @dstandish @jedcunningham WDYT? |
There was a "Requested changes" status, and I've clicked on it to inform that changes were made. I don't have permissions to change list of reviewers, maybe this is some kind of Github bug. |
Maybe. I've added them back. |
Now. There are some confflicts to resolve - It's been some time it was out (and there is some backlog of issues/PRs so this takes longer than usual) - can you please rebase and fix the conflicts though. I have not looked in detail and we've not heard from @jedcunningham and @dstandish, but I think the numbers here are too magical and forcing them on everyone is a bit too much to solve your case @dolfinus. You can add a feature to optionally add resource quota for those PODS with default being "not set" - that is the best way of handling it. I hardly imagine we will merge a change that we have a fixed memory used for the XCom POD (and it's backwards incompatible too). |
5f9a2f2
to
148ca36
Compare
Ok, got your point. I've added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thsi is much better. Thank you :)
We probably should not be configuring xcom settings from the hook in this way but for better or worse we've done it, first in apache#26766 and again in apache#28125. The problem is that other operators derived from KPO, e.g. GKEStartPodOperator, may use a different hook that doesn't have this xcom-specific methods. So KPO needs to tolerate that.
My k8s namespace has specified
ResourceQuota
, and thus every container should be created with bothrequests
andlimits
resources set.I've tried to run simple dag like this:
but got an exception:
Full pod spec from logs:
Container with pod command has the same resources I've set using
KubernetesPodOperator.container_resources
, but sidecar container with XCOM fetcher is created without limits, causing this exception. As a result I cannot get XCOM of any task created using this operator.Here I've set limits as
1m
for cpu and10m
for memory. IMHO,sleep 1
command does not need more resources, even10m
maybe too much.I haven't add any options to change these values (like in #26766), it seems that currently there is no way to change command executed by this container.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.