Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve fetching logs from AWS #33231

Merged
merged 5 commits into from
Aug 10, 2023
Merged

Improve fetching logs from AWS #33231

merged 5 commits into from
Aug 10, 2023

Conversation

vincbeck
Copy link
Contributor

@vincbeck vincbeck commented Aug 8, 2023

Fetching logs from AWS CloudWatch can take a lot of times or even fail when the log stream is old and no time boundary is specified when querying CloudWatch.

Example: if you try to look at the logs of an old task on the Airflow UI, it can be very slow or even fail doing so.

By setting start_time and end_time (end_time is the most important), it improves drastically the latency. CloudWatch team recommended adding end time for performance to restrict the search space. I tested it with a 10 days old task and I could experience an improvement from 5 seconds to 1 second when fetching task logs from the UI.

Resolves #32897


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Copy link
Contributor

@o-nikolas o-nikolas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good! But maybe some unit tests for the changes to get_cloudwatch_logs and maybe get_log_events?

@vincbeck
Copy link
Contributor Author

vincbeck commented Aug 9, 2023

Changes look good! But maybe some unit tests for the changes to get_cloudwatch_logs and maybe get_log_events?

Good call! I added some unit tests for get_cloudwatch_logs. I dont think it is worth adding unit tests for get_log_events

@vincbeck vincbeck changed the title Improve fetching logs form AWS Improve fetching logs from AWS Aug 9, 2023
@o-nikolas o-nikolas merged commit c14cb85 into apache:main Aug 10, 2023
@vincbeck vincbeck deleted the vincbeck/logs branch August 10, 2023 18:04
ferruzzi pushed a commit to aws-mwaa/upstream-to-airflow that referenced this pull request Aug 17, 2023
Make use of start and end time input to the cloudwatch API to reduce the log search space and speed up log retrieval.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhance Airflow Logs API to fetch logs from Amazon Cloudwatch with time range
4 participants