Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rewrite method used in ecs to fetch less logs #31786

Merged
merged 11 commits into from
Jun 26, 2023

Conversation

vandonr-amz
Copy link
Contributor

@vandonr-amz vandonr-amz commented Jun 8, 2023

The current behavior is to fetch all logs and only keep the last message(s) This is wasteful, as there is an option to fetch from the end directly, allowing to send only the minimum number of requests. Since a generator is used in get_log_events, stopping the iteration after we've collected enough logs prevents it from doing more requests.

With this change, we can expect less API calls & faster execution time for this method, especially in tasks that emit a log of logs.

cc @ferruzzi

The current behavior is to fetch all logs and only keep the last message(s)
This is wasteful, as there is an option to fetch from the end directly, allowing to send only the minimum number of requests.
Since a generator is used in get_log_events, stopping the iteration after we've collected enough logs prevents it from doing more requests.

With this change, we can expect less API calls & faster execution time for this method, especially in tasks that emit a log of logs.
@boring-cyborg boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Jun 8, 2023
return [log["message"] for log in deque(self._get_log_events(), maxlen=number_messages)]
"""
Gets the last logs messages in one single request, so restrictions apply:
- if logs are too old, the response will be empty
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may look bad, but from some tries I did, it'd require many calls to get logs from the tail when they are old (logs that are 1 week old already require tens of calls before one will return something non-empty), and the existing code was giving up after 3 empty responses, so we're only making the problem appear a little faster.
And from the ecs hook, we can assume it's going to used to query fresh logs from tasks that just finished.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I grok this. Why should reading from the bottom of a log stream be any different depending on how old the stream is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well that's a question for cloudwatch teams, but somehow the older the last event in the log stream, the more empty requests you have to pull before you start getting results.
I guess maybe they try to answer within a certain time SLA ? And that doesn't give them enough time to get it from semi-hot storage ? Or like they start searching in chronological order, and the older it is, the more records they need to go through before getting hits ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can try it yourself, go to cloudwatch in the console, find a stream that's like 2-3 weeks old, and open it with the network pane of your browser open, and look at the number of requests it makes

@o-nikolas o-nikolas merged commit e4eb198 into apache:main Jun 26, 2023
@vandonr-amz vandonr-amz deleted the vandonr/nice branch June 27, 2023 00:19
ferruzzi pushed a commit to aws-mwaa/upstream-to-airflow that referenced this pull request Jun 27, 2023
The current behavior is to fetch all logs and only keep the last message(s)
This is wasteful, as there is an option to fetch from the end directly, allowing to send only the minimum number of requests.
Since a generator is used in get_log_events, stopping the iteration after we've collected enough logs prevents it from doing more requests.

With this change, we can expect less API calls & faster execution time for this method, especially in tasks that emit a log of logs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:amazon AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants