rewrite method used in ecs to fetch less logs #31786

vandonr-amz · 2023-06-08T00:23:10Z

The current behavior is to fetch all logs and only keep the last message(s) This is wasteful, as there is an option to fetch from the end directly, allowing to send only the minimum number of requests. Since a generator is used in get_log_events, stopping the iteration after we've collected enough logs prevents it from doing more requests.

With this change, we can expect less API calls & faster execution time for this method, especially in tasks that emit a log of logs.

cc @ferruzzi

The current behavior is to fetch all logs and only keep the last message(s) This is wasteful, as there is an option to fetch from the end directly, allowing to send only the minimum number of requests. Since a generator is used in get_log_events, stopping the iteration after we've collected enough logs prevents it from doing more requests. With this change, we can expect less API calls & faster execution time for this method, especially in tasks that emit a log of logs.

airflow/providers/amazon/aws/hooks/logs.py

airflow/providers/amazon/aws/hooks/ecs.py

airflow/providers/amazon/aws/hooks/logs.py

airflow/providers/amazon/aws/hooks/ecs.py

airflow/providers/amazon/aws/hooks/logs.py

tests/providers/amazon/aws/hooks/test_ecs.py

vandonr-amz · 2023-06-14T22:07:09Z

airflow/providers/amazon/aws/hooks/ecs.py

-        return [log["message"] for log in deque(self._get_log_events(), maxlen=number_messages)]
+        """
+        Gets the last logs messages in one single request, so restrictions apply:
+         - if logs are too old, the response will be empty


this may look bad, but from some tries I did, it'd require many calls to get logs from the tail when they are old (logs that are 1 week old already require tens of calls before one will return something non-empty), and the existing code was giving up after 3 empty responses, so we're only making the problem appear a little faster.
And from the ecs hook, we can assume it's going to used to query fresh logs from tasks that just finished.

I'm not sure I grok this. Why should reading from the bottom of a log stream be any different depending on how old the stream is?

well that's a question for cloudwatch teams, but somehow the older the last event in the log stream, the more empty requests you have to pull before you start getting results.
I guess maybe they try to answer within a certain time SLA ? And that doesn't give them enough time to get it from semi-hot storage ? Or like they start searching in chronological order, and the older it is, the more records they need to go through before getting hits ?

you can try it yourself, go to cloudwatch in the console, find a stream that's like 2-3 weeks old, and open it with the network pane of your browser open, and look at the number of requests it makes

t push

airflow/providers/amazon/aws/hooks/logs.py

The current behavior is to fetch all logs and only keep the last message(s) This is wasteful, as there is an option to fetch from the end directly, allowing to send only the minimum number of requests. Since a generator is used in get_log_events, stopping the iteration after we've collected enough logs prevents it from doing more requests. With this change, we can expect less API calls & faster execution time for this method, especially in tasks that emit a log of logs.

vandonr-amz requested review from eladkal and o-nikolas as code owners June 8, 2023 00:23

boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Jun 8, 2023

vandonr-amz commented Jun 8, 2023

View reviewed changes

airflow/providers/amazon/aws/hooks/logs.py Outdated Show resolved Hide resolved

uranusjr reviewed Jun 8, 2023

View reviewed changes

airflow/providers/amazon/aws/hooks/ecs.py Outdated Show resolved Hide resolved

airflow/providers/amazon/aws/hooks/ecs.py Outdated Show resolved Hide resolved

airflow/providers/amazon/aws/hooks/logs.py Outdated Show resolved Hide resolved

vincbeck reviewed Jun 8, 2023

View reviewed changes

airflow/providers/amazon/aws/hooks/ecs.py Outdated Show resolved Hide resolved

airflow/providers/amazon/aws/hooks/logs.py Outdated Show resolved Hide resolved

tests/providers/amazon/aws/hooks/test_ecs.py Outdated Show resolved Hide resolved

vandonr-amz added 3 commits June 8, 2023 09:22

less memory allocation

1e678bd

Merge branch 'main' into vandonr/nice

3caa148

other approach: use one request & deprecate param

33c66db

vandonr-amz commented Jun 14, 2023

View reviewed changes

vandonr-amz added 5 commits June 15, 2023 13:44

Merge remote-tracking branch 'origin/main' into vandonr/nice

c0096dd

Merge remote-tracking branch 'origin/main' into vandonr/nice

df08b3d

Merge remote-tracking branch 'origin/main' into vandonr/nice

39a8ab6

Merge remote-tracking branch 'origin/main' into vandonr/nice

290715f

t push

Merge remote-tracking branch 'origin/main' into vandonr/nice

7245a38

o-nikolas approved these changes Jun 26, 2023

View reviewed changes

vincbeck approved these changes Jun 26, 2023

View reviewed changes

airflow/providers/amazon/aws/hooks/logs.py Outdated Show resolved Hide resolved

vandonr-amz added 2 commits June 26, 2023 16:02

+deprecated in comment

32ba6e0

Merge remote-tracking branch 'origin/main' into vandonr/nice

0e51bb7

o-nikolas merged commit e4eb198 into apache:main Jun 26, 2023

vandonr-amz deleted the vandonr/nice branch June 27, 2023 00:19

This was referenced Jul 6, 2023

Status of testing Providers that were prepared on July 06, 2023 #32389

Closed

Status of testing Providers that were prepared on July 09, 2023 #32460

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rewrite method used in ecs to fetch less logs #31786

rewrite method used in ecs to fetch less logs #31786

vandonr-amz commented Jun 8, 2023 •

edited

Loading

vandonr-amz Jun 14, 2023

o-nikolas Jun 15, 2023

vandonr-amz Jun 15, 2023

vandonr-amz Jun 15, 2023

rewrite method used in ecs to fetch less logs #31786

rewrite method used in ecs to fetch less logs #31786

Conversation

vandonr-amz commented Jun 8, 2023 • edited Loading

vandonr-amz Jun 14, 2023

Choose a reason for hiding this comment

o-nikolas Jun 15, 2023

Choose a reason for hiding this comment

vandonr-amz Jun 15, 2023

Choose a reason for hiding this comment

vandonr-amz Jun 15, 2023

Choose a reason for hiding this comment

vandonr-amz commented Jun 8, 2023 •

edited

Loading