chore(regTests): print logs when regTests timeout #2031

kostasrim · 2023-10-17T09:57:37Z

This PR:

add a python script to print the most recent log
if CI timeouts, print the most recent log
replace global timeout with timeout command
upload all logs on failure()
print uid + port + the log files for each newly df instance

The reason that I replaced the global timeout and used the timeout command instead, is because cancelled() is not working when the a job timeouts and therefore it can't be used to print the LOG.

I also tried pytest timeout module but this won't work either because a) there is only globaltimeout which sets the per-test timeout b) it continues running the tests.

One more thing to consider here is that because we use a step we no longer get a nice colored output.

Another, over-engineered solution but probably better solution, would be to register a signal which timeout would use and pytest would handle it before it exits. That way, we could have pytest print the logs, instead of the new extra step

kostasrim · 2023-10-18T11:52:00Z

Also, printing only the last log might not be the best idea, since there are pytests that create multiple dragonfly instances and each of those has their own log. Maybe it makes sense to print the last X logs where X is the largest pool of DF instances we create in one go. Upon a timeout, we know which test fails, so we know how many logs we need to read (obviously this will add a little bit of noise)

Another issue is also the default's fixture log. This will be created first so it could be the case that we never printed. But, the ones that deadlock/fail are usually the replication tests which all create a new DF instance. Therefore, this approach should be quite accurate most of the time and we won't need to fall back in download the logs

I also added:

Upload all logs
Each time we create a new DF instance, we give it a UID and we print to the CI its log names and the port that it listens. That way, when we download the logs we know which log belongs to which instance.

kostasrim · 2023-10-18T14:08:14Z

sample https://github.com/dragonflydb/dragonfly/actions/runs/6561893087/job/17822623403

kostasrim · 2023-10-18T14:09:07Z

.github/actions/regression-tests/action.yml

+        timeout 20m pytest -m "${{inputs.filter}}" --json-report --json-report-file=rep1_report.json dragonfly/replication_test.py --df alsologtostderr --df enable_multi_shard_sync=true || code=$?; if [[ $code -eq 124 ]]; then echo "TIMEDOUT=1">> "$GITHUB_OUTPUT"; exit 1; fi
+
+
+        timeout 20m pytest -m "${{inputs.filter}}" --json-report --json-report-file=rep2_report.json dragonfly/replication_test.py --df alsologtostderr --df enable_multi_shard_sync=false || code=$?; if [[ $code -eq 124 ]]; then echo "TIMEDOUT=1">> "$GITHUB_OUTPUT"; exit 1; fi


Do we need --df alsologtostderr now that we upload the logs on failures?

dranikpg

I'll approve because it works but it can be made much simpler

dranikpg · 2023-10-18T16:36:48Z

tests/dragonfly/instance.py

+def monotonic_integer_generator():
+    i = 0
+    while i < 10000000000:
+        yield i
+        i += 1
+
+
+uid_iterator = monotonic_integer_generator()


uid_gen = itertools.count() 🙂

dranikpg · 2023-10-18T16:43:10Z

tools/extract_latest_log.py

+#!/usr/bin/env python3
+
+"""Extract the most recent INFO log from a directory."""
+


Eh... What about ls *.log -tr | head -n 1 <- most recently modified file? 😆
Or ls -a | grep log | sort | head -n 1 for sorting by name like your script

I like the first one! Also, we don't need r here, since the order is descending by default

dranikpg · 2023-10-18T16:44:37Z

.github/actions/regression-tests/action.yml

+        if [[ "${{ env.TIMEDOUT_STEP_1 }}" -eq 1 ]] || [[ "${{ env.TIMEDOUT_STEP_2 }}" -eq 1 ]]; then
+          echo "🪵🪵🪵🪵🪵🪵 Latest log before timeout 🪵🪵🪵🪵🪵🪵\n\n"
+          ${GITHUB_WORKSPACE}/tools/extract_latest_log.py --path /tmp/ | xargs cat
+          echo "🪵🪵🪵🪵🪵🪵 Latest log before timeout end 🪵🪵🪵🪵🪵🪵\n\n"


See my comment about using the shell instead, also you don't need xargs | cat, doesn't it print just so?

I see you're a big fan of over-engineering 🙂

I could have just printed the file on the python script but honestly I should also have used the bash itself -- really the python script was an overkill -- so I removed it 😄

Yes I needed the xargs cat because otherwise only the filename would get printed and not its contents

+1 it was over-engineered for no reason

Ah yes, I forgot the script only prints the name 😅

dranikpg · 2023-10-18T16:45:44Z

.github/actions/regression-tests/action.yml

+        timeout 20m pytest -m "${{inputs.filter}}" --json-report --json-report-file=rep1_report.json dragonfly/replication_test.py --log-cli-level=INFO --df alsologtostderr --df enable_multi_shard_sync=true || code=$?; if [[ $code -eq 124 ]]; then echo "TIMEDOUT=1">> "$GITHUB_OUTPUT"; exit 1; fi
+
+
+        timeout 20m pytest -m "${{inputs.filter}}" --json-report --json-report-file=rep2_report.json dragonfly/replication_test.py --log-cli-level=INFO --df alsologtostderr --df enable_multi_shard_sync=false || code=$?; if [[ $code -eq 124 ]]; then echo "TIMEDOUT=1">> "$GITHUB_OUTPUT"; exit 1; fi


also if you touched it, maybe we can unify it somehow... it doesn't differ except for the enable_multi_shard option

dranikpg

Now its much simpler 😎

kostasrim self-assigned this Oct 17, 2023

kostasrim force-pushed the print_logs_when_job_timeouts branch 3 times, most recently from 7d5772b to 9aca418 Compare October 18, 2023 11:55

kostasrim requested review from dranikpg and romange October 18, 2023 11:57

kostasrim changed the title ~~(Do not review -- testing it) chore(regTests): print logs when regTests timeout~~ chore(regTests): print logs when regTests timeout Oct 18, 2023

kostasrim force-pushed the print_logs_when_job_timeouts branch 2 times, most recently from 579009c to c13fab2 Compare October 18, 2023 12:36

chore(ci-regTests): print logs when regTests timeout

fa9670c

kostasrim force-pushed the print_logs_when_job_timeouts branch from c13fab2 to fa9670c Compare October 18, 2023 12:37

kostasrim added 3 commits October 18, 2023 16:00

add upload

b8f9bf1

inject failure

969fa08

add uid generator

ebdc591

kostasrim commented Oct 18, 2023

View reviewed changes

remove injected timeout failure

99ef8b2

dranikpg previously approved these changes Oct 18, 2023

View reviewed changes

kostasrim added 2 commits October 19, 2023 12:21

chore: apply gh comments

ca994fb

test it

3f7a18b

kostasrim dismissed dranikpg’s stale review via 3f7a18b October 19, 2023 09:22

kostasrim added 3 commits October 19, 2023 12:59

fix func call

0f48b81

fix

168ca1d

remove injected failures

3f654a6

kostasrim requested a review from dranikpg October 19, 2023 10:53

add iter tools

d05d646

dranikpg approved these changes Oct 19, 2023

View reviewed changes

kostasrim merged commit 64841ef into main Oct 20, 2023

kostasrim deleted the print_logs_when_job_timeouts branch October 20, 2023 07:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(regTests): print logs when regTests timeout #2031

chore(regTests): print logs when regTests timeout #2031

kostasrim commented Oct 17, 2023 •

edited

Loading

kostasrim commented Oct 18, 2023 •

edited

Loading

kostasrim commented Oct 18, 2023

kostasrim Oct 18, 2023

dranikpg left a comment

dranikpg Oct 18, 2023

kostasrim Oct 19, 2023

dranikpg Oct 18, 2023

kostasrim Oct 19, 2023

dranikpg Oct 18, 2023

kostasrim Oct 19, 2023

dranikpg Oct 19, 2023

dranikpg Oct 18, 2023

kostasrim Oct 19, 2023

dranikpg left a comment

		timeout 20m pytest -m "${{inputs.filter}}" --json-report --json-report-file=rep1_report.json dragonfly/replication_test.py --df alsologtostderr --df enable_multi_shard_sync=true \|\| code=$?; if [[ $code -eq 124 ]]; then echo "TIMEDOUT=1">> "$GITHUB_OUTPUT"; exit 1; fi


		timeout 20m pytest -m "${{inputs.filter}}" --json-report --json-report-file=rep2_report.json dragonfly/replication_test.py --df alsologtostderr --df enable_multi_shard_sync=false \|\| code=$?; if [[ $code -eq 124 ]]; then echo "TIMEDOUT=1">> "$GITHUB_OUTPUT"; exit 1; fi

		#!/usr/bin/env python3

		"""Extract the most recent INFO log from a directory."""

chore(regTests): print logs when regTests timeout #2031

chore(regTests): print logs when regTests timeout #2031

Conversation

kostasrim commented Oct 17, 2023 • edited Loading

kostasrim commented Oct 18, 2023 • edited Loading

kostasrim commented Oct 18, 2023

Choose a reason for hiding this comment

dranikpg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dranikpg left a comment

Choose a reason for hiding this comment

kostasrim commented Oct 17, 2023 •

edited

Loading

kostasrim commented Oct 18, 2023 •

edited

Loading