-
Notifications
You must be signed in to change notification settings - Fork 998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(regTests): print logs when regTests timeout #2031
Conversation
Also, printing only the last log might not be the best idea, since there are pytests that create multiple dragonfly instances and each of those has their own log. Maybe it makes sense to print the last X logs where X is the largest pool of DF instances we create in one go. Upon a timeout, we know which test fails, so we know how many logs we need to read (obviously this will add a little bit of noise) Another issue is also the default's fixture log. This will be created first so it could be the case that we never printed. But, the ones that deadlock/fail are usually the replication tests which all create a new DF instance. Therefore, this approach should be quite accurate most of the time and we won't need to fall back in download the logs I also added:
|
7d5772b
to
9aca418
Compare
579009c
to
c13fab2
Compare
c13fab2
to
fa9670c
Compare
timeout 20m pytest -m "${{inputs.filter}}" --json-report --json-report-file=rep1_report.json dragonfly/replication_test.py --df alsologtostderr --df enable_multi_shard_sync=true || code=$?; if [[ $code -eq 124 ]]; then echo "TIMEDOUT=1">> "$GITHUB_OUTPUT"; exit 1; fi | ||
|
||
|
||
timeout 20m pytest -m "${{inputs.filter}}" --json-report --json-report-file=rep2_report.json dragonfly/replication_test.py --df alsologtostderr --df enable_multi_shard_sync=false || code=$?; if [[ $code -eq 124 ]]; then echo "TIMEDOUT=1">> "$GITHUB_OUTPUT"; exit 1; fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need --df alsologtostderr
now that we upload the logs on failures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll approve because it works but it can be made much simpler
tests/dragonfly/instance.py
Outdated
def monotonic_integer_generator(): | ||
i = 0 | ||
while i < 10000000000: | ||
yield i | ||
i += 1 | ||
|
||
|
||
uid_iterator = monotonic_integer_generator() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uid_gen = itertools.count()
🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed!
tools/extract_latest_log.py
Outdated
#!/usr/bin/env python3 | ||
|
||
"""Extract the most recent INFO log from a directory.""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh... What about ls *.log -tr | head -n 1
<- most recently modified file? 😆
Or ls -a | grep log | sort | head -n 1
for sorting by name like your script
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the first one! Also, we don't need r
here, since the order is descending
by default
if [[ "${{ env.TIMEDOUT_STEP_1 }}" -eq 1 ]] || [[ "${{ env.TIMEDOUT_STEP_2 }}" -eq 1 ]]; then | ||
echo "🪵🪵🪵🪵🪵🪵 Latest log before timeout 🪵🪵🪵🪵🪵🪵\n\n" | ||
${GITHUB_WORKSPACE}/tools/extract_latest_log.py --path /tmp/ | xargs cat | ||
echo "🪵🪵🪵🪵🪵🪵 Latest log before timeout end 🪵🪵🪵🪵🪵🪵\n\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment about using the shell instead, also you don't need xargs | cat, doesn't it print just so?
I see you're a big fan of over-engineering 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could have just printed the file on the python script but honestly I should also have used the bash itself -- really the python script was an overkill -- so I removed it 😄
Yes I needed the xargs cat
because otherwise only the filename would get printed and not its contents
+1 it was over-engineered for no reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I forgot the script only prints the name 😅
timeout 20m pytest -m "${{inputs.filter}}" --json-report --json-report-file=rep1_report.json dragonfly/replication_test.py --log-cli-level=INFO --df alsologtostderr --df enable_multi_shard_sync=true || code=$?; if [[ $code -eq 124 ]]; then echo "TIMEDOUT=1">> "$GITHUB_OUTPUT"; exit 1; fi | ||
|
||
|
||
timeout 20m pytest -m "${{inputs.filter}}" --json-report --json-report-file=rep2_report.json dragonfly/replication_test.py --log-cli-level=INFO --df alsologtostderr --df enable_multi_shard_sync=false || code=$?; if [[ $code -eq 124 ]]; then echo "TIMEDOUT=1">> "$GITHUB_OUTPUT"; exit 1; fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also if you touched it, maybe we can unify it somehow... it doesn't differ except for the enable_multi_shard option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now its much simpler 😎
This PR:
timeout
commanduid
+port
+ the log files for each newly df instanceThe reason that I replaced the global timeout and used the timeout
command
instead, is becausecancelled()
is not working when the a job timeouts and therefore it can't be used to print the LOG.I also tried
pytest timeout module
but this won't work either because a) there is only globaltimeout which sets the per-test timeout b) it continues running the tests.One more thing to consider here is that because we use a
step
we no longer get a nice colored output.Another, over-engineered solution but probably better solution, would be to register a signal which
timeout
would use and pytest would handle it before it exits. That way, we could have pytest print the logs, instead of the newextra step