Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show details when a test in regression.sh fails #128

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

AloisMahdal
Copy link
Contributor

Don't just thow away the comparison; instead show what precisely was expected, what was produced, any stderr, and detail of the difference.


When tryiong to enable tests in Fedora packaging, I learned that failing tests don't report any details.

This change makes sure that if a test fails, the failure is followd by:

  • expected output
  • actual output
  • comparison result (from $cmp_tool, which I've altered to not be silent)
  • stderr from the python call.

@AloisMahdal
Copy link
Contributor Author

@ymattw please see AloisMahdal#1 for preview on how it looks.

The PR includes same commit as this one, but then another one, which has only purpose of breaking the tests in order to demonstrate how failing test would look like in the pipeline.

What's interesting is that while injecting the fail I stumbled upon what looks like a bug: I tried to add extra "test call" which uses -p nonexistent_pager but the test passed anyway! Is it possible that the -p option is ignored? (I tried to read the code but I don't know argparser enough to know for sure.)

Back to the main PR: There could be probably more done to make the output a bit more readable, but I see that for ydiff this will probably be challenging since it is basically binary output with all the ANSI control chars and what not. If yoyuare OK with it I could come up better solution to that. (As extra commit or separate PR.)

(Eg. one strategy I used in past for similar scenario was to post-process the raw output such that it eg. includes escape chars instead of any non-ASCII, effectively making it plain-text again, and then use eg. colordiff to produce a diff which is the thing that describes the test failure. That gets rid of two problems: 1. the intended color codes get in the way of understanding what is "correct/incorrect" in the test output, 2. when a failure is precisely because of missing or ANSI chars, it's possible to show that clearly. The post-processing step is not trivial, but I suppose it could be done with few lines of Python embedded in regression.sh.)

Copy link
Owner

@ymattw ymattw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR and the other example. I left some nitpicks, let me know if they make sense to you or not.

@@ -37,16 +46,20 @@ function cmp_output()
printf "$cmd"

if [[ $TRAVIS_OS_NAME == windows ]]; then
cmp_tool="diff --strip-trailing-cr -q"
cmp_tool="diff --strip-trailing-crq"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep -q separated, otherwise it confuses readers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a typo, I meant to remove -q. diff with -q only says the files differ which is already implied in this context.

So all I really did here I just broke the tests for Windows. :)

fi

if eval $cmd 2>/dev/null | eval $cmp_tool $expected_out - > /dev/null; then
if eval $cmd 2>cmd.err | tee actual.out | eval $cmp_tool $expected_out - >cmp_tool.out; then
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend using mktemp -d to create a temporary dir (by default in /tmp) and put all temporary files there, cleanup can be omitted.

@@ -26,6 +26,15 @@ function fail()
fi
}

function show_file()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to discard the cmp results because they are not really readable - they contain ANSI escape sequences and it's hard to tell what parts differ. Maybe just print out their locations to help further debug, rather than cat them all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think printing locations would be useful; at least in the pipeline Fedora uses the files won't be collected.

I do agree that the result of cmp in particular is not very helpful, though. What would be a better option?

It seems tricky especially in case of ydiff, where ANSI codes are subject of the test, plus the output is already diff (sometimes diff of diff). But as I mentioned in my comment to #125, I do have ideas to crack this. I just did not want to "ambush" you with too big changes in one PR :)

How about I try to come up with better way of presenting the result? It would be a bit more changes to this shell script but it might be worth it. My idea is to normalize the outputs a little bit (basically just replace ANSI with escape codes, making it true utf8 plain text while preserving the structure); and then show simply the diff -u of the normalized versions of expected vs. actual output. At that point the diff itself would be descriptive enough so that it would suffice to print just the diff and the stderr. (Perhaps with some brief message to help the reader understand the output.)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, if you found a good solution.

Keep in mind that when there is a regression, it's usually failing a lot of tests, so the result diff might be hard to read and often you will want to debug one case at a time - that's why I prefer printing out the paths of the temporary files for further looking. You are welcome to proceed your way if you do find a good balance.

Another thing is that such regressions are usually caught by unit tests first.

Don't just thow away the comparison; instead show what precisely was
expected, what was produced, any stderr, and detail of the difference.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants