Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spurious backslashes in the "diff" output #1347

Closed
offby1 opened this issue Jan 29, 2016 · 9 comments
Closed

Spurious backslashes in the "diff" output #1347

offby1 opened this issue Jan 29, 2016 · 9 comments
Labels
topic: reporting related to terminal output and user-facing messages and errors type: bug problem that needs to be addressed

Comments

@offby1
Copy link

offby1 commented Jan 29, 2016

How to demonstrate a bug in pytest==2.8.7

  • virtualenv venv2.Darwin
  • ./venv2.Darwin/bin/pip install --upgrade pip
  • ./venv2.Darwin/bin/pip install pytest
  • ./venv2.Darwin/bin/py.test test_repro_pytest_bug.py

Note the output:

./venv2.Darwin/bin/py.test test_repro_pytest_bug.py 
=========================================================== test session starts ============================================================
platform darwin -- Python 2.7.10, pytest-2.8.7, py-1.4.31, pluggy-0.3.1
rootdir: /private/tmp, inifile: 
>       assert original == truncated
E       assert 'Xxxxxxxx Xxx... xxx xxxxxx.\n' == 'Xxxxxxxx Xxxx... Xxxxx xxx.\n'
E         Skipping 2776 identical leading characters in diff, use -v to show
E         - xxxxx xxx 'Xxxxxxxxxxxxx-XXX'. (3 xxxx xx Xxx-3.  Xx xxx Xxx-3, xxxxxx xxxx xx xxxxxxxxxxx xx XXX.  Xx xxxxxx xx xxxxxxxx xxxxxx xxx-xxxxx xx xxxx xx xxxxxxxxxxx xx XXX xxx xxxxxxxxx xxx.)
E         + xxxxx xxx \'Xxxxxxxxxxxxx-XXX\'. (3 xxxx xx Xxx-3.  Xx xxx Xxx-3, xxxxxx xxxx xx xxxxxxxxxxx xx XXX.  Xx xxxxxx xx xxxxxxxx xxxxxx xxx-xxxxx xx xxxx xx xxxxxxxxxxx xx XXX xxx xxxxxxxxx xxx.)
E         ?           +                  +
E           x.  Xx XXXXX xxxxxxxx xxxxxxx xx 3 xxx XXX xxxxx xxxx x xxxxxx xxx xxxxxxxx xxxx xxxxxxx xxxx, xxxxxxxx xxx xxxxxxx x xxxxx xx xxxx xxxx xx xxxx.
E           3.  Xxxxxx xxx xxxxxxxxxxxxxx xx xxxx xxxxxx xxxxxxx xxx xxxx(x) xxxx xxxx xxxxxxxxx xxx xxx xxx xxxx(x): "XXX - Xxx xxxxxxxxx xxxxx xxxx xxxx xxxxxxxxx xx [xxxxxx xxx xxxx].  Xxxxxx xxxxx xx xxx xxxxxxxx xx xxxx xxxxxx xxx xxxxxxxx xxxxx 3-33"
E
E           ===Xxxxxx Xxxxxxx Xxxxxxx (XXX)===
E
E         Detailed information truncated (16 more lines), use "-vv" to show

The bug is in lines four through six of the output above. Those lines suggest that the truncated string contains backslashes whereas the original string doesn’t. But in fact, neither string contains backslashes.

repro.zip

@RonnyPfannschmidt
Copy link
Member

at first glance this looks like a difference in escaping

@mattduck
Copy link
Contributor

I haven't looked properly yet, but the backslashes are inserted here on 2.7:

if isinstance(left, py.builtin.bytes):

cc @flub

@nicoddemus
Copy link
Member

Since that fix introduced by @flub, we've implemented _escape_strings, which which attempts to escape non-ascii strings, but keeps ascii strings intact.

@flub
Copy link
Member

flub commented Sep 21, 2016

At the PyConUK sprint @mattduck was looking at this and we discussed this in detail and came to a similar approach about trying to decode it and only if it fails to escape strings. I wasn't aware of _escape_strings though, but I was suggesting something like:

try:
    unicode(the_bytes)
except UnicodeError:
    # escape non-ascii chars in the string somehow

Main difference is that it tries to use the system encoding instead of just ascii in the initial unicode conversion attempt.

Otherwise I don't really understand _escape_strings. It calls .endcode() on bytes, but AFAIK it should be calling .decode()? I think the code will probably work on py2 because of the implicit conversion semantics. Either that or I'm completely getting this wrong, which is entirely possible.

Secondly I wasn't aware of the unicode-escape codec and can't seem to find it in any docs. Any pointers to it's documentation/definition? It sounds very handy.

@flub
Copy link
Member

flub commented Sep 21, 2016

It should also be pointed out that while the bug is valid as far as pytest is concerned the test case is actually just invalid. You can't and should not compare a unicode string to a bytestring. In py3 it would rightfully blow up. If this comparison is important to you then you should first convert one of the two sides to the type of the other using an explicit encoding or decoding step with the desired codec.

@mattduck
Copy link
Contributor

@flub thanks for summarising! I hadn't had the chance yet.

I agree that it's not the best test case. In python3 any bytes vs unicode comparison returns False and pytest just displays eg. E assert b'hello' == 'hello' rather than trying to show a diff.

But in 2.7 a bytes vs unicode comparision returns True if the strings contain the same characters, and only returns False if the strings contain different values. I think pytest is correct in trying to display a useful diff in this situation, because it follows the spirit of 2.7 - it's a valid comparison. However, the current implementation adds the extra backslash when repr is called (

left = u(repr(left)[1:-1]).replace(r'\n', '\n')
).

One other thing that @flub and I discussed is whether it's worthwhile adding a warning in the case of bytes vs str in 2.7, making it clear that the diff could be misleading if the bytestring contains non-ASCII bytes. Because the pytest diff could highlight characters that were added in its attempt to represent the bytes (which is what happens in this issue with the backslash).

@Zac-HD Zac-HD added type: bug problem that needs to be addressed topic: reporting related to terminal output and user-facing messages and errors labels Oct 19, 2018
@Zac-HD
Copy link
Member

Zac-HD commented Oct 19, 2018

Just confirming that this still happens under pytest 3.9.1 on Python 2.7.15
It would be nice to fix this, but as a fallback plan we could close it as EOL in 2020!

@nicoddemus
Copy link
Member

FTR: created platform: python 2 only to track Python 2 only issues 👍

@asottile
Copy link
Member

asottile commented Jun 2, 2020

hello, first off thank you for the issue!

python 2.x support has ended for pytest core.

we've decided in #7296 to close the python-2-specific issues to free up some space in our backlog. however, in accordance to our python 2.7 and 3.4 support community patches will still be accepted to the 4.6 series to fix bugs for python 2. (so if this issue is important to you and you have a patch to fix it, feel free to make a PR targeting the 4.6.x branch despite this ticket being closed).

@asottile asottile closed this as completed Jun 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: reporting related to terminal output and user-facing messages and errors type: bug problem that needs to be addressed
Projects
None yet
Development

No branches or pull requests

7 participants