Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode Key Causes Encoding Error with Log Statement #10

Open
jacinda opened this issue Nov 17, 2013 · 1 comment
Open

Unicode Key Causes Encoding Error with Log Statement #10

jacinda opened this issue Nov 17, 2013 · 1 comment

Comments

@jacinda
Copy link

jacinda commented Nov 17, 2013

I noticed this while using qr (which is great, btw) with Django, which uses unicode for everything and I ended up using something like q = Queue(u'my_key') without realizing it at first because my_key was a variable and not a string I had hard-coded. It also only broke if the value being popped met got pickled with non-ascii characters.

This error occurs because of the combination of using a cPickle protocol of 1 with a unicode string. There are a couple of solutions to the bug. Let me know which you prefer and I'll submit a patch.

Here is a detailed description.

Because of the way _pack is defined using protocol 1, cPickle uses a binary format for serialization:

def _pack(self, val):
    """Prepares a message to go into Redis"""
    return self.serializer.dumps(val, 1)

When a log statement is then executed on popping, if the string used for key lookup is unicode, a UnicodeDecodeError will be raised if the value of popped containing any hex values greater than 127.

log.debug('Popped ** %s ** from key ** %s **' % (popped, self.key))

Here is an example:

>>> import cPickle
>>> x = cPickle.dumps(128, 1)
>>> x
'K\x81.'
>>> u = u'unicode string'
>>> 'Popped ** %s ** from key ** %s **' % (x, u)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0x81 in position 11: ordinal not in range(128)

This does not fail if protocol 0 is used:

>>> x = cPickle.dumps(128)
>>> 'Popped ** %s ** from key ** %s **' % (x, u)
u'Popped ** I129\n. ** from key ** unicode string **'

It also does not fail if the unicode string is specifically encoded as ascii:

>>> x = cPickle.dumps(128, 1)
>>> 'Popped ** %s ** from key ** %s **' % (x, u.encode('ascii'))
'Popped ** K\x80. ** from key ** unicode string **'

Either changing the pickling protocol or using explicit encoding are options and I can submit either as a patch (or do something else you suggest if both of these are considered less than ideal). Let me know what the preferred solution is.

@tnm
Copy link
Owner

tnm commented Jan 20, 2014

I'd be cool with the explicit encoding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants