Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate Memory Report as valid CSV #97

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

joshuaflanagan
Copy link

The current memory report does not emit valid CSV if a Redis key contains a comma or quotes.

These are perfectly valid characters in a Redis key:

SET "First,Second" 1
SET 'quotes"are"valid' 2
KEYS *
#=>
1) "quotes\"are\"valid"
2) "First,Second"

The current code makes no attempt to escape these characters, so the generated file is impossible to parse from any other application if one of these characters exist in the rdb.

Use the standard library CSV module to ensure valid CSV is always generated.

When a redis key contains a comma or quote, the
output generated by the memory report is no longer
parsable as a CSV.
The csv module will make sure all values are
properly escaped so that the resulting file can be
parsed by other applications.
The csv module in Python only works with text streams. Adapt the
provided binary stream to look like a text stream .
@joshuaflanagan
Copy link
Author

I've updated the PR to add python 3 compatibility, fixing the build.

@amotzg
Copy link
Contributor

amotzg commented May 16, 2017

@joshuaflanagan great catch! It's an important corner that have never been addressed.

Small concern about your fix though, previous code used latin-1 encoding to keep byte values as is, your valid CSV solution use UTF-8. It will support Unicode characters but will change binary key names.

Library user have a way to control desired byte/text manipulation by setting the string escape parameter. MemoryCallback.emit_record() take care of this. With previous code, an attempt to use UTF-8 parsing would have fail, but the default option work for all cases.
A possible resolution for this is to add code to allow MemoryCallback.__init__() to set stream encoding, something like:
self._stream.set_encoding('utf8' if self._escape == encodehelpers.STRING_ESCAPE_UTF8 else 'latin-1')

Can you please address this issue in this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants