Generate Memory Report as valid CSV #97

joshuaflanagan · 2017-05-13T03:47:06Z

The current memory report does not emit valid CSV if a Redis key contains a comma or quotes.

These are perfectly valid characters in a Redis key:

SET "First,Second" 1
SET 'quotes"are"valid' 2
KEYS *
#=>
1) "quotes\"are\"valid"
2) "First,Second"

The current code makes no attempt to escape these characters, so the generated file is impossible to parse from any other application if one of these characters exist in the rdb.

Use the standard library CSV module to ensure valid CSV is always generated.

When a redis key contains a comma or quote, the output generated by the memory report is no longer parsable as a CSV.

The csv module will make sure all values are properly escaped so that the resulting file can be parsed by other applications.

The csv module in Python only works with text streams. Adapt the provided binary stream to look like a text stream .

joshuaflanagan · 2017-05-16T03:51:13Z

I've updated the PR to add python 3 compatibility, fixing the build.

amotzg · 2017-05-16T09:20:24Z

@joshuaflanagan great catch! It's an important corner that have never been addressed.

Small concern about your fix though, previous code used latin-1 encoding to keep byte values as is, your valid CSV solution use UTF-8. It will support Unicode characters but will change binary key names.

Library user have a way to control desired byte/text manipulation by setting the string escape parameter. MemoryCallback.emit_record() take care of this. With previous code, an attempt to use UTF-8 parsing would have fail, but the default option work for all cases.
A possible resolution for this is to add code to allow MemoryCallback.__init__() to set stream encoding, something like:
self._stream.set_encoding('utf8' if self._escape == encodehelpers.STRING_ESCAPE_UTF8 else 'latin-1')

Can you please address this issue in this PR?

joshuaflanagan added 3 commits May 12, 2017 22:35

Test case demonstrating invalid CSV generation

cc156fb

When a redis key contains a comma or quote, the output generated by the memory report is no longer parsable as a CSV.

Use csv module to generate memory report output

55e7828

The csv module will make sure all values are properly escaped so that the resulting file can be parsed by other applications.

Make csv writing work with Python 3

8389742

The csv module in Python only works with text streams. Adapt the provided binary stream to look like a text stream .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate Memory Report as valid CSV #97

Generate Memory Report as valid CSV #97

joshuaflanagan commented May 13, 2017

joshuaflanagan commented May 16, 2017

amotzg commented May 16, 2017

Generate Memory Report as valid CSV #97

Are you sure you want to change the base?

Generate Memory Report as valid CSV #97

Conversation

joshuaflanagan commented May 13, 2017

joshuaflanagan commented May 16, 2017

amotzg commented May 16, 2017