Unicode strings with codepoints at or below \u00ff are encoded as \x?? rather than \u00?? #146

obi1kenobi · 2015-11-18T22:31:37Z

This is a different facet of the unicode encoding issue mentioned in #129, which is unfortunately not solved by PR #142. This is because of two issues:

Python chooses to represent \u00?? codepoints as \x??, which is not supported by OrientDB.
on Python 2, pyorient uses str values internally, which are implicitly encoded with ascii and therefore cannot correctly represent all of Unicode.

To reproduce, add a unicode value that contains the \u00c5 character (latin capital letter a with ring above, AKA Angstrom unit symbol normalized) and attempt to write it to OrientDB.

The text was updated successfully, but these errors were encountered:

Ostico · 2016-04-11T00:20:26Z

Can i close?

obi1kenobi · 2016-04-11T15:38:26Z

Ah yes I think this was fixed. Closing.

obi1kenobi mentioned this issue Nov 18, 2015

Updated date and unicode handling code. #142

Merged

obi1kenobi mentioned this issue Feb 8, 2016

Fixed unicode special character representation. #170

Merged

obi1kenobi closed this as completed Apr 11, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode strings with codepoints at or below \u00ff are encoded as \x?? rather than \u00?? #146

Unicode strings with codepoints at or below \u00ff are encoded as \x?? rather than \u00?? #146

obi1kenobi commented Nov 18, 2015

Ostico commented Apr 11, 2016

obi1kenobi commented Apr 11, 2016

Unicode strings with codepoints at or below \u00ff are encoded as \x?? rather than \u00?? #146

Unicode strings with codepoints at or below \u00ff are encoded as \x?? rather than \u00?? #146

Comments

obi1kenobi commented Nov 18, 2015

Ostico commented Apr 11, 2016

obi1kenobi commented Apr 11, 2016