-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sql lab] Fix issue around VARBINARY type in Presto #5121
Conversation
When receiving a VARBINARY field out of Presto, it shows up as type `bytes` out of the pyhive driver. Then the pre 3.15 version of simplejson attempts to convert it to utf8 by default and it craps out. I bumped to simplejson>=3.25.0 and set `encoding=None` as documented here https://simplejson.readthedocs.io/en/latest/#basic-usage so that we can handle bytes on our own.
Codecov Report
@@ Coverage Diff @@
## master #5121 +/- ##
==========================================
- Coverage 77.51% 77.48% -0.04%
==========================================
Files 44 44
Lines 8735 8740 +5
==========================================
+ Hits 6771 6772 +1
- Misses 1964 1968 +4
Continue to review full report at Codecov.
|
@@ -323,6 +322,11 @@ def base_json_conv(obj): | |||
return str(obj) | |||
elif isinstance(obj, timedelta): | |||
return str(obj) | |||
elif isinstance(obj, bytes): | |||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a unit test here? Also don't we want to raise the error vs. swallow by sending [bytes]
or even put some type of logger.error()
?
@@ -323,6 +322,11 @@ def base_json_conv(obj): | |||
return str(obj) | |||
elif isinstance(obj, timedelta): | |||
return str(obj) | |||
elif isinstance(obj, bytes): | |||
try: | |||
return '{}'.format(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will return different values depending on Python version. For Python 2 (where bytes are strings):
>>> print(repr('{}'.format(b'test')))
'test'
For Python 3:
>>> print(repr('{}'.format(b'test')))
"b'test'"
Why not return a list of integers here instead?
>>> list(b'test')
[116, 101, 115, 116]
What is the use case for VARBINARY
?
Reviving this trying to push it through. For context Presto/Hive has the |
Merging this as it fixes a bug. We can tweak the way bytes look in the future if needed. |
When receiving a VARBINARY field out of Presto, it shows up as type `bytes` out of the pyhive driver. Then the pre 3.15 version of simplejson attempts to convert it to utf8 by default and it craps out. I bumped to simplejson>=3.25.0 and set `encoding=None` as documented here https://simplejson.readthedocs.io/en/latest/#basic-usage so that we can handle bytes on our own.
When receiving a VARBINARY field out of Presto, it shows up as type `bytes` out of the pyhive driver. Then the pre 3.15 version of simplejson attempts to convert it to utf8 by default and it craps out. I bumped to simplejson>=3.25.0 and set `encoding=None` as documented here https://simplejson.readthedocs.io/en/latest/#basic-usage so that we can handle bytes on our own. (cherry picked from commit 409ac68)
When receiving a VARBINARY field out of Presto, it shows up as type `bytes` out of the pyhive driver. Then the pre 3.15 version of simplejson attempts to convert it to utf8 by default and it craps out. I bumped to simplejson>=3.25.0 and set `encoding=None` as documented here https://simplejson.readthedocs.io/en/latest/#basic-usage so that we can handle bytes on our own.
When receiving a VARBINARY field out of Presto, it shows up as type
bytes
out of the pyhive driver. Then the pre 3.15 version ofsimplejson attempts to convert it to utf8 by default and it craps out.
I bumped to simplejson>=3.25.0 and set
encoding=None
as documentedhere
https://simplejson.readthedocs.io/en/latest/#basic-usage so that we can
handle bytes on our own.