-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hive-csv] Infer schema from csv #5267
[hive-csv] Infer schema from csv #5267
Conversation
7b72ba4
to
adb194b
Compare
superset/db_engine_specs.py
Outdated
filename = form.csv_file.data.filename | ||
def convert_to_hive_type(col_type): | ||
"""maps tableschema's types to hive types""" | ||
if col_type == "number": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having a dict for the mapping would be cleaner also would allow for people to easily add more types if need be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Added it to config
@@ -34,6 +34,7 @@ six==1.11.0 | |||
sqlalchemy==1.2.2 | |||
sqlalchemy-utils==0.32.21 | |||
sqlparse==0.2.4 | |||
tableschema==1.1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You’ll have to add this to setup.py as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
71b3dcf
to
ec72456
Compare
Codecov Report
@@ Coverage Diff @@
## master #5267 +/- ##
=========================================
- Coverage 61.31% 61.3% -0.02%
=========================================
Files 368 368
Lines 23449 23453 +4
Branches 2713 2713
=========================================
Hits 14378 14378
- Misses 9059 9063 +4
Partials 12 12
Continue to review full report at Codecov.
|
setup.py
Outdated
@@ -90,6 +90,7 @@ def get_git_sha(): | |||
'sqlalchemy', | |||
'sqlalchemy-utils', | |||
'sqlparse', | |||
'tableschema==1.1.0', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't pin in setup.py
unless it's strictly required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DOne
superset/config.py
Outdated
@@ -314,6 +314,14 @@ class CeleryConfig(object): | |||
# contain all the external tables | |||
CSV_TO_HIVE_UPLOAD_DIRECTORY = 'EXTERNAL_HIVE_TABLES/' | |||
|
|||
# This contains the mappings from the tableschema types to HIVE types | |||
TABLESCHEMA_TO_HIVE_TYPES = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this probably belongs in db_engine_spec
. I agree this isn't cut-and-dry as different versions of Hive support different types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
ec72456
to
dde356d
Compare
superset/db_engine_specs.py
Outdated
bucket_path = app.config['CSV_TO_HIVE_UPLOAD_S3_BUCKET'] | ||
def convert_to_hive_type(col_type): | ||
"""maps tableschema's types to hive types""" | ||
TABLESCHEMA_TO_HIVE_TYPES = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry given this no longer resides in the config this should be a lower case variable name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Hm |
dde356d
to
f6ac85b
Compare
(cherry picked from commit b0eee12)
(cherry picked from commit b0eee12)
Previously all column types were string. This PR leverages the tableschema package to get a more precise column type and uses it in the table creation.
@john-bodley @mistercrunch