-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[csv-hive] Use schema form field in upload csv #5303
[csv-hive] Use schema form field in upload csv #5303
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5303 +/- ##
==========================================
- Coverage 61.33% 61.32% -0.02%
==========================================
Files 369 369
Lines 23488 23493 +5
Branches 2713 2713
==========================================
Hits 14407 14407
- Misses 9069 9074 +5
Partials 12 12
Continue to review full report at Codecov.
|
PING |
@@ -984,14 +984,22 @@ def convert_to_hive_type(col_type): | |||
return tableschema_to_hive_types.get(col_type, 'STRING') | |||
|
|||
table_name = form.name.data | |||
schema_name = form.schema.data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this just never used? It seems like it's been in the form for several months. Per the original exception it seems like the plan was to enforce uploading to a pre-defined schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't. Historically, you could only specify the namespace by making the table name namespace.tablename
.
The exception happens when the administrator forces all tables to be in one namespace. And in that case, any avenue used to specify a namespace (schema form field or namespace before table name) will lead to an exception.
superset/db_engine_specs.py
Outdated
if config.get('UPLOADED_CSV_HIVE_NAMESPACE'): | ||
if '.' in table_name: | ||
if '.' in table_name or schema_name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're ok with not enforcing using the required namespace, isn't something like this more compact and clearer?
schema_name = form.schema.data or config.get('UPLOAD_CSV_HIVE_NAMESPACE')
if schema:
if '.' in table_name:
raise Exception("You can't specify both an explicit schema and the schema in the table name")
table_name = '{}.{}'.format(schema, table_name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because there are two ways to specify the namespace, we need to check if either method was used before loading the table.
superset/db_engine_specs.py
Outdated
raise Exception( | ||
"You can't specify a namespace. " | ||
'All tables will be uploaded to the `{}` namespace'.format( | ||
config.get('HIVE_NAMESPACE'))) | ||
table_name = '{}.{}'.format( | ||
config.get('UPLOADED_CSV_HIVE_NAMESPACE'), table_name) | ||
if schema_name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can just use form.schema.data
.
8cbda03
to
07395dc
Compare
Updated wrt earlier discussion. |
(cherry picked from commit 28ba5a9)
historically, the schema was specified by naming the tables
schema_name.table_name
. This PR changes things to also allow the users specify the schema name in a form field and have that be applied to the uploaded hive table.@john-bodley