Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[csv-hive] Use schema form field in upload csv #5303

Merged

Conversation

timifasubaa
Copy link
Contributor

@timifasubaa timifasubaa commented Jun 28, 2018

historically, the schema was specified by naming the tables schema_name.table_name. This PR changes things to also allow the users specify the schema name in a form field and have that be applied to the uploaded hive table.

@john-bodley

@timifasubaa timifasubaa changed the title use schema form field in upload csv [csv-hive] Use schema form field in upload csv Jun 28, 2018
@codecov-io
Copy link

codecov-io commented Jun 28, 2018

Codecov Report

Merging #5303 into master will decrease coverage by 0.01%.
The diff coverage is 20%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5303      +/-   ##
==========================================
- Coverage   61.33%   61.32%   -0.02%     
==========================================
  Files         369      369              
  Lines       23488    23493       +5     
  Branches     2713     2713              
==========================================
  Hits        14407    14407              
- Misses       9069     9074       +5     
  Partials       12       12
Impacted Files Coverage Δ
superset/db_engine_specs.py 53.83% <20%> (-0.43%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 252cba2...07395dc. Read the comment docs.

@timifasubaa
Copy link
Contributor Author

PING

@@ -984,14 +984,22 @@ def convert_to_hive_type(col_type):
return tableschema_to_hive_types.get(col_type, 'STRING')

table_name = form.name.data
schema_name = form.schema.data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this just never used? It seems like it's been in the form for several months. Per the original exception it seems like the plan was to enforce uploading to a pre-defined schema.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't. Historically, you could only specify the namespace by making the table name namespace.tablename.

The exception happens when the administrator forces all tables to be in one namespace. And in that case, any avenue used to specify a namespace (schema form field or namespace before table name) will lead to an exception.

if config.get('UPLOADED_CSV_HIVE_NAMESPACE'):
if '.' in table_name:
if '.' in table_name or schema_name:
Copy link
Member

@john-bodley john-bodley Jun 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're ok with not enforcing using the required namespace, isn't something like this more compact and clearer?

schema_name = form.schema.data or config.get('UPLOAD_CSV_HIVE_NAMESPACE')

if schema:
    if '.' in table_name:
        raise Exception("You can't specify both an explicit schema and the schema in the table name")
    
    table_name = '{}.{}'.format(schema, table_name)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there are two ways to specify the namespace, we need to check if either method was used before loading the table.

raise Exception(
"You can't specify a namespace. "
'All tables will be uploaded to the `{}` namespace'.format(
config.get('HIVE_NAMESPACE')))
table_name = '{}.{}'.format(
config.get('UPLOADED_CSV_HIVE_NAMESPACE'), table_name)
if schema_name:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can just use form.schema.data.

@timifasubaa timifasubaa force-pushed the use_schema_field_in_import_csv branch from 8cbda03 to 07395dc Compare July 5, 2018 22:03
@timifasubaa
Copy link
Contributor Author

timifasubaa commented Jul 5, 2018

Updated wrt earlier discussion.

@timifasubaa timifasubaa merged commit 28ba5a9 into apache:master Jul 6, 2018
timifasubaa added a commit to airbnb/superset-fork that referenced this pull request Jul 25, 2018
timifasubaa added a commit to airbnb/superset-fork that referenced this pull request Sep 19, 2018
wenchma pushed a commit to wenchma/incubator-superset that referenced this pull request Nov 16, 2018
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.28.0 labels Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.28.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants