Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] Improve csv upload functionality #8457

Merged
merged 8 commits into from
Nov 7, 2019
Merged

[fix] Improve csv upload functionality #8457

merged 8 commits into from
Nov 7, 2019

Conversation

villebro
Copy link
Member

@villebro villebro commented Oct 26, 2019

CATEGORY

Choose one

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

Currently all engines except Hive try to create table metadata when uploading CSVs. However, the CSV upload feature throws an exception when uploading using either replace or append when table metadata is present, despite the upload having taken place. This PR changes the following:

  • If table metadata is present and the CSV upload is successful, the old metadata is updated to reflect the new table contents.
  • Table metadata is now created for Hive like other engines when uploading CSVs.
  • The extension tsv added to ALLOWED_EXTENSIONS and implemented in forms.py, as I feel it is nowadays a commonly accepted extension for tab separated csv files.
  • Fixes to i18n CSV messages that were previously hardcoded, i.e. didn't work.
  • New integration tests added to test all common cases.
  • Refactor: parts of the CSV uploading logic had to be moved from db_engine_specs to database/views as they would otherwise have caused circular imports.

TEST PLAN

CI + Tested all possible combinations locally.

ADDITIONAL INFORMATION

REVIEWERS

@pengyejun @willbarrett @mistercrunch @john-bodley @betodealmeida

@codecov-io
Copy link

codecov-io commented Oct 27, 2019

Codecov Report

Merging #8457 into master will increase coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8457      +/-   ##
==========================================
+ Coverage   66.68%   66.72%   +0.04%     
==========================================
  Files         449      449              
  Lines       22684    22692       +8     
  Branches     2366     2366              
==========================================
+ Hits        15126    15141      +15     
+ Misses       7420     7413       -7     
  Partials      138      138
Impacted Files Coverage Δ
superset/views/database/forms.py 91.11% <ø> (ø) ⬆️
superset/views/core.py 71.96% <ø> (ø) ⬆️
superset/config.py 89.9% <100%> (ø) ⬆️
superset/views/database/views.py 92.15% <100%> (+8.64%) ⬆️
superset/db_engine_specs/base.py 80.33% <100%> (-0.26%) ⬇️
superset/db_engine_specs/hive.py 28.18% <100%> (+0.32%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 571c474...40f1f60. Read the comment docs.

@pull-request-size pull-request-size bot added size/L and removed size/M labels Oct 27, 2019
@villebro villebro changed the title [fix] csv upload when table metadata present [fix] Improve csv upload functionality Oct 27, 2019
@villebro
Copy link
Member Author

villebro commented Nov 4, 2019

@willbarrett are you ok with the latest changes? @mistercrunch @dpgaspar comments warmly welcomed!

@@ -98,7 +98,7 @@ def fetch_data(cls, cursor, limit: int) -> List[Tuple]:
return []

@classmethod
def create_table_from_csv(cls, form, table): # pylint: disable=too-many-locals
def create_table_from_csv(cls, form): # pylint: disable=too-many-locals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will make the API for the Hive db_engine_spec diverge further from base. It would behoove us to keep them in sync.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my understanding of the HiveEngineSpec.create_table_from_csv, this actually aligns the functionality with that of BaseEngineSpec, as previously the Hive spec didn't create any metadata for new tables (all the others did). After this change metadata creation takes place outside of the engine spec, and works similarly for all engines, i.e. metadata is created in database/views.py, and only uploading of actual data is handled by the engine specs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Ville - the implementation is fine here, I'm only referring to the removal of the table argument from the method signature. Sorry, I should have been more specific.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willbarrett I have tried to make sure the API remains compatible, i.e. both BaseEngineSpec ja HiveEngineSpec now only take form as additional parameters (I did forget to add the None return type to Hive, though). So there shouldn't be any discrepancy between them. Or did I misunderstand something?

Copy link
Member

@willbarrett willbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more comments on the implementation.

Copy link
Member

@dpgaspar dpgaspar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small comment, but it LGTM

@villebro
Copy link
Member Author

villebro commented Nov 6, 2019

Thanks @willbarrett and @dpgaspar for additional comments; I added a return type to the Hive upload method and replaced first() with one_or_none() in database/views.py, I think this should be ok now.

@villebro villebro merged commit 49ea232 into apache:master Nov 7, 2019
@axuew axuew mentioned this pull request Nov 12, 2019
3 tasks
graceguo-supercat pushed a commit that referenced this pull request Nov 13, 2019
* [fix] csv upload when table metadata present

* Remove table from hive spec

* Move upload before table metadata creation

* Refine upload logic, dd unit tests and fix translations

* Use ALLOWED_EXTENSIONS from config

* Address review comments

* Fix error message grammar

* Add return type to hive csv upload and replace first with one_or_none
dpgaspar pushed a commit that referenced this pull request Nov 15, 2019
* [fix] csv upload when table metadata present

* Remove table from hive spec

* Move upload before table metadata creation

* Refine upload logic, dd unit tests and fix translations

* Use ALLOWED_EXTENSIONS from config

* Address review comments

* Fix error message grammar

* Add return type to hive csv upload and replace first with one_or_none
@dpgaspar dpgaspar added the v0.35 label Dec 20, 2019
@mistercrunch mistercrunch added 🍒 0.35.1 🍒 0.35.2 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.36.0 labels Feb 28, 2024
cccs-rc pushed a commit to CybercentreCanada/superset that referenced this pull request Mar 6, 2024
* [fix] csv upload when table metadata present

* Remove table from hive spec

* Move upload before table metadata creation

* Refine upload logic, dd unit tests and fix translations

* Use ALLOWED_EXTENSIONS from config

* Address review comments

* Fix error message grammar

* Add return type to hive csv upload and replace first with one_or_none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L v0.35 🍒 0.35.1 🍒 0.35.2 🚢 0.36.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upload CSV does not work, in replacement and append mode
5 participants