-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix] Improve csv upload functionality #8457
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8457 +/- ##
==========================================
+ Coverage 66.68% 66.72% +0.04%
==========================================
Files 449 449
Lines 22684 22692 +8
Branches 2366 2366
==========================================
+ Hits 15126 15141 +15
+ Misses 7420 7413 -7
Partials 138 138
Continue to review full report at Codecov.
|
@willbarrett are you ok with the latest changes? @mistercrunch @dpgaspar comments warmly welcomed! |
superset/db_engine_specs/hive.py
Outdated
@@ -98,7 +98,7 @@ def fetch_data(cls, cursor, limit: int) -> List[Tuple]: | |||
return [] | |||
|
|||
@classmethod | |||
def create_table_from_csv(cls, form, table): # pylint: disable=too-many-locals | |||
def create_table_from_csv(cls, form): # pylint: disable=too-many-locals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change will make the API for the Hive db_engine_spec diverge further from base
. It would behoove us to keep them in sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my understanding of the HiveEngineSpec.create_table_from_csv
, this actually aligns the functionality with that of BaseEngineSpec
, as previously the Hive spec didn't create any metadata for new tables (all the others did). After this change metadata creation takes place outside of the engine spec, and works similarly for all engines, i.e. metadata is created in database/views.py
, and only uploading of actual data is handled by the engine specs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Ville - the implementation is fine here, I'm only referring to the removal of the table
argument from the method signature. Sorry, I should have been more specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@willbarrett I have tried to make sure the API remains compatible, i.e. both BaseEngineSpec
ja HiveEngineSpec
now only take form
as additional parameters (I did forget to add the None
return type to Hive, though). So there shouldn't be any discrepancy between them. Or did I misunderstand something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple more comments on the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small comment, but it LGTM
Thanks @willbarrett and @dpgaspar for additional comments; I added a return type to the Hive upload method and replaced |
* [fix] csv upload when table metadata present * Remove table from hive spec * Move upload before table metadata creation * Refine upload logic, dd unit tests and fix translations * Use ALLOWED_EXTENSIONS from config * Address review comments * Fix error message grammar * Add return type to hive csv upload and replace first with one_or_none
* [fix] csv upload when table metadata present * Remove table from hive spec * Move upload before table metadata creation * Refine upload logic, dd unit tests and fix translations * Use ALLOWED_EXTENSIONS from config * Address review comments * Fix error message grammar * Add return type to hive csv upload and replace first with one_or_none
* [fix] csv upload when table metadata present * Remove table from hive spec * Move upload before table metadata creation * Refine upload logic, dd unit tests and fix translations * Use ALLOWED_EXTENSIONS from config * Address review comments * Fix error message grammar * Add return type to hive csv upload and replace first with one_or_none
CATEGORY
Choose one
SUMMARY
Currently all engines except Hive try to create table metadata when uploading CSVs. However, the CSV upload feature throws an exception when uploading using either
replace
orappend
when table metadata is present, despite the upload having taken place. This PR changes the following:tsv
added toALLOWED_EXTENSIONS
and implemented informs.py
, as I feel it is nowadays a commonly accepted extension for tab separated csv files.db_engine_specs
todatabase/views
as they would otherwise have caused circular imports.TEST PLAN
CI + Tested all possible combinations locally.
ADDITIONAL INFORMATION
REVIEWERS
@pengyejun @willbarrett @mistercrunch @john-bodley @betodealmeida