-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: new Columnar upload form and API #28192
Merged
dpgaspar
merged 46 commits into
apache:master
from
preset-io:danielgaspar/sc-59713/migrate-columnar-data-upload-to-database
May 6, 2024
Merged
feat: new Columnar upload form and API #28192
dpgaspar
merged 46 commits into
apache:master
from
preset-io:danielgaspar/sc-59713/migrate-columnar-data-upload-to-database
May 6, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
hughhhh
reviewed
Apr 24, 2024
github-actions
bot
added
risk:db-migration
PRs that require a DB migration
dependencies:npm
labels
Apr 26, 2024
dpgaspar
requested review from
hughhhh,
eschutho,
craig-rueda,
betodealmeida,
geido and
michael-s-molina
April 30, 2024 14:35
…a-upload-to-database
github-actions
bot
removed
i18n
Namespace | Anything related to localization
i18n:spanish
Translation related to Spanish language
i18n:italian
Translation related to Italian language
i18n:french
Translation related to French language
i18n:chinese
Translation related to Chinese language
i18n:japanese
Translation related to Japanese language
i18n:russian
Translation related to Russian language
i18n:korean
Translation related to Korean language
doc
Namespace | Anything related to documentation
plugins
github_actions
Pull requests that update GitHub Actions code
i18n:dutch
i18n:slovak
i18n:ukrainian
i18n:portuguese
i18n:brazilian
i18n:traditional-chinese
labels
May 6, 2024
dpgaspar
deleted the
danielgaspar/sc-59713/migrate-columnar-data-upload-to-database
branch
May 6, 2024 14:51
imancrsrk
pushed a commit
to imancrsrk/superset
that referenced
this pull request
May 10, 2024
jzhao62
pushed a commit
to jzhao62/superset
that referenced
this pull request
May 16, 2024
EnxDev
pushed a commit
to EnxDev/superset
that referenced
this pull request
May 31, 2024
vinothkumar66
pushed a commit
to vinothkumar66/superset
that referenced
this pull request
Nov 11, 2024
mistercrunch
added
🏷️ bot
A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels
🚢 4.1.0
labels
Nov 27, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
api
Related to the REST API
🏷️ bot
A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels
dependencies:npm
risk:db-migration
PRs that require a DB migration
size/XXL
🚢 4.1.0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
SUMMARY
Continuing the work on deprecating/removing server side rendered pages, this adds a new React/Antd Form and backend API for Columnar uploads.
Leverages the work already done on: #28164, #28105, #27840
Columnar upload will accept ZIP files and parquet files.
A refactor was done on CSV and Excel upload, with this PR the initial parsing of the data file to get metadata info such as columns and sheet names was being done on the frontend, now we send the file to the backend to fetch it's metadata info. This is more maintainable and scalable since we avoid code duplication to parse data files, avoid having to add a new frontend dependencies for each file type, avoid blocking the UI while the file is being parsed.
Although we send the entire file to the backend, by using pandas or pyarrow we can avoid having to parse the entire file, so I expect this operation to be relatively lightweight.
3 new endpoints are added to fetch data files metadata:
api/v1/database/csv_upload_metadata
api/v1/database/excel_upload_metadata
api/v1/database/columnar_upload_metadata
Permission names for these endpoints are the same as their counterparts, so
csv_upload on Database
,excel_upload on Database
andcolumnar_upload on Database
Possibly a single endpoint to upload data and a single endpoint to fetch metadata would be sufficient, but that would imply a single permission for all file types, and that's a breaking change.
I think that a single permission and only 2 REST API endpoints is better, so having granular permissions for each file type is not actually required. I'll add a v5 breaking item if everyone agrees.
Screen.Recording.2024-04-29.at.11.41.47.mov
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
ADDITIONAL INFORMATION