Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Upload parquet' option #14020

Closed
elenamereloaxesor opened this issue Apr 8, 2021 · 4 comments · Fixed by #14449
Closed

'Upload parquet' option #14020

elenamereloaxesor opened this issue Apr 8, 2021 · 4 comments · Fixed by #14449
Labels
enhancement:request Enhancement request submitted by anyone from the community

Comments

@elenamereloaxesor
Copy link

Is your feature request related to a problem? Please describe.
In my data mining department we read data from parquets, something that's quite common and widespread. However, Superset doesn't give the option to directly upload a parquet file, just csv or excel.
Describe the solution you'd like
I would like it to be an 'upload parquet' option, alongside the existing ones.
Describe alternatives you've considered
I am already using Drill with Superset since Drill supports many kind of files. It would be perfect were it to be an option like when adding the database and using Drill, of using SQLAlchemy to connect with more databases that the default ones.

@junlincc
Copy link
Member

junlincc commented Apr 9, 2021

Hi @elenamereloaxesor, thanks for suggesting!Took a look at Apache Parquet, it seems promising! However, we haven't received a lot of the same request so this is not in our roadmap. would you be interested in adding the support?
Bringing our data expert @betodealmeida for additional idea. :)

@junlincc junlincc added the enhancement:request Enhancement request submitted by anyone from the community label Apr 9, 2021
@nytai
Copy link
Member

nytai commented Apr 9, 2021

Parquet is awesome and I think it’s something we should consider, along with an option to import a file directly from a remote server (eg, s3). Though, if data is already in parquet format it may make more sense to import it directly into the db then connect the db to superset.

For these upload data options, superset is really just acting as a broker for the target db. @elenamereloaxesor is the motivation here due to supersets upload UI being much friendlier than going directly to the db? Or is getting direct access to the db the challenge?

@elenamereloaxesor
Copy link
Author

The motivation may well be both you mention. For ease of use, and since i am having a bit of trouble accesing GCS's parquets, even if i have succesfully connected Drill as a database. Thanks for the rapid response. I would of course love to help, but im afraid im quite new to this world, getting the grasp of how everything works and where everything is, so i feel a bit at a loss as to how to contribute.

@0xBADBAC0N
Copy link
Contributor

Hi,if parquet does gets added I also recommend to add in the same process support for ORC files. Both of these formats are quite common in the Hadoop environment and analytics engineering area.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement:request Enhancement request submitted by anyone from the community
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants