-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Apache Drill #6610
Conversation
There is one small issue with this PR that I would really like to resolve but I'm a little stuck on. Here's where I'm stuck. In all the other functions where I could find this, I added a hook to clean up the file name, specifically, to look for Thanks!! |
@mistercrunch @kristw |
Codecov Report
@@ Coverage Diff @@
## master #6610 +/- ##
==========================================
- Coverage 65.26% 65.24% -0.03%
==========================================
Files 435 435
Lines 21485 21503 +18
Branches 2379 2379
==========================================
+ Hits 14023 14030 +7
- Misses 7342 7353 +11
Partials 120 120
Continue to review full report at Codecov.
|
@mistercrunch All unit tests passed. Superset + Drill is a really awesome combination. Do you want me to squash all commits? |
I don't think it's ok to have references to |
I am watching this PR very closely. I think this is a great addition to Superset and look forward to seeing it merged. |
Hi @mistercrunch , They are: superset/assets/src/SqlLab/components/SouthPane.jsx and superset/assets/src/SqlLab/components/SqlEditorLeftBar.jsx. I can't seem to figure out how these functions are populated or called from the Python/Flask side.
If you could point me in the right direction there, and show me where in the python code these functions are getting the data, I can wrap this up. |
@cgivre I'm happy to help push that through and providing pointers. So here's the view: I didn't dig super deep as I don't have time to dig right this moment, but wanted to give some pointers. |
@mistercrunch Thanks for your response. The PR already had a hook there to clean up the table name (see below) I've went through most of the flles looking for anything like |
@mistercrunch @kristw and There are some things which I'd like to contribute but really can't until this is committed. I'll keep working on it and hopefully get some more help as more people start using Drill+Superset. |
@cgivre I am not familiar with the backend code so I cannot fully review the code. However, regarding your request to merge as Work in Progress, Superset is used in production environment for many organizations, making However, if you feel that there are parts of this PR that are solid already and ready to be merged, please keep only that part and get rid of everything else. You can also help the reviewers understand what are the remaining issues that make you say this is still WIP. If you can clean out the incompleted parts, after receiving approval from @mistercrunch or other backend experts, we can merge. |
Hi @kristw The remaining issue is that @mistercrunch prefers to have code that is specific to one database in the As I said, I do feel this PR is solid and ready to be merged. I'm willing to keep working on it to figure out how to move this code to the |
Bump... Any update? |
@cgivre maybe I can help; @yciabaud recently ran into something similar with Presto+Pulsar which was addressed in #7297 . While slightly different there seem to be parallels. Can you summarize where you are stuck right now to help me get up to speed on your issue? By the looks of it this is quite close to an elegant solution but still needs that last finishing touch. |
Hi @villebro What was happening was that Superset was working great with Drill when it was querying a database, but if something had a file, the dot for the file extension would mess up the URLs. My approach works, however, there were two places that I couldn't figure out how to call the database specific function in and As you can see, these are not python files. I couldn't figure out how to insert the Python function to clean up the table name. Another approach which I tried but couldn't figure out was to add a function in db_engine_specs to do all the URL transformations. Any assistance would be greatly appreciated, as I'd really like to get this integrated into Superset. |
It looks like the culprit is here: https://github.com/apache/incubator-superset/blob/ce3f9c30a925330763b394903d1a16ce6a90a495/superset/assets/src/components/TableSelector.jsx#L173-L179 @mistercrunch do you have any idea or context on what's going on here? It seems like this functionality has been deprecated (choosing a table without having first chosen a schema), and the above logic should be removed. I tested creating a table with a period in the name on Postgres and got a similar error. Removing the |
@kristw can you check out my last comment; is the above code in |
@villebro , I was also working on this issue this weekend and figured that this is an issue on JS side. |
Hi @villebro, |
Thanks @villebro! I appreciate your help on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e7691c2
to
b7e8b3e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made all the requested changes. At this point, the PR really is only overwriting the functions relating the time-grain functionality for Drill. Everything seems to work without any mods outside of the db_engine_specs.py
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, when you feel this is complete please remove WIP from the title.
@villebro I updated the title. Thank you very much for your assistance getting this PR completed. From my perspective we are ready to go. |
Let's leave this open for a few days to see if anyone has something to add. But IMO this is ready to be merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Awesome to see. Should the README.md#database-support be updated with Apache Drill? |
@cgivre ^^^ |
Thanks @mistercrunch and @SamuelMarks |
This is the first attempt at support for Apache Drill in Superset. This uses John Omernik's SQLAlchemy dialect (https://github.com/JohnOmernik/sqlalchemy-drill).
Drill has a unique aspect in its namespace which proved problematic. Drill uses the concepts of:
When connecting to Drill to a non-file based system, the behavior is pretty much as a traditional database. However, when Drill is querying a file based system, you can have additional
.
in the namespace that are not namespace related. For example:In this case, the storage plugin is
dfs
, the workspace which is optional istest
, and the table, which in this case is a file, isdata.json
. In this case however, the last period is not part of the namespace and was consistently misinterpreted by Superset.The solution I found, which seemed like complete hackery, was to replace the last period with
@@@
in the dropdown in SQLlab. Then I modifiedview/core.py
to replace the@@@
with a period in the various functions that generate tables.UPDATE: All Hackery Removed