-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Druid Type mappings for Druid >= 0.23.0 #288
Conversation
@Usiel I think there's a conflict between this patch and your other one. Could you fix it up please? |
Druid 0.23.0 returns DATA_TYPE values such as 'Complex<thetaSketch>' which are not mapped properly to sqlalchemy.types. We fix this by making use of Druid's JDBC_TYPE return, which should ensure that this doesn't break easily again in the future with more complex types.
027976e
to
22ea292
Compare
Rebased on master, thanks for the reviews! |
pydruid/db/sqlalchemy.py
Outdated
@@ -189,7 +189,7 @@ def get_columns(self, connection, table_name, schema=None, **kwargs): | |||
return [ | |||
{ | |||
"name": row.COLUMN_NAME, | |||
"type": type_map[row.DATA_TYPE.lower()], | |||
"type": jdbc_type_map[row.JDBC_TYPE], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we introduce new types in the future this would break again. Do you know if SQLAlchemy can handle "unknown" types? If so, it'd be good to report some kind of "unknown" type instead of throwing an error here.
If it can't handle unknown types, skipping the column is better than throwing an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thanks!
We add a fallback mapping to sqlalchemy's NullType with a warning.
Druid 0.23.0 returns more granular types in the
INFORMATION_SCHEMA.COLUMNS
DATA_TYPE
column (see PR on Druid here). This breaks the theget_columns
method in theDruidDialect
class, as it doesn't know what to do with values like"Complex<thetaSketch>"
, which previously used to be returned withDATA_TYPE="OTHER"
.With this suggested change we instead use the JDBC types Druid provides and map them to the sqlalchemy types. This has the advantage that Druid can figure out the mapping of the complex types and we just consume what Druid tells us, so hopefully this makes the mapping more resistant to future changes on the Druid side.
FYI this bug breaks Superset, it is not possible to add Datasets with complex columns when using Druid 0.23.0.
How to reproduce the issue
get_columns(...)
, which fails without this PR (KeyError: 'complex<thetasketch>'
)Expected: With the new code the reflection works and complex metrics are mapped to
BLOB
.