Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Druid Type mappings for Druid >= 0.23.0 #288

Merged
merged 3 commits into from
Aug 10, 2022

Conversation

Usiel
Copy link
Contributor

@Usiel Usiel commented Jul 12, 2022

Druid 0.23.0 returns more granular types in the INFORMATION_SCHEMA.COLUMNS DATA_TYPE column (see PR on Druid here). This breaks the the get_columns method in the DruidDialect class, as it doesn't know what to do with values like "Complex<thetaSketch>", which previously used to be returned with DATA_TYPE="OTHER".

With this suggested change we instead use the JDBC types Druid provides and map them to the sqlalchemy types. This has the advantage that Druid can figure out the mapping of the complex types and we just consume what Druid tells us, so hopefully this makes the mapping more resistant to future changes on the Druid side.

FYI this bug breaks Superset, it is not possible to add Datasets with complex columns when using Druid 0.23.0.

How to reproduce the issue

  1. Follow the https://druid.apache.org/docs/0.23.0/tutorials/index.html guide up to step 13
  2. Add a thetaSketch metric in the generated ingestion spec:
"metricsSpec": [
        {
          "type": "thetaSketch",
          "name": "complex_metric",
          "fieldName": "countryName"
        }
]
  1. Continue with the remaining steps
  2. Getting the MetaData relies on get_columns(...), which fails without this PR (KeyError: 'complex<thetasketch>')
from sqlalchemy import create_engine, MetaData
engine = create_engine('druid://druid-router.dca.tumblr.net:8082/druid/v2/sql')
meta = MetaData(schema="druid")
meta.reflect(engine)

Expected: With the new code the reflection works and complex metrics are mapped to BLOB.

@Usiel Usiel changed the title Fixes Druid Type mappings for Druid >= 0.23.0 fix: Druid Type mappings for Druid >= 0.23.0 Aug 3, 2022
@gianm
Copy link
Member

gianm commented Aug 8, 2022

@Usiel I think there's a conflict between this patch and your other one. Could you fix it up please?

Usiel added 2 commits August 9, 2022 11:19
Druid 0.23.0 returns DATA_TYPE values such as 'Complex<thetaSketch>' which are not mapped properly to sqlalchemy.types. We fix this by making use of Druid's JDBC_TYPE return, which should ensure that this doesn't break easily again in the future with more complex types.
@Usiel Usiel force-pushed the usiel/fixes-druid-type-mappings branch from 027976e to 22ea292 Compare August 9, 2022 03:23
@Usiel
Copy link
Contributor Author

Usiel commented Aug 9, 2022

@Usiel I think there's a conflict between this patch and your other one. Could you fix it up please?

Rebased on master, thanks for the reviews!

@@ -189,7 +189,7 @@ def get_columns(self, connection, table_name, schema=None, **kwargs):
return [
{
"name": row.COLUMN_NAME,
"type": type_map[row.DATA_TYPE.lower()],
"type": jdbc_type_map[row.JDBC_TYPE],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we introduce new types in the future this would break again. Do you know if SQLAlchemy can handle "unknown" types? If so, it'd be good to report some kind of "unknown" type instead of throwing an error here.

If it can't handle unknown types, skipping the column is better than throwing an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, looks like sqlalchemy offers a NullType for exactly this case.

I added the fallback and also added a warning log similar to other dialects (e.g. Hive).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks!

We add a fallback mapping to sqlalchemy's NullType with a warning.
@gianm gianm merged commit a9c174f into druid-io:master Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants