Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve engine spec discoverability #14204

Merged
merged 3 commits into from
Apr 20, 2021

Conversation

betodealmeida
Copy link
Member

@betodealmeida betodealmeida commented Apr 16, 2021

SUMMARY

This PR changes the mechanism we use to discover DB engine specs, switching to Python entry points. Using entry points is cleaner, and allows users to distribute new engine specs as packages separate from Superset.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

List of engines before:

{'awsathena': <class 'superset.db_engine_specs.athena.AthenaEngineSpec'>,
 'base': <class 'superset.db_engine_specs.base.BaseEngineSpec'>,
 'bigquery': <class 'superset.db_engine_specs.bigquery.BigQueryEngineSpec'>,
 'clickhouse': <class 'superset.db_engine_specs.clickhouse.ClickHouseEngineSpec'>,
 'cockroachdb': <class 'superset.db_engine_specs.cockroachdb.CockroachDbEngineSpec'>,
 'crate': <class 'superset.db_engine_specs.crate.CrateEngineSpec'>,
 'databricks': <class 'superset.db_engine_specs.databricks.DatabricksHiveEngineSpec'>,
 'dremio': <class 'superset.db_engine_specs.dremio.DremioBaseEngineSpec'>,
 'drill': <class 'superset.db_engine_specs.drill.DrillEngineSpec'>,
 'druid': <class 'superset.db_engine_specs.druid.DruidEngineSpec'>,
 'elasticsearch': <class 'superset.db_engine_specs.elasticsearch.ElasticSearchEngineSpec'>,
 'exa': <class 'superset.db_engine_specs.exasol.ExasolEngineSpec'>,
 'firebird': <class 'superset.db_engine_specs.firebird.FirebirdEngineSpec'>,
 'gsheets': <class 'superset.db_engine_specs.gsheets.GSheetsEngineSpec'>,
 'hana': <class 'superset.db_engine_specs.hana.HanaEngineSpec'>,
 'hive': <class 'superset.db_engine_specs.hive.HiveEngineSpec'>,
 'ibm_db_sa': <class 'superset.db_engine_specs.db2.Db2EngineSpec'>,
 'impala': <class 'superset.db_engine_specs.impala.ImpalaEngineSpec'>,
 'kylin': <class 'superset.db_engine_specs.kylin.KylinEngineSpec'>,
 'mssql': <class 'superset.db_engine_specs.mssql.MssqlEngineSpec'>,
 'mysql': <class 'superset.db_engine_specs.mysql.MySQLEngineSpec'>,
 'odelasticsearch': <class 'superset.db_engine_specs.elasticsearch.OpenDistroEngineSpec'>,
 'oracle': <class 'superset.db_engine_specs.oracle.OracleEngineSpec'>,
 'pinot': <class 'superset.db_engine_specs.pinot.PinotEngineSpec'>,
 'postgres': <class 'superset.db_engine_specs.postgres.PostgresEngineSpec'>,
 'postgresql': <class 'superset.db_engine_specs.postgres.PostgresEngineSpec'>,
 'presto': <class 'superset.db_engine_specs.presto.PrestoEngineSpec'>,
 'redshift': <class 'superset.db_engine_specs.redshift.RedshiftEngineSpec'>,
 'snowflake': <class 'superset.db_engine_specs.snowflake.SnowflakeEngineSpec'>,
 'solr': <class 'superset.db_engine_specs.solr.SolrEngineSpec'>,
 'sqlite': <class 'superset.db_engine_specs.sqlite.SqliteEngineSpec'>,
 'teradata': <class 'superset.db_engine_specs.teradata.TeradataEngineSpec'>,
 'trino': <class 'superset.db_engine_specs.trino.TrinoEngineSpec'>,
 'vertica': <class 'superset.db_engine_specs.vertica.VerticaEngineSpec'>}

And after:

{'awsathena': <class 'superset.db_engine_specs.athena.AthenaEngineSpec'>,
 'bigquery': <class 'superset.db_engine_specs.bigquery.BigQueryEngineSpec'>,
 'clickhouse': <class 'superset.db_engine_specs.clickhouse.ClickHouseEngineSpec'>,
 'cockroachdb': <class 'superset.db_engine_specs.cockroachdb.CockroachDbEngineSpec'>,
 'crate': <class 'superset.db_engine_specs.crate.CrateEngineSpec'>,
 'databricks': <class 'superset.db_engine_specs.databricks.DatabricksHiveEngineSpec'>,
 'dremio': <class 'superset.db_engine_specs.dremio.DremioBaseEngineSpec'>,
 'drill': <class 'superset.db_engine_specs.drill.DrillEngineSpec'>,
 'druid': <class 'superset.db_engine_specs.druid.DruidEngineSpec'>,
 'elasticsearch': <class 'superset.db_engine_specs.elasticsearch.ElasticSearchEngineSpec'>,
 'exa': <class 'superset.db_engine_specs.exasol.ExasolEngineSpec'>,
 'firebird': <class 'superset.db_engine_specs.firebird.FirebirdEngineSpec'>,
 'gsheets': <class 'superset.db_engine_specs.gsheets.GSheetsEngineSpec'>,
 'hana': <class 'superset.db_engine_specs.hana.HanaEngineSpec'>,
 'hive': <class 'superset.db_engine_specs.hive.HiveEngineSpec'>,
 'ibm_db_sa': <class 'superset.db_engine_specs.db2.Db2EngineSpec'>,
 'impala': <class 'superset.db_engine_specs.impala.ImpalaEngineSpec'>,
 'kylin': <class 'superset.db_engine_specs.kylin.KylinEngineSpec'>,
 'mssql': <class 'superset.db_engine_specs.mssql.MssqlEngineSpec'>,
 'mysql': <class 'superset.db_engine_specs.mysql.MySQLEngineSpec'>,
 'odelasticsearch': <class 'superset.db_engine_specs.elasticsearch.OpenDistroEngineSpec'>,
 'oracle': <class 'superset.db_engine_specs.oracle.OracleEngineSpec'>,
 'pinot': <class 'superset.db_engine_specs.pinot.PinotEngineSpec'>,
 'postgres': <class 'superset.db_engine_specs.postgres.PostgresEngineSpec'>,
 'postgresql': <class 'superset.db_engine_specs.postgres.PostgresEngineSpec'>,
 'presto': <class 'superset.db_engine_specs.presto.PrestoEngineSpec'>,
 'redshift': <class 'superset.db_engine_specs.redshift.RedshiftEngineSpec'>,
 'snowflake': <class 'superset.db_engine_specs.snowflake.SnowflakeEngineSpec'>,
 'solr': <class 'superset.db_engine_specs.solr.SolrEngineSpec'>,
 'sqlite': <class 'superset.db_engine_specs.sqlite.SqliteEngineSpec'>,
 'teradata': <class 'superset.db_engine_specs.teradata.TeradataEngineSpec'>,
 'trino': <class 'superset.db_engine_specs.trino.TrinoEngineSpec'>,
 'vertica': <class 'superset.db_engine_specs.vertica.VerticaEngineSpec'>}

The only difference is the absence of the base class: 'base': <class 'superset.db_engine_specs.base.BaseEngineSpec'>,

TEST PLAN

Tested Superset, everything works as expected.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@codecov
Copy link

codecov bot commented Apr 16, 2021

Codecov Report

Merging #14204 (3276b3b) into master (1448f78) will decrease coverage by 0.08%.
The diff coverage is 74.19%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #14204      +/-   ##
==========================================
- Coverage   76.13%   76.05%   -0.09%     
==========================================
  Files         945      945              
  Lines       47927    47946      +19     
  Branches     5950     5950              
==========================================
- Hits        36491    36464      -27     
- Misses      11230    11276      +46     
  Partials      206      206              
Flag Coverage Δ
hive 80.38% <74.19%> (-0.02%) ⬇️
mysql 80.67% <74.19%> (-0.02%) ⬇️
postgres 80.71% <74.19%> (-0.02%) ⬇️
presto ?
python 81.09% <74.19%> (-0.17%) ⬇️
sqlite 80.31% <74.19%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/models/core.py 88.85% <50.00%> (-0.49%) ⬇️
superset/db_engine_specs/__init__.py 81.81% <77.77%> (-18.19%) ⬇️
superset/db_engine_specs/presto.py 84.42% <0.00%> (-5.48%) ⬇️
superset/connectors/sqla/models.py 88.61% <0.00%> (-1.46%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1448f78...3276b3b. Read the comment docs.

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! One question about automating population of the built-in specs, other than that LGTM!

setup.py Outdated
Comment on lines 65 to 101
entry_points={
"console_scripts": ["superset=superset.cli:superset",],
"superset.db_engine_specs": [
"athena=superset.db_engine_specs.athena:AthenaEngineSpec",
"bigquery=superset.db_engine_specs.bigquery:BigQueryEngineSpec",
"clickhouse=superset.db_engine_specs.clickhouse:ClickHouseEngineSpec",
"cockroachdb=superset.db_engine_specs.cockroachdb:CockroachDbEngineSpec",
"crate=superset.db_engine_specs.crate:CrateEngineSpec",
"databricks=superset.db_engine_specs.databricks:DatabricksHiveEngineSpec",
"db2=superset.db_engine_specs.db2:Db2EngineSpec",
"dremio=superset.db_engine_specs.dremio:DremioBaseEngineSpec",
"drill=superset.db_engine_specs.drill:DrillEngineSpec",
"druid=superset.db_engine_specs.druid:DruidEngineSpec",
"elasticsearch=superset.db_engine_specs.elasticsearch:ElasticSearchEngineSpec",
"odelasticsearch=superset.db_engine_specs.elasticsearch:OpenDistroEngineSpec",
"exasol=superset.db_engine_specs.exasol:ExasolEngineSpec",
"firebird=superset.db_engine_specs.firebird:FirebirdEngineSpec",
"gsheets=superset.db_engine_specs.gsheets:GSheetsEngineSpec",
"hana=superset.db_engine_specs.hana:HanaEngineSpec",
"hive=superset.db_engine_specs.hive:HiveEngineSpec",
"impala=superset.db_engine_specs.impala:ImpalaEngineSpec",
"kylin=superset.db_engine_specs.kylin:KylinEngineSpec",
"mssql=superset.db_engine_specs.mssql:MssqlEngineSpec",
"mysql=superset.db_engine_specs.mysql:MySQLEngineSpec",
"oracle=superset.db_engine_specs.oracle:OracleEngineSpec",
"pinot=superset.db_engine_specs.pinot:PinotEngineSpec",
"postgres=superset.db_engine_specs.postgres:PostgresEngineSpec",
"presto=superset.db_engine_specs.presto:PrestoEngineSpec",
"redshift=superset.db_engine_specs.redshift:RedshiftEngineSpec",
"snowflake=superset.db_engine_specs.snowflake:SnowflakeEngineSpec",
"solr=superset.db_engine_specs.solr:SolrEngineSpec",
"sqlite=superset.db_engine_specs.sqlite:SqliteEngineSpec",
"teradata=superset.db_engine_specs.teradata:TeradataEngineSpec",
"trino=superset.db_engine_specs.trino:TrinoEngineSpec",
"vertica=superset.db_engine_specs.vertica:VerticaEngineSpec",
],
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to automate this part by iterating over modules in superset/db_engine_specs/*.py and checking that they are issubclass but not isinstance of BaseEngineSpec?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do!

@betodealmeida betodealmeida force-pushed the db_engine_specs_discover branch from e9350a0 to b90bf7e Compare April 19, 2021 14:20
@betodealmeida betodealmeida force-pushed the db_engine_specs_discover branch from b90bf7e to 09c12de Compare April 19, 2021 22:59
@betodealmeida betodealmeida force-pushed the db_engine_specs_discover branch from 09c12de to 3276b3b Compare April 19, 2021 23:12
@betodealmeida betodealmeida merged commit 13bf023 into apache:master Apr 20, 2021
etr2460 pushed a commit to airbnb/superset-fork that referenced this pull request Apr 22, 2021
try:
engine_spec = ep.load()
except Exception: # pylint: disable=broad-except
logger.warning("Unable to load engine spec: %s", engine_spec)
Copy link
Member

@john-bodley john-bodley Apr 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@betodealmeida there's a bug here. If line 68 throws an exception then engine_spec isn't defined. Also adding documentation as to how to define external modules via entry_points could be helpful.

john-bodley pushed a commit to airbnb/superset-fork that referenced this pull request Apr 26, 2021
* feat: improve engine spec discoverability

* Address comments

* Fix tests

(cherry picked from commit 13bf023)
QAlexBall pushed a commit to QAlexBall/superset that referenced this pull request Dec 29, 2021
* feat: improve engine spec discoverability

* Address comments

* Fix tests
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.2.0 labels Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/M 🚢 1.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants