-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: refactor all get_sqla_engine
to use contextmanager in codebase
#21943
Conversation
Codecov Report
@@ Coverage Diff @@
## master #21943 +/- ##
==========================================
+ Coverage 67.09% 68.06% +0.96%
==========================================
Files 1827 1827
Lines 69866 72646 +2780
Branches 7548 7548
==========================================
+ Hits 46875 49444 +2569
- Misses 21035 21246 +211
Partials 1956 1956
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
/testenv up |
@hughhhh Ephemeral environment spinning up at http://34.222.64.144:8080. Credentials are |
superset/db_engine_specs/base.py
Outdated
with database.get_sqla_engine_with_context( | ||
schema=schema, source=source | ||
) as engine: | ||
return engine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work; the context manager will be exited before the functions returns. It's equivalent to:
with database.get_sqla_engine_with_context(...) as engine:
a = engine
return a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so should i just return database._ get_sqla_engine()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so should i just
return database._ get_sqla_engine()
If you do that, then the SSH tunneling wouldn't be setup in places that call DBEngineSpec.get_engine()
, defeating the purpose of the changes in this PR. You need to convert this into a context manager as well (and update everything downstream), so that the setup/teardown works as expected.
You can play with a simple Python script to understand how this works:
from contextlib import contextmanager
@contextmanager
def get_sqla_engine_with_context():
print("enabling ssh tunnel")
yield 42
print("disabling ssh tunnel")
def get_engine():
with get_sqla_engine_with_context() as engine:
return engine
print("start")
engine = get_engine()
print("I have the engine, I can now work on it")
print("end")
Running the code above prints:
start
enabling ssh tunnel
disabling ssh tunnel
I have the engine, I can now work on it
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add a get_raw_connection
context manager (that uses closing
), it should simplify the code a lot. Could be done in this PR or a separate one.
We should also make the use engine.raw_connection().execute()
vs. engine.execute()
more consistent, standardizing on the latter, but we can do that in a separate PR.
with dataset.database.get_sqla_engine_with_context( | ||
schema=dataset.schema | ||
) as engine: | ||
with closing(engine.raw_connection()) as conn: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to do this now, but eventually we could have a database.get_raw_connection
context manager, to simplify things a little bit. This way we could rewrite this as:
with dataset.database.get_raw_connection(schema=dataset.schema) as conn:
cursor = conn.cursor()
...
And the implementation of get_raw_connection()
would take care of closing the connection:
@contextmanager
def get_raw_connection(...):
with get_sqla_engine_with_context(...) as engine:
with closing(engine.raw_connection()) as conn:
yield conn
int( | ||
app.config[ | ||
"TEST_DATABASE_CONNECTION_TIMEOUT" | ||
].total_seconds() | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, I'd compute this outside the with
block just to improve readability:
timeout = app.config["TEST_DATABASE_CONNECTION_TIMEOUT"].total_seconds()
with database.get_sqla_engine_with_context() as engine:
try:
alive = func_timeout(int(timeout), ping, args=(engine,))
Also, func_timeout
should take floats. :-P
if not alive: | ||
raise DatabaseOfflineError( | ||
SupersetError( | ||
message=__("Database is offline."), | ||
error_type=SupersetErrorType.GENERIC_DB_ENGINE_ERROR, | ||
level=ErrorLevel.ERROR, | ||
), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can dedent this block, no?
to_sql_kwargs["method"] = "multi" | ||
with cls.get_engine(database) as engine: | ||
if engine.dialect.supports_multivalues_insert: | ||
to_sql_kwargs["method"] = "multi" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to improve this a little bit... here we're building an engine just to check an attribute in the dialect, which means we're setting up and tearing down an SSH connection just to read an attribute. :-(
Maybe we should add a get_dialect
method to the DB engine spec, that builds the engine without the context manager:
@classmethod
def get_dialect(database, schema, source):
engine = database.get_sqla_engine(schema=schema, source=source)
return engine.dialect
Then when we only need the dialect we can call this method, which is cheaper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case we still need the engine
though, so it makes more sense to just use get_engine
instead of just the dialect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point, I missed that.
with cls.get_engine(database) as engine: | ||
to_gbq_kwargs = { | ||
"destination_table": str(table), | ||
"project_id": engine.url.host, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should add an attribute to DB engine specs annotating if they support SSH tunnel or not? BigQuery, eg, will probably never support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created a ticket for this
@@ -205,7 +203,8 @@ def df_to_sql( | |||
if table_exists: | |||
raise SupersetException("Table already exists") | |||
elif to_sql_kwargs["if_exists"] == "replace": | |||
engine.execute(f"DROP TABLE IF EXISTS {str(table)}") | |||
with cls.get_engine(database) as engine: | |||
engine.execute(f"DROP TABLE IF EXISTS {str(table)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's interesting that sometimes we use engine.raw_connection().execute
, and others we use engine.execute
. Ideally we should standardize in the latter wherever possible, since it's more concise.
engine = database.get_sqla_engine() | ||
schema = inspect(engine).default_schema_name | ||
table_exists = database.has_table_by_name(tbl_name) | ||
with database.get_sqla_engine_with_context() as engine: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really should convert these old examples into the new format (YAML + CSV)...
superset/models/core.py
Outdated
yield self._get_sqla_engine(schema=schema, nullpool=nullpool, source=source) | ||
except Exception as ex: | ||
raise self.db_engine_spec.get_dbapi_mapped_exception(ex) | ||
raise ex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need try/except
here if you're raising all exceptions.
@@ -733,7 +733,7 @@ def test_execute_sql_statements(self, mock_execute_sql_statement, mock_get_query | |||
mock_query = mock.MagicMock() | |||
mock_query.database.allow_run_async = False | |||
mock_cursor = mock.MagicMock() | |||
mock_query.database.get_sqla_engine.return_value.raw_connection.return_value.cursor.return_value = ( | |||
mock_query.database.get_sqla_engine_with_context.return_value.__enter__.return_value.raw_connection.return_value.cursor.return_value = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can replace all intermediary .return_value
with ()
(but not the last):
mock_query.database.get_sqla_engine_with_context.return_value.__enter__.return_value.raw_connection.return_value.cursor.return_value = ( | |
mock_query.database.get_sqla_engine_with_context().__enter__().raw_connection().cursor.return_value = ( |
…erset into ref-get-sqla-engine-2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, there's just one place that needs to be fixed.
with database.get_sqla_engine_with_context() as engine: | ||
connection = engine | ||
|
||
df.to_sql( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will only run in the else
block, but it needs to run in the if database.sqlalchemy_uri == current_app.config.get("SQLALCHEMY_DATABASE_URI"):
block as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ephemeral environment shutdown and build artifacts deleted. |
SUMMARY
In this PR, we moving
get_sqla_engine
to be leveraged as a contextmanager through out the entire codebase. What that means is any time we want to instantiate an engine we'll have to use awith ___ as ___
block. For example:BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
ADDITIONAL INFORMATION