Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement on_kill() for SQLExecuteQueryOperator #27514

Closed

Conversation

kazanzhy
Copy link
Contributor

@kazanzhy kazanzhy commented Nov 5, 2022

@boring-cyborg boring-cyborg bot added area:providers provider:common-sql provider:snowflake Issues related to Snowflake provider labels Nov 5, 2022
@kazanzhy kazanzhy force-pushed the on_kill_actions_for_sql_provider branch 2 times, most recently from 23073ef to 622c37c Compare November 5, 2022 23:16
@kazanzhy
Copy link
Contributor Author

kazanzhy commented Nov 6, 2022

.kill() method is implemented only for Trino/Preso and Snowflake.
Cursors of these connectors return query_id which is used to stop running query.

For the other databases we could use something like:

select pg_terminate_backend(pid) 
from pg_stat_activity 
where query = '';

if we will save the latest queries as well as QueryId

@kazanzhy kazanzhy force-pushed the on_kill_actions_for_sql_provider branch from 622c37c to 9c0ad7c Compare November 6, 2022 21:37
@kazanzhy kazanzhy marked this pull request as ready for review November 6, 2022 21:38
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. I like the way it is implemented now.

@potiuk potiuk requested a review from dstandish November 6, 2022 23:47
@kazanzhy kazanzhy force-pushed the on_kill_actions_for_sql_provider branch 3 times, most recently from 1bbeb06 to 186036b Compare November 10, 2022 00:30
@@ -264,6 +266,7 @@ def run(
results = []
for sql_statement in sql:
self._run_command(cur, sql_statement, parameters)
self._update_query_ids(cur)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, the query is done, so what is the utility of updating the query id list here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I checked it for Snowflake.
photo_2022-11-13_00-53-28

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have access to any Trino and Presto instances to check it also for these DBs?

@kazanzhy kazanzhy force-pushed the on_kill_actions_for_sql_provider branch from 186036b to 928a2cc Compare November 10, 2022 23:09
@eladkal
Copy link
Contributor

eladkal commented Nov 11, 2022

@kazanzhy do we have something else to resolve or this PR is ready for final review?

@dstandish
Copy link
Contributor

@kazanzhy do we have something else to resolve or this PR is ready for final review?

@eladkal why do you think we should merge this, given that it does not do what it tries to do?

The queries it says are running are not actually running. The queries it will kill in on_kill are not running. Before we add a new interface, shouldn't we start with code that does what it's supposed to do?

@dstandish dstandish requested a review from potiuk November 11, 2022 20:01
@@ -264,6 +266,7 @@ def run(
results = []
for sql_statement in sql:
self._run_command(cur, sql_statement, parameters)
self._update_query_ids(cur)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that we need to address this issue, where the queries aren't actually running and therefore will not be killed, before adding this interface.

@eladkal
Copy link
Contributor

eladkal commented Nov 11, 2022

@eladkal why do you think we should merge this, given that it does not do what it tries to do?

I didnt say we should merge this. I asked if its ready to review. I wasnt sure if the running queries problem was adressed.

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually need changes in the hook to capture query IDs and kill them, couldn't that be solely on the operators?

From an architectural PoV it feels odd/out-of-place with the rest of the code base they were track what is effectively operator state (query ids) on the hook object rather than in the Operator object

@dstandish
Copy link
Contributor

dstandish commented Nov 11, 2022

@eladkal why do you think we should merge this, given that it does not do what it tries to do?

I didnt say we should merge this. I asked if its ready to review. I wasnt sure if the running queries problem was adressed.

OK cool i thought it meant like... you were preparing to merge it :)

Cus I had commented after the recent updates like saying "this is still an issue" but then no response and the next thing i saw was "ok anything else"?

Maybe it was the word "final" which I interpreted to mean you thought it was basically done. But yeah definitely more work to do.

Anyway, thank you. Looking forward to getting this right... just ... want to get it right :)

@kazanzhy
Copy link
Contributor Author

@dstandish I understand your concerns about this PR.
I must say that these changes are just returning deleted code in TrinoHook (df00436#diff-9a702240b5127432b36cea027e9846f74943e8c20c292f4301c182c6063db880) and refactor of SnowflakeHook to use DbApiHook.run() instead of SnowflakeHook.run().
So as we saw now there's no sense in using query_ids for on_kill(). But we can save them for other reasons.

I'm proposing the following steps when we will be sure that Trino cursor.execute for Trino is also synchronous.

@kazanzhy kazanzhy force-pushed the on_kill_actions_for_sql_provider branch from 928a2cc to 20538cf Compare November 13, 2022 21:52
@potiuk potiuk requested a review from dstandish November 16, 2022 20:52
@dstandish
Copy link
Contributor

I'm a little skeptical about Merge SnowflakeHook.run and TrinoHook.run to DbApiHook.run simply because there isn't clear utility apart from maybe consolidating shared code between two hooks. But we don't add to the base hook every time two subclasses share any code. And it's unclear to me anyway that this particular feature is actually useful / a good idea. But if you think there's a valuable interface to add, nothing wrong with opening the PR. Might make sense to do new PR for it though and just close this one. Then we can examine the interface from the simple perspective of... is this a feature we want to add to DbApiHook.

@kazanzhy kazanzhy force-pushed the on_kill_actions_for_sql_provider branch from 20538cf to 68602b9 Compare November 17, 2022 23:45
@kazanzhy kazanzhy force-pushed the on_kill_actions_for_sql_provider branch from 68602b9 to ec760ac Compare November 18, 2022 00:22
@kazanzhy
Copy link
Contributor Author

This PR is really overloaded.
Let's discuss #27763 and #27762.
After that, we could come back here or just close this PR.
@dstandish

@kazanzhy kazanzhy marked this pull request as draft November 18, 2022 00:33
@dstandish
Copy link
Contributor

This PR is really overloaded. Let's discuss #27763 and #27762. After that, we could come back here or just close this PR. @dstandish

great, thanks

@kazanzhy
Copy link
Contributor Author

Can I conclude that this PR should be closed as well #27314 because of the absent possibility of canceling queries after their execution using a synchronous cursor?
@eladkal @dstandish

@kazanzhy kazanzhy closed this Nov 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generalize on_kill() for SQLExecuteQueryOperator
5 participants