Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Stop pending queries when user close dashboard #10836

Closed

Conversation

graceguo-supercat
Copy link

@graceguo-supercat graceguo-supercat commented Sep 11, 2020

SUMMARY

This is a feature requested from airbnb. When user open a large dashboard and apply filters, very possibly there is no cached query results, so the dashboard will send queries to Presto, some of them might be very slow. In airbnb, Presto resource is allocated by user. Each user can run 3 concurrent queries at a time, and extra queries are queued up. As the result new queries will not be executed until queued queries were finished first.

This limitation of using Presto resource sometime causes bad experience for Superset dashboard users. Once user's queries are queued up, there is no way to free up, and users have to wait pretty long time to back to normal (after all the previous queued queries are completed). Both users and Superset have no way to stop the queued queries, even users may want to abandon them.

To avoid this situation, this PR proposes a solution: Superset dashboard detects browser close event. When it happens, Dashboard will call a Presto API, with username and dashboard_id, then Presto will kill/stop all the pending and waiting queries from this user and dashboard.

Note:
This solution depends on an airbnb internal API call, which is not available for other corporations. This PR is to provide a hook that connects Dashboard to internal API so that we can expands Superset functionalities.

TEST PLAN

  1. Open Superset dashboard, and open presto web interface to watch for new queries
  2. Apply dashboard filter, should see new queries started in Presto
  3. Close dashboard from browser, or open other url in the same browser window.
  4. Should see pending queries are killed by Presto.

cc @etr2460 @john-bodley @ktmud

@graceguo-supercat graceguo-supercat force-pushed the gg-StopPendingQueries branch 3 times, most recently from 427c493 to ccae7f3 Compare September 11, 2020 20:54
@codecov-commenter
Copy link

codecov-commenter commented Sep 11, 2020

Codecov Report

Merging #10836 into master will decrease coverage by 4.40%.
The diff coverage is 84.44%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10836      +/-   ##
==========================================
- Coverage   65.88%   61.47%   -4.41%     
==========================================
  Files         815      815              
  Lines       38337    38360      +23     
  Branches     3601     3601              
==========================================
- Hits        25257    23582    -1675     
- Misses      12978    14592    +1614     
- Partials      102      186      +84     
Flag Coverage Δ
#cypress ?
#javascript 61.74% <57.14%> (-0.01%) ⬇️
#python 61.31% <96.77%> (-0.16%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...et-frontend/src/dashboard/components/Dashboard.jsx 76.66% <53.84%> (-12.75%) ⬇️
superset/db_engine_specs/base.py 87.09% <66.66%> (-0.17%) ⬇️
superset-frontend/src/featureFlags.ts 87.50% <100.00%> (-12.50%) ⬇️
superset/config.py 90.42% <100.00%> (ø)
superset/views/core.py 74.73% <100.00%> (+0.26%) ⬆️
superset/views/utils.py 85.16% <100.00%> (+0.86%) ⬆️
superset-frontend/src/SqlLab/App.jsx 0.00% <0.00%> (-100.00%) ⬇️
superset-frontend/src/explore/App.jsx 0.00% <0.00%> (-100.00%) ⬇️
superset-frontend/src/dashboard/App.jsx 0.00% <0.00%> (-100.00%) ⬇️
superset-frontend/src/explore/index.jsx 0.00% <0.00%> (-100.00%) ⬇️
... and 175 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 141ef4a...00f6d93. Read the comment docs.

] = lambda database, user: [
UPLOADED_CSV_HIVE_NAMESPACE
] if UPLOADED_CSV_HIVE_NAMESPACE else []
ALLOWED_USER_CSV_SCHEMA_FUNC: Callable[["Database", "models.User"], List[str]] = (
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

black automatically formatted this section. not related with my PR.

@@ -206,6 +206,10 @@ def is_db_column_type_match(
def get_allow_cost_estimate(cls, version: Optional[str] = None) -> bool:
return False

@classmethod
Copy link
Author

@graceguo-supercat graceguo-supercat Sep 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check blocks requests to those databases that don't support stop queries function, or those companies that not implements stop queries API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of making this a get function, why not call this allows_stop_pending_queries and put it up around line 150?

Also, to be more explicit with the API, can you add the stop_queries function to this class too and make it pass?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, if stop_queries was an empty function, you might be able to remove the allows_stop_pending_queries attribute altogether. I think the only reason we might need the database level allows attribute would be if you wanted to prevent the frontend from sending the request to stop queries. But that could be messy as dashboards can be based off multiple different databases.

So maybe remove the allows check completely and just make stop_queries an empty function here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this suggestion

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the suggestion. fixed by remove get_allow_stop_pending_queries check, and add an empty stop_queries.

@graceguo-supercat graceguo-supercat changed the title [WIP] feat: Stop pending queries when user close dashboard feat: Stop pending queries when user close dashboard Sep 14, 2020
Copy link
Member

@etr2460 etr2460 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few comments, thanks for the implementation here!

self, dashboard_id: int
) -> FlaskResponse:
if is_feature_enabled("STOP_DASHBOARD_PENDING_QUERIES"):
username = g.user.username if g.user else None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is wrapped by has_access_api, does that guarantee that the user object already exists?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. fixed.

WTF_CSRF_EXEMPT_LIST = [
"superset.views.core.log",
"superset.charts.api.data",
"superset.views.core.stop_dashboard_queries",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this dangerous? I assume this means that anyone who steals a session cookie could constantly kill all that user's queries?

it's probably fine for internal deployments, but I wonder if there's a way we could do this without removing CSRF protections

Copy link
Author

@graceguo-supercat graceguo-supercat Sep 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know 2 APIs here that don't use CSRF token: log and stop_dashboard_queries, both of them use sendbeacon which can't send CSRF token. The advantage of sendbeacon vs regular POST is here: it doesn't need to wait response. And it's pretty common for sendbeacon call go without CSRF token.

CSRF token is used to prevent malicious site from executing some transaction, like move money from your bank account to mine :) I feel kill other ppl's queries when their dashboard is still loading, is not very dangerous.

But for superset.charts.api.data i am not sure why it is in the exempt list. But this is not related to this PR.

Copy link
Member

@villebro villebro Sep 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason superset.charts.api.data is exempt is because it is only a POST due to having a large request payload that doesn't sit well with a GET. Therefore it's not really a state changing POST, but a simulated GET.

@@ -206,6 +206,10 @@ def is_db_column_type_match(
def get_allow_cost_estimate(cls, version: Optional[str] = None) -> bool:
return False

@classmethod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of making this a get function, why not call this allows_stop_pending_queries and put it up around line 150?

Also, to be more explicit with the API, can you add the stop_queries function to this class too and make it pass?

@@ -206,6 +206,10 @@ def is_db_column_type_match(
def get_allow_cost_estimate(cls, version: Optional[str] = None) -> bool:
return False

@classmethod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, if stop_queries was an empty function, you might be able to remove the allows_stop_pending_queries attribute altogether. I think the only reason we might need the database level allows attribute would be if you wanted to prevent the frontend from sending the request to stop queries. But that could be messy as dashboards can be based off multiple different databases.

So maybe remove the allows check completely and just make stop_queries an empty function here

@has_access_api
@event_logger.log_this
@expose("/dashboard/<dashboard_id>/stop/", methods=["POST"])
def stop_dashboard_queries( # pylint: disable=no-self-use
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be useful to sync with @dpgaspar on using api v1.
Also please add some tests, we have presto on CI

Copy link
Author

@graceguo-supercat graceguo-supercat Sep 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in open source code base, stop_queries function is empty, nothing to test. airbnb and other companies can add their internal implementation to stop queries by dashboard_id, but this is not a standard Presto API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to have stop_queries implementation in the test config.
It will help in 2 ways:

  1. be a safeguard from open source contributions not to break it
  2. will work as an example for the companies willing to try it out

Copy link
Author

@graceguo-supercat graceguo-supercat Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bkyryliuk do you have an existed example, what is implementation in the test config? thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as I mentioned above, core apis are deprecated and new ones should be added here: https://github.com/apache/incubator-superset/blob/master/superset/dashboards/api.py @dpgaspar & @villebro would probably know more on this topic


# find databases for all charts in a given dashboard
# stop pending query is only available for certain database(s)
for slc in slices:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good candidate for the util function - get database ids from the dashboard

by the way it would be much more efficient if that logic could be done in the SQL itself

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to util function, and added unit test.

Copy link
Member

@ktmud ktmud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of thoughts on naming nits

}
}

componentWillUnmount() {
window.removeEventListener('visibilitychange', this.onVisibilityChange);
}

onStopPendingQueries() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's the beforeunload event that triggers the stopPendingQueries action. So maybe we can just call this method stopPendingQueries?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. fixed method name.

@@ -1001,3 +1001,13 @@ def get_extra_params(database: "Database") -> Dict[str, Any]:
logger.error(ex)
raise ex
return extra

@classmethod
def stop_queries(cls, username: str, dashboard_id: int) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name this stop_dashboard_queries since it accepts dashboard_id as a required parameter? Also, would user_id be better suited for this API than username?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

airbnb's internal API accept username and dashboard_id :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious if there is an interest to have in in superset as well e.g. use smth like:
CALL system.runtime.kill_query('20151207_215727_00146_tx3nr');
prestodb/presto#1515

However a challenge here would be tracking all queries and there ids

Copy link
Author

@graceguo-supercat graceguo-supercat Sep 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct. Dashboard is running in synchronized mode, there is no query id passed from query engine to dashboard. While in SQL lab, which is running in asynchronized mode, query id is saved into database, and celery Worker will update query status.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you find another solution, i am happy to learn :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to introduce a hook here that can either link a database or a db engine spec to a function that handles the query termination? Something similar to what JINJA_CONTEXT_ADDONS or CUSTOM_TEMPLATE_PROCESSORS does in config.py.

@graceguo-supercat graceguo-supercat force-pushed the gg-StopPendingQueries branch 12 times, most recently from c31ee53 to 497b6d6 Compare September 17, 2020 00:49
@graceguo-supercat graceguo-supercat force-pushed the gg-StopPendingQueries branch 14 times, most recently from e8eb039 to 5de1714 Compare September 19, 2020 23:10
datasource = slc.datasource
if (
datasource
and datasource.type == "table"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only return database id for table type datasources.

@villebro villebro force-pushed the gg-StopPendingQueries branch 2 times, most recently from 41112f9 to 1e284c3 Compare October 1, 2020 08:02
@villebro villebro force-pushed the gg-StopPendingQueries branch from 1e284c3 to 847b740 Compare October 1, 2020 08:03
@villebro villebro requested a review from dpgaspar October 1, 2020 08:39
Copy link
Member

@dpgaspar dpgaspar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but left some suggestions

if not dashboard_id:
return self.response(400, message="dashboard_id missing in body")
hook = current_app.config["STOP_DASHBOARD_PENDING_QUERIES_HOOK"]
hook(dashboard_id, g.user.username)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@protect or @has_access does not guarantee we a have a user, since it's possible for a public/non authenticated user to access this resource. It's probably very unlikely that a Public role has access to this resource, but now it's possible to set PUBLIC_ROLE_LIKE = "Admin".

@@ -503,6 +504,50 @@ def data(self) -> Response:

return response

@expose("/data/stop", methods=["POST"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional: maybe it would be simpler to set this endpoint to /data/stop/<id> HTTP POST or PUT, no need for a payload

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why if this endpoint takes a dashboard_id value, does it live in the charts namespace?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the dashboard namespace was considered, the main use case for this endpoint is to terminate pending chart data queries, hence the chart namespace. The signature will likely be broadened in the future, making it possible to cancel queries based on other parameters (cache_key, slice_id etc). This endpoint will likely change significantly when the async framework is fully implemented, and will be revisited then.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel there could be 2 use cases: terminate pending query(s) by dashboard_id, or terminate pending query by chart id. Currently airbnb has internal API to do 1), but i feel in the future (when the async framework is fully implemented), Superset should be able to do both.

content:
application/json:
schema:
type: object
Copy link
Member

@dpgaspar dpgaspar Oct 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need content on a empty response, also see if HTTP 204 makes sense

@@ -309,6 +313,8 @@ def _try_json_readsha( # pylint: disable=unused-argument
"TAGGING_SYSTEM": False,
"SQLLAB_BACKEND_PERSISTENCE": False,
"LISTVIEWS_DEFAULT_CARD_VIEW": False,
# stop pending queries when user close/reload dashboard in browser
"STOP_DASHBOARD_PENDING_QUERIES": False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is behind a feature flag we can go even further and not register the endpoint at all. Take a look at: https://github.com/apache/incubator-superset/blob/master/superset/dashboards/api.py#L173

@stale
Copy link

stale bot commented Dec 25, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

@stale stale bot added the inactive Inactive for >= 30 days label Dec 25, 2020
@amitmiran137
Copy link
Member

closing due to inactivity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inactive Inactive for >= 30 days size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants