-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Stop pending queries when user close dashboard #10836
Changes from 6 commits
1001523
eeb9e33
7e52c2d
8b1961b
40bb23c
0825159
dc00f15
5051558
1c302df
847b740
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -172,7 +172,11 @@ def _try_json_readsha( # pylint: disable=unused-argument | |
WTF_CSRF_ENABLED = True | ||
|
||
# Add endpoints that need to be exempt from CSRF protection | ||
WTF_CSRF_EXEMPT_LIST = ["superset.views.core.log", "superset.charts.api.data"] | ||
WTF_CSRF_EXEMPT_LIST = [ | ||
"superset.views.core.log", | ||
"superset.charts.api.data", | ||
"superset.views.core.stop_dashboard_queries", | ||
] | ||
|
||
# Whether to run the web server in debug mode or not | ||
DEBUG = os.environ.get("FLASK_ENV") == "development" | ||
|
@@ -309,6 +313,8 @@ def _try_json_readsha( # pylint: disable=unused-argument | |
"SIP_38_VIZ_REARCHITECTURE": False, | ||
"TAGGING_SYSTEM": False, | ||
"SQLLAB_BACKEND_PERSISTENCE": False, | ||
# stop pending queries when user close/reload dashboard in browser | ||
"STOP_DASHBOARD_PENDING_QUERIES": False, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this is behind a feature flag we can go even further and not register the endpoint at all. Take a look at: https://github.com/apache/incubator-superset/blob/master/superset/dashboards/api.py#L173 |
||
} | ||
|
||
# This is merely a default. | ||
|
@@ -627,11 +633,11 @@ class CeleryConfig: # pylint: disable=too-few-public-methods | |
# db configuration and a result of this function. | ||
|
||
# mypy doesn't catch that if case ensures list content being always str | ||
ALLOWED_USER_CSV_SCHEMA_FUNC: Callable[ | ||
["Database", "models.User"], List[str] | ||
] = lambda database, user: [ | ||
UPLOADED_CSV_HIVE_NAMESPACE | ||
] if UPLOADED_CSV_HIVE_NAMESPACE else [] | ||
ALLOWED_USER_CSV_SCHEMA_FUNC: Callable[["Database", "models.User"], List[str]] = ( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
lambda database, user: [UPLOADED_CSV_HIVE_NAMESPACE] | ||
if UPLOADED_CSV_HIVE_NAMESPACE | ||
else [] | ||
) | ||
|
||
# Values that should be treated as nulls for the csv uploads. | ||
CSV_DEFAULT_NA_NAMES = list(STR_NA_VALUES) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1025,3 +1025,13 @@ def get_extra_params(database: "Database") -> Dict[str, Any]: | |
logger.error(ex) | ||
raise ex | ||
return extra | ||
|
||
@classmethod | ||
def stop_queries(cls, username: str, dashboard_id: int) -> None: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we name this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. airbnb's internal API accept username and dashboard_id :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. curious if there is an interest to have in in superset as well e.g. use smth like: However a challenge here would be tracking all queries and there ids There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. correct. Dashboard is running in synchronized mode, there is no query id passed from query engine to dashboard. While in SQL lab, which is running in asynchronized mode, query id is saved into database, and celery Worker will update query status. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if you find another solution, i am happy to learn :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it be possible to introduce a hook here that can either link a database or a db engine spec to a function that handles the query termination? Something similar to what |
||
""" | ||
An empty function. The actual stop implementation depends on the engine | ||
|
||
:param: username: user sends out queries | ||
:param dashboard_id: dashboard has charts that waiting for queries | ||
""" | ||
return None |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -118,6 +118,7 @@ | |
check_slice_perms, | ||
get_cta_schema_name, | ||
get_dashboard_extra_filters, | ||
get_database_ids, | ||
get_datasource_info, | ||
get_form_data, | ||
get_viz, | ||
|
@@ -1421,6 +1422,25 @@ def fave_slices( # pylint: disable=no-self-use | |
payload.append(dash) | ||
return json_success(json.dumps(payload, default=utils.json_int_dttm_ser)) | ||
|
||
@api | ||
@has_access_api | ||
@event_logger.log_this | ||
@expose("/dashboard/<int:dashboard_id>/stop/", methods=["POST"]) | ||
def stop_dashboard_queries( # pylint: disable=no-self-use | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It may be useful to sync with @dpgaspar on using api v1. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. in open source code base, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it would be nice to have stop_queries implementation in the test config.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bkyryliuk do you have an existed example, what is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think as I mentioned above, core apis are deprecated and new ones should be added here: https://github.com/apache/incubator-superset/blob/master/superset/dashboards/api.py @dpgaspar & @villebro would probably know more on this topic |
||
self, dashboard_id: int | ||
) -> FlaskResponse: | ||
if is_feature_enabled("STOP_DASHBOARD_PENDING_QUERIES"): | ||
username = g.user.username | ||
database_ids = get_database_ids(dashboard_id) | ||
|
||
# stop pending query is only available for certain database(s) | ||
for dbid in database_ids: | ||
mydb = db.session.query(models.Database).get(dbid) | ||
if mydb: | ||
mydb.db_engine_spec.stop_queries(username, int(dashboard_id)) | ||
|
||
return Response(status=200) | ||
|
||
@event_logger.log_this | ||
@api | ||
@has_access_api | ||
|
@@ -1778,7 +1798,7 @@ def sync_druid_source(self) -> FlaskResponse: # pylint: disable=no-self-use | |
@expose("/get_or_create_table/", methods=["POST"]) | ||
@event_logger.log_this | ||
def sqllab_table_viz(self) -> FlaskResponse: # pylint: disable=no-self-use | ||
""" Gets or creates a table object with attributes passed to the API. | ||
"""Gets or creates a table object with attributes passed to the API. | ||
|
||
It expects the json with params: | ||
* datasourceName - e.g. table name, required | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -213,6 +213,33 @@ def get_datasource_info( | |
return datasource_id, datasource_type | ||
|
||
|
||
def get_database_ids(dashboard_id: int) -> List[int]: | ||
""" | ||
Find all database ids used by a given dashboard | ||
|
||
:param dashboard_id: The dashboard id | ||
:returns: A list of database ids used by the given dashboard | ||
""" | ||
dashboard = db.session.query(Dashboard).filter_by(id=dashboard_id).one() | ||
slices = dashboard.slices | ||
datasource_ids: Set[int] = set() | ||
database_ids: Set[int] = set() | ||
|
||
for slc in slices: | ||
datasource = slc.datasource | ||
if ( | ||
datasource | ||
and datasource.type == "table" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. only return database id for |
||
and datasource.id not in datasource_ids | ||
): | ||
datasource_ids.add(datasource.id) | ||
database = datasource.database | ||
if database: | ||
database_ids.add(database.id) | ||
|
||
return list(database_ids) | ||
|
||
|
||
def apply_display_max_row_limit( | ||
sql_results: Dict[str, Any], rows: Optional[int] = None | ||
) -> Dict[str, Any]: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1237,6 +1237,22 @@ def test_get_column_names_from_metric(self): | |
"my_col" | ||
] | ||
|
||
@mock.patch.dict( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added a test for |
||
"superset.extensions.feature_flag_manager._feature_flags", | ||
{"STOP_DASHBOARD_PENDING_QUERIES": True}, | ||
clear=True, | ||
) | ||
def test_stop_dashboard_queries(self): | ||
username = "admin" | ||
self.login(username) | ||
dashboard = self.get_dash_by_slug("births") | ||
with mock.patch.object(BaseEngineSpec, "stop_queries") as mock_stop_queries: | ||
resp = self.client.post(f"/superset/dashboard/{dashboard.id}/stop/") | ||
|
||
self.assertTrue(is_feature_enabled("STOP_DASHBOARD_PENDING_QUERIES")) | ||
self.assertEqual(resp.status_code, 200) | ||
mock_stop_queries.assert_called_once_with(username, dashboard.id) | ||
|
||
|
||
if __name__ == "__main__": | ||
unittest.main() |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,6 +23,7 @@ | |
import json | ||
import os | ||
import re | ||
from typing import List | ||
from unittest.mock import Mock, patch | ||
|
||
import numpy | ||
|
@@ -32,6 +33,7 @@ | |
|
||
import tests.test_app | ||
from superset import app, db, security_manager | ||
from superset.connectors.base.models import BaseDatasource | ||
from superset.exceptions import CertificateException, SupersetException | ||
from superset.models.core import Database, Log | ||
from superset.models.dashboard import Dashboard | ||
|
@@ -45,6 +47,7 @@ | |
get_form_data_token, | ||
get_iterable, | ||
get_email_address_list, | ||
get_example_database, | ||
get_or_create_db, | ||
get_since_until, | ||
get_stacktrace, | ||
|
@@ -67,6 +70,7 @@ | |
from superset.utils import schema | ||
from superset.views.utils import ( | ||
build_extra_filters, | ||
get_database_ids, | ||
get_form_data, | ||
get_time_range_endpoints, | ||
) | ||
|
@@ -1134,3 +1138,80 @@ def test_get_form_data_token(self): | |
assert get_form_data_token({"token": "token_abcdefg1"}) == "token_abcdefg1" | ||
generated_token = get_form_data_token({}) | ||
assert re.match(r"^token_[a-z0-9]{8}$", generated_token) is not None | ||
|
||
def test_get_database_ids(self) -> None: | ||
world_health = db.session.query(Dashboard).filter_by(slug="world_health").one() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let's tests some edge cases here:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is no There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. wow nice, what about iframe chart? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
dash_id = world_health.id | ||
database_ids = get_database_ids(dash_id) | ||
assert len(database_ids) == 1 | ||
assert database_ids == [get_example_database().id] | ||
|
||
def test_get_database_ids_empty_dash(self) -> None: | ||
# test dash with no slice | ||
dashboard = Dashboard(dashboard_title="no slices", id=101, slices=[]) | ||
with patch("superset.db.session.query") as mock_query: | ||
mock_query.return_value.filter_by.return_value.one.return_value = dashboard | ||
database_ids = get_database_ids(dashboard.id) | ||
assert database_ids == [] | ||
|
||
def test_get_database_ids_multiple_databases(self) -> None: | ||
# test dash with 2 databases | ||
datasource_1 = Mock() | ||
datasource_1.type = "table" | ||
datasource_1.datasource_name = "table_datasource_1" | ||
datasource_1.database = Mock() | ||
|
||
datasource_2 = Mock() | ||
datasource_2.type = "table" | ||
datasource_2.datasource_name = "table_datasource_2" | ||
datasource_2.database = Mock() | ||
|
||
slices = [ | ||
Slice( | ||
datasource_id=datasource_1.id, | ||
datasource_type=datasource_1.type, | ||
datasource_name=datasource_1.datasource_name, | ||
slice_name="slice_name_1", | ||
), | ||
Slice( | ||
datasource_id=datasource_2.id, | ||
datasource_type=datasource_2.type, | ||
datasource_name=datasource_2.datasource_name, | ||
slice_name="slice_name_2", | ||
), | ||
] | ||
dashboard = Dashboard(dashboard_title="with 2 slices", id=102, slices=slices) | ||
with patch("superset.db.session.query") as mock_query: | ||
mock_query.return_value.filter_by.return_value.one.return_value = dashboard | ||
mock_query.return_value.filter_by.return_value.first.side_effect = [ | ||
datasource_1, | ||
datasource_2, | ||
] | ||
database_ids = get_database_ids(dashboard.id) | ||
self.assertCountEqual( | ||
database_ids, [datasource_1.database.id, datasource_2.database.id] | ||
) | ||
|
||
def test_get_database_ids_druid(self) -> None: | ||
druid_datasource = Mock() | ||
druid_datasource.type = "druid" | ||
druid_datasource.datasource_name = "druid_datasource_1" | ||
druid_datasource.cluster = Mock() | ||
|
||
slices = [ | ||
Slice( | ||
datasource_id=druid_datasource.id, | ||
datasource_type=druid_datasource.type, | ||
datasource_name=druid_datasource.datasource_name, | ||
slice_name="slice_name_1", | ||
), | ||
] | ||
dashboard = Dashboard(dashboard_title="druid dash", id=103, slices=slices) | ||
with patch("superset.db.session.query") as mock_query: | ||
mock_query.return_value.filter_by.return_value.one.return_value = dashboard | ||
mock_query.return_value.filter_by.return_value.first.return_value = ( | ||
druid_datasource | ||
) | ||
database_ids = get_database_ids(dashboard.id) | ||
# druid slice has no database id | ||
assert database_ids == [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this dangerous? I assume this means that anyone who steals a session cookie could constantly kill all that user's queries?
it's probably fine for internal deployments, but I wonder if there's a way we could do this without removing CSRF protections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know 2 APIs here that don't use CSRF token: log and stop_dashboard_queries, both of them use
sendbeacon
which can't send CSRF token. The advantage of sendbeacon vs regular POST is here: it doesn't need to wait response. And it's pretty common for sendbeacon call go without CSRF token.CSRF token is used to prevent malicious site from executing some transaction, like move money from your bank account to mine :) I feel kill other ppl's queries when their dashboard is still loading, is not very dangerous.
But for
superset.charts.api.data
i am not sure why it is in the exempt list. But this is not related to this PR.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason
superset.charts.api.data
is exempt is because it is only a POST due to having a large request payload that doesn't sit well with a GET. Therefore it's not really a state changing POST, but a simulated GET.