feat: update timeout error UX #10274

etr2460 · 2020-07-09T18:30:06Z

SUMMARY

Implements new error designs within Superset, starting with timeout errors. This standardizes all timeout errors, frontend and backend, to the UI component. Future errors will be built following this same pattern. Also adds documentation pointers for potential root causes so users can unblock themselves

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TEST PLAN

CI + unit tests
Set the timeouts to 1 second and see the new error component across Dashboards, Explore, and SQL Lab. Ensure everything renders correctly when there are no errors.

ADDITIONAL INFORMATION

to: @graceguo-supercat @ktmud @nytai @rusackas
cc: @Steejay

superset-frontend/src/components/ErrorMessage/TimeoutErrorMessage.tsx

ktmud

Great feature! Added a couple of comments.

ktmud · 2020-07-09T18:52:04Z

superset-frontend/src/components/ErrorMessage/TimeoutErrorMessage.tsx

+      {['chart', 'dashboard'].includes(extra.location) && extra.owners && (
+        <>
+          <br />
+          <p>{t('Please reach out to the Chart Owner(s) for assistance.')}</p>


nit: why is "Chart Owners" in title case?

This is as defined in @Steejay's specs, which would you prefer here?

I can go either way if we are considering Chart Owner a proper noun for Superset and will be consistent everywhere.

@etr2460 I interpreted Chart Owners as the title of a role so I defined it w title case.

superset-frontend/src/components/ErrorMessage/TimeoutErrorMessage.tsx

ktmud · 2020-07-09T19:04:19Z

superset-frontend/src/components/ErrorMessage/TimeoutErrorMessage.tsx

+            </>
+          )}
+        </div>
+      )}


Instead of two big code chunks, how about creating two subcomponents SimpleTimeoutErrorMessage and DetailedTimeoutErrorMessage?

Let me look into simplifying this a bit more. I might pull some things out into smaller, more reusable components. It's a bit tricky, because depending on where the error is being rendered, it's supposed to look different (sometimes using a modal, sometimes expanding inline)

ktmud · 2020-07-09T19:16:38Z

superset-frontend/src/components/ErrorMessage/constants.ts

+export const ERROR_DATASOURCE_TOO_LARGE = {
+  message: t('Error 1000 - The datasource is too large to query.'),
+  link: `${ROOT_ERROR_REFERENCE_URL}error-1000`,
+};


Should this be more structural? I'm imagining a errors.json somewhere that could potentially be shared by both the frontend and doc generator:

{ "DATASOURCE_TOO_LARGE": { "code": 1001, "title": "The database is too large to query", "description": "It's likely your datasource has grown too large to run the current query, and is timing out. You can resolve this by reducing the size of your datasource or by modifying your query to only process a subset of your data." }, "DATASOURCE_OVERLOAD": { "code": 1002, .... } }

+1 this. Error messages should be defined in API level. So that API user (that use json response without js rendered chart) should get the same message when some query went wrong.
Front-end should handle how to render error by its code, not define what is this error.

So there are a couple things being conflated here I think. First off, an idea of API error codes. These are already consistently defined and described on both the backend and the frontend here:
https://github.com/apache/incubator-superset/blob/master/superset/errors.py#L24

and here:
https://github.com/apache/incubator-superset/blob/master/superset-frontend/src/components/ErrorMessage/types.ts#L21

Part of the problem is that each of these errors from a software perspective could have multiple possible root causes and mitigations from the user's perspective. In this case, a timeout error can have multiple different root causes and mitigations, and these root causes/mitigations could apply to many different software errors too. We could add another field onto the error payload coming from the backend to include root causes and mitigations, but I also want to ensure we have freedom on the frontend to render these errors in whatever rich format we see fit. Thus why the software's idea of errors exists on the backend, but the user interpretations live in the frontend.

Finally, as to making the errors more structural: I agree this can be defined better, I'll restructure the TS object. However, I'm a bit anxious about investing a bunch of time into autogenning docs from it since we just voted to change doc systems a couple days ago. It might be better to wait for the doc migration, then to build an autogenned doc page with error details

That makes sense. Didn't think it was necessary to build autogen right now either.

superset-frontend/src/components/Icon.tsx

superset/utils/core.py

graceguo-supercat · 2020-07-09T21:19:23Z

Do you have specifications on what are errors, what are warnings etc?
Do you have UI samples for warning message?

ktmud · 2020-07-09T23:52:35Z

superset/exceptions.py

+        super(SupersetTimeoutException, self).__init__(message)
+        self.error = SupersetError(
+            error_type=error_type, message=message, level=level, extra=extra
+        )


I'm wondering whether we can just extend the attributes of SupersetError directly on SupersetException and use a custom JSON encoder to serialize it. This way you remove a layer of abstraction and the need to map the arguments for any new errors your might add in the future,

Hmm, possibly. I'm not super familiar with what the plan is for exception handling on the Python side, but having the SupersetExceptions automatically contain the error properties might make sense.

The reason I shied away from it originally was so that it was very clear what attributes the clients need to render the errors and what parts are server specific. But maybe that's not a big issue. Maybe @john-bodley has thoughts?

@ktmud could you expand (possibly with an example) on your thoughts? Are you looking for more of a generic container for handling error properties?

I was thinking maybe subclasses of SupersetException itself can handle encoding of error code and messages. Here's an example:

class SupersetException(Exception, SupersetAlert): def __init__(...) -> None: SupersetAlert.__init__( self, code=code, message=message, description=description, extra=extra ) Exception.__init__(self, self.message) def json_error_response(error: SupersetException) -> FlaskResponse: return jsonify({ "error": error.to_json() })

Full gist here

codecov-commenter · 2020-07-10T17:07:05Z

Codecov Report

Merging #10274 into master will increase coverage by 4.88%.
The diff coverage is 32.40%.

@@            Coverage Diff             @@
##           master   #10274      +/-   ##
==========================================
+ Coverage   65.57%   70.46%   +4.88%     
==========================================
  Files         600      602       +2     
  Lines       32101    32177      +76     
  Branches     3244     3263      +19     
==========================================
+ Hits        21051    22672    +1621     
+ Misses      10869     9401    -1468     
+ Partials      181      104      -77

Flag	Coverage Δ
#cypress	`55.22% <14.28%> (?)`
#javascript	`59.29% <20.22%> (-0.29%)`	⬇️
#python	`69.77% <57.89%> (+0.01%)`	⬆️

Impacted Files	Coverage Δ
superset-frontend/src/chart/Chart.jsx	`59.25% <0.00%> (+48.84%)`	⬆️
superset-frontend/src/chart/chartReducer.js	`68.85% <ø> (+49.80%)`	⬆️
...frontend/src/components/ErrorMessage/ErrorCode.tsx	`0.00% <0.00%> (ø)`
...onents/ErrorMessage/ErrorMessageWithStackTrace.tsx	`26.31% <0.00%> (ø)`
...rset-frontend/src/components/ErrorMessage/types.ts	`100.00% <ø> (ø)`
.../src/dashboard/components/gridComponents/Chart.jsx	`86.02% <ø> (+20.43%)`	⬆️
...frontend/src/dashboard/reducers/getInitialState.js	`68.53% <ø> (+68.53%)`	⬆️
...uperset-frontend/src/dashboard/util/propShapes.jsx	`92.85% <ø> (ø)`
...ntend/src/explore/components/ExploreChartPanel.jsx	`72.22% <ø> (+61.11%)`	⬆️
superset/models/slice.py	`84.65% <ø> (ø)`
... and 169 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b316f72...c44b8aa. Read the comment docs.

graceguo-supercat · 2020-07-10T17:11:48Z

superset-frontend/src/components/ErrorMessage/TimeoutErrorMessage.tsx

+  const message = (
+    <>
+      <p>
+        {t('This may be triggered by:')}


I am thinking the logic for this may be triggered by, should come from a standard, general json format generated from backend.
front-end should not decide what error is caused by what reason. Front-end timeout limit is defined in config, it's not something decided by front-end.

I've introduced an error_codes mapping within extra that gets passed down from the backend. You're still free to alter it on the frontend as needed, but it consolidates potential root causes to the backend with the more developer helpful error_type

ktmud · 2020-07-14T00:46:27Z

superset/errors.py

+            "message": _("Error 1001 - The database is under an unusual load."),
+        },
+    ]
+}


Hmm, just realized a bigger question: are there cases that the backend will be able to tell which is the actual reason? If not, why can't they be the same error code?

## Error 1000 - Timeout communicating with datasource. Your datasource may be too large to query or was under an unusual load.

It's also possible to remove such mapping altogether:

class SupersetError(IntEnum): BACKEND_TIMEOUT = 1001 class SupersetException(Exception): code = None def __init__(self, message=None, description=None): if not self.code: raise RuntimeError('Must not raise SupersetException directly') error_name = SupersetError(self.code).name self.message = message or _(f'error.{error_name}.message') self.description = description or _(f'error.{error_name}.message') class SupersetTimeoutException(SupersetException): code = SupersetErrorEnum.BACKEND_TIMEOUT

Then add the detailed message and explanation in i18n.

I think this again points to the differentiation between the actual error and potential root causes. In this case, Error 1001 might be the root cause for many different errors (a timeout error, a generic DB error, a 503/504 from the db engine, etc.). That was the reason for splitting them into two different structures. However, I agree it's getting into a zone of too many different abstraction layers with both the error_type and the error_codes...

@Steejay, this is something where your thoughts would be helpful: e.g. what relationship do you see between error types and the in product user facing error codes?

The goal of the error code is to suggest a path to a solution by highlighting the potential sources. Ideally we'd like to design a message that can pinpoint the source but this can't always be achieved so instead we try to provide as much context as possible to let the technical user troubleshoot. But youre right, the same error source can lead to different errors.

I've expanded my thoughts a little and here's a more detailed example: https://gist.github.com/ktmud/59ccbdf6c96570e1ffe41ee8e5ecd479

Let me know if anything was unclear.

Even if an error may represent different root causes, we can return a generic error code and potential causes in extras if needed. If the backend is able to identify the root cause, then cool, we return a more specific error code. But there doesn't have to be another layer of abstraction with all the mappings.

Yes, I agree error codes and possible root causes shouldn't be mixed together as suggesting root causes and providing remedies are more of a documentation issue rather than something the program cares about.

I've removed the description field in my example code, but other parts of the code still feel like an improvement to me though. There really is no need to have two separate entities for exception catching and messaging.

As for the mapping between symptoms and root causes, maybe we can extend the errors.json file I suggested earlier and add a causes field?

{ "issues": { "DATASOURCE_TOO_LARGE": { "code": 1001, "title": "The database is too large to query", "description": "It's likely your datasource has grown too large to run the current query, and is timing out. You can resolve this by reducing the size of your datasource or by modifying your query to only process a subset of your data." }, "DATASOURCE_OVERLOAD": { "code": 1002, // .... }, }, "causes": { "BACKEND_TIMEOUT_ERROR": ["DATASOURCE_TOO_LARGE", "DATASOURCE_OVERLOAD"], } }

I think we might need separate entities for exception catching and error messaging. One example is for async queries in sql lab. We would catch an exception from the query failing, but then only want to write a subset of attributes to the query table for returning to the client later. Then when the client polls for query status, we're returning an error payload within that request, but no exception is occuring. They seem like 2 different entities to me, and I'm not sure how much is improved by having a complex inheritance tree consisting of both native Python Exceptions and UI specific payload classes

They are indeed two entities, one is runtime exception, another is the result payload used by the API. But it doesn't mean the payload cannot be generated by exception itself.

The improvement is that you don't have to manually generate the payload every time an exception occurs. You just store relevant information as top-level attributes on the Exception and some general error handlers will return a consistent payload whenever needed. The payload doesn't have to be UI specific. We can certainly return different payloads for storage and API by adding arguments to the to_json method, if that's what we want.

I just find it weird that you had to initialize two classes in order to raise an API-conforming exception. We already use class inheritance for Exceptions and used the pattern extensively for superset.commands.exceptions, so I'm not sure an additional mixin to serialize the exception is that much of complexity.

@villebro @dpgaspar curious to hear your thoughts on this, too.

hmm, i haven't seen the superset.commands.exceptions module yet, i'll look at that and see how other parts of the code handles errors. This has been a wild west of sorts on the backend side, and I've mainly been focused on making sure whatever happens on the backend, the frontend gets something consistent. If there's a more standard approach now (i think it was discussed in SIP-41) then i'm happy to follow it

ktmud

LGTM with a caveat of some potential future refactoring:

Create an extended SupersetErrorAlert component that renders conforming SupersetError in consistent format and make TimeoutErrorMessage reusable.
(Optional) Consolidate SupersetError and SupersetException in the backend.

john-bodley · 2020-07-14T01:13:37Z

superset/exceptions.py

+        super(SupersetTimeoutException, self).__init__(message)
+        self.error = SupersetError(
+            error_type=error_type, message=message, level=level, extra=extra
+        )


@ktmud could you expand (possibly with an example) on your thoughts? Are you looking for more of a generic container for handling error properties?

superset/utils/core.py

etr2460 · 2020-07-20T16:24:11Z

@graceguo-supercat and @ktmud, sorry for the delayed response here. I think this is in a good place for final review now. Some significant changes from before:

I've moved all issue strings from the frontend to the backend, so any client can generate appropriate error messages from the payload returned
I'm using the @handle_api_error wrapper on the backend to construct the appropriate error response when sql queries timeout. This seems to be the standard approach thus far and aligns with how SupersetSecurityException is handled in the new error format. Completely changing the handling methods on the backend seems out of scope here, but I'm going to move the conversation we had into SIP-41 as it seems quite pertinent to it
Error Codes have been renamed to Issue Codes as that wording seems more clear

Thanks for taking the time to be diligent here and get this PR into a better state than it started. Hopefully it should be good to go!

john-bodley · 2020-07-20T16:27:13Z

Are the constructs of how the errors are defined still a blocking issue? If so SIP-41 describes the somewhat current state of how errors are handled by the backend and there is clearly a number of inconsistencies which need to be addressed. I wonder whether it makes sense to flush out the architecture of error constructs in SIP-41 as the problem is far-reaching and beyond just the scope of the issues/concerns raised here.

ktmud

Yes, never intended to let the discussions regarding error handling in the backend block this PR. Sorry for not stamping earlier!

graceguo-supercat · 2020-07-20T21:26:45Z

superset-frontend/src/SqlLab/main.less

@@ -409,6 +409,10 @@ div.tablePopover {
  padding-right: 8px;
 }

+.ResultSetErrorMessage {


do you mind use "-" not case in css class?

my bad, updated!

graceguo-supercat

one lint comment for css class name. otherwise LGTM!

pull-request-size bot added the size/XL label Jul 9, 2020

ktmud reviewed Jul 9, 2020

View reviewed changes

superset-frontend/src/components/ErrorMessage/TimeoutErrorMessage.tsx Outdated Show resolved Hide resolved

ktmud reviewed Jul 9, 2020

View reviewed changes

etr2460 force-pushed the erik-ritter--timeout-error-ux branch from 1c4437d to cc26a70 Compare July 9, 2020 19:35

etr2460 force-pushed the erik-ritter--timeout-error-ux branch 4 times, most recently from 0c3ebcf to 8aa7e32 Compare July 9, 2020 23:39

ktmud reviewed Jul 9, 2020

View reviewed changes

etr2460 force-pushed the erik-ritter--timeout-error-ux branch from 8aa7e32 to 00f8ecd Compare July 10, 2020 16:59

graceguo-supercat reviewed Jul 10, 2020

View reviewed changes

etr2460 force-pushed the erik-ritter--timeout-error-ux branch 6 times, most recently from 24c85f7 to c44b8aa Compare July 14, 2020 00:23

ktmud reviewed Jul 14, 2020

View reviewed changes

etr2460 mentioned this pull request Jul 14, 2020

feat: update timeout code to return new error format apache-superset/superset-ui#667

Closed

john-bodley reviewed Jul 14, 2020

View reviewed changes

etr2460 force-pushed the erik-ritter--timeout-error-ux branch 4 times, most recently from db3d101 to d041659 Compare July 20, 2020 16:23

etr2460 mentioned this pull request Jul 20, 2020

[SIP-41] Proposal for Alignment of Backend Error Handling #9298

Open

ktmud approved these changes Jul 20, 2020

View reviewed changes

graceguo-supercat reviewed Jul 20, 2020

View reviewed changes

graceguo-supercat approved these changes Jul 20, 2020

View reviewed changes

feat: update timeout error UX

e41361f

etr2460 force-pushed the erik-ritter--timeout-error-ux branch from d041659 to e41361f Compare July 20, 2020 21:36

etr2460 merged commit 5fa4680 into apache:master Jul 20, 2020

auxten pushed a commit to auxten/incubator-superset that referenced this pull request Nov 20, 2020

feat: update timeout error UX (apache#10274)

8ef08f2

ktmud mentioned this pull request Jul 8, 2021

chore: simplify errors and issue codes #15437

Merged

8 tasks

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.38.0 labels Mar 12, 2024

feat: update timeout error UX #10274

feat: update timeout error UX #10274

Conversation

etr2460 commented Jul 9, 2020

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TEST PLAN

ADDITIONAL INFORMATION

ktmud left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ktmud Jul 9, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ktmud Jul 9, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

graceguo-supercat commented Jul 9, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Jul 10, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ktmud Jul 14, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ktmud left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

etr2460 commented Jul 20, 2020

john-bodley commented Jul 20, 2020

ktmud left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

graceguo-supercat left a comment

Choose a reason for hiding this comment

ktmud Jul 9, 2020 •

edited

Loading

ktmud Jul 9, 2020 •

edited

Loading

codecov-commenter commented Jul 10, 2020 •

edited

Loading

ktmud Jul 14, 2020 •

edited

Loading