[Logs UI] Enhance the Logs UI Analysis setup flow #44407

Kerry350 · 2019-08-29T17:56:15Z

Summary

The analysis setup page should be enhanced to use a wizard / steps style flow. This will utilise a new API to perform cleanup / reset if needed in a failure scenario.

Rationale

In #43774 some cleanup functionality was introduced to try to handle failure scenarios during setup. However, it isn't 100% robust and isn't actually visible to the user.

Acceptance criteria

The Analysis setup page should be enhanced to introduce a wizard / steps style flow. The UI will revolve mainly around the EUI Steps component.
Step 1 will contain explanatory text and configurable options (as it stands this is start and end times), with a setup button. When the setup button is clicked step 2 stops being disabled.
Step 2, once enabled, should display the setup progress / loading state.
- If this succeeds the step transitions to a complete state, and will display a "View results" button. When clicked this goes to the results screen.
- If this fails the step transitions to a danger state, displays an error message, and will display a "Try again" button
  - If the "Try again" button is clicked the step will now display the progress of the retry operation. This will then either loop back to display the aforementioned complete or danger state depending on the result.
A new API endpoint will be provided to perform the cleanup / reset. The useLogAnalysisCleanup hook should be altered to make use of this new endpoint. In future work this reset API will also be extended to a dedicated reset page (which users can invoke themselves at any time post setup).

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-08-29T17:56:17Z

Pinging @elastic/infra-logs-ui

Kerry350 · 2019-09-03T18:04:35Z

Okay, I've got a first pass at this implemented on a branch. Everything can be refined, of course 😄 @weltenwort @jasonrhodes @hbharding @Zacqary - wondering if you have any direct feedback here before I open a PR, especially around the caveats.

There are two steps, utilising the EUI Steps component.

Visuals

Step 1:

By default -

When changing the start and / or end times -

Interaction with this step is entirely optional for now. In the future this would include extra parameters we need.

Step 2:

By default -

Contains the "Create resources" button. Interaction with this button then transitions into 3 states, they are:

Setting up or retrying after a failure:

Failure:

Clicking "Try again" transitions into the above state in a retry.

Success:

Shows the "View results" button which transitions through to the results page.

GIFS:

Success flow:

Failure flow:

Behind the scenes

When the "Create resources" (I changed this language from "jobs" as I don't think we need the user to know or understand ML terminology) button is clicked a call is made to setup the analysis ML module.

If we receive any errors in the response payload - for the jobs or datafeeds - we transition into the failure state. The important thing to note here is that the jobs and datafeeds may have been created successfully, but there might just be an error starting the datafeed.

The user may now request to "Try again", this makes a call to our cleanup hook. This currently (totally subject to enhancements at the moment) deletes the jobs and the datafeeds. Once cleaned up, another request is made to setup the ML module.

If everything is successful, we simply allow transitioning over to the results.

Caveats

There is a caveat in that, as mentioned above, the jobs and datafeeds may have been created successfully, but something went wrong starting the datafeed. During the explicit setup flow this is immediately flagged as a failure (if there was an error in the payload) and we show an error message and allow retrying.

However, after that failure if the user were to refresh Kibana they would actually land on the results page. This is because the jobs and datafeeds were actually created successfully, the datafeed just didn't start. Our check for "should we show setup or results" allows just a created state to transition to results, doesn't have to be started.

Now, we could of course extend the check to require everything to be created and started but I don't think that's the way to go. There are legitimate reasons for stopping the datafeed (saving resources etc).

With the introduction of the new health status feature in #44414 I think this is a non-issue as the results page would flag up the fact that the datafeed isn't started.

The reason I flag this is purely because it might be disconcerting that you can go from a failure state on the setup page, to a results page after the fact. Maybe we skip flagging start failures as a failure during setup? I'm not sure.

Also, with retrying moving to an explicit action of "Try again" if a user has a failed setup, then comes back at some later date and tries to setup again, it will fail at first ("you have a job with this ID" and so on). I tried to mitigate this before by having the cleanup happen directly after a failure, but I'm not sure that's right either.

Number of steps

@weltenwort I wanted to just briefly touch on why I circled back to 2 steps after we briefly discussed having 3. With 3 steps we actually end up with three problems:

We end up with a "dead state" in step 2 after everything is successful. Step 3 would have the success state with view results, but this leaves nothing to comfortably sit in step 2 at that time. Or duplicate content.
We end up disabling step 3 during the setup process in step 2.
The EUI docs say step states are used mostly as a final step when you need to make some sort of final check., with 3 steps we really end up with two steps with a state.

And whilst I have absolutely no data to back it up, I think having 2 steps over 3 with nothing disabled will just feel like the process was "easier".

weltenwort · 2019-09-03T21:25:24Z

I think that looks awesome, well done!

"Create resources" (I changed this language from "jobs" as I don't think we need the user to know or understand ML terminology)

While I agree that we don't want to burden the user with too many implementation details, "Create resources" sounds almost too abstract. By mentioning ML jobs we have a few advantages, I think:

the user notices that this is something that is based on the ML features she potentially purchased
the user has an indication of where to check of something goes wrong and "retry" doesn't fix the problem

However, after that failure if the user were to refresh Kibana they would actually land on the results page.

As you said, I don't think that will be an issue for long.

And whilst I have absolutely no data to back it up, I think having 2 steps over 3 with nothing disabled will just feel like the process was "easier".

The two steps look just fine to me. 👍

While this is not a PR review, one thing that popped out to me was how uninformative the pending state looks with just the spinner. Maybe we can add a "Creating ML jobs..." text besides that?

Overall I think this will help us scale better with more complex parameter choices and failure conditions. 🚀

weltenwort · 2019-09-04T09:00:52Z

ℹ️ Talking about failure conditions, I documented an edge case in #44652.

Kerry350 · 2019-09-04T11:05:06Z

By mentioning ML jobs we have a few advantages, I think:

Fair points, I'll change this back.

one thing that popped out to me was how uninformative the pending state looks with just the spinner. Maybe we can add a "Creating ML jobs..." text besides that?

Thanks for the suggestion. I'll make that state a bit more informative. 👍

With regards to #44652 I'll add mitigation 1 (stopping the datafeed first) to this work, since it touches that area anyway (unless there's any objections).

jasonrhodes · 2019-09-06T12:07:48Z

Looks great to me! I think the success state could stand to have a bit of text too, maybe just "Your jobs have all been set up!" or "All jobs have been successfully set up." or something similar to that?

Zacqary · 2019-09-06T16:22:34Z

Setup flow looks great! I'm actually just noticing a discrepancy in Step 1's wording: "by default, we'll analyze..." vs "by default, Machine Learning analyzes." Let's also use this next iteration to get those two messages consistent.

sophiec20 · 2019-09-10T14:33:30Z

+1 to reverting back to using familiar ML terminology i.e. "jobs". Some users will be already familiar with ML and also it allows more familiarity with the ML docs if required.

From the UI mockups above, it does not seem that the ML errors are being passed back to the user. If the jobs are being started using the module setup(startDatafeed:true) kibana API, then I believe that an error object is returned containing the elasticsearch error reasons.
Are the full errors being displayed? perhaps they will be visible as toast... This will greatly help in the user being able to self-diagnose.

I'd also like to ask if the concept of "Try again" could perhaps be revisited. In reading through some of the connected tickets I see a few reasons why a retry was proposed:

If the index did not exist - rather than wait for the ML setup to return failure, perhaps this could be a pre-req check. i.e. if no documents exist in the index then the button should be disabled so the user never tries to create the jobs.
If the ML setup fails because the job already exists - is this really a "permanent broken state"? If the job already exists, it may be running and healthy. Alternatively, the job may have been created but the job failed to open or the datafeed failed to start. In which case, jobs could just be started (subject to the underlying error) rather than deleted then started.

I know I do not have full history here, so sorry to go over old ground, but this retry path has not been taken by other integrations and it seems to presume failures. If this is the case, then it's important for ML team to understand why.

weltenwort · 2019-09-10T16:40:02Z

If the index did not exist - rather than wait for the ML setup to return failure, perhaps this could be a pre-req check. i.e. if no documents exist in the index then the button should be disabled so the user never tries to create the jobs.

That's a good point, we should do that. 👍

If the ML setup fails because the job already exists - is this really a "permanent broken state"? If the job already exists, it may be running and healthy. Alternatively, the job may have been created but the job failed to open or the datafeed failed to start. In which case, jobs could just be started (subject to the underlying error) rather than deleted then started.

If the job was running and healthy the setup screen wouldn't be shown. The idea of the retry is to (a) clean up inconsistent leftovers ourselves and (b) give the user the opportunity to fix the cause of the failure somehow (in a terminal or another tab) and try again. Surfacing the ML error as you suggested above sounds particularly useful to that end. In any case, we don't want to let the user sit there without the ability to take any action.

Kerry350 added v8.0.0 Feature:Logs UI Logs UI feature Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.5.0 :Logs UI labels Aug 29, 2019

Kerry350 self-assigned this Aug 29, 2019

sgrodzicki added the [zube]: In Progress label Sep 2, 2019

Kerry350 mentioned this issue Sep 6, 2019

[Logs UI] Enhance analysis setup flow #44990

Merged

5 tasks

Kerry350 closed this as completed in #44990 Sep 16, 2019

zube bot added [zube]: Done and removed [zube]: In Progress labels Sep 16, 2019

sgrodzicki removed the [zube]: Done label Oct 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Logs UI] Enhance the Logs UI Analysis setup flow #44407

[Logs UI] Enhance the Logs UI Analysis setup flow #44407

Kerry350 commented Aug 29, 2019 •

edited by sgrodzicki

Loading

elasticmachine commented Aug 29, 2019

Kerry350 commented Sep 3, 2019

weltenwort commented Sep 3, 2019 •

edited

Loading

weltenwort commented Sep 4, 2019

Kerry350 commented Sep 4, 2019

jasonrhodes commented Sep 6, 2019

Zacqary commented Sep 6, 2019

sophiec20 commented Sep 10, 2019

weltenwort commented Sep 10, 2019 •

edited

Loading

[Logs UI] Enhance the Logs UI Analysis setup flow #44407

[Logs UI] Enhance the Logs UI Analysis setup flow #44407

Comments

Kerry350 commented Aug 29, 2019 • edited by sgrodzicki Loading

Summary

Rationale

Acceptance criteria

elasticmachine commented Aug 29, 2019

Kerry350 commented Sep 3, 2019

Visuals

Behind the scenes

Caveats

Number of steps

weltenwort commented Sep 3, 2019 • edited Loading

weltenwort commented Sep 4, 2019

Kerry350 commented Sep 4, 2019

jasonrhodes commented Sep 6, 2019

Zacqary commented Sep 6, 2019

sophiec20 commented Sep 10, 2019

weltenwort commented Sep 10, 2019 • edited Loading

Kerry350 commented Aug 29, 2019 •

edited by sgrodzicki

Loading

weltenwort commented Sep 3, 2019 •

edited

Loading

weltenwort commented Sep 10, 2019 •

edited

Loading