-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Logs UI] Enhance the Logs UI Analysis setup flow #44407
Comments
Pinging @elastic/infra-logs-ui |
Okay, I've got a first pass at this implemented on a branch. Everything can be refined, of course 😄 @weltenwort @jasonrhodes @hbharding @Zacqary - wondering if you have any direct feedback here before I open a PR, especially around the caveats. There are two steps, utilising the EUI Steps component. VisualsStep 1: When changing the Interaction with this step is entirely optional for now. In the future this would include extra parameters we need. Step 2: By default - Contains the "Create resources" button. Interaction with this button then transitions into 3 states, they are: Setting up or retrying after a failure: Failure: Clicking "Try again" transitions into the above state in a retry. Success: Shows the "View results" button which transitions through to the results page. GIFS: Success flow: Failure flow: Behind the scenesWhen the "Create resources" (I changed this language from "jobs" as I don't think we need the user to know or understand ML terminology) button is clicked a call is made to setup the analysis ML module. If we receive any The user may now request to "Try again", this makes a call to our cleanup hook. This currently (totally subject to enhancements at the moment) deletes the jobs and the datafeeds. Once cleaned up, another request is made to setup the ML module. If everything is successful, we simply allow transitioning over to the results. CaveatsThere is a caveat in that, as mentioned above, the jobs and datafeeds may have been created successfully, but something went wrong starting the datafeed. During the explicit setup flow this is immediately flagged as a failure (if there was an However, after that failure if the user were to refresh Kibana they would actually land on the results page. This is because the jobs and datafeeds were actually created successfully, the datafeed just didn't start. Our check for "should we show setup or results" allows just a Now, we could of course extend the check to require everything to be With the introduction of the new health status feature in #44414 I think this is a non-issue as the results page would flag up the fact that the datafeed isn't started. The reason I flag this is purely because it might be disconcerting that you can go from a failure state on the setup page, to a results page after the fact. Maybe we skip flagging start failures as a failure during setup? I'm not sure. Also, with retrying moving to an explicit action of "Try again" if a user has a failed setup, then comes back at some later date and tries to setup again, it will fail at first ("you have a job with this ID" and so on). I tried to mitigate this before by having the cleanup happen directly after a failure, but I'm not sure that's right either. Number of steps@weltenwort I wanted to just briefly touch on why I circled back to 2 steps after we briefly discussed having 3. With 3 steps we actually end up with three problems:
And whilst I have absolutely no data to back it up, I think having 2 steps over 3 with nothing disabled will just feel like the process was "easier". |
I think that looks awesome, well done!
While I agree that we don't want to burden the user with too many implementation details, "Create resources" sounds almost too abstract. By mentioning ML jobs we have a few advantages, I think:
As you said, I don't think that will be an issue for long.
The two steps look just fine to me. 👍 While this is not a PR review, one thing that popped out to me was how uninformative the pending state looks with just the spinner. Maybe we can add a "Creating ML jobs..." text besides that? Overall I think this will help us scale better with more complex parameter choices and failure conditions. 🚀 |
ℹ️ Talking about failure conditions, I documented an edge case in #44652. |
Fair points, I'll change this back.
Thanks for the suggestion. I'll make that state a bit more informative. 👍 With regards to #44652 I'll add mitigation 1 (stopping the datafeed first) to this work, since it touches that area anyway (unless there's any objections). |
Looks great to me! I think the success state could stand to have a bit of text too, maybe just "Your jobs have all been set up!" or "All jobs have been successfully set up." or something similar to that? |
Setup flow looks great! I'm actually just noticing a discrepancy in Step 1's wording: "by default, we'll analyze..." vs "by default, Machine Learning analyzes." Let's also use this next iteration to get those two messages consistent. |
+1 to reverting back to using familiar ML terminology i.e. "jobs". Some users will be already familiar with ML and also it allows more familiarity with the ML docs if required. From the UI mockups above, it does not seem that the ML errors are being passed back to the user. If the jobs are being started using the module setup( I'd also like to ask if the concept of "Try again" could perhaps be revisited. In reading through some of the connected tickets I see a few reasons why a retry was proposed:
I know I do not have full history here, so sorry to go over old ground, but this retry path has not been taken by other integrations and it seems to presume failures. If this is the case, then it's important for ML team to understand why. |
That's a good point, we should do that. 👍
If the job was running and healthy the setup screen wouldn't be shown. The idea of the retry is to (a) clean up inconsistent leftovers ourselves and (b) give the user the opportunity to fix the cause of the failure somehow (in a terminal or another tab) and try again. Surfacing the ML error as you suggested above sounds particularly useful to that end. In any case, we don't want to let the user sit there without the ability to take any action. |
Summary
The analysis setup page should be enhanced to use a wizard / steps style flow. This will utilise a new API to perform cleanup / reset if needed in a failure scenario.
Rationale
In #43774 some cleanup functionality was introduced to try to handle failure scenarios during setup. However, it isn't 100% robust and isn't actually visible to the user.
Acceptance criteria
The Analysis setup page should be enhanced to introduce a wizard / steps style flow. The UI will revolve mainly around the EUI Steps component.
Step 1 will contain explanatory text and configurable options (as it stands this is
start
andend
times), with a setup button. When the setup button is clicked step 2 stops beingdisabled
.Step 2, once enabled, should display the setup progress / loading state.
complete
state, and will display a "View results" button. When clicked this goes to the results screen.danger
state, displays an error message, and will display a "Try again" buttoncomplete
ordanger
state depending on the result.A new API endpoint will be provided to perform the cleanup / reset. The
useLogAnalysisCleanup
hook should be altered to make use of this new endpoint. In future work this reset API will also be extended to a dedicated reset page (which users can invoke themselves at any time post setup).The text was updated successfully, but these errors were encountered: