Improved Container "Health" Monitoring #9860

ssimic2 · 2022-04-12T20:39:48Z

Our state machine may be overly complex, and user may need to monitor a bunch of different individual states to derive if container is healthy. Now, couple of times we have run into situation where FF was able to create/attach new doc, get ID back, but unable to interact with op stream due to op sequencing service being down. In terms of user experience, it was not obvious that error was present as doc creation implied services were up and running.

Can we create more holistic "healthy system" indicators/API?
Do we have needed services monitoring capabilities in place?
Should we allow operations on historian/storage service if alfred is (for example) down?

anthony-murphy · 2022-04-12T21:22:34Z

I think the key here is minimizing complexity and the necessity for plumbing by the client. In this particular case the client attached a new container, and then say it was stuck in the dirty state. It turned out the container had never connected, so all ops after attach we waiting to be sent.

When i think about this in a scenario focused way i can see two different but related scenarios

Attaching a new container
Tracking saved/dirty state

For the first, should a container be considered to be successfully attached if it can't send ops? There are performance reason not to wait, but ideally the defaults make it easy, and we have ways to get performance with more work.

For the second, a container that never connects will never move to saved. potentially we need a better model around save/dirty so we express that saving is blocked or having trouble.

markfields · 2022-04-12T21:27:17Z

Note about current state of things - "connected" event fires once the container is connected to the delta stream and "caught up" (well, pending #9377). SO maybe we want to key other things off this as well.

microsoft-github-policy-service · 2023-02-01T17:34:20Z

This PR has been automatically marked as stale because it has had no activity for 60 days. It will be closed if no further activity occurs within 8 days of this comment. Thank you for your contributions to Fluid Framework!

ssimic2 added api public api change Changes to a public API labels Apr 12, 2022

ssimic2 added this to the Future milestone Apr 12, 2022

microsoft-github-policy-service bot added the status: stale label Feb 1, 2023

microsoft-github-policy-service bot removed this from the Future milestone Apr 23, 2023

microsoft-github-policy-service bot closed this as completed Apr 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved Container "Health" Monitoring #9860

Improved Container "Health" Monitoring #9860

ssimic2 commented Apr 12, 2022

anthony-murphy commented Apr 12, 2022 •

edited

Loading

markfields commented Apr 12, 2022

microsoft-github-policy-service bot commented Feb 1, 2023

Improved Container "Health" Monitoring #9860

Improved Container "Health" Monitoring #9860

Comments

ssimic2 commented Apr 12, 2022

anthony-murphy commented Apr 12, 2022 • edited Loading

markfields commented Apr 12, 2022

microsoft-github-policy-service bot commented Feb 1, 2023

anthony-murphy commented Apr 12, 2022 •

edited

Loading