Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved Container "Health" Monitoring #9860

Closed
ssimic2 opened this issue Apr 12, 2022 · 3 comments
Closed

Improved Container "Health" Monitoring #9860

ssimic2 opened this issue Apr 12, 2022 · 3 comments
Labels

Comments

@ssimic2
Copy link

ssimic2 commented Apr 12, 2022

Our state machine may be overly complex, and user may need to monitor a bunch of different individual states to derive if container is healthy. Now, couple of times we have run into situation where FF was able to create/attach new doc, get ID back, but unable to interact with op stream due to op sequencing service being down. In terms of user experience, it was not obvious that error was present as doc creation implied services were up and running.

Can we create more holistic "healthy system" indicators/API?
Do we have needed services monitoring capabilities in place?
Should we allow operations on historian/storage service if alfred is (for example) down?

@ssimic2 ssimic2 added api public api change Changes to a public API labels Apr 12, 2022
@ssimic2 ssimic2 added this to the Future milestone Apr 12, 2022
@anthony-murphy
Copy link
Contributor

anthony-murphy commented Apr 12, 2022

I think the key here is minimizing complexity and the necessity for plumbing by the client. In this particular case the client attached a new container, and then say it was stuck in the dirty state. It turned out the container had never connected, so all ops after attach we waiting to be sent.

When i think about this in a scenario focused way i can see two different but related scenarios

  1. Attaching a new container
  2. Tracking saved/dirty state

For the first, should a container be considered to be successfully attached if it can't send ops? There are performance reason not to wait, but ideally the defaults make it easy, and we have ways to get performance with more work.

For the second, a container that never connects will never move to saved. potentially we need a better model around save/dirty so we express that saving is blocked or having trouble.

@markfields
Copy link
Member

Note about current state of things - "connected" event fires once the container is connected to the delta stream and "caught up" (well, pending #9377). SO maybe we want to key other things off this as well.

@microsoft-github-policy-service
Copy link
Contributor

This PR has been automatically marked as stale because it has had no activity for 60 days. It will be closed if no further activity occurs within 8 days of this comment. Thank you for your contributions to Fluid Framework!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants