Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Container Attach Retriable for Non-Runtime Failures #19246

Merged
merged 49 commits into from
Jan 30, 2024

Conversation

anthony-murphy
Copy link
Contributor

@anthony-murphy anthony-murphy commented Jan 16, 2024

Background

This change is in preparation for allowing consumers of fluid to call serialize on a container while a container is in an attaching state, which will include a failed attach where the failure didn't cause the container to close. This will aid in the case of offline, network, or server outage by allowing the application to capture the un-attached state of the container, preserve it locally, and later create a new file from it.

AB#5502

Overview

In order to make attachment retriable I've made explicit types that represent each stage of the attachment state machine and called it AttachmentData. The container tracks the current attachment data, gets the AttachState, and gets the base snapshot information in the case of offline for it.

The biggest changes are to the attach method itself.

Now that is retriable, we had to remove the _attachStarted member which gated the call to a single instance. I've replaced that with the runSingle helper method which ensure only one attach promise is running at a time and ensure that all the parameters for all parallel watchers of that promise used the same arguments.

I've also encapsulated the management of the state machine in a new method runRetriableAttachProcess . The primary reason for this is testability, as the new method is easily unit testable, and I've added a number of unit test to validate its behavior. testability was achieved by limiting the number of dependencies the method needed, so it uses callbacks to perform the actual operations necessary to attach the container.

The split also helped to segment the error handling for the container. Specifically, only the callbacks that engage with the runtime will close the container on failure, as if we can't create a summary, or propagate an attachment state, we can't safely retry.

The container creates the callbacks for runRetriableAttachProcess as follows:

  • setAttachmentData manages the attachment data member, which is exposed via AttachState, and performs eventing and runtime callbacks.
  • createAttachmentSummary creates the summary from the runtime, add the protocol tree, and the closes the container if any of that fails
  • createOrGetStorageService resolves the request, calls create container, and sets up the containers document services. it only returns the storage service, as that is all that might be needed to upload blobs.

Lastly the container's attach method configures the delta connection, and logs. As was previously the case, we still error if attach is called for an invalid state, and we still limit serialize calls to only being supported in a detached state.

AB#5911

@anthony-murphy anthony-murphy requested review from a team as code owners January 16, 2024 17:54
@github-actions github-actions bot added area: loader Loader related issues area: tests Tests to add, test infrastructure improvements, etc base: main PRs targeted against main branch labels Jan 16, 2024
@anthony-murphy anthony-murphy requested a review from a team as a code owner January 19, 2024 22:20
Copy link
Contributor

@dannimad dannimad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and I don't see any outstanding comments.

Copy link
Contributor

@ChumpChief ChumpChief left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little nervous about expanding the capabilities of attach (adding retriability) given that it's a bit convoluted already, but this seems like a fine approach.

One scary comment to consider, rest are more minor.

@anthony-murphy anthony-murphy enabled auto-merge (squash) January 30, 2024 19:57
@anthony-murphy anthony-murphy merged commit 3d5dfb2 into microsoft:main Jan 30, 2024
29 checks passed
anthony-murphy added a commit that referenced this pull request Jan 31, 2024
Added some new tests in #19246 that are failing CI due to the increased
the compat matrix. This moves them to the reduced compat section of the
file to prevent failures.

Co-authored-by: Tony Murphy <[email protected]>
alexvy86 pushed a commit to alexvy86/FluidFramework that referenced this pull request Feb 13, 2024
…9246)

### Background

This change is in preparation for allowing consumers of fluid to call
serialize on a container while a container is in an attaching state,
which will include a failed attach where the failure didn't cause the
container to close. This will aid in the case of offline, network, or
server outage by allowing the application to capture the un-attached
state of the container, preserve it locally, and later create a new file
from it.

AB#5502

### Overview

In order to make attachment retriable I've made explicit types that
represent each stage of the attachment state machine and called it
`AttachmentData`. The container tracks the current attachment data, gets
the `AttachState`, and gets the base snapshot information in the case of
offline for it.

The biggest changes are to the `attach` method itself. 

Now that is retriable, we had to remove the `_attachStarted` member
which gated the call to a single instance. I've replaced that with the
`runSingle` helper method which ensure only one attach promise is
running at a time and ensure that all the parameters for all parallel
watchers of that promise used the same arguments.


I've also encapsulated the management of the state machine in a new
method `runRetriableAttachProcess `. The primary reason for this is
testability, as the new method is easily unit testable, and I've added a
number of unit test to validate its behavior. testability was achieved
by limiting the number of dependencies the method needed, so it uses
callbacks to perform the actual operations necessary to attach the
container.

The split also helped to segment the error handling for the container.
Specifically, only the callbacks that engage with the runtime will close
the container on failure, as if we can't create a summary, or propagate
an attachment state, we can't safely retry.

The container creates the callbacks for `runRetriableAttachProcess ` as
follows:

- `setAttachmentData` manages the attachment data member, which is
exposed via AttachState, and performs eventing and runtime callbacks.
- `createAttachmentSummary` creates the summary from the runtime, add
the protocol tree, and the closes the container if any of that fails
- `createOrGetStorageService` resolves the request, calls create
container, and sets up the containers document services. it only returns
the storage service, as that is all that might be needed to upload
blobs.

Lastly the container's attach method configures the delta connection,
and logs. As was previously the case, we still error if attach is called
for an invalid state, and we still limit serialize calls to only being
supported in a detached state.

AB#5911

---------

Co-authored-by: Tony Murphy <[email protected]>
Co-authored-by: jzaffiro <[email protected]>
Co-authored-by: Matt Rakow <[email protected]>
alexvy86 pushed a commit to alexvy86/FluidFramework that referenced this pull request Feb 13, 2024
…oft#19402)

Added some new tests in microsoft#19246 that are failing CI due to the increased
the compat matrix. This moves them to the reduced compat section of the
file to prevent failures.

Co-authored-by: Tony Murphy <[email protected]>
anthony-murphy added a commit that referenced this pull request Mar 1, 2024
…19893)

## Background
This change is the final change for allowing consumers of fluid to call
serialize on a container while a container is in an attaching state,
which will include a failed attach where the failure didn't cause the
container to close. This will aid in the case of offline, network, or
server outage by allowing the application to capture the un-attached
state of the container, preserve it locally, and later create a new file
from it.


[AB#5502](https://dev.azure.com/fluidframework/235294da-091d-4c29-84fc-cdfc3d90890b/_workitems/edit/5502)

## Overview
This change enables calling container.serialize while a container is in
the attaching state. It is the culmination of many smaller changes that
have gone in to enable this change.
- #18829
- #19400
 - #19246
 - #19517
 - #19518
 - #19590
 - #19634
 - #19802

---------

Co-authored-by: Tony Murphy <[email protected]>
Co-authored-by: Daniel Madrid <[email protected]>
Co-authored-by: jzaffiro <[email protected]>
Co-authored-by: Matt Rakow <[email protected]>
@anthony-murphy anthony-murphy deleted the make-attach-retyable branch September 10, 2024 00:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: loader Loader related issues area: tests Tests to add, test infrastructure improvements, etc base: main PRs targeted against main branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants