fix(DataStore): should not crash on missing version metadata #2849
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue #, if available:
#2151 (comment)
Description of changes:
The app should not crash when DataStore cannot retrieve the version for update and delete mutations. DataStore should attempt to send the mutation without a version and handle the response.
This PR is looking to address the following crash that customers are experiencing
The message
comes from https://github.com/aws-amplify/amplify-android/blob/main/aws-datastore/src/main/java/com/amplifyframework/datastore/syncengine/VersionRepository.kt#L148
Code paths
The unexpected behavior observed by the customer is that DataStore is deciding to crash when publishing mutations requires a version for updates and deletes, but it could not find it.
According to this customer #2151 (comment) This happens in bad network scenarios during saving and updating a model.
The cause of this issue:
DataStore.save(model)
is called for a new model. This inserts into the local database and queues up a pending create mutation.DataStore.save(model)
is called for the existing model, performing some updates to the fields. This updates the existing record in the local database, and queues up a pending update mutation. We assume that the MutationProcessor performs (1) process the create mutation, such that it is marked in flight, and the update mutation is not merged into the create mutation when it is inserted into the mutation database, and has to be sent on its own by the MutationProcessor.MutationProcessor (running in parallel):
errorHandler
and the mutation will be dropped. Subsequently there will not be any reconciliation, and no local version will be persisted.The cause of this issue is the create mutation was dropped, which could happen for a variety of reasons decided by the MutationProcessor’s logic in handling the response of the request, and the subsequent update mutation did not have a version to send with the mutation.
MutationProcessor error classification
https://github.com/aws-amplify/amplify-android/blob/main/aws-datastore/src/main/java/com/amplifyframework/datastore/syncengine/MutationProcessor.java#L353-L355
DataStoreException.GraphQLResponseException
https://github.com/aws-amplify/amplify-android/blob/main/aws-datastore/src/main/java/com/amplifyframework/datastore/syncengine/MutationProcessor.java#L387
This exception is thrown when the response is successful with a GraphQL response payload containing errors. If any of the GraphQLError’s do not contain the AppSync “conflictUnhandled” type then throw this.
This means the conditions for classification of non-retryable is as follows:
Since we are discussing the scenario with bad network conditions, it’s unlikely that it was classified as non-retyable due to this condition since (1) A successful GraphQL response would not have occurred.
ApiException.NonRetryableException
This exception is thrown when the response is 400 to 499.
This includes 408 Request Timeout. To be verified by manual testing, but this could be the response being handled as non-retryable even though it was a transient timeout error (bad network conditions).
In summary, there is an existing scenario to be verified: During bad network conditions, DataStore’s MutationProcessor should handle failures as retryable.
Consider the scenario in which Unauthorized is treated as non-retryable (as seen above in DataStoreException.GraphQLResponseException). Unauthorized is very common when the user is signed out, or their token or session is expired. The mutation is then dropped, and update mutation will attempt to get the version and crash the app.
This PR looks to address this scenario while another PR independent from this will look into verifying MutationProcessor's classification of transient network errors as retryable. The app should not crash when DataStore cannot retrieve the version for update and delete mutations. DataStore should attempt to send the mutation without a version and handle the response. The response will be handled by DataStore by sending it back to the customer through the
errorHandler
and then dropping it, moving on to the next mutation in the queue.What kind of response are we expecting?
The conflict resolution enabled backend does not enforce version to be a required field. If we cannot get a version locally, we can send the update/delete mutation without a version, AppSync will respond successfully with a GraphQL response. This is inline with Swift DataStore's implementation, as it will also send update/delete mutations without a version if it cannot retrieve it at the time it's processing the mutation event.
Example 1. Update Mutation on existing model
In the above example, version is updated, and the name is not. The conflict resolution strategy is enabled is auto-merge. DataStore will then reconcile this response data into the local database as the latest version of this model.
Example 2: Update mutation on model not yet created in AppSync
The example above closely resembles the scenario where we continue to send the update mutation without a version. When the create mutation fails, the blog id does not exist in the AppSync. When the id is used for update mutations w/ Cognito User Pool auth, and without version, the response will be unauthorized.
According to the MutationProcessor logic, this will be categorized as a non retryable error, and the error will be returned to the customer through the
errorHandler
, before the update mutation is dropped. Based on their use cases, they can decide to reconcile this themselves or not, however, new devices or cleared devices will not sync this data back down to the app since it does not exist in the remote store.How did you test these changes?
(Please add a line here how the changes were tested)
Documentation update required?
General Checklist
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.