-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modified UserFateStore.create to behave like MetaFateStore #4787
Conversation
MetaFateStore.create will retry forever when a collision happens when trying to create a Fate transaction. The probabiliy of a collision is low due to a random UUID being used. Before this change the UserFateStore.create method would retry 5 times then throw an exception. This was removed in favor of an unlimited retry so that the behavior of the two Fate stores is the same. Closes apache#4246
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good change and keeps the two stores consistent. Wasn't sure how this would conflict with the changes in #4524, but looking at this, shouldn't have any problems.
Hmm thinking about this some more. Wondering if it would be better if we had finite number of retries for both stores instead of infinite. The probability of collision is very low. I'm not sure that both stores are functioning the same under these new changes. The MFS will throw an error if any other KeeperException is seen other than NodeExistsException. The UFS will retry forever if an UNKNOWN status is received. So if there is something wrong and we keep receiving UNKNOWN, it will keep retrying. |
So, I first went down the path of creating an IT to determine what error a user would receive when something like |
As it works now, it is testing for a duplicate FateId (while not directly looking at the FateId). It will be REJECTED if there is already a TStatus set for the FateId, and a TStatus is always set on creation of the FateId and persists throughout the existence of the FateId. Maybe I'm misunderstanding you? Maybe there is no issue with both stores retrying infinitely, but what would be the problem with both stores retrying only a finite number of times? This could be a larger than 5, but still finite. The probability of collision would be, for all intents and purposes, impossible. If it fails after these attempts, then there is a bigger problem and not a problem with collisions |
I think I mis-understood your comment in relation to the code. You are saying that we should continue to retry on REJECTED, but throw an exception on UNKNOWN? |
I'm just wondering if it would be better/safer to keep a finite number of retries and maybe change MFS to be a finite number of retries as well. It should still retry on UNKNOWN or REJECTED for UFS |
My comment regarding the retry for UNKNOWN was just that if there is something wrong with the TabletServer or something else that would cause UNKNOWN to be received indefinitely, this current impl will retry forever. |
I think with the changes in 5882212, the two implementations are consistent. They continue forever when there is a dupe, otherwise they throw an exception. We can have a discussion about whether or not they should try forever and I'm wondering what they did previously. Because I think that's the answer as the calling code is already written to handle that. |
core/src/main/java/org/apache/accumulo/core/fate/user/UserFateStore.java
Outdated
Show resolved
Hide resolved
Apologies if any of my comments were confusing/misleading. |
This reverts commit 5882212.
All good, I'm just trying to make sure things are handled consistently. Based on the old ZooStore implementation, which is now the MetaFateStore class, I think we should retry indefinitely in the UserFateStore. Reason being that the code that calls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good change to me so things are consistent. I think it makes sense to treat both stores the same and keep retrying forever for the normal create()
case where we are generating a UUID (vs supplying it).
As already noted, the meta store is pretty much guaranteed to eventually succeed because it just generates new UUIDs on a collision which is very rare anyways. It will still abort and exit if an exception is thrown due to an issue with Zookeeper (such as the client can't get to the server or something).
For the User store, it also generates a random ID in the same way so it should be fine to keep retrying and I don't really see an issue with that. I think we can keep retrying for REJECTED
for sure and probably UNKNOWN
.
REJECTED
of course just means there was a collision so we definitely want to retry with a new UUID.- For the
UNKNOWN
case that might mean there was just a transient issue with the client getting confirmation after submission so we don't know if it went through so I think we can try again without issue. If it was actually successful the FateCleaner will handle cleanup and age off. - If another type of AccumuloException is thrown (like the client can't connect at all anymore) or security exception is thrown then a runtime exception bubbles up and we will also exit the method (just like with the meta store and a ZK error).
The time when we should limit the retries would be when a FateKey is provided and we are generating the ID off of the FateKey. That still has a retry of 5 and that makes sense so you don't block forever with multiple attempts as that ID will keep trying the same one over and over and won't generate a new one.
@keith-turner - Do see you any issue with this approach?
No, that all sounds good. |
More complexity could be added around the unknown case, but its not worth the maintenance burden. I agree its better to just let the FateCleaner deal with it. |
Yeah I figured it wouldn't be worth trying to scan and check if it was successful as it should be rare and the FateCleaner would clean it up. To really handle UNKNOWN effectively you'd need something more complex like how Ample combines a retry and rejection handler API. |
MetaFateStore.create will retry forever when a collision happens when trying to create a Fate transaction. The probabiliy of a collision is low due to a random UUID being used. Before this change the UserFateStore.create method would retry 5 times then throw an exception. This was removed in favor of an unlimited retry so that the behavior of the two Fate stores is the same.
Closes #4246