-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit the number of ephemeral nodes a session can create #118
Limit the number of ephemeral nodes a session can create #118
Conversation
…hemeral limit to 7500
zookeeper-server/src/main/java/org/apache/zookeeper/KeeperException.java
Outdated
Show resolved
Hide resolved
zookeeper-server/src/main/java/org/apache/zookeeper/server/PrepRequestProcessor.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall in right direction Grant! Thanks for picking it up.
zookeeper-server/src/main/java/org/apache/zookeeper/server/PrepRequestProcessor.java
Outdated
Show resolved
Hide resolved
...per-server/src/test/java/org/apache/zookeeper/server/quorum/EphemeralNodeThrottlingTest.java
Outdated
Show resolved
Hide resolved
...per-server/src/test/java/org/apache/zookeeper/server/quorum/EphemeralNodeThrottlingTest.java
Outdated
Show resolved
Hide resolved
...per-server/src/test/java/org/apache/zookeeper/server/quorum/EphemeralNodeThrottlingTest.java
Outdated
Show resolved
Hide resolved
zookeeper-server/src/main/java/org/apache/zookeeper/server/PrepRequestProcessor.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent job @GrantPSpencer! Let's test this in EI once it's in!
...per-server/src/test/java/org/apache/zookeeper/server/quorum/EphemeralNodeThrottlingTest.java
Outdated
Show resolved
Hide resolved
...per-server/src/test/java/org/apache/zookeeper/server/quorum/EphemeralNodeThrottlingTest.java
Outdated
Show resolved
Hide resolved
zookeeper-server/src/main/java/org/apache/zookeeper/server/DataTree.java
Outdated
Show resolved
Hide resolved
zookeeper-server/src/main/java/org/apache/zookeeper/server/DataTree.java
Outdated
Show resolved
Hide resolved
zookeeper-server/src/main/java/org/apache/zookeeper/server/DataTree.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work Grant! I left a few comments.
Also have one question. Why are we using the size of the path be the limit? Is it because we are worried about depth of tree of ephemeral nodes? If just ephemeral node count in general, counting each as 1 will be more efficient than iterating all chars in the path to get size.
We're using the sum of all the path sizes for that session as the limit because the closeSession transaction will contain all those znode paths. If the sum of those znode paths is larger than the jute max buffer, then the server will crash. I did look into ways to split up the closeSession transaction but did not come up with any viable solutions. Previously I used the # of ephemeral znodes as a proxy, but Kiran pointed out that trying estimate the sum of the path sizes would be a more accurate way to track it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, really nice work @GrantPSpencer!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You pivoted very fast from just count to using size. this is great change. just one minor comment.
zookeeper-server/src/main/java/org/apache/zookeeper/server/DataTree.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great change. Thanks @GrantPSpencer for working through this.
Description
In Zookeeper, an ephemeral node is only kept alive while the session it was created in is still active. When a session is closed by the client, all ephemeral nodes created by that session are deleted. Currently, there is no limit on how many ephemeral nodes can be created in a single session and if a large enough number of ephemeral nodes are created, then the work needed to delete these nodes and communicate the deletion to all followers can overwhelm the zookeeper server. This behavior has caused two previous Zookeeper site-up issues at LinkedIn.
My proposed solution to this problem is to throttle based on the cumulative bytes it takes to store all of the paths for the znodes created within that session. When a request comes into the PrepRequestProcessor, we check the DataTree and deny any create requests if we have already reached/exceeded the byte limit for that session.
With this approach, it is important to note that when a request is accepted, it would not immediately affect the ephemeral node count as it would need to reach the FinalRequestProcessor before the request affects the DataTree. This lag between the accepting ephemeral node requests and updating our ephemeral node count can lead to us not strictly upholding the limit. However, I believe this inaccuracy is acceptable as this should only result in minor variances of the total # of nodes allowed to be created and would still achieve the primary goal of preventing a server being overwhelmed when a session closes. If we wanted to be extra cautious, we could also fail requests at the end of the request processing pipeline when the new node is actually added to the DataTree.
4 years ago a PR (apache#1144) for a similar feature was opened against the opensource Zookeeper, but was never merged. The approach in that PR is slightly different as it attempts to keep a counter in the PrepRequestProcessor to track the # of ephemeral nodes currently in the DataTree. This solution does not have the same "lag time" problem as my proposed solution, but it does not account for the request failing at any step other than the PrepRequestProcessor. This could lead to the counter desyncing from the actual DataTree and being in an error state until reboot. Attempting to cover all the areas where a request could fail would also add quite a bit of constant work to the server for a feature that is intended to prevent an infrequent occurrence.
(Please let me know if this is not the correct repo/branch to target with these changes)
Tests
The following tests are written for this issue:
org/apache/zookeeper/server/quorum/EphemeralNodeThrottlingTest.java
The following is the result of the "mvn test" command on the appropriate module: