Limit the number of ephemeral nodes a session can create #118

GrantPSpencer · 2023-09-25T21:00:33Z

Description

In Zookeeper, an ephemeral node is only kept alive while the session it was created in is still active. When a session is closed by the client, all ephemeral nodes created by that session are deleted. Currently, there is no limit on how many ephemeral nodes can be created in a single session and if a large enough number of ephemeral nodes are created, then the work needed to delete these nodes and communicate the deletion to all followers can overwhelm the zookeeper server. This behavior has caused two previous Zookeeper site-up issues at LinkedIn.

My proposed solution to this problem is to throttle based on the cumulative bytes it takes to store all of the paths for the znodes created within that session. When a request comes into the PrepRequestProcessor, we check the DataTree and deny any create requests if we have already reached/exceeded the byte limit for that session.

With this approach, it is important to note that when a request is accepted, it would not immediately affect the ephemeral node count as it would need to reach the FinalRequestProcessor before the request affects the DataTree. This lag between the accepting ephemeral node requests and updating our ephemeral node count can lead to us not strictly upholding the limit. However, I believe this inaccuracy is acceptable as this should only result in minor variances of the total # of nodes allowed to be created and would still achieve the primary goal of preventing a server being overwhelmed when a session closes. If we wanted to be extra cautious, we could also fail requests at the end of the request processing pipeline when the new node is actually added to the DataTree.

4 years ago a PR (apache#1144) for a similar feature was opened against the opensource Zookeeper, but was never merged. The approach in that PR is slightly different as it attempts to keep a counter in the PrepRequestProcessor to track the # of ephemeral nodes currently in the DataTree. This solution does not have the same "lag time" problem as my proposed solution, but it does not account for the request failing at any step other than the PrepRequestProcessor. This could lead to the counter desyncing from the actual DataTree and being in an error state until reboot. Attempting to cover all the areas where a request could fail would also add quite a bit of constant work to the server for a feature that is intended to prevent an infrequent occurrence.

(Please let me know if this is not the correct repo/branch to target with these changes)

Tests

The following tests are written for this issue:
org/apache/zookeeper/server/quorum/EphemeralNodeThrottlingTest.java

The following is the result of the "mvn test" command on the appropriate module:

$ mvn test -o -Dtest=EphemeralNodeThrottlingTest -pl zookeeper-server/

[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.zookeeper.server.quorum.EphemeralNodeThrottlingTest
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 14.93 s - in org.apache.zookeeper.server.quorum.EphemeralNodeThrottlingTest
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  21.543 s
[INFO] Finished at: 2023-09-25T13:57:39-07:00
[INFO] ------------------------------------------------------------------------

…hemeral limit to 7500

zookeeper-server/src/main/java/org/apache/zookeeper/KeeperException.java

zookeeper-server/src/main/java/org/apache/zookeeper/server/PrepRequestProcessor.java

…ments to new error code

rahulrane50

Overall in right direction Grant! Thanks for picking it up.

zookeeper-server/src/main/java/org/apache/zookeeper/server/PrepRequestProcessor.java

...per-server/src/test/java/org/apache/zookeeper/server/quorum/EphemeralNodeThrottlingTest.java

zookeeper-server/src/main/java/org/apache/zookeeper/server/PrepRequestProcessor.java

rahulrane50

Excellent job @GrantPSpencer! Let's test this in EI once it's in!

...per-server/src/test/java/org/apache/zookeeper/server/quorum/EphemeralNodeThrottlingTest.java

…s limit

zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md

zookeeper-server/src/main/java/org/apache/zookeeper/server/DataTree.java

zpinto

Nice work Grant! I left a few comments.

Also have one question. Why are we using the size of the path be the limit? Is it because we are worried about depth of tree of ephemeral nodes? If just ephemeral node count in general, counting each as 1 will be more efficient than iterating all chars in the path to get size.

GrantPSpencer · 2023-10-18T20:01:13Z

Nice work Grant! I left a few comments.

Also have one question. Why are we using the size of the path be the limit? Is it because we are worried about depth of tree of ephemeral nodes? If just ephemeral node count in general, counting each as 1 will be more efficient than iterating all chars in the path to get size.

We're using the sum of all the path sizes for that session as the limit because the closeSession transaction will contain all those znode paths. If the sum of those znode paths is larger than the jute max buffer, then the server will crash. I did look into ways to split up the closeSession transaction but did not come up with any viable solutions.

Previously I used the # of ephemeral znodes as a proxy, but Kiran pointed out that trying estimate the sum of the path sizes would be a more accurate way to track it

zpinto

LGTM, really nice work @GrantPSpencer!

desaikomal

You pivoted very fast from just count to using size. this is great change. just one minor comment.

zookeeper-server/src/main/java/org/apache/zookeeper/server/DataTree.java

desaikomal

Great change. Thanks @GrantPSpencer for working through this.

GrantPSpencer added 8 commits September 11, 2023 15:18

add error for ephemeralcountexceeded

dcd7db3

First pass

0c0820b

add unit tests

3af8e06

Make test assertions more specific, formatting, and change default ep…

0afc762

…hemeral limit to 7500

Formatting

c972c2f

Cleanup unused imports

274abb7

add licensing text to new test file

674f5a1

update zookeeperAdmin.md with correct default ephemeral node limit

e362f46

desaikomal reviewed Oct 2, 2023

View reviewed changes

zookeeper-server/src/main/java/org/apache/zookeeper/KeeperException.java Outdated Show resolved Hide resolved

zookeeper-server/src/main/java/org/apache/zookeeper/server/PrepRequestProcessor.java Outdated Show resolved Hide resolved

GrantPSpencer added 2 commits October 3, 2023 15:24

Instantiate ephemeral limit in request processor constructor, add com…

16f4cc2

…ments to new error code

update doc to include -1 override for ephemeral limit

8c18612

rahulrane50 reviewed Oct 4, 2023

View reviewed changes

Move limit instantation to ZookeeperServer, Improve unit tests

de6de82

rahulrane50 approved these changes Oct 6, 2023

View reviewed changes

...per-server/src/test/java/org/apache/zookeeper/server/quorum/EphemeralNodeThrottlingTest.java Outdated Show resolved Hide resolved

...per-server/src/test/java/org/apache/zookeeper/server/quorum/EphemeralNodeThrottlingTest.java Outdated Show resolved Hide resolved

GrantPSpencer added 5 commits October 12, 2023 20:58

first pass at switching over to using ephemeral node path byte size a…

bd2c5cf

…s limit

better byte size algo, change error and metric naming

539a3cc

finish switching over to byte size rather than count

1bcc953

separate tests because of shared system property

50dc802

Formatting, removed debug prints, and remove unused imports

9a88656

zpinto reviewed Oct 18, 2023

View reviewed changes

Only instantiate AtomicInt on map.put, fix bad byteLimit assertion

8582c8e

zpinto approved these changes Oct 18, 2023

View reviewed changes

desaikomal reviewed Oct 18, 2023

View reviewed changes

zookeeper-server/src/main/java/org/apache/zookeeper/server/DataTree.java Show resolved Hide resolved

desaikomal approved these changes Oct 19, 2023

View reviewed changes

rahulrane50 merged commit 6433cbd into linkedin:branch-3.6 Nov 8, 2023

abhilash1in mentioned this pull request Apr 16, 2024

Fix checkstyle error #121

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit the number of ephemeral nodes a session can create #118

Limit the number of ephemeral nodes a session can create #118

GrantPSpencer commented Sep 25, 2023 •

edited

Loading

rahulrane50 left a comment

rahulrane50 left a comment

zpinto left a comment •

edited

Loading

GrantPSpencer commented Oct 18, 2023

zpinto left a comment

desaikomal left a comment

desaikomal left a comment

Limit the number of ephemeral nodes a session can create #118

Limit the number of ephemeral nodes a session can create #118

Conversation

GrantPSpencer commented Sep 25, 2023 • edited Loading

Description

Tests

rahulrane50 left a comment

Choose a reason for hiding this comment

rahulrane50 left a comment

Choose a reason for hiding this comment

zpinto left a comment • edited Loading

Choose a reason for hiding this comment

GrantPSpencer commented Oct 18, 2023

zpinto left a comment

Choose a reason for hiding this comment

desaikomal left a comment

Choose a reason for hiding this comment

desaikomal left a comment

Choose a reason for hiding this comment

GrantPSpencer commented Sep 25, 2023 •

edited

Loading

zpinto left a comment •

edited

Loading