fix: switch to SHUTTING_DOWN state unconditionally #4408

romange · 2025-01-06T07:21:50Z

During the shutdown sequence always switch to SHUTTING_DOWN. Make sure that the rest of the code does not break if it can not switch to the desired global state + some clean ups around state transitions.

romange · 2025-01-06T10:33:06Z

@adiholden I also run the full regtest suit https://github.com/dragonflydb/dragonfly/actions/runs/12629900466

During the shutdown sequence always switch to SHUTTING_DOWN. Make sure that the rest of the code does not break if it can not switch to the desired global state + some clean ups around state transitions. Finally, reduce the amount of data in test_replicaof_reject_on_load Signed-off-by: Roman Gershman <[email protected]>

BorysTheDev · 2025-01-06T11:18:24Z

src/server/main_service.cc

-  }
-  if (switch_state) {
-    SwitchState(GlobalState::ACTIVE, GlobalState::LOADING);
+    loading_state_counter_++;


How is it possible?

how what possible?

To have 2 loading processes at the same time

adiholden · 2025-01-07T13:20:37Z

tests/dragonfly/replication_test.py


    replica.stop()
    replica.start()
    c_replica = replica.client()
+
+    @assert_eventually


why do you use the assert_eventually here? shouldnt we see the loading state at the first time we run the info persistence?

I ignore the implementation details and assume there can be a short period of time where the server has not started loading yet or ServerState::g_state_ has not been updated yet. I embrace the eventual consistency nature of state transitions

adiholden · 2025-01-07T13:21:56Z

tests/dragonfly/replication_test.py

+        persistence = await c_replica.info("PERSISTENCE")
+        assert persistence["loading"] == 1
+
+    # If this fails adjust `keys` and the `assert dbsize >= 30000` above.


we can update the comment there is not assert on dbsize

adiholden · 2025-01-07T13:27:09Z

tests/dragonfly/replication_test.py

@@ -1971,23 +1971,27 @@ async def test_replicaof_reject_on_load(df_factory, df_seeder_factory):
    df_factory.start_all([master, replica])

    c_replica = replica.client()
-    await c_replica.execute_command(f"DEBUG POPULATE 8000000")
+    await c_replica.execute_command(f"DEBUG POPULATE 4000000")


The only reason we use large amount of keys here is to generate lots of data which will be later take time to load.
If you will use
seeder = StaticSeeder(
key_target=800,
data_size=100000,
collection_size=10000,
types=["SET"])
The test will be extremely fast and this will be same data size.
2 reasons for this:

when populating we yeild every 32 keys added I think

we add multiple elements in one command when using collections while for string we each time add one key

in that case the variation of "debug populate ... type SET ... " could also work.

i just confirmed debug populate 800 key 1000 RAND type set elements 2000 creates 800 sets with 2000 elements of size 1000 each

StaticSeeder uses debug poulate its similar to your fix, anyway the change is good

adiholden · 2025-01-08T06:43:01Z

src/server/server_family.cc

+    if (schedule_done_.WaitFor(100ms)) {
+      return;
+    }
+  } while (ss->gstate() == GlobalState::LOADING);


What is the reason you move several places in the code to get the global state from ServerState?
I believe we used the service_.GetGlobalState() in this places intentionaly.
We assume that the ServerState::state is eventually consistent with the global state, therefor in places where we allow/recect running some flow based on the state we use the global state as it could be that there are 2 commands running at the same time one is changing the state and the other one checks the state to determine whether we can run the flow and might also change the state based on this check.

And that's exactly why I removed it. It's misleading. There are no transactional guarantees when you test an atomic variable like this. The global state can change a millisecond later, so this check is not "better". I do not think that in all these places it's super important but I wanted to remove this false sense of correctness on purpose.

kostasrim · 2025-01-08T09:13:10Z

Fixes this one #4423 right ?

romange · 2025-01-08T09:28:28Z

we will see

romange requested a review from adiholden January 6, 2025 07:21

romange force-pushed the FixShutdown branch 2 times, most recently from 7daf643 to 402cc7b Compare January 6, 2025 09:25

romange mentioned this pull request Jan 6, 2025

test: move ReplicaofRejectOnLoad test from pytest into unit tests #4410

Merged

romange force-pushed the FixShutdown branch from 402cc7b to b896c6e Compare January 6, 2025 11:12

BorysTheDev reviewed Jan 6, 2025

View reviewed changes

adiholden reviewed Jan 7, 2025

View reviewed changes

chore: comments

07413d5

romange requested a review from adiholden January 7, 2025 18:52

adiholden reviewed Jan 8, 2025

View reviewed changes

adiholden approved these changes Jan 8, 2025

View reviewed changes

romange merged commit 0a40083 into main Jan 8, 2025
9 checks passed

romange deleted the FixShutdown branch January 8, 2025 09:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: switch to SHUTTING_DOWN state unconditionally #4408

fix: switch to SHUTTING_DOWN state unconditionally #4408

romange commented Jan 6, 2025

romange commented Jan 6, 2025

BorysTheDev Jan 6, 2025

romange Jan 6, 2025

BorysTheDev Jan 6, 2025

romange Jan 6, 2025

adiholden Jan 7, 2025

romange Jan 7, 2025

adiholden Jan 7, 2025

adiholden Jan 7, 2025

romange Jan 7, 2025

romange Jan 7, 2025 •

edited

Loading

adiholden Jan 8, 2025

adiholden Jan 8, 2025

romange Jan 8, 2025

kostasrim commented Jan 8, 2025

romange commented Jan 8, 2025

fix: switch to SHUTTING_DOWN state unconditionally #4408

fix: switch to SHUTTING_DOWN state unconditionally #4408

Conversation

romange commented Jan 6, 2025

romange commented Jan 6, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romange Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kostasrim commented Jan 8, 2025

romange commented Jan 8, 2025

romange Jan 7, 2025 •

edited

Loading