feat(server): memory defrag support - unit tests added to verify #448 #523

boazsade · 2022-12-01T08:41:38Z

fixes #448
feat(server): active memory defrag task #448

Signed-off-by: Boaz Sade [email protected]

This PR is a complementary PR for the high level implementation of the defrag task - PR 509.
In this PR we are covering the low level part that do the actual defragmation operation.
This can close issue #448.
Please note that we are disabling small string in this PR!.
Also note that we are changing mi_malloc memory decommit policy to wait for 0 ms from the default value.
Right now the flag that control the actual execution of the shard scan - "mem_defrag_threshold" is set to 1, this means that the task will do actual work only if we are at memory limits. To make it more reasonable we should change this to 0.05 or run with --mem_defrag_threshold=0.05

romange · 2022-12-01T11:00:14Z

src/core/compact_object.cc

+  // Supporting only for Strings - but not externals, or small string - they are using different
+  // allocations. otherwise we may crash See HasAllocated function for testing other conditions
+  const bool is_mimalloc_allocated =
+      (HasAllocated() && taglen_ != SMALL_TAG && ObjType() == OBJ_STRING);


why logically it's correct, I suggest that you structure it differently so that when we add support for the SMALL_TAG it will be easily extended.

I don't like it too, it was even more cumbersome. I think this is a classic case where you need a function.

i meant separate tag_len_ test from others. and later write

if (is_mimalloc_allocated) { if (taglen_ == ROBJ_TAG) { ... } // else TODO } return false;

yes - the next commit will have it, but since this can grow I think it would be more readable and less comments if this is in a function

romange · 2022-12-01T11:05:13Z

src/server/engine_shard_set.cc

+  CHECK(prime_table->depth() < 40)
+      << "the depth for the prime table is invalid: " << prime_table->depth();
+  PrimeTable::Cursor cur = defrag_state_.cursor;
+  const std::size_t max_iteration = slice.DbSize(kDefaultDbIndex) / kCursorCountRelation;


Do I read correctly that the task run-time is linear with the size of the database. so it will traverse for 1000000/50 iterations for 1M records ?

your whole function runs at least max_iteration loops.
DoDefrag should be very fast. It should run several buckets and exit and then continue etc.
With this code we will have unresponsive backend with it being stuck for time linear to db size. Could be many seconds.
We need microseconds.

so now I understand what you mean by const. Changing this to limit = 50 (const value), will take about 200-300 micro (this include the writing to the log, so should be even less).

I did a measurement for the value of 50 (const value), the time it sends running the function DoDefrag is ~0us (its about 400ns), so I think this value is good.

romange · 2022-12-01T11:06:33Z

src/server/engine_shard_set.cc

+  DCHECK(slice.IsDbValid(kDefaultDbIndex));
+  auto [prime_table, expire_table] = slice.GetTables(kDefaultDbIndex);
+  //  otherwise we will end with negative invalid segment id - crashed at CI
+  CHECK(prime_table->depth() < 40)


it's your assumption, right? Please add todo to remove this check.

This check is based on "dash_internal.h line 679 (function segment_id). This can be removed however since I found the reason for the crash in CI. I left it in just in case, but I will remove this.

I did not understand this part. Did you find the reason? what was the reason for the crash?

in any case, I prefer using DCHECK_LT if you need it for tests

I found the reason (and why it was not consistent) - I had a bug in the shutdown loop, that was only be visible when the DoDefrag was called. I assumed that the defrag_task value is > 0 but this wasn't true, so the task was not stopped between tests. Not sure why it happened always on CI and only when running it for a very many iterations, but I was able to reproduce this locally today, and now its not happening again, so there is no real reason to leave this check (I removed it on my local branch).

Very glad it was found!

src/redis/zmalloc.h

romange · 2022-12-01T15:59:36Z

src/server/dragonfly_test.cc

+  int kMaxNumKeysToDelete = 10'000;
+  int current_step = kFactor;
+  for (int i = 1; i < kMaxNumKeysToDelete; current_step *= kFactor) {
+    for (int j = i - 1 + current_step; i < current_step; i++, j++) {


what does this loop do? should it be i < current_step or j < current_step ?

j is just to set the string value, it is not controlling the loop number of iterations - you can write this without the j at all it would just looking at the line that set the name of the key as something like i + current_step - 1. So j is just to make it less verbose there. But you are using i for both loops.

ok, I understand now but it does break my pattern recognition.

Do you mind writing the equivalent code with inner loop like:
``for (; i < current_step; ++i) {
int j = i - 1 + current_step;
....
}

romange · 2022-12-01T16:01:58Z

src/server/dragonfly_test.cc

-    EXPECT_GT(GetMallocCurrentCommitted(), NUMBER_OF_KEYS);
+    // we are expecting to have at least one try by now
+    EXPECT_GT(shard->GetDefragStats().tries, 0);
+    EXPECT_GT(GetMallocCurrentCommitted(), kMaxNumKeysToDelete);


you are comparing GetMallocCurrentCommitted() bytes with number of keys?

should be removed - I change the test to set the max memory to be explicit, this was to make sure that the task would actually run (that is, we more memory than deleted keys, but was done when I initialy wrote the test and just want to very that didn't deleting too many keys, again before I realised that it better to just set the value for max memory).

romange · 2022-12-01T16:04:14Z

src/server/dragonfly_test.cc

+          return stats.success_count;
+        }
+        --tries;
+        this_fiber::sleep_for(220ms);


there is a lot to unpack here but I am pretty sure this line is never called. Otherwise, it would crash.

what do you want to test in words?

We need to somehow make sure that the idle task was executed and that it found a memory to defrag (it should once it was running since we deleted keys at random locations). I have to say that this didn't crashed, but it makes the test a little bit unpredictable in the sense that you cannot be sure whether the task ran or not. I though of adding a "TEST" functions that would be called explicitly to execute the DoDefarg, but didn't like the idea of adding functions that are should not be part of the production code.

romange · 2022-12-04T12:38:29Z

src/server/dragonfly_test.cc

+    //  creating a busy wait fiber here that checks the stats whether the defrag task run and
+    //  whether it did something.
+    auto stats = shard->GetDefragStats();
+    while (tries > 0 && stats.tries < 3) {


DragStats stats; for (int i = 0; i < 10; ++i) { auto stats = shard->GetDefragStats(); if (stats.success_count > 10) { break; } this_fiber::sleep_for(220ms); } EXPECT_GT(stats.success_count, kMinExpectedDefragCount);

romange · 2022-12-04T12:41:35Z

src/server/engine_shard_set.cc

 }

 void EngineShard::DefragTaskState::Init() {
-  cursor = 0u;
+  cursor = kCursorDoneState;


Init can be removed.

Signed-off-by: Boaz Sade <[email protected]>

boazsade requested a review from dranikpg December 1, 2022 08:41

boazsade force-pushed the active-mem-derag-full branch 2 times, most recently from b0836e1 to df4f7f4 Compare December 1, 2022 10:52

romange reviewed Dec 1, 2022

View reviewed changes

dranikpg reviewed Dec 1, 2022

View reviewed changes

src/redis/zmalloc.h Outdated Show resolved Hide resolved

boazsade force-pushed the active-mem-derag-full branch 2 times, most recently from 510c77f to d3b413b Compare December 1, 2022 15:49

boazsade requested a review from romange December 1, 2022 15:50

romange reviewed Dec 1, 2022

View reviewed changes

boazsade force-pushed the active-mem-derag-full branch 2 times, most recently from ce19c72 to 0460f3e Compare December 4, 2022 12:09

boazsade requested a review from romange December 4, 2022 12:15

romange reviewed Dec 4, 2022

View reviewed changes

feat(server): active memory defrag task dragonflydb#448

7f12a82

Signed-off-by: Boaz Sade <[email protected]>

boazsade force-pushed the active-mem-derag-full branch from 0460f3e to 7f12a82 Compare December 4, 2022 13:09

romange approved these changes Dec 4, 2022

View reviewed changes

romange merged commit 6ec4cf0 into dragonflydb:main Dec 4, 2022

aaronmondal mentioned this pull request Mar 10, 2024

Use mimalloc as global memory allocator TraceMachina/nativelink#749

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server): memory defrag support - unit tests added to verify #448 #523

feat(server): memory defrag support - unit tests added to verify #448 #523

boazsade commented Dec 1, 2022

romange Dec 1, 2022

boazsade Dec 1, 2022

romange Dec 1, 2022

boazsade Dec 1, 2022

romange Dec 1, 2022

romange Dec 1, 2022

boazsade Dec 1, 2022 •

edited

Loading

boazsade Dec 4, 2022

romange Dec 1, 2022

boazsade Dec 1, 2022

romange Dec 1, 2022

romange Dec 1, 2022

boazsade Dec 1, 2022

romange Dec 1, 2022

romange Dec 1, 2022

boazsade Dec 1, 2022

romange Dec 1, 2022

boazsade Dec 1, 2022

romange Dec 1, 2022

boazsade Dec 1, 2022

romange Dec 1, 2022

romange Dec 1, 2022

boazsade Dec 1, 2022

romange Dec 4, 2022

boazsade Dec 4, 2022

romange Dec 4, 2022

boazsade Dec 4, 2022

feat(server): memory defrag support - unit tests added to verify #448 #523

feat(server): memory defrag support - unit tests added to verify #448 #523

Conversation

boazsade commented Dec 1, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

boazsade Dec 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

boazsade Dec 1, 2022 •

edited

Loading