feat(rdb_load): add support for loading huge sets #3807

andydunstall · 2024-09-27T06:49:26Z

Adds support for loading huge sets (#3760)

romange · 2024-09-27T08:58:01Z

src/server/rdb_load.cc

+  /*
+   * TODO
+   *
+   * I'm quite confused by the existing logic here, were we limit


If i remember correctly, it was more efficient to allocate smaller arrays than a single huge array.
And I agree with you - lets start with a single "static" rule of upto kMaxBlobLen for streaming.

romange · 2024-09-27T09:05:26Z

src/server/rdb_load.cc

-    // Despite being async, this function can block if the shard queue is full.
-    FlushShardAsync(sid);
-  }
+    out_buf.emplace_back(item);


nit: move(item)

thanks updated

src/server/rdb_load.cc

romange

looks good :)

romange · 2024-09-28T06:27:25Z

you can use rdb_test.cc to add the unit test

andydunstall · 2024-09-28T07:27:45Z

Sure I've added a unit test

Testing a 5GB set manually with 5m entries (debug populate 1 test 1000 rand type set elements 5000000)

RSS is still 10GB (5GB used), as it still only does a single flush (since kMaxBufSize x kMaxBlobLen > 8m)

It's also now ~20% slower which seems to be due to not reserving the full set up front anymore. Though can include the full set size in Item to reserve on first load if you think it's worth doing

Repeating the test where we flush if pending_read.remaining > 0, RSS < 7GB (5GB used) and it's 20% faster. Not sure if you think thats worth adding or if theres another flushing approach we should use?

romange · 2024-09-28T08:39:51Z

I think it's worth reserving capacity if it's easy to add. Also we can reduce now kMaxBlobLen to smaller number since we know large sets will just occupy all the items in ItemsBuf.
using pending_read.remaining as a flush signal is also possible and could be a good short term heuristic.

andydunstall · 2024-09-28T09:51:00Z

Ok sure, I've reduced kMaxBlobLen to 4096 meaning the 5m set is split into ~10 flushes (128x4096 ~= 500k)

And reserved the full set up front

With that loading the 5GB set takes 2.8s (compared to 3.8s before) with RSS ~7GiB (~10GiB before)

romange · 2024-09-28T19:42:17Z

src/server/rdb_load.cc

@@ -2505,6 +2557,11 @@ void RdbLoader::LoadItemsBuffer(DbIndex db_ind, const ItemsBuf& ib) {
      stop_early_ = true;
      break;
    }
+    if (item->load_config.append) {


why did you add this code?

A bit of a guess given it already does ts->TryStash below in the non-append case - should i remove?

(Or if you mean why does it continue early in the append state, it's because we don't want to check expiry, add to db_slice or update flags in the append case right?)

Lets remove this code.

Data tiering does not work with to non-strings.

We do not want to stash values that are not yet finalized.

sure removed

src/server/rdb_load.cc

src/server/rdb_load.h

romange

lgtm, plus minor comments

romange · 2024-09-29T06:52:24Z

Good work @andydunstall . If you want you can follow up with hash-map + zset PR as well.

andydunstall · 2024-09-29T06:59:33Z

If you want you can follow up with hash-map + zset PR as well

Thanks will do

* feat(rdb_load): add support for loading huge sets

romange reviewed Sep 27, 2024

View reviewed changes

src/server/rdb_load.cc Show resolved Hide resolved

romange reviewed Sep 27, 2024

View reviewed changes

andydunstall force-pushed the rdb-load-huge-sets branch from 5abb498 to 1b37a0f Compare September 28, 2024 06:09

feat(rdb_load): add support for loading huge sets

8cb986b

andydunstall force-pushed the rdb-load-huge-sets branch from 1b37a0f to 8cb986b Compare September 28, 2024 07:26

feat(rdb_load): reserve full huge set up front

dfb6e09

andydunstall changed the title ~~feat(rdb_load): add support for loading huge sets (WIP)~~ feat(rdb_load): add support for loading huge sets Sep 28, 2024

andydunstall marked this pull request as ready for review September 28, 2024 09:56

romange reviewed Sep 28, 2024

View reviewed changes

src/server/rdb_load.cc Outdated Show resolved Hide resolved

romange reviewed Sep 28, 2024

View reviewed changes

src/server/rdb_load.h Outdated Show resolved Hide resolved

romange reviewed Sep 28, 2024

View reviewed changes

src/server/rdb_load.h Outdated Show resolved Hide resolved

romange reviewed Sep 28, 2024

View reviewed changes

feat(rdb_load): refactor and decrease kBufSize

5696d58

andydunstall force-pushed the rdb-load-huge-sets branch from b34c951 to 5696d58 Compare September 29, 2024 06:41

romange approved these changes Sep 29, 2024

View reviewed changes

romange merged commit 520dea0 into dragonflydb:main Sep 29, 2024
9 checks passed

andydunstall mentioned this pull request Sep 29, 2024

feat(rdb_load): add support for loading huge hmaps and zsets #3823

Merged

romange pushed a commit that referenced this pull request Sep 30, 2024

feat(rdb_load): add support for loading huge sets (#3807)

e476664

* feat(rdb_load): add support for loading huge sets

andydunstall mentioned this pull request Oct 2, 2024

feat(rdb_load): add support for loading huge lists #3850

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rdb_load): add support for loading huge sets #3807

feat(rdb_load): add support for loading huge sets #3807

andydunstall commented Sep 27, 2024 •

edited

Loading

romange Sep 27, 2024

andydunstall Sep 28, 2024

romange Sep 27, 2024

andydunstall Sep 28, 2024

romange left a comment

romange commented Sep 28, 2024

andydunstall commented Sep 28, 2024 •

edited

Loading

romange commented Sep 28, 2024

andydunstall commented Sep 28, 2024

romange Sep 28, 2024

andydunstall Sep 29, 2024 •

edited

Loading

romange Sep 29, 2024

andydunstall Sep 29, 2024

romange left a comment

romange commented Sep 29, 2024

andydunstall commented Sep 29, 2024

feat(rdb_load): add support for loading huge sets #3807

feat(rdb_load): add support for loading huge sets #3807

Conversation

andydunstall commented Sep 27, 2024 • edited Loading

romange Sep 27, 2024

Choose a reason for hiding this comment

andydunstall Sep 28, 2024

Choose a reason for hiding this comment

romange Sep 27, 2024

Choose a reason for hiding this comment

andydunstall Sep 28, 2024

Choose a reason for hiding this comment

romange left a comment

Choose a reason for hiding this comment

romange commented Sep 28, 2024

andydunstall commented Sep 28, 2024 • edited Loading

romange commented Sep 28, 2024

andydunstall commented Sep 28, 2024

romange Sep 28, 2024

Choose a reason for hiding this comment

andydunstall Sep 29, 2024 • edited Loading

Choose a reason for hiding this comment

romange Sep 29, 2024

Choose a reason for hiding this comment

andydunstall Sep 29, 2024

Choose a reason for hiding this comment

romange left a comment

Choose a reason for hiding this comment

romange commented Sep 29, 2024

andydunstall commented Sep 29, 2024

andydunstall commented Sep 27, 2024 •

edited

Loading

andydunstall commented Sep 28, 2024 •

edited

Loading

andydunstall Sep 29, 2024 •

edited

Loading