feat(server): New auto-journal types/read/write #560

dranikpg · 2022-12-12T23:17:27Z

Reduced version of #540

Contains only declaration of new journal type, reader and writer

romange · 2022-12-13T03:01:07Z

please install presubmit hooks in your git client

Signed-off-by: Vladislav Oleshko <[email protected]>

dranikpg · 2022-12-13T09:41:20Z

src/server/journal_test.cc

+}
+
+// Test serializing and de-serializing entries.
+TEST(Journal, WriteRead) {


The stat might look like much but the tests need some utils to mock dragonfly slice types

dranikpg · 2022-12-13T09:42:46Z

src/server/journal/types.h

+// This struct represents a single journal entry.
+// Those are either control instructions or commands.
+struct EntryNew {  // Called this "New" because I can't delete the old neither replace it partially


I called this New because we need both, because this PR contains only the declaration & read+write utils

dranikpg · 2022-12-13T09:43:35Z

src/server/journal/serializer.cc

+using nonstd::make_unexpected;
+
+// TODO: Stolen from rdb, unite in common utils
+
+#define SET_OR_RETURN(expr, dest)              \
+  do {                                         \
+    auto exp_val = (expr);                     \
+    if (!exp_val) {                            \
+      VLOG(1) << "Error while calling " #expr; \
+      return exp_val.error();                  \
+    }                                          \
+    dest = exp_val.value();                    \
+  } while (0)
+
+#define SET_OR_UNEXPECT(expr, dest)            \
+  {                                            \
+    auto exp_res = (expr);                     \
+    if (!exp_res)                              \
+      return make_unexpected(exp_res.error()); \
+    dest = std::move(exp_res.value());         \
+  }


I'm still in favour of moving all the serialization into one folder and share common utils

dranikpg · 2022-12-13T09:44:58Z

src/server/journal/serializer.cc

+  if (read < 1)  // TODO: Custom errc namespace? Generic opcodes?
+    return make_unexpected(std::make_error_code(std::errc::result_out_of_range));


And sharing common error codes

However this would make this PR bigger than intended 🤔

adiholden · 2022-12-14T08:18:03Z

src/server/CMakeLists.txt

@@ -18,7 +18,8 @@ add_library(dragonfly_lib  channel_slice.cc command_registry.cc
            list_family.cc main_service.cc memory_cmd.cc rdb_load.cc rdb_save.cc replica.cc
            snapshot.cc script_mgr.cc server_family.cc malloc_stats.cc
            set_family.cc stream_family.cc string_family.cc
-            zset_family.cc version.cc bitops_family.cc container_utils.cc)
+            zset_family.cc version.cc bitops_family.cc container_utils.cc
+            journal/serializer.cc)


I see that files in journal dir are under dfly_transaction lib
Maybe it makes more sense to put the journal/serializer.cc also there?

For now, its only part of the replication flow, so I'm putting it into the outermost dependency

adiholden · 2022-12-14T08:29:40Z

src/server/journal/serializer.h

+#include "io/io.h"
+#include "server/common.h"
+#include "server/journal/types.h"
+#include "server/main_service.h"


what is #include "server/main_service.h" used for?

Forgot to remove it, was from Journal executor

adiholden · 2022-12-14T09:07:35Z

src/server/journal/serializer.h

+
+// JournalWriter serializes journal entries to a sink.
+// It automatically keeps track of the current database index.
+struct JournalWriter {


struct -> class

adiholden · 2022-12-14T09:07:47Z

src/server/journal/serializer.h

+
+// JournalReader allows deserializing journal entries from a source.
+// Like the writer, it automatically keeps track of the database index.
+struct JournalReader {


struct -> class

adiholden · 2022-12-14T09:10:13Z

src/server/journal/serializer.h

+
+namespace dfly {
+
+// JournalWriter serializes journal entries to a sink.


So I know know we have an option to write journal to file. Is this flow going to be dropped?
If not and you will use the JournalWriter, it is writing directly to sink without any buffering

In what sense dropped?

Its a bit more complicated. For now we can leave it as it is so the PR doesn't grow

dropped i.e we are going to remove this flow of writing to the journal to file.
If you want to write the buffering code later maybe add TODO

Why? The source is generic enough

For now its not part of the PR. Its a split of the original, its not finished

I dont follow..
I dont see in the original PR buffering in the JournalWriter class
If you remember in the RDB serializer we had some buffering code which I removed cause it was redundant as we use it with StringSink and buffering is not needed
If this class is used with a sink which is a file (not a socket as in the replica flow) you will have multiple writes of 1/2/8 bytes and in this case we will need buffering in the JournalWriter or JournalWriter will write to a StringSink and there should be some wrapping code to write to the journal file.
Anyway I can wait to the next PRs and see how you use JournalWriter with File Sink

There is no journal file yet, I didn't cover this part at all

In my original PR we write to a string sink in the rdb saver

In stable state we write directly to the sink simply because I didn't yet develop a mechanism for doing it efficiently. Once we have it working and have full test coverage, we can start benchmarking it

adiholden · 2022-12-14T09:32:03Z

src/server/journal/serializer.cc

+  return sink_->Write(io::Bytes{buf, sizeof(buf)});
+}
+
+error_code JournalWriter::WriteU64(uint64_t v) {


how about writing something similar to SerializeLen in rdb save?
Than you dont need to assume anything about the values you write . This can prevent bugs in calling the incorrect function and also may reduce the data size

Only for reducing data size.

I want to use SerializeLen from rdb, so I'll create a commons file for them

But in the future we should move all serialization logic into one folder

src/server/journal/serializer.cc

adiholden · 2022-12-14T11:08:54Z

src/server/journal/serializer.cc

+  vec->resize(size);
+  for (auto& span : *vec) {
+    size_t len;
+    SET_OR_RETURN(ReadString(), len);


you can change ReadString to return IoBuf::Bytes and this way you dont need the second loop

I need it, because I can't keep any references to it while its growing

I see, please consider writing the cmd length in the writer so you could use reserve on the buffer at the beginning of the loop

adiholden · 2022-12-14T12:16:18Z

src/server/journal/serializer.h

+struct JournalWriter {
+  // Initialize with sink and optional start database index. If no start index is set,
+  // a SELECT will be issued before the first entry.
+  JournalWriter(io::Sink* sink, std::optional<DbIndex> dbid = std::nullopt);


do you use the dbid in any flow? I didnt see this flow in the bigger PR.
If you do initialize with db id than caller mast make sure to call the select this is a bit confusing

I do. In which case must the caller care about the id? The should absorb all select entries

adiholden · 2022-12-14T12:20:20Z

src/server/journal/types.h

+  // Payload represents a non-owning view into a command executed on the shard.
+  using Payload =
+      std::variant<std::monostate,                        // No payload.
+                   CmdArgList,                            // Parts of a full command.


can you give example when CmdArgList is set , and why only parts of the full command enter the payload
and when the std::pair<std::string_view, ArgSlice> is set
Why do we need this 2 options?

CmdArgList is for single shard commands, the pair for a combination of command name shard args.

Please look at how it's used in my original PR

Ok I believe you can simplify this by using only std::pair<std::string_view, ArgSlice> and for single shard commands the ShardArgsInShard will return all args of the command, the command name is stored in Transaction class cid_ var I think, this way you dont need full_args_ .

DF has a weird split, where multi shard commands use shard args for all values, and single shard commands use it only for determining the shard to run on.

For single shard commands it contains only the key, but not the full data. Changing this internal behavior is possible but requires revisiting transaction code and all the commands

Ok. If this will simplify the code you could still use full_args_ and only std::pair<std::string_view, ArgSlice> init with full_args_.front(), full_args.Span(1) . If you think this makes the flow simpler...

It does, but there is currently no simple way of converting a Span<Span<char>> (aka a CmdArgList) into a Span<string_view> (aka an ArgSlice) without allocating the backing container for storing the string views.

I referenced this issue in the original PR, emphasising that the differently typed spans are not easily interchangeable. But this is a difficult question on its own because it is used all over the place

adiholden · 2022-12-15T09:09:30Z

src/server/journal/types.h

+  DbIndex dbid;
+
+  Payload payload;
+  OwnedPayload owned_payload;


payload is used for writing journal entries and owned_payload used for reading entries. I would consider using 2 struct one for writer and one for reader.

Why? What does it simplify or make more coherent?

We can make it extend a common base and use two subtypes for the varying subfields

It simplifies understanding code and flow. You wrote the code so you understand that on the master and in writer you need to use payload and in replica and reader you need to use owned_payload, but for someone who reads it or want to change or add things to the flow I need to read the whole flow to understand it.
You could add a comment to the struct on this but it will be better to enforce it buy using 2 types and than when I add to the flow I cant access the incorrect filed.
Of course you can use base class.

Signed-off-by: Vladislav Oleshko <[email protected]>

feat(server): New auto-journal types/read/write

eb1cea6

Signed-off-by: Vladislav Oleshko <[email protected]>

dranikpg force-pushed the jf2-base-serde branch from 594655c to eb1cea6 Compare December 13, 2022 09:38

dranikpg requested a review from adiholden December 13, 2022 09:40

dranikpg commented Dec 13, 2022

View reviewed changes

adiholden reviewed Dec 14, 2022

View reviewed changes

src/server/journal/serializer.cc Outdated Show resolved Hide resolved

adiholden reviewed Dec 14, 2022

View reviewed changes

adiholden reviewed Dec 15, 2022

View reviewed changes

feat(server): Small journal entry fixes

1446f98

Signed-off-by: Vladislav Oleshko <[email protected]>

dranikpg force-pushed the jf2-base-serde branch from 4bbaa66 to 1446f98 Compare December 15, 2022 17:01

dranikpg requested a review from adiholden December 15, 2022 17:01

Merge branch 'main' into jf2-base-serde

ccb0005

adiholden approved these changes Dec 15, 2022

View reviewed changes

dranikpg merged commit 08803e6 into dragonflydb:main Dec 15, 2022

dranikpg deleted the jf2-base-serde branch December 25, 2022 12:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server): New auto-journal types/read/write #560

feat(server): New auto-journal types/read/write #560

dranikpg commented Dec 12, 2022

romange commented Dec 13, 2022

dranikpg Dec 13, 2022

dranikpg Dec 13, 2022

dranikpg Dec 13, 2022

dranikpg Dec 13, 2022

adiholden Dec 14, 2022

dranikpg Dec 14, 2022

adiholden Dec 14, 2022

dranikpg Dec 14, 2022

adiholden Dec 14, 2022

adiholden Dec 14, 2022

adiholden Dec 14, 2022

dranikpg Dec 14, 2022

adiholden Dec 15, 2022

dranikpg Dec 15, 2022

adiholden Dec 15, 2022

dranikpg Dec 15, 2022

adiholden Dec 14, 2022

dranikpg Dec 14, 2022

adiholden Dec 14, 2022

dranikpg Dec 14, 2022

adiholden Dec 15, 2022

adiholden Dec 14, 2022

dranikpg Dec 15, 2022

adiholden Dec 14, 2022

dranikpg Dec 14, 2022

adiholden Dec 15, 2022

dranikpg Dec 15, 2022

adiholden Dec 15, 2022

dranikpg Dec 15, 2022

adiholden Dec 15, 2022

dranikpg Dec 15, 2022

adiholden Dec 15, 2022

		if (read < 1) // TODO: Custom errc namespace? Generic opcodes?
		return make_unexpected(std::make_error_code(std::errc::result_out_of_range));


		namespace dfly {

		// JournalWriter serializes journal entries to a sink.

feat(server): New auto-journal types/read/write #560

feat(server): New auto-journal types/read/write #560

Conversation

dranikpg commented Dec 12, 2022

romange commented Dec 13, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment