Journal record format #14

deepthidevaki · 2021-03-29T11:01:32Z

This ZEP describes the new journal record format.

npepinpe

Do we want to document just each record, or the journal in general? Or do we do a second ZEP for the journal? Eventually it would be nice to have something describing the journal as a whole.

ZEP-XXX-journal-record-format.md

npepinpe · 2021-04-05T16:24:19Z

ZEP-XXX-journal-record-format.md

+## Journal layout
+The journal contains one or more segments. Each segment is a file that contains a sequence of records ordered by index.
+
+
+TODO: JournalSegmentDescriptor - first 64 bytes of a segment
+


I'm confused, the title and description state this is about the journal record format, but here we talk about the journal as a whole. What's the focus on the document?

I would prefer it to document journal record format. But since we discussed about documenting the whole journal, I added a small section here for the descriptor. If we are planning to have another document, then I would say only record format here.

npepinpe · 2021-04-05T16:29:46Z

ZEP-XXX-journal-record-format.md

+0 : End of File  (EOF)
+1 : Version 1
+
+If the value of the frame indicates EOF, then the following bytes do not contain a valid journal record.


I'm a little uneasy about having only a single byte. So 0 means there is nothing after, and anything else (1-255) is the version. Is this really resilient? I can't think of any strong arguments against it, however, so take it with a grain of salt.

Is this really resilient?

Why not?

ZEP-XXX-journal-record-format.md

npepinpe · 2021-04-05T16:30:22Z

ZEP-XXX-journal-record-format.md

+
+```
+ --------------------------------
+| checksum<int64> | length<int32> |


Isn't length unsigned?

unsigned int is interpreted as long in the generated code. Then we have to cast to int always because everywhere length is expected to be int. Besides, there is not much advantage making it unsigned. 2^31 bytes is already too big for a record.

ZEP-XXX-journal-record-format.md

npepinpe · 2021-04-06T10:49:46Z

ZEP-XXX-journal-record-format.md

+Currently the transport layer uses Kryo for serialization.
+So even if we replicate the journal record as it is, we are not getting much benefits from it.
+We would still need to copy the record before it is send.
+However, if we can make use of operating systems' feature to transfer bytes directly from the file to the network it would make sense to transfer the entire journal record as it is.


I don't think we will get this any time soon (even with gRPC), but I think it's still laudable to reduce the number of copies when possible. So with gRPC we will be able to cut it down copying to 2 - once from file to the netty send buffer (which is direct memory, so still happening out-of-JVM), then once to the socket. Which in the end is still preferable to the current 4 - once to serializedRaftRecord, once to Kryo, once to Netty send buffer, once to socket.

You can read more about the issue with FileRegion and gRPC here: grpc/grpc-java#1054

We can revisit it when we move to grpc. It is a breaking change anyway when we change the protocol, so we can also change this format.

npepinpe · 2021-04-06T10:51:33Z

ZEP-XXX-journal-record-format.md

+The raft thread that receives this record uses the term and index to verify the preconditions before writing it to the journal. It should then construct a journal record using `index`, `asqn`, `checksum` and `serializedRaftRecord` which can be appended to the journal. Journal is expected to verify if the index and checksum is correct.
+
+# Rationale and alternatives
+[rationale-and-alternatives]: #rationale-and-alternatives


We might want to add a section about alternatives in terms of serialization format - why zero copy? Why SBE and not Flatbuffers or Cap'n'Proto?

I can do that as I already had a look at these.

Thanks. That will be good.

ZEP-XXX-journal-record-format.md

Co-authored-by: Nicolas Pepin-Perreault <[email protected]>

deepthidevaki · 2021-05-28T12:26:46Z

@miguelpires @npepinpe Please check. We can merge it imo.

npepinpe

LGTM

miguelpires

Pointed out some minor typos but nothing blocking.

ZEP-XXX-journal-record-format.md

Co-authored-by: Miguel Pires <[email protected]>

deepthidevaki added 3 commits March 4, 2021 11:53

wip: add record schemas

4699146

wip: update raft schema

012875e

update record format

ec511d0

deepthidevaki requested review from npepinpe and miguelpires March 29, 2021 11:01

npepinpe reviewed Apr 6, 2021

View reviewed changes

deepthidevaki and others added 5 commits April 6, 2021 14:51

Update ZEP-XXX-journal-record-format.md

7b27e41

Co-authored-by: Nicolas Pepin-Perreault <[email protected]>

Update ZEP-XXX-journal-record-format.md

c0ade4a

Co-authored-by: Nicolas Pepin-Perreault <[email protected]>

Update ZEP-XXX-journal-record-format.md

d5a96de

Co-authored-by: Nicolas Pepin-Perreault <[email protected]>

docs: add a short blurb about the format choice

949d96f

minor improvements

0a106d1

npepinpe approved these changes May 28, 2021

View reviewed changes

miguelpires approved these changes Jun 1, 2021

View reviewed changes

deepthidevaki and others added 6 commits June 2, 2021 08:14

Update ZEP-XXX-journal-record-format.md

bf6ed1e

Co-authored-by: Miguel Pires <[email protected]>

Update ZEP-XXX-journal-record-format.md

5db432b

Co-authored-by: Miguel Pires <[email protected]>

Update ZEP-XXX-journal-record-format.md

f3f2fcd

Co-authored-by: Miguel Pires <[email protected]>

Update ZEP-XXX-journal-record-format.md

9e822b0

Co-authored-by: Miguel Pires <[email protected]>

Update ZEP-XXX-journal-record-format.md

aa8d2ff

Co-authored-by: Miguel Pires <[email protected]>

Update ZEP-XXX-journal-record-format.md

3f8f170

Co-authored-by: Miguel Pires <[email protected]>

deepthidevaki marked this pull request as ready for review June 2, 2021 06:17

rename file

5015ff4

deepthidevaki force-pushed the dd-journal-record-format branch from edc5041 to 5015ff4 Compare June 2, 2021 06:23

deepthidevaki merged commit 7a5aeb8 into master Jun 2, 2021

deepthidevaki deleted the dd-journal-record-format branch June 2, 2021 06:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Journal record format #14

Journal record format #14

deepthidevaki commented Mar 29, 2021

npepinpe left a comment

npepinpe Apr 5, 2021

deepthidevaki Apr 6, 2021

npepinpe Apr 5, 2021

deepthidevaki Apr 6, 2021 •

edited

Loading

npepinpe Apr 5, 2021

deepthidevaki Apr 6, 2021

npepinpe Apr 6, 2021

npepinpe Apr 6, 2021

deepthidevaki Apr 6, 2021

npepinpe Apr 6, 2021

deepthidevaki Apr 6, 2021

deepthidevaki commented May 28, 2021

npepinpe left a comment

miguelpires left a comment

Journal record format #14

Journal record format #14

Conversation

deepthidevaki commented Mar 29, 2021

npepinpe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deepthidevaki Apr 6, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deepthidevaki commented May 28, 2021

npepinpe left a comment

Choose a reason for hiding this comment

miguelpires left a comment

Choose a reason for hiding this comment

deepthidevaki Apr 6, 2021 •

edited

Loading