Skip to content
This repository has been archived by the owner on Nov 28, 2024. It is now read-only.

Journal record format #14

Merged
merged 15 commits into from
Jun 2, 2021
Merged

Journal record format #14

merged 15 commits into from
Jun 2, 2021

Conversation

deepthidevaki
Copy link
Member

This ZEP describes the new journal record format.

Copy link
Member

@npepinpe npepinpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to document just each record, or the journal in general? Or do we do a second ZEP for the journal? Eventually it would be nice to have something describing the journal as a whole.

ZEP-XXX-journal-record-format.md Outdated Show resolved Hide resolved
Comment on lines 32 to 37
## Journal layout
The journal contains one or more segments. Each segment is a file that contains a sequence of records ordered by index.


TODO: JournalSegmentDescriptor - first 64 bytes of a segment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, the title and description state this is about the journal record format, but here we talk about the journal as a whole. What's the focus on the document?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer it to document journal record format. But since we discussed about documenting the whole journal, I added a small section here for the descriptor. If we are planning to have another document, then I would say only record format here.

0 : End of File (EOF)
1 : Version 1

If the value of the frame indicates EOF, then the following bytes do not contain a valid journal record.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little uneasy about having only a single byte. So 0 means there is nothing after, and anything else (1-255) is the version. Is this really resilient? I can't think of any strong arguments against it, however, so take it with a grain of salt.

Copy link
Member Author

@deepthidevaki deepthidevaki Apr 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really resilient?

Why not?

ZEP-XXX-journal-record-format.md Outdated Show resolved Hide resolved

```
+ --------------------------------
| checksum<int64> | length<int32> |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't length unsigned?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsigned int is interpreted as long in the generated code. Then we have to cast to int always because everywhere length is expected to be int. Besides, there is not much advantage making it unsigned. 2^31 bytes is already too big for a record.

ZEP-XXX-journal-record-format.md Outdated Show resolved Hide resolved
Currently the transport layer uses Kryo for serialization.
So even if we replicate the journal record as it is, we are not getting much benefits from it.
We would still need to copy the record before it is send.
However, if we can make use of operating systems' feature to transfer bytes directly from the file to the network it would make sense to transfer the entire journal record as it is.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we will get this any time soon (even with gRPC), but I think it's still laudable to reduce the number of copies when possible. So with gRPC we will be able to cut it down copying to 2 - once from file to the netty send buffer (which is direct memory, so still happening out-of-JVM), then once to the socket. Which in the end is still preferable to the current 4 - once to serializedRaftRecord, once to Kryo, once to Netty send buffer, once to socket.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can read more about the issue with FileRegion and gRPC here: grpc/grpc-java#1054

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can revisit it when we move to grpc. It is a breaking change anyway when we change the protocol, so we can also change this format.

The raft thread that receives this record uses the term and index to verify the preconditions before writing it to the journal. It should then construct a journal record using `index`, `asqn`, `checksum` and `serializedRaftRecord` which can be appended to the journal. Journal is expected to verify if the index and checksum is correct.

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to add a section about alternatives in terms of serialization format - why zero copy? Why SBE and not Flatbuffers or Cap'n'Proto?

I can do that as I already had a look at these.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. That will be good.

ZEP-XXX-journal-record-format.md Outdated Show resolved Hide resolved
ZEP-XXX-journal-record-format.md Outdated Show resolved Hide resolved
@deepthidevaki
Copy link
Member Author

@miguelpires @npepinpe Please check. We can merge it imo.

Copy link
Member

@npepinpe npepinpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@miguelpires miguelpires left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pointed out some minor typos but nothing blocking.

ZEP-XXX-journal-record-format.md Outdated Show resolved Hide resolved
ZEP-XXX-journal-record-format.md Outdated Show resolved Hide resolved
ZEP-XXX-journal-record-format.md Outdated Show resolved Hide resolved
ZEP-XXX-journal-record-format.md Outdated Show resolved Hide resolved
ZEP-XXX-journal-record-format.md Outdated Show resolved Hide resolved
ZEP-XXX-journal-record-format.md Outdated Show resolved Hide resolved
@deepthidevaki deepthidevaki marked this pull request as ready for review June 2, 2021 06:17
@deepthidevaki deepthidevaki force-pushed the dd-journal-record-format branch from edc5041 to 5015ff4 Compare June 2, 2021 06:23
@deepthidevaki deepthidevaki merged commit 7a5aeb8 into master Jun 2, 2021
@deepthidevaki deepthidevaki deleted the dd-journal-record-format branch June 2, 2021 06:24
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants