-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(replication): Use a ring buffer with messages to serve replication. #1835
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my comments, mostly questions because of my lack of context
src/server/journal/journal_slice.cc
Outdated
#include "base/logging.h" | ||
#include "server/journal/serializer.h" | ||
|
||
ABSL_FLAG(int, shard_changes_log_size, 1 << 16, "The size of the circular changes log per shard"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add the work "replication" in the flag name, or at least in the description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
happy to bike shed here :)
For reference, the equivalent redis configuration is called repl-backlog-size
and it's measured in bytes (not entries)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think of it as bike-shedding, it's actually something the users might look for. So ctrl-f in the output of --helpfull
or whatever can be useful to contain that word..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to shard_repl_backlog_len
5d8599d
to
b688080
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few cleaning comments
A thought I had. Instead of using RingBufferstd::string we can use two rotating byte buffers (like the streamer does) to control offset not only in entries, but also in bytes (for example a max lag of 5mb is allowed). Also, we don't need separate allocations for all entries (which are likely fairly cheap either way). The downside that memory usage is twice higher and you drop half of the entries when filling up (or you keep track with an external vector which makes it difficult) |
Why did we punt on the design review? I am curious to know if we still maintain back-pressure when replicating via ring buffer. |
item->opcode = entry.opcode; | ||
item->data = ""; | ||
} else { | ||
item = ring_buffer_->GetTail(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please add comment saying, it provide a pointer to next tail, overriding the head if buffer is full.
- I would like to understand how you envision product behavior in this case. Are you saying, there is no backpressure whatsoever, and if this item has not been consumed by a replica, then cest la vie?
- How replica will know that its stable sync channel was overridden and its missing some entries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I wrote below - currently no change, backpressure is applied through the callbacks so we can't override an entry that hasn't been sent.
item.txid = entry.txid; | ||
VLOG(1) << "Writing item [" << item.lsn << "]: " << entry.ToString(); | ||
ring_buffer_->EmplaceOrOverride(move(item)); | ||
io::BufSink buf_sink{&ring_serialize_buf_}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using StringSink is better here.
you serialize into a string and then just move into item->data
saving a copy at line 155.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping a big backing buffer will prevent grow allocations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can also just serialize directly into item->data
without moving, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, StringSink owns its string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So - if I always move out of a StringBuf I always have to allocate a new string. Where as currently I'm reusing the strings from the ring buffer. I think less allocations is better than less copies in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? item
is a pointer to an object on the ring buffer and I'm calling std::string::operator=(std::string_view)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its currently allocating a single string per entry - but exactly once. A moved-from string sink can cause multiple allocations because the string might need to grow after multiple write calls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand where any allocation happens here (except for the first 16k entries)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not remember we reuse items in the ring buffer. It's worth adding a comment because it can bite us later by hiding allocated memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I thought about this again. I agree that memory usage can be an issue. I like @dranikpg's suggestion about two buffers (one for entries, one for bytes) but it's a bit more tricky to handle.
So I suggest for now to lower the default size (which is currently 64k, I didn't remember correctly) to 1k and later we can adjust this better and add visibility into the memory consumption.
I left it as an open question. We still have backpressure and I don't plan to change that right now, but it would be pretty easy to do so. |
I agree. I think it's a pretty simple change but I'd rather do it in another PR if we decide we want to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I don't have any further comments and agree that we can try out different ring buffer variations later
This is infrastructure changes needed for partial sync, but no actual replication protocol changes are implemented here.
The journal ring buffer is enabled, some functionality is introduced, and the existing replication infrastructure is migrated to use the ring buffer instead of serializing in-place.