Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: fix bug in unbounded log growth prevention mechanism #10199

Merged
merged 1 commit into from
Oct 22, 2018

Conversation

tbg
Copy link
Contributor

@tbg tbg commented Oct 19, 2018

The previous code was using the proto-generated Size() method to
track the size of an incoming proposal at the leader. This includes
the Index and Term, which were mutated after the call to Size()
when appending to the log. Additionally, it was not taking into
account that an ignored configuration change would ignore the
original proposal and append an empty entry instead.

As a result, a fully committed Raft group could end up with a non-
zero tracked uncommitted Raft log counter that would eventually hit
the ceiling and drop all future proposals indiscriminately. It would
also immediately imply that proposals exceeding the threshold alone
would get refused (as the "first uncommitted proposal" gets special
treatment and is always allowed in).

Track only the size of the payload actually appended to the Raft log
instead.

For context, see:
cockroachdb/cockroach#31618 (comment)

@gyuho-bot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approvers:

If they are not already assigned, you can assign the PR to them by writing /assign in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gyuho
Copy link
Contributor

gyuho commented Oct 19, 2018

Please ignore above comments, while we are experimenting Prow integration.

/ok-to-test

@gyuho
Copy link
Contributor

gyuho commented Oct 19, 2018

Looks good to me. Defer to @xiang90 and @bdarnell.

@nvanbenschoten
Copy link
Contributor

Looks good to me as well, although it' kind of strange that the size limit imposed by raftLog.maxSize and Storage.Entries use the proto's size, while this size limit uses the entry.Data size.

An alternative you could explore is pushing this uncommitted size limit into appendEntry, after Term and Index are assigned, and use the proto size. This is what I did at first in #10167, but changed my approach because I didn't want the empty proposal from becomeLeader to be counted towards the limit in case the very first real proposal was above the uncommitted size limit. In hindsight, that would have avoided the issue we saw in cockroachdb/cockroach#31618.

@codecov-io
Copy link

codecov-io commented Oct 19, 2018

Codecov Report

Merging #10199 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10199      +/-   ##
==========================================
+ Coverage   71.85%   71.87%   +0.01%     
==========================================
  Files         390      390              
  Lines       36346    36348       +2     
==========================================
+ Hits        26116    26124       +8     
+ Misses       8424     8417       -7     
- Partials     1806     1807       +1
Impacted Files Coverage Δ
raft/util.go 56.52% <100%> (+1.29%) ⬆️
raft/raft.go 92.29% <100%> (ø) ⬆️
proxy/grpcproxy/register.go 80.55% <0%> (-13.89%) ⬇️
pkg/adt/interval_tree.go 86.48% <0%> (-6.91%) ⬇️
pkg/tlsutil/tlsutil.go 86.2% <0%> (-6.9%) ⬇️
etcdctl/ctlv3/command/printer_simple.go 72.48% <0%> (-1.35%) ⬇️
etcdserver/api/rafthttp/peer.go 78.77% <0%> (-1.12%) ⬇️
etcdserver/cluster_util.go 58.74% <0%> (-0.9%) ⬇️
etcdserver/api/rafthttp/stream.go 79.39% <0%> (-0.86%) ⬇️
etcdserver/api/membership/cluster.go 77.44% <0%> (-0.46%) ⬇️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8c80efb...ea5ff8a. Read the comment docs.

@tbg tbg force-pushed the fix-max-uncommitted-size branch from ea5ff8a to 3c480ce Compare October 22, 2018 08:28
@tbg tbg requested a review from bdarnell October 22, 2018 08:32
@tbg
Copy link
Contributor Author

tbg commented Oct 22, 2018

I pushed the computation down into appendEntries and had changed it to use .Size(), but I actually ended up reintroducing PayloadSize. It's pretty awkward to have to guess all the assigned terms and indexes during the tests, to the point where the tests become a mess (or you have to introduce additional tools).

The previous code was using the proto-generated `Size()` method to
track the size of an incoming proposal at the leader. This includes
the Index and Term, which were mutated after the call to `Size()`
when appending to the log. Additionally, it was not taking into
account that an ignored configuration change would ignore the
original proposal and append an empty entry instead.

As a result, a fully committed Raft group could end up with a non-
zero tracked uncommitted Raft log counter that would eventually hit
the ceiling and drop all future proposals indiscriminately. It would
also immediately imply that proposals exceeding the threshold alone
would get refused (as the "first uncommitted proposal" gets special
treatment and is always allowed in).

Track only the size of the payload actually appended to the Raft log
instead.

For context, see:
cockroachdb/cockroach#31618 (comment)
@tbg
Copy link
Contributor Author

tbg commented Oct 22, 2018

Various tests failed, but I'm not sure that's really caused by this PR. @gyuho what do you think?

@tbg
Copy link
Contributor Author

tbg commented Oct 22, 2018

(I'm going to re-run just to get an idea of whether they're flakes or not)

@tbg
Copy link
Contributor Author

tbg commented Oct 22, 2018

@gyuho I re-triggered the build but it's still red. master looks pretty red as well. Could you help me push this over the finish line?

@gyuho
Copy link
Contributor

gyuho commented Oct 22, 2018

@tschottdorf

https://status.github.com/messages

11:26 PDT
We have resumed webhook delivery and will continue to monitor as we process the backlog of events.

Seems like github is just back. Will retrigger manually.

@tbg
Copy link
Contributor Author

tbg commented Oct 22, 2018

@gyuho I re-triggered the build but it's still red

I actually meant that it failed again :-) I triggered directly on travis-ci. Let's see what the next run brings.

Copy link
Contributor

@gyuho gyuho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. etcd is experiencing some flaky tests not related to this change. Should be good to merge. /cc @xiang90

@tbg
Copy link
Contributor Author

tbg commented Oct 22, 2018

Thanks @gyuho, appreciate the heads up.

@tbg tbg merged commit b42b394 into etcd-io:master Oct 22, 2018
@tbg tbg deleted the fix-max-uncommitted-size branch October 22, 2018 19:30
tbg added a commit to cockroachdb/vendored that referenced this pull request Oct 22, 2018
craig bot pushed a commit to cockroachdb/cockroach that referenced this pull request Oct 22, 2018
31554: exec: initial commit of execgen tool r=solongordon a=solongordon

Execgen will be our tool for generating templated code necessary for
columnarized execution. So far it only generates the
EncDatumRowsToColVec function, which is used by the columnarizer to
convert a RowSource into a columnarized Operator.

Release note: None

31610: sql: fix pg_catalog.pg_constraint's confkey column r=BramGruneir a=BramGruneir

Prior to this patch, all columns in the index were included instead of only the
ones being used in the foreign key reference.

Fixes #31545.

Release note (bug fix): Fix pg_catalog.pg_constraint's confkey column from
including columns that were not involved in the foreign key reference.

31689: storage: pick up fix for Raft uncommitted entry size tracking r=benesch a=tschottdorf

Waiting for the upstream PR

etcd-io/etcd#10199

to merge, but this is going to be what the result will look like.

----

The tracking of the uncommitted portion of the log had a bug where
it wasn't releasing everything as it should've. As a result, over
time, all proposals would be dropped. We're hitting this way earlier
in our import tests, which propose large proposals. As an intentional
implementation detail, a proposal that itself exceeds the max
uncommitted log size is allowed only if the uncommitted log is empty.
Due to the leak, we weren't ever hitting this case and so AddSSTable
commands were often dropped indefinitely.

Fixes #31184.
Fixes #28693.
Fixes #31642.

Optimistically:
Fixes #31675.
Fixes #31654.
Fixes #31446.

Release note: None

Co-authored-by: Solon Gordon <[email protected]>
Co-authored-by: Bram Gruneir <[email protected]>
Co-authored-by: Tobias Schottdorf <[email protected]>
tbg added a commit to cockroachdb/vendored that referenced this pull request Dec 18, 2018
tbg added a commit to tbg/cockroach that referenced this pull request Dec 18, 2018
Picks up the bug fix etcd-io/etcd#10199.
It also picks up these further small fixes (which is intentional,
I authored most of them):

5c209d6 raft: ensure leader is in ProgressStateReplicate
1569f48 raft: print RejectHint of zero on MsgAppResp
9668536 raft: add a test case in TestStorageAppend
e4af2be raft: separate MaxCommittedSizePerReady config from MaxSizePerMsg
aa4313a *: fix github links
4aa72ca raft: Explain ReportSnapshot and Propose behavior
b7ed416 raft: fix godoc in tests
10255cf raft: Fix comment on TestLeaderBcastBeat
c561f83 OWNERS: experiment
ad49c8f raft: fix bug in unbounded log growth prevention mechanism
de47099 raft: fix description in UT
73c20cc raft: Fix comment on sendHeartbeat
7be7ac5 raft: Fix spelling in doc.go

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants