Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pump client: increase retry time, and refine some code #158

Merged
merged 31 commits into from
Jan 10, 2019

Conversation

WangXiangUSTC
Copy link
Contributor

@WangXiangUSTC WangXiangUSTC commented Jan 2, 2019

What problem does this PR solve?

the retry time interval is too small, may cause no available pump error when update the tidb cluster.
issue: https://internal.pingcap.net/jira/browse/TOOL-772

What is changed and how it works?

  1. use write binlog timeout config set in tidb
  2. when all pumps write binlog failed, use all the unavailable pump to try again
  3. handle error for watch etcd
  4. refine log and some code

Check List

Tests

  • Unit test

@WangXiangUSTC
Copy link
Contributor Author

/run-all-tests

@zhouqiang-cl
Copy link

/rebuild

@IANTHEREAL
Copy link
Collaborator

@july2993 PTAL

}

// setPumpAvaliable set pump's isAvaliable, and modify UnAvaliablePumps or AvaliablePumps.
func (c *PumpsClient) setPumpAvaliable(pump *PumpStatus, avaliable bool) {
pump.IsAvaliable = avaliable
if pump.IsAvaliable {
err := pump.createGrpcClient(c.Security)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice modification

@WangXiangUSTC
Copy link
Contributor Author

/run-all-tests

@WangXiangUSTC
Copy link
Contributor Author

/run-all-tests

Copy link
Contributor

@july2993 july2993 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you show some of of tidb while all pump is temporary unavailable

pkg/etcd/etcd_test.go Outdated Show resolved Hide resolved
pkg/etcd/etcd_test.go Outdated Show resolved Hide resolved
if err != nil {
return nil, errors.Trace(err)
}

if len(newPumpsClient.Pumps.Pumps) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*.Pumps.Pumps not good name, hard to know what 's it while reading code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's hard to modify the name now, because tools relay on tidb, and tidb relay on tools too, this variable is used in tidb's unit test, if change this name, will build failed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's not right, we can spend time to fix it. I suggest to create an issue and fix it later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will create a issue for it

tidb-binlog/pump_client/client.go Show resolved Hide resolved
tidb-binlog/pump_client/client.go Show resolved Hide resolved
tidb-binlog/pump_client/client.go Outdated Show resolved Hide resolved
c.setPumpAvaliable(pump, false)
pump = c.Selector.Next(binlog, retryTime/5+1)
Logger.Debugf("[pumps client] avaliable pumps: %v, write binlog choose pump %v", c.Pumps.AvaliablePumps, pump)
if time.Since(startTime) > c.BinlogWriteTimeout {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BinlogWriteTimeout will be the config of tidb ?

[binlog]
# enable to write binlog.
enable = false

# WriteTimeout specifies how long it will wait for writing binlog to pump.
write-timeout = "15s"

# If IgnoreError is true, when writting binlog meets error, TiDB would stop writting binlog,
# but still provide service.
ignore-error = false

# use socket file to write binlog, for compatible with kafka version tidb-binlog.
binlog-socket = ""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what't the timeout of write rpc? what if just one pump return slowly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rpc timeout also use this config, how about use

if time.Since(startTime) > 2 * c.BinlogWriteTimeout {

if len(h.Pumps) == 0 {
h.TsMap[binlog.StartTs] = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why set this and in RangeSelector too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if no available pump, and then write prewrite binlog, will return nil pump, and tidb will get error, and then write a rollback binlog. the rollback binlog should also choose nil pump.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why write a rollback binlog if you fail to write prewrite binlog

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tidb write rollback binlog

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's meaningless to write rollback if you did't write p-binlog.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's meaningless to write rollback if you did't write p-binlog and better to handle unexpect case ASAP, add some comment to WriteBinlog never return err != nil for rollback&commit, i can't make sure this at first time.

}

// NewPumpStatus returns a new PumpStatus according to node's status.
func NewPumpStatus(status *node.Status, security *tls.Config) *PumpStatus {
pumpStatus := &PumpStatus{}
pumpStatus.Status = *status
pumpStatus.IsAvaliable = (status.State == node.Online)
pumpStatus.security = security

if status.State != node.Online {
return pumpStatus
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to add ErrNum?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand definition of pump available, is pump unavailable after pump fail to write binlog?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all use pump.IsUsable() now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errNum default is 0

@WangXiangUSTC
Copy link
Contributor Author

@GregoryIan @july2993 PTAL again

p.Lock()
defer p.Unlock()

atomic.StoreInt64(&p.ErrNum, 0)
Copy link
Collaborator

@IANTHEREAL IANTHEREAL Jan 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also put it into create grpc client?

@july2993
Copy link
Contributor

july2993 commented Jan 9, 2019

note, this case may be easy to happen
1, stop all pumps, all pump is update to paused
2, receive WriteBinlog, no pumps available( fall to the so called backoffWriteBinlog case)
3, pump start again, or it have started but you update pumps in Selector before receive WriteBinlog

} else {
err := p.createGrpcClient()
if err != nil {
atomic.AddInt64(&p.ErrNum, 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put it into createGrpcClient

@july2993
Copy link
Contributor

july2993 commented Jan 9, 2019

add some log at unexpect case like in WriteBinlog fall back to backoffWriteBinlog case

@@ -117,6 +114,8 @@ type PumpsClient struct {

// binlog socket file path, for compatible with kafka version pump.
binlogSocket string

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RetryTime of PumpsClient is useless

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't remove now, because tools relay on tidb, and tidb relay on tools, this variable is used in tidb's test, if remove this, will build failed. I will add a comment for this variable.

@IANTHEREAL
Copy link
Collaborator

LGTM

Copy link
Contributor

@july2993 july2993 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@WangXiangUSTC WangXiangUSTC merged commit 15b677a into pingcap:master Jan 10, 2019
@WangXiangUSTC WangXiangUSTC deleted the xiang/refine_pump_client branch January 10, 2019 04:58
WangXiangUSTC added a commit to WangXiangUSTC/tidb-tools that referenced this pull request Mar 25, 2019
IANTHEREAL pushed a commit that referenced this pull request Mar 29, 2019
… strategy config (#223)

* pump client: compatible with kafka version tidb-binlog && add unit test (#139)

* pump client: write commit binlog will never return error (#148)

* pkg watcher: move watcher from tidb-enterprise-tools (#146)

* pump client: increase retry time, and refine some code (#158)

* pump client: add initial log function (#165)

* pump client: support change select pump's strategy (#221)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants