-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pump client: increase retry time, and refine some code #158
pump client: increase retry time, and refine some code #158
Conversation
/run-all-tests |
/rebuild |
@july2993 PTAL |
} | ||
|
||
// setPumpAvaliable set pump's isAvaliable, and modify UnAvaliablePumps or AvaliablePumps. | ||
func (c *PumpsClient) setPumpAvaliable(pump *PumpStatus, avaliable bool) { | ||
pump.IsAvaliable = avaliable | ||
if pump.IsAvaliable { | ||
err := pump.createGrpcClient(c.Security) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice modification
/run-all-tests |
/run-all-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you show some of of tidb while all pump is temporary unavailable
if err != nil { | ||
return nil, errors.Trace(err) | ||
} | ||
|
||
if len(newPumpsClient.Pumps.Pumps) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*.Pumps.Pumps
not good name, hard to know what 's it while reading code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's hard to modify the name now, because tools relay on tidb, and tidb relay on tools too, this variable is used in tidb's unit test, if change this name, will build failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it's not right, we can spend time to fix it. I suggest to create an issue and fix it later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I will create a issue for it
tidb-binlog/pump_client/client.go
Outdated
c.setPumpAvaliable(pump, false) | ||
pump = c.Selector.Next(binlog, retryTime/5+1) | ||
Logger.Debugf("[pumps client] avaliable pumps: %v, write binlog choose pump %v", c.Pumps.AvaliablePumps, pump) | ||
if time.Since(startTime) > c.BinlogWriteTimeout { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BinlogWriteTimeout will be the config of tidb ?
[binlog]
# enable to write binlog.
enable = false
# WriteTimeout specifies how long it will wait for writing binlog to pump.
write-timeout = "15s"
# If IgnoreError is true, when writting binlog meets error, TiDB would stop writting binlog,
# but still provide service.
ignore-error = false
# use socket file to write binlog, for compatible with kafka version tidb-binlog.
binlog-socket = ""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what't the timeout of write rpc? what if just one pump return slowly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rpc timeout also use this config, how about use
if time.Since(startTime) > 2 * c.BinlogWriteTimeout {
tidb-binlog/pump_client/selector.go
Outdated
if len(h.Pumps) == 0 { | ||
h.TsMap[binlog.StartTs] = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why set this and in RangeSelector too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if no available pump, and then write prewrite binlog, will return nil pump, and tidb will get error, and then write a rollback binlog. the rollback binlog should also choose nil pump.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why write a rollback binlog if you fail to write prewrite binlog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tidb write rollback binlog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's meaningless to write rollback if you did't write p-binlog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's meaningless to write rollback if you did't write p-binlog and better to handle unexpect case ASAP, add some comment to WriteBinlog never return err != nil for rollback&commit, i can't make sure this at first time.
tidb-binlog/pump_client/pump.go
Outdated
} | ||
|
||
// NewPumpStatus returns a new PumpStatus according to node's status. | ||
func NewPumpStatus(status *node.Status, security *tls.Config) *PumpStatus { | ||
pumpStatus := &PumpStatus{} | ||
pumpStatus.Status = *status | ||
pumpStatus.IsAvaliable = (status.State == node.Online) | ||
pumpStatus.security = security | ||
|
||
if status.State != node.Online { | ||
return pumpStatus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to add ErrNum
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand definition of pump available
, is pump unavailable after pump fail to write binlog?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all use pump.IsUsable() now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
errNum default is 0
@GregoryIan @july2993 PTAL again |
tidb-binlog/pump_client/pump.go
Outdated
p.Lock() | ||
defer p.Unlock() | ||
|
||
atomic.StoreInt64(&p.ErrNum, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also put it into create grpc client
?
note, this case may be easy to happen |
tidb-binlog/pump_client/pump.go
Outdated
} else { | ||
err := p.createGrpcClient() | ||
if err != nil { | ||
atomic.AddInt64(&p.ErrNum, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put it into createGrpcClient
add some log at unexpect case like in WriteBinlog fall back to backoffWriteBinlog case |
@@ -117,6 +114,8 @@ type PumpsClient struct { | |||
|
|||
// binlog socket file path, for compatible with kafka version pump. | |||
binlogSocket string | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RetryTime
of PumpsClient
is useless
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't remove now, because tools relay on tidb, and tidb relay on tools, this variable is used in tidb's test, if remove this, will build failed. I will add a comment for this variable.
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
… strategy config (#223) * pump client: compatible with kafka version tidb-binlog && add unit test (#139) * pump client: write commit binlog will never return error (#148) * pkg watcher: move watcher from tidb-enterprise-tools (#146) * pump client: increase retry time, and refine some code (#158) * pump client: add initial log function (#165) * pump client: support change select pump's strategy (#221)
What problem does this PR solve?
the retry time interval is too small, may cause
no available pump
error when update the tidb cluster.issue: https://internal.pingcap.net/jira/browse/TOOL-772
What is changed and how it works?
Check List
Tests