run server failed: couldn't find local name "tidb-default-pd-0" in the initial cluster configuration #1520

LinuxGit · 2019-04-25T06:01:03Z

Please answer these questions before submitting your issue. Thanks!

What did you do?
If possible, provide a recipe for reproducing the error.
I created a tidb-cluster with three pd nodes via tidb-operator. But tidb-default-pd-0 can't start, the other pd pods are ready.
I removed the /var/lib/pd/snap/db file, the pd pod could start normally.

tidb-operator version:

./tidb-controller-manager -V
TiDB Operator Version: version.Info{TiDBVersion:"2.1.0", GitVersion:"v1.0.0-beta.1-p2-71-g617546b792be61-dirty", GitCommit:"617546b792be61e253eb3cc0152e953069120365", GitTreeState:"dirty", BuildDate:"2019-04-24T03:03:30Z", GoVersion:"go1.12", Compiler:"gc", Platform:"linux/amd64"}

But the issue can't reproduce every time, I met it two times today.

What did you expect to see?
start pd normally
What did you see instead?

 kubectl logs -f tidb-default-pd-0 -n c2e207e4-607d-41c6-b646-cf6cdd091a5d
2019/04/25 03:14:19.911 server.go:136: [info] start embed etcd
2019/04/25 03:14:19.912 log.go:88: [info] embed: [pprof is enabled under /debug/pprof]
2019/04/25 03:14:19.912 systime_mon.go:24: [info] start system time monitor
2019/04/25 03:14:19.917 main.go:101: [fatal] run server failed: couldn't find local name "tidb-default-pd-0" in the initial cluster configuration
github.com/pingcap/pd/server.(*Server).startEtcd
        /home/jenkins/workspace/release_tidb_2.1-ga/go/src/github.com/pingcap/pd/server/server.go:142
github.com/pingcap/pd/server.(*Server).Run
        /home/jenkins/workspace/release_tidb_2.1-ga/go/src/github.com/pingcap/pd/server/server.go:285
main.main
        /home/jenkins/workspace/release_tidb_2.1-ga/go/src/github.com/pingcap/pd/cmd/pd-server/main.go:100
runtime.main
        /usr/local/go/src/runtime/proc.go:200
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1337

What version of PD are you using (pd-server -V)?

Release Version: v2.1.8
Git Commit Hash: 1961ce0
Git Branch: HEAD
UTC Build Time: 2019-04-12 07:46:09

The text was updated successfully, but these errors were encountered:

nolouch · 2019-04-25T12:17:11Z

Seems there is exists old data.

shafreeck · 2019-07-04T16:24:04Z

Is that possible that the name tidb-default-pd-0 has not been created by k8s when the pd-server starts?

weiqiang333 · 2019-08-18T15:29:18Z

Add -name=xxx makes me work

cofyc · 2020-01-20T09:14:35Z

we encountered this issue several times in tidb-operator CI. in a recent failure, we found an entry from pd-ctl member does not have the name field.

    {
      "member_id": 7699799069801548718,
      "peer_urls": [
        "http://basic-v2-pd-3.basic-v2-pd-peer.tidb-cluster-1861.svc:2380"
      ]
    },

full output is here

do you know in which case this might happen?

nolouch · 2020-01-20T10:30:34Z

@cofyc join has two step：

Prepare join: use etcd API, add a member, that will no name and justmember_id
Publish name: if the new server started successfully, the name will be pushed.

The problem is after prepare join, the PD does not start successfully. Does this problem meet in 3.x?
There is a retry fix to #1643 do not pick to release-2.1.

cofyc · 2020-01-20T13:18:44Z

Thanks! We encountered this issue only in 2.x CI job which we run to verify backward compatibility with TiDB 2 (pingcap/tidb-operator#1592). Can this be picked into 2.x?

rleungx · 2020-05-12T03:44:34Z

@cofyc This issue can be closed now?

rleungx · 2021-10-19T08:23:03Z

This issue seems to be stale. I'm going to close it for now.

cofyc · 2021-10-19T08:32:40Z

This issue seems to be stale. I'm going to close it for now.

it can be closed, sorry, I missed the previous message.

LinuxGit assigned nolouch Apr 25, 2019

nolouch added the type/question The issue belongs to a question. label Apr 25, 2019

Yisaer mentioned this issue Jan 20, 2020

PD sometimes fails to scale out pingcap/tidb-operator#1592

Closed

Yisaer mentioned this issue Mar 2, 2020

Encounter PD join error again in version 2 during e2e test pingcap/tidb-operator#1837

Closed

rleungx closed this as completed Oct 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run server failed: couldn't find local name "tidb-default-pd-0" in the initial cluster configuration #1520

run server failed: couldn't find local name "tidb-default-pd-0" in the initial cluster configuration #1520

LinuxGit commented Apr 25, 2019 •

edited

Loading

nolouch commented Apr 25, 2019

shafreeck commented Jul 4, 2019

weiqiang333 commented Aug 18, 2019

cofyc commented Jan 20, 2020 •

edited

Loading

nolouch commented Jan 20, 2020 •

edited

Loading

cofyc commented Jan 20, 2020 •

edited

Loading

rleungx commented May 12, 2020

rleungx commented Oct 19, 2021

cofyc commented Oct 19, 2021

run server failed: couldn't find local name "tidb-default-pd-0" in the initial cluster configuration #1520

run server failed: couldn't find local name "tidb-default-pd-0" in the initial cluster configuration #1520

Comments

LinuxGit commented Apr 25, 2019 • edited Loading

nolouch commented Apr 25, 2019

shafreeck commented Jul 4, 2019

weiqiang333 commented Aug 18, 2019

cofyc commented Jan 20, 2020 • edited Loading

nolouch commented Jan 20, 2020 • edited Loading

cofyc commented Jan 20, 2020 • edited Loading

rleungx commented May 12, 2020

rleungx commented Oct 19, 2021

cofyc commented Oct 19, 2021

LinuxGit commented Apr 25, 2019 •

edited

Loading

cofyc commented Jan 20, 2020 •

edited

Loading

nolouch commented Jan 20, 2020 •

edited

Loading

cofyc commented Jan 20, 2020 •

edited

Loading