Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run server failed: couldn't find local name "tidb-default-pd-0" in the initial cluster configuration #1520

Closed
LinuxGit opened this issue Apr 25, 2019 · 9 comments
Assignees
Labels
type/question The issue belongs to a question.

Comments

@LinuxGit
Copy link

LinuxGit commented Apr 25, 2019

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
    If possible, provide a recipe for reproducing the error.
    I created a tidb-cluster with three pd nodes via tidb-operator. But tidb-default-pd-0 can't start, the other pd pods are ready.
    I removed the /var/lib/pd/snap/db file, the pd pod could start normally.

tidb-operator version:

./tidb-controller-manager -V
TiDB Operator Version: version.Info{TiDBVersion:"2.1.0", GitVersion:"v1.0.0-beta.1-p2-71-g617546b792be61-dirty", GitCommit:"617546b792be61e253eb3cc0152e953069120365", GitTreeState:"dirty", BuildDate:"2019-04-24T03:03:30Z", GoVersion:"go1.12", Compiler:"gc", Platform:"linux/amd64"}

But the issue can't reproduce every time, I met it two times today.

  1. What did you expect to see?
    start pd normally

  2. What did you see instead?

 kubectl logs -f tidb-default-pd-0 -n c2e207e4-607d-41c6-b646-cf6cdd091a5d
2019/04/25 03:14:19.911 server.go:136: [info] start embed etcd
2019/04/25 03:14:19.912 log.go:88: [info] embed: [pprof is enabled under /debug/pprof]
2019/04/25 03:14:19.912 systime_mon.go:24: [info] start system time monitor
2019/04/25 03:14:19.917 main.go:101: [fatal] run server failed: couldn't find local name "tidb-default-pd-0" in the initial cluster configuration
github.com/pingcap/pd/server.(*Server).startEtcd
        /home/jenkins/workspace/release_tidb_2.1-ga/go/src/github.com/pingcap/pd/server/server.go:142
github.com/pingcap/pd/server.(*Server).Run
        /home/jenkins/workspace/release_tidb_2.1-ga/go/src/github.com/pingcap/pd/server/server.go:285
main.main
        /home/jenkins/workspace/release_tidb_2.1-ga/go/src/github.com/pingcap/pd/cmd/pd-server/main.go:100
runtime.main
        /usr/local/go/src/runtime/proc.go:200
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1337
  1. What version of PD are you using (pd-server -V)?

Release Version: v2.1.8
Git Commit Hash: 1961ce0
Git Branch: HEAD
UTC Build Time: 2019-04-12 07:46:09

@nolouch
Copy link
Contributor

nolouch commented Apr 25, 2019

Seems there is exists old data.

@nolouch nolouch added the type/question The issue belongs to a question. label Apr 25, 2019
@shafreeck
Copy link
Contributor

Is that possible that the name tidb-default-pd-0 has not been created by k8s when the pd-server starts?

@weiqiang333
Copy link

Add -name=xxx makes me work

@cofyc
Copy link

cofyc commented Jan 20, 2020

we encountered this issue several times in tidb-operator CI. in a recent failure, we found an entry from pd-ctl member does not have the name field.

    {
      "member_id": 7699799069801548718,
      "peer_urls": [
        "http://basic-v2-pd-3.basic-v2-pd-peer.tidb-cluster-1861.svc:2380"
      ]
    },

full output is here

do you know in which case this might happen?

@nolouch
Copy link
Contributor

nolouch commented Jan 20, 2020

@cofyc join has two step:

  1. Prepare join: use etcd API, add a member, that will no name and justmember_id
  2. Publish name: if the new server started successfully, the name will be pushed.

The problem is after prepare join, the PD does not start successfully. Does this problem meet in 3.x?
There is a retry fix to #1643 do not pick to release-2.1.

@cofyc
Copy link

cofyc commented Jan 20, 2020

Thanks! We encountered this issue only in 2.x CI job which we run to verify backward compatibility with TiDB 2 (pingcap/tidb-operator#1592). Can this be picked into 2.x?

@rleungx
Copy link
Member

rleungx commented May 12, 2020

@cofyc This issue can be closed now?

@rleungx
Copy link
Member

rleungx commented Oct 19, 2021

This issue seems to be stale. I'm going to close it for now.

@rleungx rleungx closed this as completed Oct 19, 2021
@cofyc
Copy link

cofyc commented Oct 19, 2021

This issue seems to be stale. I'm going to close it for now.

it can be closed, sorry, I missed the previous message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/question The issue belongs to a question.
Projects
None yet
Development

No branches or pull requests

6 participants