add index status was always running, tidb panic and can not start when tidb rolling restart #47902
Labels
affects-7.5
This bug affects the 7.5.x(LTS) versions.
component/ddl
This issue is related to DDL of TiDB.
severity/critical
type/bug
The issue is confirmed as a bug.
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
tidb_enable_dist_task='on'
1、run sysbench
2、add index
3、tidb rolling restart
2. What did you expect to see? (Required)
add index can success
3. What did you see instead (Required)
1、add index status was always running
"the status of ddl job is not synced after 1h0m0s (now: 2023-10-19 17:35:56, jobId: 561, job type: add index /* ingest /, state: running)
operatorLogs:
[2023-10-19 16:35:48] ###### start adding index
alter table sbtest1 add index index_test_1697704548745 (c)
[2023-10-19 16:35:48] ###### wait for ddl job finish
[2023-10-19 17:35:56] ###### wait for ddl job finish timeout(1h0m0s)
select job_id, job_type, state from information_schema.ddl_jobs where query = 'alter table sbtest1 add index index_test_1697704548745 (c)'
jobId: 561, job type: add index / ingest */, state: running"
tidb logs:
[2023/10/19 08:39:57.482 +00:00] [INFO] [dispatcher.go:222] ["schedule task meet err, reschedule it"] [task-id=180041] [task-type=backfill] [error="no available TiDB node to dispatch subtasks"]
[2023/10/19 08:39:57.963 +00:00] [INFO] [dispatcher.go:503] [onNextStage] [task-id=180041] [task-type=backfill] [current-step=1] [next-step=3]
2、tidb panic
2023-10-19 16:36:49 log="2023-10-19T16:36:49.461997791+08:00 stderr F \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:72 +0x96"
2023-10-19 16:36:49 log="2023-10-19T16:36:49.461995984+08:00 stderr F created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 720992478"
2023-10-19 16:36:49 log="2023-10-19T16:36:49.461994155+08:00 stderr F \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75 +0x56"
2023-10-19 16:36:49 log="2023-10-19T16:36:49.461992056+08:00 stderr F golang.org/x/sync/errgroup.(*Group).Go.func1()"
2023-10-19 16:36:49 log="2023-10-19T16:36:49.461985137+08:00 stderr F \t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/ddl/index.go:2127 +0xf1"
2023-10-19 16:36:49 log="2023-10-19T16:36:49.461982488+08:00 stderr F github.com/pingcap/tidb/pkg/ddl.(*worker).executeDistGlobalTask.func2()"
2023-10-19 16:36:49 log="2023-10-19T16:36:49.461979599+08:00 stderr F \t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/ddl/reorg.go:401"
2023-10-19 16:36:49 log="2023-10-19T16:36:49.4619776+08:00 stderr F github.com/pingcap/tidb/pkg/ddl.(*ddlCtx).isReorgPaused(...)"
2023-10-19 16:36:49 log="2023-10-19T16:36:49.461975479+08:00 stderr F \t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/ddl/reorg.go:104"
2023-10-19 16:36:49 log="2023-10-19T16:36:49.461973037+08:00 stderr F github.com/pingcap/tidb/pkg/ddl.(*reorgCtx).isReorgPaused(...)"
2023-10-19 16:36:49 log="2023-10-19T16:36:49.46197055+08:00 stderr F goroutine 720992623 [running]:"
2023-10-19 16:36:49 log="2023-10-19T16:36:49.461967768+08:00 stderr F "
2023-10-19 16:36:49 log="2023-10-19T16:36:49.461964954+08:00 stderr F [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x3e358f1]"
2023-10-19 16:36:49 log="2023-10-19T16:36:49.461936693+08:00 stderr F panic: runtime error: invalid memory address or nil pointer dereference"
2023-10-19 16:36:49 log="[ddl_workerpool.go:83] ["closing workerPool"] [category=ddl]"
2023-10-19 16:36:49 log="[manager.go:311] ["revoke session"] ["owner info"="[ddl] /tidb/ddl/fg/owner ownerManager 7967c229-7b89-45de-a781-0afaf6f1d3c7"] []"
2023-10-19 16:36:49 log="[manager.go:267] ["break campaign loop, context is done"] ["owner info"="[ddl] /tidb/ddl/fg/owner ownerManager 7967c229-7b89-45de-a781-0afaf6f1d3c7"]"
2023-10-19 16:36:49 log="[manager.go:300] ["is not the owner"] ["owner info"="[ddl] /tidb/ddl/fg/owner ownerManager 7967c229-7b89-45de-a781-0afaf6f1d3c7"]"
2023-10-19 16:36:49 log="[ddl_worker.go:1206] ["schema version doesn't change"] [category=ddl]"
2023-10-19 16:36:49 log="[terror.go:324] ["encountered error"] [error="write tcp 10.200.47.170:4000->10.200.52.64:38182: use of closed network connection"] [stack="github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:1109\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/server.go:700"]"
2023-10-19 16:36:49 log="[conn.go:1098] ["command dispatched failed"] [conn=2717918466] [session_alias=] [connInfo="id:0, addr:10.200.52.64:38182 status:10, collation:utf8mb4_general_ci, user:root"] [command=Query] [status="inTxn:0, autocommit:1"] [sql="alter table sbtest1 add index index_test_1697704548745 (c)"] [txn_mode=PESSIMISTIC] [timestamp=445043061226471491] [err="context canceled\ngithub.com/pingcap/errors.AddStack\n\t/go/pkg/mod/github.com/pingcap/[email protected]/errors.go:174\ngithub.com/pingcap/errors.Trace\n\t/go/pkg/mod/github.com/pingcap/[email protected]/juju_adaptor.go:15\ngithub.com/pingcap/tidb/pkg/ddl.(*ddl).callHookOnChanged\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/ddl/ddl.go:1211\ngithub.com/pingcap/tidb/pkg/ddl.(*ddl).createIndex\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/ddl/ddl_api.go:7342\ngithub.com/pingcap/tidb/pkg/ddl.(*ddl).AlterTable\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/ddl/ddl_api.go:3711\ngithub.com/pingcap/tidb/pkg/executor.(*DDLExec).executeAlterTable\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/ddl.go:385\ngithub.com/pingcap/tidb/pkg/executor.(*DDLExec).Next\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/ddl.go:149\ngithub.com/pingcap/tidb/pkg/executor/internal/exec.Next\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/internal/exec/executor.go:283\ngithub.com/pingcap/tidb/pkg/executor.(*ExecStmt).next\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/adapter.go:1222\ngithub.com/pingcap/tidb/pkg/executor.(*ExecStmt).handleNoDelayExecutor\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/adapter.go:967\ngithub.com/pingcap/tidb/pkg/executor.(*ExecStmt).handleNoDelay\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/adapter.go:793\ngithub.com/pingcap/tidb/pkg/executor.(*ExecStmt).Exec\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/adapter.go:574\ngithub.com/pingcap/tidb/pkg/session.runStmt\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/session/session.go:2421\ngithub.com/pingcap/tidb/pkg/session.(*session).ExecuteStmt\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/session/session.go:2271\ngithub.com/pingcap/tidb/pkg/server.(*TiDBContext).ExecuteStmt\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/driver_tidb.go:292\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).handleStmt\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:2010\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).handleQuery\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:1801\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).dispatch\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:1288\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:1067\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/server.go:700\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650"]"
2023-10-19 16:36:49 log="[session.go:2283] ["run statement failed"] [conn=2717918466] [session_alias=] [schemaVersion=784] [error="context canceled"] [session="{\n \"currDBName\": \"sysbench_64_7000w\",\n \"id\": 2717918466,\n \"status\": 2,\n \"strictMode\": true,\n \"user\": {\n \"Username\": \"root\",\n \"Hostname\": \"10.200.52.64\",\n \"CurrentUser\": false,\n \"AuthUsername\": \"root\",\n \"AuthHostname\": \"%\",\n \"AuthPlugin\": \"mysql_native_password\"\n }\n}"]"
2023-10-19 16:36:49 log="[tidb.go:285] ["rollbackTxn called due to ddl/autocommit failure"]"
2023-10-19 16:36:49 log="[tso_dispatcher.go:240] ["exit tso dispatcher loop"]"
2023-10-19 16:36:49 log="[resource_manager_client.go:314] ["[resource manager] exit resource token dispatcher"]"
2023-10-19 16:36:49 log="[tso_dispatcher.go:186] ["exit tso requests cancel loop"]"
2023-10-19 16:36:49 log="[tso_dispatcher.go:339] ["[tso] exit tso dispatcher"] [dc-location=global]"
2023-10-19 16:36:49 log="[tso_dispatcher.go:405] ["[tso] stop fetching the pending tso requests due to context canceled"] [dc-location=global]"
2023-10-19 16:36:49 log="[ddl.go:1122] ["DoDDLJob will quit because context done"] [category=ddl]"
2023-10-19 16:36:49 log="[reorg.go:249] ["run reorg job quit"] [category=ddl]"
2023-10-19 16:36:49 log="[pd_service_discovery.go:248] ["[pd] exit member loop due to context canceled"]"
2023-10-19 16:36:49 log="[server.go:905] ["error setting read deadline for kill."] [error="set tcp 10.200.47.170:4000: use of closed network connection"]"
2023-10-19 16:36:49 log="[server.go:905] ["error setting read deadline for kill."] [error="set tcp 10.200.47.170:4000: use of closed network connection"]"
2023-10-19 16:36:49 log="[server.go:924] ["kill all connections."] [category=server]"
2023-10-19 16:36:49 log="[server.go:972] ["all sessions quit in drain wait time"]"
2023-10-19 16:36:49 log="[server.go:943] ["start drain clients"]"
2023-10-19 16:36:49 log="[manager.go:311] ["revoke session"] ["owner info"="[autoid] tidb/autoid/leader ownerManager tc-tidb-1.tc-tidb-peer.endless-ha-test-add-index-tps-3180180-1-267.svc:10080"] []"
2023-10-19 16:36:49 log="[manager.go:267] ["break campaign loop, context is done"] ["owner info"="[autoid] tidb/autoid/leader ownerManager tc-tidb-1.tc-tidb-peer.endless-ha-test-add-index-tps-3180180-1-267.svc:10080"]"
2023-10-19 16:36:49 log="[manager.go:300] ["is not the owner"] ["owner info"="[autoid] tidb/autoid/leader ownerManager tc-tidb-1.tc-tidb-peer.endless-ha-test-add-index-tps-3180180-1-267.svc:10080"]"
2023-10-19 16:36:49 log="[http_status.go:520] ["http server error"] [error="http: Server closed"]"
3、tidb can not start
4. What is your TiDB version? (Required)
./tidb-server -V
Release Version: v7.5.0-alpha
Edition: Community
Git Commit Hash: 4cd4d5f
Git Branch: heads/refs/tags/v7.5.0-alpha
UTC Build Time: 2023-10-18 11:42:36
GoVersion: go1.21.3
Race Enabled: false
Check Table Before Drop: false
Store: unistore
2023-10-19T16:35:41.666+0800
The text was updated successfully, but these errors were encountered: