Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tiflash core dume when enable async grpc from sync grpc #4527

Closed
fzhedu opened this issue Mar 31, 2022 · 2 comments
Closed

tiflash core dume when enable async grpc from sync grpc #4527

fzhedu opened this issue Mar 31, 2022 · 2 comments
Assignees

Comments

@fzhedu
Copy link
Contributor

fzhedu commented Mar 31, 2022

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

log and core @

[fzh@h99 test]$ tiup cluster display tpch
tiup is checking updates for component cluster ...
Starting component `cluster`: /data2/fzh/.tiup/components/cluster/v1.9.3/tiup-cluster /data2/fzh/.tiup/components/cluster/v1.9.3/tiup-cluster display tpch
Cluster type:       tidb
Cluster name:       tpch
Cluster version:    nightly
Deploy user:        root
SSH type:           builtin
Dashboard URL:      http://172.16.4.97:41322/dashboard
ID                 Role          Host         Ports                                OS/Arch       Status   Data Dir                                    Deploy Dir
--                 ----          ----         -----                                -------       ------   --------                                    ----------
172.16.4.97:43033  alertmanager  172.16.4.97  43033/49094                          linux/x86_64  Up       /data2/tpch123/tidb-data/alertmanager-9093  /data2/tpch123/tidb-deploy/alertmanager-9093
172.16.4.97:43003  grafana       172.16.4.97  43003                                linux/x86_64  Up       -                                           /data2/tpch123/tidb-deploy/grafana-43003
172.16.4.97:41322  pd            172.16.4.97  41322/41323                          linux/x86_64  Up|L|UI  /data2/tpch123/tidb-data/pd-2379            /data2/tpch123/tidb-deploy/pd-2379
172.16.4.97:9736   prometheus    172.16.4.97  9736/12020                           linux/x86_64  Up       /data2/tpch123/tidb-data/prometheus-9736    /data2/tpch123/tidb-deploy/prometheus-9736
172.16.4.97:41324  tidb          172.16.4.97  41324/41325                          linux/x86_64  Up       -                                           /data2/tpch123/tidb-deploy/tidb-41324
172.16.4.99:41324  tidb          172.16.4.99  41324/41325                          linux/x86_64  Up       -                                           /data2/tpch123/tidb-deploy/tidb-41324
172.16.4.39:41334  tiflash       172.16.4.39  41334/41335/41337/41338/41339/41336  linux/x86_64  Up       /data2/tpch123/tidb-data/tiflash-9000       /data2/tpch123/tidb-deploy/tiflash-9000
172.16.4.42:41434  tiflash       172.16.4.42  41434/41435/41437/41438/41439/41436  linux/x86_64  Up       /data2/tpch123/tidb-data/tiflash-9000       /data2/tpch123/tidb-deploy/tiflash-9000
172.16.4.79:41334  tiflash       172.16.4.79  41334/41335/41337/41338/41339/41336  linux/x86_64  Up       /data2/tpch123/tidb-data/tiflash-9000       /data2/tpch123/tidb-deploy/tiflash-9000
172.16.4.97:41326  tikv          172.16.4.97  41326/41327                          linux/x86_64  Up       /data2/tpch123/tidb-data/tikv-20160         /data2/tpch123/tidb-deploy/tikv-20160
Total nodes: 10

2. What did you expect to see? (Required)

3. What did you see instead (Required)

[2022/03/30 14:44:44.770 +08:00] [ERROR] [<unknown>] [BaseDaemon:########################################] [thread_id=29932458]
[2022/03/30 14:44:44.770 +08:00] [ERROR] [<unknown>] ["BaseDaemon:(from thread 29934141) Received signal Segmentation fault (11)."] [thread_id=29932458]
[2022/03/30 14:44:44.770 +08:00] [ERROR] [<unknown>] ["BaseDaemon:Address: NULL pointer."] [thread_id=29932458]
[2022/03/30 14:44:44.770 +08:00] [ERROR] [<unknown>] ["BaseDaemon:Access: write."] [thread_id=29932458]
[2022/03/30 14:44:44.770 +08:00] [ERROR] [<unknown>] ["BaseDaemon:Address not mapped to object."] [thread_id=29932458]
[2022/03/30 14:44:44.770 +08:00] [ERROR] [<unknown>] ["BaseDaemon:\n       0x5a77811\tfaultSignalHandler(int, siginfo_t*, void*) [tiflash+94861329]\n                \tlibs/libdaemon/src/BaseDaemon.cpp:221\n  0x7f9fa30d0630\t<unknown symbol> [libpthread.so.0+63024]\n       0xb752459\tDB::GRPCCompletionQueuePool::pickQueue() [tiflash+192226393]\n                \tdbms/src/Flash/Mpp/GRPCCompletionQueuePool.cpp:39\n       0xb7543bf\tDB::(anonymous namespace)::AsyncGrpcExchangePacketReader::init(DB::UnaryCallback<bool>*) [tiflash+192234431]\n                \tdbms/src/Flash/Mpp/GRPCReceiverContext.cpp:100\n       0xb753ab7\tDB::GRPCReceiverContext::makeAsyncReader(DB::ExchangeRecvRequest const&, std::__1::shared_ptr<DB::AsyncExchangePacketReader>&, DB::UnaryCallback<bool>*) const [tiflash+192232119]\n                \tdbms/src/Flash/Mpp/GRPCReceiverContext.cpp:244\n       0xb748317\tvoid std::__1::allocator_traits<std::__1::allocator<DB::(anonymous namespace)::AsyncRequestHandler<DB::GRPCReceiverContext> > >::construct<DB::(anonymous namespace)::AsyncRequestHandler<DB::GRPCReceiverContext>, DB::MPMCQueue<DB::(anonymous namespace)::AsyncRequestHandler<DB::GRPCReceiverContext>*>*, DB::MPMCQueue<std::__1::shared_ptr<DB::ReceivedMessage> >*, std::__1::shared_ptr<DB::GRPCReceiverContext>&, DB::ExchangeRecvRequest const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, void>(std::__1::allocator<DB::(anonymous namespace)::AsyncRequestHandler<DB::GRPCReceiverContext> >&, DB::(anonymous namespace)::AsyncRequestHandler<DB::GRPCReceiverContext>*, DB::MPMCQueue<DB::(anonymous namespace)::AsyncRequestHandler<DB::GRPCReceiverContext>*>*&&, DB::MPMCQueue<std::__1::shared_ptr<DB::ReceivedMessage> >*&&, std::__1::shared_ptr<DB::GRPCReceiverContext>&, DB::ExchangeRecvRequest const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) [tiflash+192185111]\n                \t/usr/local/bin/../include/c++/v1/__memory/allocator_traits.h:290\n       0xb74f045\tDB::ExchangeReceiverBase<DB::GRPCReceiverContext>::reactor(std::__1::vector<DB::ExchangeRecvRequest, std::__1::allocator<DB::ExchangeRecvRequest> > const&) [tiflash+192213061]\n                \tdbms/src/Flash/Mpp/ExchangeReceiver.cpp:370\n       0x59565ce\tauto std::__1::thread DB::ThreadFactory::newThread<std::__1::function<void ()> >(bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::function<void ()>&&)::'lambda'(auto&&...)::operator()<>(auto&&...) const [tiflash+93677006]\n                \tdbms/src/Common/ThreadFactory.h:44\n       0x59563a1\tvoid* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, std::__1::thread DB::ThreadFactory::newThread<std::__1::function<void ()> >(bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::function<void ()>&&)::'lambda'(auto&&...)> >(void*) [tiflash+93676449]\n                \t/usr/local/bin/../include/c++/v1/thread:291\n  0x7f9fa30c8ea5\tstart_thread [libpthread.so.0+32421]"] [thread_id=29932458]
[2022/03/30 14:44:44.770 +08:00] [ERROR] [<unknown>] [BaseDaemon:########################################] [thread_id=29932458]
[2022/03/30 14:44:44.771 +08:00] [ERROR] [<unknown>] ["BaseDaemon:(from thread 29934144) Received signal Segmentation fault (11)."] [thread_id=29932458]
[2022/03/30 14:44:44.771 +08:00] [ERROR] [<unknown>] ["BaseDaemon:Address: NULL pointer."] [thread_id=29932458]
[2022/03/30 14:44:44.771 +08:00] [ERROR] [<unknown>] ["BaseDaemon:Access: write."] [thread_id=29932458]
[2022/03/30 14:44:44.771 +08:00] [ERROR] [<unknown>] ["BaseDaemon:Address not mapped to object."] [thread_id=29932458]
[2022/03/30 14:44:44.771 +08:00] [ERROR] [<unknown>] ["BaseDaemon:\n       0x5a77811\tfaultSignalHandler(int, siginfo_t*, void*) [tiflash+94861329]\n                \tlibs/libdaemon/src/BaseDaemon.cpp:221\n  0x7f9fa30d0630\t<unknown symbol> [libpthread.so.0+63024]\n       0xb752459\tDB::GRPCCompletionQueuePool::pickQueue() [tiflash+192226393]\n                \tdbms/src/Flash/Mpp/GRPCCompletionQueuePool.cpp:39\n       0xb7543bf\tDB::(anonymous namespace)::AsyncGrpcExchangePacketReader::init(DB::UnaryCallback<bool>*) [tiflash+192234431]\n                \tdbms/src/Flash/Mpp/GRPCReceiverContext.cpp:100\n       0xb753ab7\tDB::GRPCReceiverContext::makeAsyncReader(DB::ExchangeRecvRequest const&, std::__1::shared_ptr<DB::AsyncExchangePacketReader>&, DB::UnaryCallback<bool>*) const [tiflash+192232119]\n                \tdbms/src/Flash/Mpp/GRPCReceiverContext.cpp:244\n       0xb748317\tvoid std::__1::allocator_traits<std::__1::allocator<DB::(anonymous namespace)::AsyncRequestHandler<DB::GRPCReceiverContext> > >::construct<DB::(anonymous namespace)::AsyncRequestHandler<DB::GRPCReceiverContext>, DB::MPMCQueue<DB::(anonymous namespace)::AsyncRequestHandler<DB::GRPCReceiverContext>*>*, DB::MPMCQueue<std::__1::shared_ptr<DB::ReceivedMessage> >*, std::__1::shared_ptr<DB::GRPCReceiverContext>&, DB::ExchangeRecvRequest const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, void>(std::__1::allocator<DB::(anonymous namespace)::AsyncRequestHandler<DB::GRPCReceiverContext> >&, DB::(anonymous namespace)::AsyncRequestHandler<DB::GRPCReceiverContext>*, DB::MPMCQueue<DB::(anonymous namespace)::AsyncRequestHandler<DB::GRPCReceiverContext>*>*&&, DB::MPMCQueue<std::__1::shared_ptr<DB::ReceivedMessage> >*&&, std::__1::shared_ptr<DB::GRPCReceiverContext>&, DB::ExchangeRecvRequest const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) [tiflash+192185111]\n                \t/usr/local/bin/../include/c++/v1/__memory/allocator_traits.h:290\n       0xb74f045\tDB::ExchangeReceiverBase<DB::GRPCReceiverContext>::reactor(std::__1::vector<DB::ExchangeRecvRequest, std::__1::allocator<DB::ExchangeRecvRequest> > const&) [tiflash+192213061]\n                \tdbms/src/Flash/Mpp/ExchangeReceiver.cpp:370\n       0x59565ce\tauto std::__1::thread DB::ThreadFactory::newThread<std::__1::function<void ()> >(bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::function<void ()>&&)::'lambda'(auto&&...)::operator()<>(auto&&...) const [tiflash+93677006]\n                \tdbms/src/Common/ThreadFactory.h:44\n       0x59563a1\tvoid* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, std::__1::thread DB::ThreadFactory::newThread<std::__1::function<void ()> >(bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::function<void ()>&&)::'lambda'(auto&&...)> >(void*) [tiflash+93676449]\n                \t/usr/local/bin/../include/c++/v1/thread:291\n  0x7f9fa30c8ea5\tstart_thread [libpthread.so.0+32421]"] [thread_id=29932458]

4. What is your TiFlash version? (Required)

maybe last friday nightly.

@fzhedu fzhedu added the type/bug The issue is confirmed as a bug. label Mar 31, 2022
@fuzhe1989
Copy link
Contributor

expected to have been fixed by #4485.

@zanmato1984
Copy link
Contributor

This issue can be only observed when all the following condition fit:

  1. Using TiDB 6.0;
  2. User manually changed the enable async setting in configuration file;
  3. User didn't restart TiFlash server;

Considering point 2 is very unlikely to happen, I'm changing the severity to minor.

Also 6.1 has fixed this issue in #4485. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants