-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix broken url to "Better Bloom Filter" paper #1
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Merged manually (see cloudera/kudu#8). Thanks! |
asfgit
pushed a commit
that referenced
this pull request
May 19, 2017
Currently, Rowset::EstimateOnDiskSize() serves two purposes: 1. An estimate of the total size of the rowset, which is exposed when rolled into the tablet's on-disk size metric. 2. An estimate of the benefit of compaction. These two purposes conflicted-- the compaction size counts only base data and redo deltas that are relevant for compaction, so e.g. undo deltas are omitted from the estimate. This patch separates these two purposes. EstimateOnDiskSize() remains the method for purpose #1, while a new method EstimateCompactionSize() is introduced for purpose #2. EstimateOnDiskSize now includes undo deltas, and so is more accurate than before (however, there's more work to do: see KUDU-1755). There should be no changes to compaction policy as a result of this patch. Change-Id: I59001adadb9a768a464e7b2cf0f0a5df0ef5393a Reviewed-on: http://gerrit.cloudera.org:8080/6850 Tested-by: Kudu Jenkins Reviewed-by: Todd Lipcon <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
May 24, 2017
…n config This patch enhances ksck to gather consensus info from every tablet. It compares this info with master and outputs the master's config and every conflicting config, if there are any conflicts. To do this efficiently it reimplements the GetAllConsensusState RPC so that it gathers info about every replica's consensus state. This will catch at least the two problems identified in KUDU-1860: 1. The leader has a pending config to remove a tablet, but it is not committed so the master does not see this config. This can hide an unhealthy tablet if, e.g., one pending config member is down and the pending-to-be-kicked-out member is up, so 1/2 replicas are alive in the leader's active config but the master thinks 2/3 are alive. 2. No replica is leader but the master believes there is a leader because its cache is old and hasn't been updated. Sample output showing #1: https://gist.github.com/wdberkeley/d2606698e4f2e8ca3ef70d4dcef7ba9a Change-Id: I16e4de09821b372c3773b4ade3fd9e37ab818808 Reviewed-on: http://gerrit.cloudera.org:8080/6772 Tested-by: Kudu Jenkins Reviewed-by: Mike Percy <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Jun 1, 2017
TSAN reports warnings on races on writing/reading the QueueState::last_idx_appended_to_leader field: CatalogManagerTskITest.LeadershipChangeOnTskGeneration: WARNING: ThreadSanitizer: data race (pid=2710) Write of size 8 at 0x7d500003e480 by thread T33 (mutexes: write M1863, write M1821): #0 kudu::consensus::PeerMessageQueue::UpdateLastIndexAppendedToLeader(long) consensus/consensus_queue.cc:607:44 #1 kudu::consensus::RaftConsensus::UpdateReplica(kudu::consensus::ConsensusRequestPB const*, kudu::consensus::ConsensusResponsePB*) consensus/raft_consensus.cc:1155:13 #2 kudu::consensus::RaftConsensus::Update(kudu::consensus::ConsensusRequestPB const*, kudu::consensus::ConsensusResponsePB*) consensus/raft_consensus.cc:752:14 #3 kudu::tserver::ConsensusServiceImpl::UpdateConsensus(kudu::consensus::ConsensusRequestPB const*, kudu::consensus::ConsensusResponsePB*, kudu::rpc::RpcContext*) tserver/tablet_service.cc:861:25 #4 kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr<kudu::MetricEntity> const&, scoped_refptr<kudu::rpc::ResultTracker> const&)::$_1::operator()(google::protobuf::Message const*, google::protobuf::Message*, kudu::rpc::RpcContext*) const consensus/consensus.service.cc:100:13 ... skipped ... Previous read of size 8 at 0x7d500003e480 by thread T79 (mutexes: write M1822): #0 kudu::consensus::PeerMessageQueue::UpdateLagMetrics() consensus/consensus_queue.cc:602:20 #1 kudu::consensus::PeerMessageQueue::UpdateMetrics() consensus/consensus_queue.cc:875:3 #2 kudu::consensus::PeerMessageQueue::ResponseFromPeer(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, kudu::consensus::ConsensusResponsePB const&, bool*) consensus/consensus_queue.cc:828:5 ... skipped ... For more details, see http://dist-test.cloudera.org:8080/diagnose?key=0784c33a-41b5-11e7-9f6b-0242ac11000f This patch fixes the above race. Also, it contains some extras: * The PeerMessageQueue::UpdateLagMetrics() method has been renamed into PeerMessageQueue::UpdateLagMetricsUnlocked() and made private. * The PeerMessageQueue::UpdateMetrics() method has been renamed into PeerMessageQueue::UpdateMetricsUnlocked(). * Added DCHECK(queue_lock_.is_locked()) into all PeerMessageQueue::XxxUnlocked() methods. Change-Id: I25feb676619cc1f3a94fb8e631bffd8ca02ead49 Reviewed-on: http://gerrit.cloudera.org:8080/7032 Tested-by: Kudu Jenkins Reviewed-by: Todd Lipcon <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Jun 30, 2017
This access produced the occasional race, which I've attached to the bottom of the commit message. I also revoked friendship for RaftConsensus. This was done solely to get access to GetSerialTimestamp; making the function public seems like the lesser of two evils to me. WARNING: ThreadSanitizer: data race (pid=15727) Read of size 8 at 0x7b1c00013818 by thread T126 (mutexes: write M2582, write M2285): #0 kudu::operator<(kudu::Timestamp const&, kudu::Timestamp const&) /data/jenkins-workspace/kudu-workspace/src/kudu/common/timestamp.h:105:22 (libtserver.so+0xe1768) #1 kudu::operator>(kudu::Timestamp const&, kudu::Timestamp const&) /data/jenkins-workspace/kudu-workspace/src/kudu/common/timestamp.h:109:14 (libtserver.so+0xd9170) #2 kudu::operator<=(kudu::Timestamp const&, kudu::Timestamp const&) /data/jenkins-workspace/kudu-workspace/src/kudu/common/timestamp.h:113:16 (libtablet.so+0x19e990) #3 kudu::consensus::TimeManager::GetSafeTimeUnlocked() /data/jenkins-workspace/kudu-workspace/src/kudu/consensus/time_manager.cc:291:11 (libconsensus.so+0xbddad) #4 kudu::consensus::TimeManager::GetSafeTime() /data/jenkins-workspace/kudu-workspace/src/kudu/consensus/time_manager.cc:267:10 (libconsensus.so+0xbdeb9) #5 kudu::consensus::PeerMessageQueue::RequestForPeer(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, kudu::consensus::ConsensusRequestPB*, std::__1::vector<scoped_refptr<kudu::consensus::RefCountedReplicate>, std::__1::allocator<scoped_refptr<kudu::consensus::RefCountedReplicate> > >*, bool*) /data/jenkins-workspace/kudu-workspace/src/kudu/consensus/consensus_queue.cc:469:50 (libconsensus.so+0x70a3a) #6 kudu::consensus::Peer::SendNextRequest(bool) /data/jenkins-workspace/kudu-workspace/src/kudu/consensus/consensus_peers.cc:177:22 (libconsensus.so+0x649c2) #7 kudu::consensus::Peer::SignalRequest(bool)::$_0::operator()() const /data/jenkins-workspace/kudu-workspace/src/kudu/consensus/consensus_peers.cc:134:3 (libconsensus.so+0x67a02) #8 boost::detail::function::void_function_obj_invoker0<kudu::consensus::Peer::SignalRequest(bool)::$_0, void>::invoke(boost::detail::function::function_buffer&) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11 (libconsensus.so+0x67809) #9 boost::function0<void>::operator()() const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libkrpc.so+0xb0111) #10 kudu::FunctionRunnable::Run() /data/jenkins-workspace/kudu-workspace/src/kudu/util/threadpool.cc:56:5 (libkudu_util.so+0x1c4b1d) #11 kudu::ThreadPool::DispatchThread(bool) /data/jenkins-workspace/kudu-workspace/src/kudu/util/threadpool.cc:621:22 (libkudu_util.so+0x1c1e04) #12 boost::_mfi::mf1<void, kudu::ThreadPool, bool>::operator()(kudu::ThreadPool*, bool) const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:165:29 (libkudu_util.so+0x1ca38e) #13 void boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> >::operator()<boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf1<void, kudu::ThreadPool, bool>&, boost::_bi::list0&, int) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:319:9 (libkudu_util.so+0x1ca2cd) #14 boost::_bi::bind_t<void, boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> > >::operator()() /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16 (libkudu_util.so+0x1ca233) #15 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> > >, void>::invoke(boost::detail::function::function_buffer&) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11 (libkudu_util.so+0x1c9fd1) #16 boost::function0<void>::operator()() const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libkrpc.so+0xb0111) #17 kudu::Thread::SuperviseThread(void*) /data/jenkins-workspace/kudu-workspace/src/kudu/util/thread.cc:591:3 (libkudu_util.so+0x1b975e) Previous write of size 8 at 0x7b1c00013818 by thread T76 (mutexes: write M2279): #0 kudu::consensus::TimeManager::GetSerialTimestamp() /data/jenkins-workspace/kudu-workspace/src/kudu/consensus/time_manager.cc:310:28 (libconsensus.so+0xbcbbb) #1 kudu::consensus::RaftConsensus::UnsafeChangeConfig(kudu::consensus::UnsafeChangeConfigRequestPB const&, kudu::tserver::TabletServerErrorPB_Code*) /data/jenkins-workspace/kudu-workspace/src/kudu/consensus/raft_consensus.cc:1668:36 (libconsensus.so+0xacabd) #2 kudu::tserver::ConsensusServiceImpl::UnsafeChangeConfig(kudu::consensus::UnsafeChangeConfigRequestPB const*, kudu::consensus::UnsafeChangeConfigResponsePB*, kudu::rpc::RpcContext*) /data/jenkins-workspace/kudu-workspace/src/kudu/tserver/tablet_service.cc:941:25 (libtserver.so+0xc98cf) #3 kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr<kudu::MetricEntity> const&, scoped_refptr<kudu::rpc::ResultTracker> const&)::$_7::operator()(google::protobuf::Message const*, google::protobuf::Message*, kudu::rpc::RpcContext*) const /data/jenkins-workspace/kudu-workspace/build/tsan/src/kudu/consensus/consensus.service.cc:160:13 (libconsensus_proto.so+0x825b4) #4 _ZNSt3__18__invokeIRZN4kudu9consensus18ConsensusServiceIfC1ERK13scoped_refptrINS1_12MetricEntityEERKS4_INS1_3rpc13ResultTrackerEEE3$_7JPKN6google8protobuf7MessageEPSI_PNS9_10RpcContextEEEEDTclclsr3std3__1E7forwardIT_Efp_Espclsr3std3__1E7forwardIT0_Efp0_EEEOSO_DpOSP_ /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/c++/v1/type_traits:4301:1 (libconsensus_proto.so+0x82541) #5 _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN4kudu9consensus18ConsensusServiceIfC1ERK13scoped_refptrINS3_12MetricEntityEERKS6_INS3_3rpc13ResultTrackerEEE3$_7PKN6google8protobuf7MessageEPSK_PNSB_10RpcContextEEEEvDpOT_ /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/c++/v1/__functional_base:359 (libconsensus_proto.so+0x82541) #6 std::__1::__function::__func<kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr<kudu::MetricEntity> const&, scoped_refptr<kudu::rpc::ResultTracker> const&)::$_7, std::__1::allocator<kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr<kudu::MetricEntity> const&, scoped_refptr<kudu::rpc::ResultTracker> const&)::$_7>, void ()(google::protobuf::Message const*, google::protobuf::Message*, kudu::rpc::RpcContext*)>::operator()(google::protobuf::Message const*&&, google::protobuf::Message*&&, kudu::rpc::RpcContext*&&) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/c++/v1/functional:1552:12 (libconsensus_proto.so+0x82454) #7 std::__1::function<void ()(google::protobuf::Message const*, google::protobuf::Message*, kudu::rpc::RpcContext*)>::operator()(google::protobuf::Message const*, google::protobuf::Message*, kudu::rpc::RpcContext*) const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/c++/v1/functional:1914:12 (libkrpc.so+0xe72e9) #8 kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*) /data/jenkins-workspace/kudu-workspace/src/kudu/rpc/service_if.cc:134:3 (libkrpc.so+0xe6bf4) #9 kudu::rpc::ServicePool::RunThread() /data/jenkins-workspace/kudu-workspace/src/kudu/rpc/service_pool.cc:210:15 (libkrpc.so+0xe82cd) #10 boost::_mfi::mf0<void, kudu::rpc::ServicePool>::operator()(kudu::rpc::ServicePool*) const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:49:29 (libkrpc.so+0xea2c6) #11 void boost::_bi::list1<boost::_bi::value<kudu::rpc::ServicePool*> >::operator()<boost::_mfi::mf0<void, kudu::rpc::ServicePool>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void, kudu::rpc::ServicePool>&, boost::_bi::list0&, int) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:259:9 (libkrpc.so+0xea21a) #12 boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::rpc::ServicePool>, boost::_bi::list1<boost::_bi::value<kudu::rpc::ServicePool*> > >::operator()() /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16 (libkrpc.so+0xea1a3) #13 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::rpc::ServicePool>, boost::_bi::list1<boost::_bi::value<kudu::rpc::ServicePool*> > >, void>::invoke(boost::detail::function::function_buffer&) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11 (libkrpc.so+0xe9fa9) #14 boost::function0<void>::operator()() const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libkrpc.so+0xb0111) #15 kudu::Thread::SuperviseThread(void*) /data/jenkins-workspace/kudu-workspace/src/kudu/util/thread.cc:591:3 (libkudu_util.so+0x1b975e) Location is heap block of size 104 at 0x7b1c000137f0 allocated by thread T123: #0 operator new(unsigned long) /data/jenkins-workspace/kudu-workspace/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_new_delete.cc:41 (kudu-tserver+0x4c0d63) #1 kudu::tablet::TabletReplica::Init(std::__1::shared_ptr<kudu::tablet::Tablet> const&, scoped_refptr<kudu::server::Clock> const&, std::__1::shared_ptr<kudu::rpc::Messenger> const&, scoped_refptr<kudu::rpc::ResultTracker> const&, scoped_refptr<kudu::log::Log> const&, scoped_refptr<kudu::MetricEntity> const&) /data/jenkins-workspace/kudu-workspace/src/kudu/tablet/tablet_replica.cc:162:45 (libtablet.so+0x139f6a) #2 kudu::tserver::TSTabletManager::OpenTablet(scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::tserver::TransitionInProgressDeleter> const&) /data/jenkins-workspace/kudu-workspace/src/kudu/tserver/ts_tablet_manager.cc:770:19 (libtserver.so+0xe6991) #3 boost::_mfi::mf2<void, kudu::tserver::TSTabletManager, scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::tserver::TransitionInProgressDeleter> const&>::operator()(kudu::tserver::TSTabletManager*, scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::tserver::TransitionInProgressDeleter> const&) const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:280:29 (libtserver.so+0xf1bc7) #4 void boost::_bi::list3<boost::_bi::value<kudu::tserver::TSTabletManager*>, boost::_bi::value<scoped_refptr<kudu::tablet::TabletMetadata> >, boost::_bi::value<scoped_refptr<kudu::tserver::TransitionInProgressDeleter> > >::operator()<boost::_mfi::mf2<void, kudu::tserver::TSTabletManager, scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::tserver::TransitionInProgressDeleter> const&>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf2<void, kudu::tserver::TSTabletManager, scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::tserver::TransitionInProgressDeleter> const&>&, boost::_bi::list0&, int) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:398:9 (libtserver.so+0xf1b06) #5 boost::_bi::bind_t<void, boost::_mfi::mf2<void, kudu::tserver::TSTabletManager, scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::tserver::TransitionInProgressDeleter> const&>, boost::_bi::list3<boost::_bi::value<kudu::tserver::TSTabletManager*>, boost::_bi::value<scoped_refptr<kudu::tablet::TabletMetadata> >, boost::_bi::value<scoped_refptr<kudu::tserver::TransitionInProgressDeleter> > > >::operator()() /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16 (libtserver.so+0xf1a63) #6 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf2<void, kudu::tserver::TSTabletManager, scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::tserver::TransitionInProgressDeleter> const&>, boost::_bi::list3<boost::_bi::value<kudu::tserver::TSTabletManager*>, boost::_bi::value<scoped_refptr<kudu::tablet::TabletMetadata> >, boost::_bi::value<scoped_refptr<kudu::tserver::TransitionInProgressDeleter> > > >, void>::invoke(boost::detail::function::function_buffer&) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11 (libtserver.so+0xf17d1) #7 boost::function0<void>::operator()() const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libkrpc.so+0xb0111) #8 kudu::FunctionRunnable::Run() /data/jenkins-workspace/kudu-workspace/src/kudu/util/threadpool.cc:56:5 (libkudu_util.so+0x1c4b1d) #9 kudu::ThreadPool::DispatchThread(bool) /data/jenkins-workspace/kudu-workspace/src/kudu/util/threadpool.cc:621:22 (libkudu_util.so+0x1c1e04) #10 boost::_mfi::mf1<void, kudu::ThreadPool, bool>::operator()(kudu::ThreadPool*, bool) const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:165:29 (libkudu_util.so+0x1ca38e) #11 void boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> >::operator()<boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf1<void, kudu::ThreadPool, bool>&, boost::_bi::list0&, int) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:319:9 (libkudu_util.so+0x1ca2cd) #12 boost::_bi::bind_t<void, boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> > >::operator()() /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16 (libkudu_util.so+0x1ca233) #13 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> > >, void>::invoke(boost::detail::function::function_buffer&) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11 (libkudu_util.so+0x1c9fd1) #14 boost::function0<void>::operator()() const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libkrpc.so+0xb0111) #15 kudu::Thread::SuperviseThread(void*) /data/jenkins-workspace/kudu-workspace/src/kudu/util/thread.cc:591:3 (libkudu_util.so+0x1b975e) Mutex M2582 (0x7b54000b0130) created at: #0 __tsan_atomic32_compare_exchange_strong /data/jenkins-workspace/kudu-workspace/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cc:756 (kudu-tserver+0x47d998) #1 base::subtle::Acquire_CompareAndSwap(int volatile*, int, int) /data/jenkins-workspace/kudu-workspace/src/kudu/gutil/atomicops-internals-tsan.h:83:3 (libtserver.so+0x91fb7) #2 base::SpinLock::Lock() /data/jenkins-workspace/kudu-workspace/src/kudu/gutil/spinlock.h:73:9 (libtserver.so+0x91f20) #3 kudu::simple_spinlock::lock() /data/jenkins-workspace/kudu-workspace/src/kudu/util/locks.h:45:8 (libtserver.so+0x91ed9) #4 std::__1::lock_guard<kudu::simple_spinlock>::lock_guard(kudu::simple_spinlock&) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/c++/v1/__mutex_base:108:27 (libconsensus.so+0x6427c) #5 kudu::consensus::Peer::Init() /data/jenkins-workspace/kudu-workspace/src/kudu/consensus/consensus_peers.cc:119 (libconsensus.so+0x6427c) #6 kudu::consensus::Peer::NewRemotePeer(kudu::consensus::RaftPeerPB const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, kudu::consensus::PeerMessageQueue*, kudu::ThreadPool*, gscoped_ptr<kudu::consensus::PeerProxy, kudu::DefaultDeleter<kudu::consensus::PeerProxy> >, std::__1::shared_ptr<kudu::consensus::Peer>*) /data/jenkins-workspace/kudu-workspace/src/kudu/consensus/consensus_peers.cc:97:3 (libconsensus.so+0x64101) #7 kudu::consensus::PeerManager::UpdateRaftConfig(kudu::consensus::RaftConfigPB const&) /data/jenkins-workspace/kudu-workspace/src/kudu/consensus/peer_manager.cc:73:5 (libconsensus.so+0x908e1) #8 kudu::consensus::RaftConsensus::RefreshConsensusQueueAndPeersUnlocked() /data/jenkins-workspace/kudu-workspace/src/kudu/consensus/raft_consensus.cc:2044:3 (libconsensus.so+0xa254f) #9 kudu::consensus::RaftConsensus::AddPendingOperationUnlocked(scoped_refptr<kudu::consensus::ConsensusRound> const&) /data/jenkins-workspace/kudu-workspace/src/kudu/consensus/raft_consensus.cc:668:9 (libconsensus.so+0xa3787) #10 kudu::consensus::RaftConsensus::AppendNewRoundToQueueUnlocked(scoped_refptr<kudu::consensus::ConsensusRound> const&) /data/jenkins-workspace/kudu-workspace/src/kudu/consensus/raft_consensus.cc:626:3 (libconsensus.so+0xa2d76) #11 kudu::consensus::RaftConsensus::ReplicateConfigChangeUnlocked(kudu::consensus::RaftConfigPB const&, kudu::consensus::RaftConfigPB const&, kudu::Callback<void ()(kudu::Status const&)> const&) /data/jenkins-workspace/kudu-workspace/src/kudu/consensus/raft_consensus.cc:2025:3 (libconsensus.so+0xac5f3) #12 kudu::consensus::RaftConsensus::ChangeConfig(kudu::consensus::ChangeConfigRequestPB const&, kudu::Callback<void ()(kudu::Status const&)> const&, boost::optional<kudu::tserver::TabletServerErrorPB_Code>*) /data/jenkins-workspace/kudu-workspace/src/kudu/consensus/raft_consensus.cc:1621:5 (libconsensus.so+0xa5c0b) #13 kudu::tserver::ConsensusServiceImpl::ChangeConfig(kudu::consensus::ChangeConfigRequestPB const*, kudu::consensus::ChangeConfigResponsePB*, kudu::rpc::RpcContext*) /data/jenkins-workspace/kudu-workspace/src/kudu/tserver/tablet_service.cc:917:25 (libtserver.so+0xc8d55) #14 kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr<kudu::MetricEntity> const&, scoped_refptr<kudu::rpc::ResultTracker> const&)::$_5::operator()(google::protobuf::Message const*, google::protobuf::Message*, kudu::rpc::RpcContext*) const /data/jenkins-workspace/kudu-workspace/build/tsan/src/kudu/consensus/consensus.service.cc:140:13 (libconsensus_proto.so+0x81e94) #15 _ZNSt3__18__invokeIRZN4kudu9consensus18ConsensusServiceIfC1ERK13scoped_refptrINS1_12MetricEntityEERKS4_INS1_3rpc13ResultTrackerEEE3$_5JPKN6google8protobuf7MessageEPSI_PNS9_10RpcContextEEEEDTclclsr3std3__1E7forwardIT_Efp_Espclsr3std3__1E7forwardIT0_Efp0_EEEOSO_DpOSP_ /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/c++/v1/type_traits:4301:1 (libconsensus_proto.so+0x81e21) #16 _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN4kudu9consensus18ConsensusServiceIfC1ERK13scoped_refptrINS3_12MetricEntityEERKS6_INS3_3rpc13ResultTrackerEEE3$_5PKN6google8protobuf7MessageEPSK_PNSB_10RpcContextEEEEvDpOT_ /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/c++/v1/__functional_base:359 (libconsensus_proto.so+0x81e21) #17 std::__1::__function::__func<kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr<kudu::MetricEntity> const&, scoped_refptr<kudu::rpc::ResultTracker> const&)::$_5, std::__1::allocator<kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr<kudu::MetricEntity> const&, scoped_refptr<kudu::rpc::ResultTracker> const&)::$_5>, void ()(google::protobuf::Message const*, google::protobuf::Message*, kudu::rpc::RpcContext*)>::operator()(google::protobuf::Message const*&&, google::protobuf::Message*&&, kudu::rpc::RpcContext*&&) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/c++/v1/functional:1552:12 (libconsensus_proto.so+0x81d34) #18 std::__1::function<void ()(google::protobuf::Message const*, google::protobuf::Message*, kudu::rpc::RpcContext*)>::operator()(google::protobuf::Message const*, google::protobuf::Message*, kudu::rpc::RpcContext*) const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/c++/v1/functional:1914:12 (libkrpc.so+0xe72e9) #19 kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*) /data/jenkins-workspace/kudu-workspace/src/kudu/rpc/service_if.cc:134:3 (libkrpc.so+0xe6bf4) #20 kudu::rpc::ServicePool::RunThread() /data/jenkins-workspace/kudu-workspace/src/kudu/rpc/service_pool.cc:210:15 (libkrpc.so+0xe82cd) #21 boost::_mfi::mf0<void, kudu::rpc::ServicePool>::operator()(kudu::rpc::ServicePool*) const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:49:29 (libkrpc.so+0xea2c6) #22 void boost::_bi::list1<boost::_bi::value<kudu::rpc::ServicePool*> >::operator()<boost::_mfi::mf0<void, kudu::rpc::ServicePool>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void, kudu::rpc::ServicePool>&, boost::_bi::list0&, int) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:259:9 (libkrpc.so+0xea21a) #23 boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::rpc::ServicePool>, boost::_bi::list1<boost::_bi::value<kudu::rpc::ServicePool*> > >::operator()() /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16 (libkrpc.so+0xea1a3) #24 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::rpc::ServicePool>, boost::_bi::list1<boost::_bi::value<kudu::rpc::ServicePool*> > >, void>::invoke(boost::detail::function::function_buffer&) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11 (libkrpc.so+0xe9fa9) #25 boost::function0<void>::operator()() const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libkrpc.so+0xb0111) #26 kudu::Thread::SuperviseThread(void*) /data/jenkins-workspace/kudu-workspace/src/kudu/util/thread.cc:591:3 (libkudu_util.so+0x1b975e) Mutex M2285 (0x7b1c000137f8) created at: #0 __tsan_atomic32_compare_exchange_strong /data/jenkins-workspace/kudu-workspace/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cc:756 (kudu-tserver+0x47d998) #1 base::subtle::Acquire_CompareAndSwap(int volatile*, int, int) /data/jenkins-workspace/kudu-workspace/src/kudu/gutil/atomicops-internals-tsan.h:83:3 (libtserver.so+0x91fb7) #2 base::SpinLock::Lock() /data/jenkins-workspace/kudu-workspace/src/kudu/gutil/spinlock.h:73:9 (libtserver.so+0x91f20) #3 kudu::simple_spinlock::lock() /data/jenkins-workspace/kudu-workspace/src/kudu/util/locks.h:45:8 (libtserver.so+0x91ed9) #4 std::__1::lock_guard<kudu::simple_spinlock>::lock_guard(kudu::simple_spinlock&) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/c++/v1/__mutex_base:108:27 (libconsensus.so+0xbc8ee) #5 kudu::consensus::TimeManager::SetNonLeaderMode() /data/jenkins-workspace/kudu-workspace/src/kudu/consensus/time_manager.cc:78 (libconsensus.so+0xbc8ee) #6 kudu::consensus::PeerMessageQueue::SetNonLeaderMode() /data/jenkins-workspace/kudu-workspace/src/kudu/consensus/consensus_queue.cc:186:18 (libconsensus.so+0x6dbe9) #7 kudu::consensus::RaftConsensus::BecomeReplicaUnlocked() /data/jenkins-workspace/kudu-workspace/src/kudu/consensus/raft_consensus.cc:593:11 (libconsensus.so+0x9e9fa) #8 kudu::consensus::RaftConsensus::Start(kudu::consensus::ConsensusBootstrapInfo const&) /data/jenkins-workspace/kudu-workspace/src/kudu/consensus/raft_consensus.cc:340:5 (libconsensus.so+0x9d41f) #9 kudu::tablet::TabletReplica::Start(kudu::consensus::ConsensusBootstrapInfo const&) /data/jenkins-workspace/kudu-workspace/src/kudu/tablet/tablet_replica.cc:196:3 (libtablet.so+0x13a974) #10 kudu::tserver::TSTabletManager::OpenTablet(scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::tserver::TransitionInProgressDeleter> const&) /data/jenkins-workspace/kudu-workspace/src/kudu/tserver/ts_tablet_manager.cc:785:18 (libtserver.so+0xe6a4b) #11 boost::_mfi::mf2<void, kudu::tserver::TSTabletManager, scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::tserver::TransitionInProgressDeleter> const&>::operator()(kudu::tserver::TSTabletManager*, scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::tserver::TransitionInProgressDeleter> const&) const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:280:29 (libtserver.so+0xf1bc7) #12 void boost::_bi::list3<boost::_bi::value<kudu::tserver::TSTabletManager*>, boost::_bi::value<scoped_refptr<kudu::tablet::TabletMetadata> >, boost::_bi::value<scoped_refptr<kudu::tserver::TransitionInProgressDeleter> > >::operator()<boost::_mfi::mf2<void, kudu::tserver::TSTabletManager, scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::tserver::TransitionInProgressDeleter> const&>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf2<void, kudu::tserver::TSTabletManager, scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::tserver::TransitionInProgressDeleter> const&>&, boost::_bi::list0&, int) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:398:9 (libtserver.so+0xf1b06) #13 boost::_bi::bind_t<void, boost::_mfi::mf2<void, kudu::tserver::TSTabletManager, scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::tserver::TransitionInProgressDeleter> const&>, boost::_bi::list3<boost::_bi::value<kudu::tserver::TSTabletManager*>, boost::_bi::value<scoped_refptr<kudu::tablet::TabletMetadata> >, boost::_bi::value<scoped_refptr<kudu::tserver::TransitionInProgressDeleter> > > >::operator()() /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16 (libtserver.so+0xf1a63) #14 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf2<void, kudu::tserver::TSTabletManager, scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::tserver::TransitionInProgressDeleter> const&>, boost::_bi::list3<boost::_bi::value<kudu::tserver::TSTabletManager*>, boost::_bi::value<scoped_refptr<kudu::tablet::TabletMetadata> >, boost::_bi::value<scoped_refptr<kudu::tserver::TransitionInProgressDeleter> > > >, void>::invoke(boost::detail::function::function_buffer&) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11 (libtserver.so+0xf17d1) #15 boost::function0<void>::operator()() const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libkrpc.so+0xb0111) #16 kudu::FunctionRunnable::Run() /data/jenkins-workspace/kudu-workspace/src/kudu/util/threadpool.cc:56:5 (libkudu_util.so+0x1c4b1d) #17 kudu::ThreadPool::DispatchThread(bool) /data/jenkins-workspace/kudu-workspace/src/kudu/util/threadpool.cc:621:22 (libkudu_util.so+0x1c1e04) #18 boost::_mfi::mf1<void, kudu::ThreadPool, bool>::operator()(kudu::ThreadPool*, bool) const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:165:29 (libkudu_util.so+0x1ca38e) #19 void boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> >::operator()<boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf1<void, kudu::ThreadPool, bool>&, boost::_bi::list0&, int) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:319:9 (libkudu_util.so+0x1ca2cd) #20 boost::_bi::bind_t<void, boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> > >::operator()() /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16 (libkudu_util.so+0x1ca233) #21 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> > >, void>::invoke(boost::detail::function::function_buffer&) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11 (libkudu_util.so+0x1c9fd1) #22 boost::function0<void>::operator()() const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libkrpc.so+0xb0111) #23 kudu::Thread::SuperviseThread(void*) /data/jenkins-workspace/kudu-workspace/src/kudu/util/thread.cc:591:3 (libkudu_util.so+0x1b975e) Mutex M2279 (0x7b540006fe44) created at: #0 __tsan_atomic32_compare_exchange_strong /data/jenkins-workspace/kudu-workspace/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cc:756 (kudu-tserver+0x47d998) #1 base::subtle::Acquire_CompareAndSwap(int volatile*, int, int) /data/jenkins-workspace/kudu-workspace/src/kudu/gutil/atomicops-internals-tsan.h:83:3 (libtserver.so+0x91fb7) #2 base::SpinLock::Lock() /data/jenkins-workspace/kudu-workspace/src/kudu/gutil/spinlock.h:73:9 (libtserver.so+0x91f20) #3 kudu::simple_spinlock::lock() /data/jenkins-workspace/kudu-workspace/src/kudu/util/locks.h:45:8 (libtserver.so+0x91ed9) #4 std::__1::lock_guard<kudu::simple_spinlock>::lock_guard(kudu::simple_spinlock&) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/c++/v1/__mutex_base:108:27 (libconsensus.so+0x9cf38) #5 kudu::consensus::RaftConsensus::Start(kudu::consensus::ConsensusBootstrapInfo const&) /data/jenkins-workspace/kudu-workspace/src/kudu/consensus/raft_consensus.cc:282 (libconsensus.so+0x9cf38) #6 kudu::tablet::TabletReplica::Start(kudu::consensus::ConsensusBootstrapInfo const&) /data/jenkins-workspace/kudu-workspace/src/kudu/tablet/tablet_replica.cc:196:3 (libtablet.so+0x13a974) #7 kudu::tserver::TSTabletManager::OpenTablet(scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::tserver::TransitionInProgressDeleter> const&) /data/jenkins-workspace/kudu-workspace/src/kudu/tserver/ts_tablet_manager.cc:785:18 (libtserver.so+0xe6a4b) #8 boost::_mfi::mf2<void, kudu::tserver::TSTabletManager, scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::tserver::TransitionInProgressDeleter> const&>::operator()(kudu::tserver::TSTabletManager*, scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::tserver::TransitionInProgressDeleter> const&) const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:280:29 (libtserver.so+0xf1bc7) #9 void boost::_bi::list3<boost::_bi::value<kudu::tserver::TSTabletManager*>, boost::_bi::value<scoped_refptr<kudu::tablet::TabletMetadata> >, boost::_bi::value<scoped_refptr<kudu::tserver::TransitionInProgressDeleter> > >::operator()<boost::_mfi::mf2<void, kudu::tserver::TSTabletManager, scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::tserver::TransitionInProgressDeleter> const&>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf2<void, kudu::tserver::TSTabletManager, scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::tserver::TransitionInProgressDeleter> const&>&, boost::_bi::list0&, int) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:398:9 (libtserver.so+0xf1b06) #10 boost::_bi::bind_t<void, boost::_mfi::mf2<void, kudu::tserver::TSTabletManager, scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::tserver::TransitionInProgressDeleter> const&>, boost::_bi::list3<boost::_bi::value<kudu::tserver::TSTabletManager*>, boost::_bi::value<scoped_refptr<kudu::tablet::TabletMetadata> >, boost::_bi::value<scoped_refptr<kudu::tserver::TransitionInProgressDeleter> > > >::operator()() /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16 (libtserver.so+0xf1a63) #11 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf2<void, kudu::tserver::TSTabletManager, scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::tserver::TransitionInProgressDeleter> const&>, boost::_bi::list3<boost::_bi::value<kudu::tserver::TSTabletManager*>, boost::_bi::value<scoped_refptr<kudu::tablet::TabletMetadata> >, boost::_bi::value<scoped_refptr<kudu::tserver::TransitionInProgressDeleter> > > >, void>::invoke(boost::detail::function::function_buffer&) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11 (libtserver.so+0xf17d1) #12 boost::function0<void>::operator()() const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libkrpc.so+0xb0111) #13 kudu::FunctionRunnable::Run() /data/jenkins-workspace/kudu-workspace/src/kudu/util/threadpool.cc:56:5 (libkudu_util.so+0x1c4b1d) #14 kudu::ThreadPool::DispatchThread(bool) /data/jenkins-workspace/kudu-workspace/src/kudu/util/threadpool.cc:621:22 (libkudu_util.so+0x1c1e04) #15 boost::_mfi::mf1<void, kudu::ThreadPool, bool>::operator()(kudu::ThreadPool*, bool) const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:165:29 (libkudu_util.so+0x1ca38e) #16 void boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> >::operator()<boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf1<void, kudu::ThreadPool, bool>&, boost::_bi::list0&, int) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:319:9 (libkudu_util.so+0x1ca2cd) #17 boost::_bi::bind_t<void, boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> > >::operator()() /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16 (libkudu_util.so+0x1ca233) #18 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> > >, void>::invoke(boost::detail::function::function_buffer&) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11 (libkudu_util.so+0x1c9fd1) #19 boost::function0<void>::operator()() const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libkrpc.so+0xb0111) #20 kudu::Thread::SuperviseThread(void*) /data/jenkins-workspace/kudu-workspace/src/kudu/util/thread.cc:591:3 (libkudu_util.so+0x1b975e) Thread T126 'cc8abe-raft [wo' (tid=17191, running) created by thread T123 at: #0 pthread_create /data/jenkins-workspace/kudu-workspace/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:897 (kudu-tserver+0x4549db) #1 kudu::Thread::StartThread(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, boost::function<void ()()> const&, unsigned long, scoped_refptr<kudu::Thread>*) /data/jenkins-workspace/kudu-workspace/src/kudu/util/thread.cc:514:15 (libkudu_util.so+0x1b8f57) #2 kudu::Status kudu::Thread::Create<void (kudu::ThreadPool::*)(bool), kudu::ThreadPool*, bool>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, void (kudu::ThreadPool::* const&)(bool), kudu::ThreadPool* const&, bool const&, scoped_refptr<kudu::Thread>*) /data/jenkins-workspace/kudu-workspace/src/kudu/util/thread.h:164:12 (libkudu_util.so+0x1c38b6) #3 kudu::ThreadPool::CreateThreadUnlocked() /data/jenkins-workspace/kudu-workspace/src/kudu/util/threadpool.cc:683:14 (libkudu_util.so+0x1c1491) #4 kudu::ThreadPool::DoSubmit(std::__1::shared_ptr<kudu::Runnable>, kudu::ThreadPoolToken*) /data/jenkins-workspace/kudu-workspace/src/kudu/util/threadpool.cc:477:21 (libkudu_util.so+0x1bf8e1) #5 kudu::ThreadPool::Submit(std::__1::shared_ptr<kudu::Runnable>) /data/jenkins-workspace/kudu-workspace/src/kudu/util/threadpool.cc:438:10 (libkudu_util.so+0x1c168f) #6 kudu::ThreadPool::SubmitClosure(kudu::Callback<void ()()>) /data/jenkins-workspace/kudu-workspace/src/kudu/util/threadpool.cc:430:10 (libkudu_util.so+0x1c15c9) #7 kudu::consensus::RaftConsensus::MarkDirty(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) /data/jenkins-workspace/kudu-workspace/src/kudu/consensus/raft_consensus.cc:2243:3 (libconsensus.so+0x9f83a) #8 kudu::consensus::RaftConsensus::Start(kudu::consensus::ConsensusBootstrapInfo const&) /data/jenkins-workspace/kudu-workspace/src/kudu/consensus/raft_consensus.cc:351:3 (libconsensus.so+0x9d5f2) #9 kudu::tablet::TabletReplica::Start(kudu::consensus::ConsensusBootstrapInfo const&) /data/jenkins-workspace/kudu-workspace/src/kudu/tablet/tablet_replica.cc:196:3 (libtablet.so+0x13a974) #10 kudu::tserver::TSTabletManager::OpenTablet(scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::tserver::TransitionInProgressDeleter> const&) /data/jenkins-workspace/kudu-workspace/src/kudu/tserver/ts_tablet_manager.cc:785:18 (libtserver.so+0xe6a4b) #11 boost::_mfi::mf2<void, kudu::tserver::TSTabletManager, scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::tserver::TransitionInProgressDeleter> const&>::operator()(kudu::tserver::TSTabletManager*, scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::tserver::TransitionInProgressDeleter> const&) const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:280:29 (libtserver.so+0xf1bc7) #12 void boost::_bi::list3<boost::_bi::value<kudu::tserver::TSTabletManager*>, boost::_bi::value<scoped_refptr<kudu::tablet::TabletMetadata> >, boost::_bi::value<scoped_refptr<kudu::tserver::TransitionInProgressDeleter> > >::operator()<boost::_mfi::mf2<void, kudu::tserver::TSTabletManager, scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::tserver::TransitionInProgressDeleter> const&>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf2<void, kudu::tserver::TSTabletManager, scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::tserver::TransitionInProgressDeleter> const&>&, boost::_bi::list0&, int) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:398:9 (libtserver.so+0xf1b06) #13 boost::_bi::bind_t<void, boost::_mfi::mf2<void, kudu::tserver::TSTabletManager, scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::tserver::TransitionInProgressDeleter> const&>, boost::_bi::list3<boost::_bi::value<kudu::tserver::TSTabletManager*>, boost::_bi::value<scoped_refptr<kudu::tablet::TabletMetadata> >, boost::_bi::value<scoped_refptr<kudu::tserver::TransitionInProgressDeleter> > > >::operator()() /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16 (libtserver.so+0xf1a63) #14 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf2<void, kudu::tserver::TSTabletManager, scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::tserver::TransitionInProgressDeleter> const&>, boost::_bi::list3<boost::_bi::value<kudu::tserver::TSTabletManager*>, boost::_bi::value<scoped_refptr<kudu::tablet::TabletMetadata> >, boost::_bi::value<scoped_refptr<kudu::tserver::TransitionInProgressDeleter> > > >, void>::invoke(boost::detail::function::function_buffer&) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11 (libtserver.so+0xf17d1) #15 boost::function0<void>::operator()() const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libkrpc.so+0xb0111) #16 kudu::FunctionRunnable::Run() /data/jenkins-workspace/kudu-workspace/src/kudu/util/threadpool.cc:56:5 (libkudu_util.so+0x1c4b1d) #17 kudu::ThreadPool::DispatchThread(bool) /data/jenkins-workspace/kudu-workspace/src/kudu/util/threadpool.cc:621:22 (libkudu_util.so+0x1c1e04) #18 boost::_mfi::mf1<void, kudu::ThreadPool, bool>::operator()(kudu::ThreadPool*, bool) const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:165:29 (libkudu_util.so+0x1ca38e) #19 void boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> >::operator()<boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf1<void, kudu::ThreadPool, bool>&, boost::_bi::list0&, int) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:319:9 (libkudu_util.so+0x1ca2cd) #20 boost::_bi::bind_t<void, boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> > >::operator()() /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16 (libkudu_util.so+0x1ca233) #21 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> > >, void>::invoke(boost::detail::function::function_buffer&) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11 (libkudu_util.so+0x1c9fd1) #22 boost::function0<void>::operator()() const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libkrpc.so+0xb0111) #23 kudu::Thread::SuperviseThread(void*) /data/jenkins-workspace/kudu-workspace/src/kudu/util/thread.cc:591:3 (libkudu_util.so+0x1b975e) Thread T76 'rpc worker-1602' (tid=16028, running) created by main thread at: #0 pthread_create /data/jenkins-workspace/kudu-workspace/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:897 (kudu-tserver+0x4549db) #1 kudu::Thread::StartThread(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, boost::function<void ()()> const&, unsigned long, scoped_refptr<kudu::Thread>*) /data/jenkins-workspace/kudu-workspace/src/kudu/util/thread.cc:514:15 (libkudu_util.so+0x1b8f57) #2 kudu::Status kudu::Thread::Create<void (kudu::rpc::ServicePool::*)(), kudu::rpc::ServicePool*>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, void (kudu::rpc::ServicePool::* const&)(), kudu::rpc::ServicePool* const&, scoped_refptr<kudu::Thread>*) /data/jenkins-workspace/kudu-workspace/src/kudu/util/thread.h:158:12 (libkrpc.so+0xe9615) #3 kudu::rpc::ServicePool::Init(int) /data/jenkins-workspace/kudu-workspace/src/kudu/rpc/service_pool.cc:81:5 (libkrpc.so+0xe7ea5) #4 kudu::RpcServer::RegisterService(gscoped_ptr<kudu::rpc::ServiceIf, kudu::DefaultDeleter<kudu::rpc::ServiceIf> >) /data/jenkins-workspace/kudu-workspace/src/kudu/server/rpc_server.cc:144:3 (libserver_process.so+0x561d1) #5 kudu::server::ServerBase::RegisterService(gscoped_ptr<kudu::rpc::ServiceIf, kudu::DefaultDeleter<kudu::rpc::ServiceIf> >) /data/jenkins-workspace/kudu-workspace/src/kudu/server/server_base.cc:364:23 (libserver_process.so+0x5c1cf) #6 kudu::tserver::TabletServer::Start() /data/jenkins-workspace/kudu-workspace/src/kudu/tserver/tablet_server.cc:117:3 (libtserver.so+0xc1441) #7 kudu::tserver::TabletServerMain(int, char**) /data/jenkins-workspace/kudu-workspace/src/kudu/tserver/tablet_server_main.cc:77:3 (kudu-tserver+0x4c3778) #8 main /data/jenkins-workspace/kudu-workspace/src/kudu/tserver/tablet_server_main.cc:91:10 (kudu-tserver+0x4c33ee) Thread T123 'tablet-open [wo' (tid=17160, finished) created by thread T56 at: #0 pthread_create /data/jenkins-workspace/kudu-workspace/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:897 (kudu-tserver+0x4549db) #1 kudu::Thread::StartThread(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, boost::function<void ()()> const&, unsigned long, scoped_refptr<kudu::Thread>*) /data/jenkins-workspace/kudu-workspace/src/kudu/util/thread.cc:514:15 (libkudu_util.so+0x1b8f57) #2 kudu::Status kudu::Thread::Create<void (kudu::ThreadPool::*)(bool), kudu::ThreadPool*, bool>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, void (kudu::ThreadPool::* const&)(bool), kudu::ThreadPool* const&, bool const&, scoped_refptr<kudu::Thread>*) /data/jenkins-workspace/kudu-workspace/src/kudu/util/thread.h:164:12 (libkudu_util.so+0x1c38b6) #3 kudu::ThreadPool::CreateThreadUnlocked() /data/jenkins-workspace/kudu-workspace/src/kudu/util/threadpool.cc:683:14 (libkudu_util.so+0x1c1491) #4 kudu::ThreadPool::DoSubmit(std::__1::shared_ptr<kudu::Runnable>, kudu::ThreadPoolToken*) /data/jenkins-workspace/kudu-workspace/src/kudu/util/threadpool.cc:477:21 (libkudu_util.so+0x1bf8e1) #5 kudu::ThreadPool::Submit(std::__1::shared_ptr<kudu::Runnable>) /data/jenkins-workspace/kudu-workspace/src/kudu/util/threadpool.cc:438:10 (libkudu_util.so+0x1c168f) #6 kudu::ThreadPool::SubmitFunc(boost::function<void ()()>) /data/jenkins-workspace/kudu-workspace/src/kudu/util/threadpool.cc:434:10 (libkudu_util.so+0x1c1729) #7 kudu::tserver::TSTabletManager::CreateNewTablet(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, kudu::Partition const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, kudu::Schema const&, kudu::PartitionSchema const&, kudu::consensus::RaftConfigPB, scoped_refptr<kudu::tablet::TabletReplica>*) /data/jenkins-workspace/kudu-workspace/src/kudu/tserver/ts_tablet_manager.cc:281:3 (libtserver.so+0xe79a1) #8 kudu::tserver::TabletServiceAdminImpl::CreateTablet(kudu::tserver::CreateTabletRequestPB const*, kudu::tserver::CreateTabletResponsePB*, kudu::rpc::RpcContext*) /data/jenkins-workspace/kudu-workspace/src/kudu/tserver/tablet_service.cc:678:34 (libtserver.so+0xc51ab) #9 kudu::tserver::TabletServerAdminServiceIf::TabletServerAdminServiceIf(scoped_refptr<kudu::MetricEntity> const&, scoped_refptr<kudu::rpc::ResultTracker> const&)::$_1::operator()(google::protobuf::Message const*, google::protobuf::Message*, kudu::rpc::RpcContext*) const /data/jenkins-workspace/kudu-workspace/build/tsan/src/kudu/tserver/tserver_admin.service.cc:58:13 (libtserver_admin_proto.so+0x27ec4) #10 _ZNSt3__18__invokeIRZN4kudu7tserver26TabletServerAdminServiceIfC1ERK13scoped_refptrINS1_12MetricEntityEERKS4_INS1_3rpc13ResultTrackerEEE3$_1JPKN6google8protobuf7MessageEPSI_PNS9_10RpcContextEEEEDTclclsr3std3__1E7forwardIT_Efp_Espclsr3std3__1E7forwardIT0_Efp0_EEEOSO_DpOSP_ /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/c++/v1/type_traits:4301:1 (libtserver_admin_proto.so+0x27e51) #11 _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN4kudu7tserver26TabletServerAdminServiceIfC1ERK13scoped_refptrINS3_12MetricEntityEERKS6_INS3_3rpc13ResultTrackerEEE3$_1PKN6google8protobuf7MessageEPSK_PNSB_10RpcContextEEEEvDpOT_ /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/c++/v1/__functional_base:359 (libtserver_admin_proto.so+0x27e51) #12 std::__1::__function::__func<kudu::tserver::TabletServerAdminServiceIf::TabletServerAdminServiceIf(scoped_refptr<kudu::MetricEntity> const&, scoped_refptr<kudu::rpc::ResultTracker> const&)::$_1, std::__1::allocator<kudu::tserver::TabletServerAdminServiceIf::TabletServerAdminServiceIf(scoped_refptr<kudu::MetricEntity> const&, scoped_refptr<kudu::rpc::ResultTracker> const&)::$_1>, void ()(google::protobuf::Message const*, google::protobuf::Message*, kudu::rpc::RpcContext*)>::operator()(google::protobuf::Message const*&&, google::protobuf::Message*&&, kudu::rpc::RpcContext*&&) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/c++/v1/functional:1552:12 (libtserver_admin_proto.so+0x27d64) #13 std::__1::function<void ()(google::protobuf::Message const*, google::protobuf::Message*, kudu::rpc::RpcContext*)>::operator()(google::protobuf::Message const*, google::protobuf::Message*, kudu::rpc::RpcContext*) const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/c++/v1/functional:1914:12 (libkrpc.so+0xe72e9) #14 kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*) /data/jenkins-workspace/kudu-workspace/src/kudu/rpc/service_if.cc:134:3 (libkrpc.so+0xe6bf4) #15 kudu::rpc::ServicePool::RunThread() /data/jenkins-workspace/kudu-workspace/src/kudu/rpc/service_pool.cc:210:15 (libkrpc.so+0xe82cd) #16 boost::_mfi::mf0<void, kudu::rpc::ServicePool>::operator()(kudu::rpc::ServicePool*) const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:49:29 (libkrpc.so+0xea2c6) #17 void boost::_bi::list1<boost::_bi::value<kudu::rpc::ServicePool*> >::operator()<boost::_mfi::mf0<void, kudu::rpc::ServicePool>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void, kudu::rpc::ServicePool>&, boost::_bi::list0&, int) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:259:9 (libkrpc.so+0xea21a) #18 boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::rpc::ServicePool>, boost::_bi::list1<boost::_bi::value<kudu::rpc::ServicePool*> > >::operator()() /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16 (libkrpc.so+0xea1a3) #19 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::rpc::ServicePool>, boost::_bi::list1<boost::_bi::value<kudu::rpc::ServicePool*> > >, void>::invoke(boost::detail::function::function_buffer&) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11 (libkrpc.so+0xe9fa9) #20 boost::function0<void>::operator()() const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libkrpc.so+0xb0111) #21 kudu::Thread::SuperviseThread(void*) /data/jenkins-workspace/kudu-workspace/src/kudu/util/thread.cc:591:3 (libkudu_util.so+0x1b975e) Change-Id: I581eaa49ed3bf705121bb9c00b58499482ed2f39 Reviewed-on: http://gerrit.cloudera.org:8080/7328 Reviewed-by: Mike Percy <[email protected]> Tested-by: Kudu Jenkins Reviewed-by: Todd Lipcon <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Jul 28, 2017
This fixes a TSAN race which showed up on the flaky dashboard: LinkedListTest.TestLoadAndVerify: WARNING: ThreadSanitizer: data race (pid=25635) Write of size 8 at 0x7f54913d69a8 by thread T126 (mutexes: write M1231): #0 kudu::process_memory::(anonymous namespace)::DoInitLimits() /data/jenkins-workspace/kudu-workspace/src/kudu/util/process_memory.cc:166:16 (libkudu_util.so+0x1a7279) #1 GoogleOnceInternalInit(int*, void (*)(), void (*)(void*), void*) /data/jenkins-workspace/kudu-workspace/src/kudu/gutil/once.cc:38:7 (libgutil.so+0x35507) #2 GoogleOnceInit(GoogleOnceType*, void (*)()) /data/jenkins-workspace/kudu-workspace/src/kudu/gutil/once.h:55:5 (libtserver.so+0xc5773) #3 kudu::process_memory::(anonymous namespace)::InitLimits() /data/jenkins-workspace/kudu-workspace/src/kudu/util/process_memory.cc:184:3 (libkudu_util.so+0x1a7071) #4 kudu::process_memory::UnderMemoryPressure(double*) /data/jenkins-workspace/kudu-workspace/src/kudu/util/process_memory.cc:221:3 (libkudu_util.so+0x1a6faa) #5 _ZNSt3__18__invokeIRPFbPdEJS1_EEEDTclclsr3std3__1E7forwardIT_Efp_Espclsr3std3__1E7forwardIT0_Efp0_EEEOS5_DpOS6_ /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/c++/v1/type_traits:4301:1 (libkudu_util.so+0x16ac0d) #6 _ZNSt3__128__invoke_void_return_wrapperIbE6__callIJRPFbPdES3_EEEbDpOT_ /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/c++/v1/__functional_base:328 (libkudu_util.so+0x16ac0d) #7 std::__1::__function::__func<bool (*)(double*), std::__1::allocator<bool (*)(double*)>, bool ()(double*)>::operator()(double*&&) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/c++/v1/functional:1552:12 (libkudu_util.so+0x16ab14) #8 std::__1::function<bool ()(double*)>::operator()(double*) const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/c++/v1/functional:1914:12 (libkudu_util.so+0x168cad) #9 kudu::MaintenanceManager::FindBestOp() /data/jenkins-workspace/kudu-workspace/src/kudu/util/maintenance_manager.cc:383:7 (libkudu_util.so+0x165a56) #10 kudu::MaintenanceManager::RunSchedulerThread() /data/jenkins-workspace/kudu-workspace/src/kudu/util/maintenance_manager.cc:245:25 (libkudu_util.so+0x164240) #11 boost::_mfi::mf0<void, kudu::MaintenanceManager>::operator()(kudu::MaintenanceManager*) const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:49:29 (libkudu_util.so+0x16ba16) #12 void boost::_bi::list1<boost::_bi::value<kudu::MaintenanceManager*> >::operator()<boost::_mfi::mf0<void, kudu::MaintenanceManager>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void, kudu::MaintenanceManager>&, boost::_bi::list0&, int) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:259:9 (libkudu_util.so+0x16b96a) #13 boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::MaintenanceManager>, boost::_bi::list1<boost::_bi::value<kudu::MaintenanceManager*> > >::operator()() /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16 (libkudu_util.so+0x16b8f3) #14 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::MaintenanceManager>, boost::_bi::list1<boost::_bi::value<kudu::MaintenanceManager*> > >, void>::invoke(boost::detail::function::function_buffer&) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11 (libkudu_util.so+0x16b6f9) #15 boost::function0<void>::operator()() const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libkrpc.so+0xb01c1) #16 kudu::Thread::SuperviseThread(void*) /data/jenkins-workspace/kudu-workspace/src/kudu/util/thread.cc:591:3 (libkudu_util.so+0x1bf37e) Previous read of size 8 at 0x7f54913d69a8 by thread T124: #0 kudu::process_memory::HardLimit() /data/jenkins-workspace/kudu-workspace/src/kudu/util/process_memory.cc:217:10 (libkudu_util.so+0x1a6f6a) #1 kudu::MemTrackersHandler(kudu::WebCallbackRegistry::WebRequest const&, std::__1::basic_ostringstream<char, std::__1::char_traits<char>, std::__1::allocator<char> >*) /data/jenkins-workspace/kudu-workspace/src/kudu/server/default-path-handlers.cc:151:24 (libserver_process.so+0x487dd) #2 boost::detail::function::void_function_invoker2<void (*)(kudu::WebCallbackRegistry::WebRequest const&, std::__1::basic_ostringstream<char, std::__1::char_traits<char>, std::__1::allocator<char> >*), void, kudu::WebCallbackRegistry::WebRequest const&, std::__1::basic_ostringstream<char, std::__1::char_traits<char>, std::__1::allocator<char> >*>::invoke(boost::detail::function::function_buffer&, kudu::WebCallbackRegistry::WebRequest const&, std::__1::basic_ostringstream<char, std::__1::char_traits<char>, std::__1::allocator<char> >*) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:118:11 (libserver_process.so+0x4d672) #3 boost::function2<void, kudu::WebCallbackRegistry::WebRequest const&, std::__1::basic_ostringstream<char, std::__1::char_traits<char>, std::__1::allocator<char> >*>::operator()(kudu::WebCallbackRegistry::WebRequest const&, std::__1::basic_ostringstream<char, std::__1::char_traits<char>, std::__1::allocator<char> >*) const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libserver_process.so+0x74d40) #4 kudu::Webserver::RunPathHandler(kudu::Webserver::PathHandler const&, sq_connection*, sq_request_info*) /data/jenkins-workspace/kudu-workspace/src/kudu/server/webserver.cc:421:5 (libserver_process.so+0x735ed) #5 kudu::Webserver::BeginRequestCallback(sq_connection*, sq_request_info*) /data/jenkins-workspace/kudu-workspace/src/kudu/server/webserver.cc:365:10 (libserver_process.so+0x73150) #6 kudu::Webserver::BeginRequestCallbackStatic(sq_connection*) /data/jenkins-workspace/kudu-workspace/src/kudu/server/webserver.cc:340:20 (libserver_process.so+0x72b98) #7 handle_request /data/jenkins-workspace/kudu-workspace/thirdparty/src/squeasel-c304d3f3481b07bf153979155f02e0aab24d01de/squeasel.c:3854:7 (libserver_process.so+0x883f0) #8 process_new_connection /data/jenkins-workspace/kudu-workspace/thirdparty/src/squeasel-c304d3f3481b07bf153979155f02e0aab24d01de/squeasel.c:4464:7 (libserver_process.so+0x8645b) #9 worker_thread /data/jenkins-workspace/kudu-workspace/thirdparty/src/squeasel-c304d3f3481b07bf153979155f02e0aab24d01de/squeasel.c:4596 (libserver_process.so+0x8645b) Change-Id: Ia9fc135ee2b6bb7fc7d3501750123d0b556526e0 Reviewed-on: http://gerrit.cloudera.org:8080/7534 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Aug 22, 2017
It's important that we differentiate between when we have a known last-logged op and when we don't actually know what it is or whether we have ever appended something to the local WAL. This applies both to the TABLET_DATA_READY case, where this information is stored in the WAL, and the TABLET_DATA_TOMBSTONED case, where this information is stored in the superblock. Cases where we are unable to determine the last-logged OpId from the WAL when a replica is in TABLET_DATA_READY state: * Early in the tablet replica lifecycle (before Raft is started). * When a replica encounters an error during initialization. Cases where we are unable to determine the last-logged OpId from the TabletMetadata when a replica is in TABLET_DATA_TOMBSTONED state: * If the replica was tombstoned while in a failed state. Included in this patch are the following API improvements: 1. Delete Log::GetLatestEntryOpId(). Previously, this method would only return something other than MinimumOpId() if a log entry had been appended during the object's lifetime. It is abandoned in favor of RaftConsensus::GetLastOpId(RECEIVED_OPID) which delegates to PeerMessageQueue::GetLastOpIdInLog(). 2. Merge PeerMessageQueue::Init() into the PeerMessageQueue constructor. This allows us to remove one lifecycle state and allows us to guarantee that, once the queue is constructed, we can always get a valid last-logged opid from it (see #1). 3. Make TabletMetadata::tombstone_last_logged_opid() return a boost::optional<OpId>. We need to clearly differentiate between when we know the last-logged opid and when we don't. We also consider MinimumOpId() to be equal to boost::none at superblock load time, since previous versions of Kudu may have written (0,0) into the TabletMetadata 'tombstone_last_logged_opid' field. Change-Id: Ia4e4501a61cd40fdee0dc918b77675a0bc2515e7 Reviewed-on: http://gerrit.cloudera.org:8080/7717 Reviewed-by: Todd Lipcon <[email protected]> Tested-by: Kudu Jenkins
asfgit
pushed a commit
that referenced
this pull request
Sep 19, 2017
WARNING: ThreadSanitizer: data race (pid=14563) Read of size 4 at 0x7fff3db0f228 by thread T59: #0 kudu::master::Master::ToString() const /home/jenkins-slave/workspace/kudu-master/3/src/kudu/master/master.cc:109 (libmaster.so+0xe76bc) #1 kudu::master::Master::InitCatalogManagerTask() /home/jenkins-slave/workspace/kudu-master/3/src/kudu/master/master.cc:180 (discriminator 2) (libmaster.so+0xe87b5) #2 kudu::internal::RunnableAdapter<void (kudu::master::Master::*)()>::Run(kudu::master::Master*) /home/jenkins-slave/workspace/kudu-master/3/src/kudu/gutil/bind_internal.h:136 (discriminator 3) (libmaster.so+0xec6a6) #3 kudu::internal::InvokeHelper<false, void, kudu::internal::RunnableAdapter<void (kudu::master::Master::*)()>, void (kudu::master::Master*)>::MakeItSo(kudu::internal::RunnableAdapter<void (kudu::master::Master::*)()>, kudu::master::Master*) /home/jenkins-slave/workspace/kudu-master/3/src/kudu/gutil/bind_internal.h:873 (discriminator 1) (libmaster.so+0xec5c5) #4 kudu::internal::Invoker<1, kudu::internal::BindState<kudu::internal::RunnableAdapter<void (kudu::master::Master::*)()>, void (kudu::master::Master*), void (kudu::internal::UnretainedWrapper<kudu::master::Master>)>, void (kudu::master::Master*)>::Run(kudu::internal::BindStateBase*) /home/jenkins-slave/workspace/kudu-master/3/src/kudu/gutil/bind_internal.h:1065 (libmaster.so+0xec50a) #5 kudu::Callback<void ()>::Run() const /home/jenkins-slave/workspace/kudu-master/3/src/kudu/gutil/callback.h:396 (discriminator 1) (libconsensus.so+0x9395d) #6 kudu::ClosureRunnable::Run() /home/jenkins-slave/workspace/kudu-master/3/src/kudu/util/threadpool.cc:74 (libkudu_util.so+0x1be0ad) #7 kudu::ThreadPool::DispatchThread(bool) /home/jenkins-slave/workspace/kudu-master/3/src/kudu/util/threadpool.cc:631 (libkudu_util.so+0x1bb3e1) #8 boost::_mfi::mf1<void, kudu::ThreadPool, bool>::operator()(kudu::ThreadPool*, bool) const /home/jenkins-slave/workspace/kudu-master/3/thirdparty/installed/common/include/boost/bind/mem_fn_template.hpp:165 (discriminator 3) (libkudu_util.so+0x1c3c9e) #9 void boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> >::operator()<boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf1<void, kudu::ThreadPool, bool>&, boost::_bi::list0&, int) /home/jenkins-slave/workspace/kudu-master/3/thirdparty/installed/common/include/boost/bind/bind.hpp:319 (discriminator 3) (libkudu_util.so+0x1c3bdd) #10 boost::_bi::bind_t<void, boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> > >::operator()() /home/jenkins-slave/workspace/kudu-master/3/thirdparty/installed/common/include/boost/bind/bind.hpp:1222 (libkudu_util.so+0x1c3b43) #11 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> > >, void>::invoke(boost::detail::function::function_buffer&) /home/jenkins-slave/workspace/kudu-master/3/thirdparty/installed/common/include/boost/function/function_template.hpp:159 (libkudu_util.so+0x1c38e1) #12 boost::function0<void>::operator()() const /home/jenkins-slave/workspace/kudu-master/3/thirdparty/installed/common/include/boost/function/function_template.hpp:770 (discriminator 1) (libkrpc.so+0xb74e1) #13 kudu::Thread::SuperviseThread(void*) /home/jenkins-slave/workspace/kudu-master/3/src/kudu/util/thread.cc:602 (libkudu_util.so+0x1b268e) Previous write of size 4 at 0x7fff3db0f228 by main thread: #0 kudu::master::Master::StartAsync() /home/jenkins-slave/workspace/kudu-master/3/src/kudu/master/master.cc:172 (libmaster.so+0xe80b6) #1 kudu::master::Master::Start() /home/jenkins-slave/workspace/kudu-master/3/src/kudu/master/master.cc:143 (discriminator 1) (libmaster.so+0xe7ce5) #2 kudu::master::MasterMain(int, char**) /home/jenkins-slave/workspace/kudu-master/3/src/kudu/master/master_main.cc:77 (discriminator 1) (kudu-master+0x4b4d66) #3 main /home/jenkins-slave/workspace/kudu-master/3/src/kudu/master/master_main.cc:91 (kudu-master+0x4b49fe) Location is stack of main thread. Thread T59 'init [worker]-1' (tid=14675, running) created by main thread at: #0 pthread_create /home/jenkins-slave/workspace/kudu-master/3/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:897 (kudu-master+0x452053) #1 kudu::Thread::StartThread(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, boost::function<void ()> const&, unsigned long, scoped_refptr<kudu::Thread>*) /home/jenkins-slave/workspace/kudu-master/3/src/kudu/util/thread.cc:525 (discriminator 2) (libkudu_util.so+0x1b1e87) #2 kudu::Status kudu::Thread::Create<void (kudu::ThreadPool::*)(bool), kudu::ThreadPool*, bool>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, void (kudu::ThreadPool::* const&)(bool), kudu::ThreadPool* const&, bool const&, scoped_refptr<kudu::Thread>*) /home/jenkins-slave/workspace/kudu-master/3/src/kudu/util/thread.h:171 (libkudu_util.so+0x1bd026) #3 kudu::ThreadPool::CreateThreadUnlocked() /home/jenkins-slave/workspace/kudu-master/3/src/kudu/util/threadpool.cc:696 (discriminator 4) (libkudu_util.so+0x1baa01) #4 kudu::ThreadPool::DoSubmit(std::__1::shared_ptr<kudu::Runnable>, kudu::ThreadPoolToken*) /home/jenkins-slave/workspace/kudu-master/3/src/kudu/util/threadpool.cc:493 (libkudu_util.so+0x1b8ec1) #5 kudu::ThreadPool::Submit(std::__1::shared_ptr<kudu::Runnable>) /home/jenkins-slave/workspace/kudu-master/3/src/kudu/util/threadpool.cc:454 (libkudu_util.so+0x1bad1f) #6 kudu::ThreadPool::SubmitClosure(kudu::Callback<void ()>) /home/jenkins-slave/workspace/kudu-master/3/src/kudu/util/threadpool.cc:446 (libkudu_util.so+0x1bac59) #7 kudu::master::Master::StartAsync() /home/jenkins-slave/workspace/kudu-master/3/src/kudu/master/master.cc:169 (discriminator 4) (libmaster.so+0xe8088) #8 kudu::master::Master::Start() /home/jenkins-slave/workspace/kudu-master/3/src/kudu/master/master.cc:143 (discriminator 1) (libmaster.so+0xe7ce5) #9 kudu::master::MasterMain(int, char**) /home/jenkins-slave/workspace/kudu-master/3/src/kudu/master/master_main.cc:77 (discriminator 1) (kudu-master+0x4b4d66) #10 main /home/jenkins-slave/workspace/kudu-master/3/src/kudu/master/master_main.cc:91 (kudu-master+0x4b49fe) SUMMARY: ThreadSanitizer: data race /home/jenkins-slave/workspace/kudu-master/3/src/kudu/master/master.cc:109 in kudu::master::Master::ToString() const Change-Id: Ie6bb7c01afcb0219b93eddec79631768ee352516 Reviewed-on: http://gerrit.cloudera.org:8080/8092 Tested-by: Kudu Jenkins Reviewed-by: David Ribeiro Alves <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Sep 28, 2017
MasterTest.TestRegisterAndHeartbeat: WARNING: ThreadSanitizer: data race (pid=23366) Write of size 8 at 0x7b4c00000d60 by main thread: #0 kudu::master::CatalogManager::ScopedLeaderDisablerForTests::ScopedLeaderDisablerForTests(kudu::master::CatalogManager*) /data/jenkins-workspace/kudu-workspace/src/kudu/master/catalog_manager.h:448:36 (master-test+0x50d9b7) #1 kudu::master::MasterTest_TestRegisterAndHeartbeat_Test::TestBody() /data/jenkins-workspace/kudu-workspace/src/kudu/master/master-test.cc:238:50 (master-test+0x4f07b2) #2 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /data/jenkins-workspace/kudu-workspace/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2402:10 (libgmock.so+0x52ac9) #3 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /data/jenkins-workspace/kudu-workspace/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2438 (libgmock.so+0x52ac9) #4 testing::Test::Run() /data/jenkins-workspace/kudu-workspace/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2474:5 (libgmock.so+0x32cb7) #5 testing::TestInfo::Run() /data/jenkins-workspace/kudu-workspace/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2656:11 (libgmock.so+0x34156) #6 testing::TestCase::Run() /data/jenkins-workspace/kudu-workspace/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2774:28 (libgmock.so+0x34ec6) #7 testing::internal::UnitTestImpl::RunAllTests() /data/jenkins-workspace/kudu-workspace/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:4649:43 (libgmock.so+0x40916) #8 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /data/jenkins-workspace/kudu-workspace/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2402:10 (libgmock.so+0x539a9) #9 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /data/jenkins-workspace/kudu-workspace/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2438 (libgmock.so+0x539a9) #10 testing::UnitTest::Run() /data/jenkins-workspace/kudu-workspace/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:4257:10 (libgmock.so+0x40361) #11 RUN_ALL_TESTS() /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/gtest/gtest.h:2233:46 (libkudu_test_main.so+0x338b) #12 main /data/jenkins-workspace/kudu-workspace/src/kudu/util/test_main.cc:106:13 (libkudu_test_main.so+0x2a46) Previous read of size 8 at 0x7b4c00000d60 by thread T79 (mutexes: read M3335, write M3813): #0 kudu::master::CatalogManager::ScopedLeaderSharedLock::ScopedLeaderSharedLock(kudu::master::CatalogManager*) /data/jenkins-workspace/kudu-workspace/src/kudu/master/catalog_manager.cc:4218:7 (libmaster.so+0xabc37) #1 kudu::master::CatalogManagerBgTasks::Run() /data/jenkins-workspace/kudu-workspace/src/kudu/master/catalog_manager.cc:490:46 (libmaster.so+0x90e26) #2 boost::_mfi::mf0<void, kudu::master::CatalogManagerBgTasks>::operator()(kudu::master::CatalogManagerBgTasks*) const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:49:29 (libmaster.so+0xd4676) #3 void boost::_bi::list1<boost::_bi::value<kudu::master::CatalogManagerBgTasks*> >::operator()<boost::_mfi::mf0<void, kudu::master::CatalogManagerBgTasks>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void, kudu::master::CatalogManagerBgTasks>&, boost::_bi::list0&, int) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:259:9 (libmaster.so+0xd45ca) #4 boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::master::CatalogManagerBgTasks>, boost::_bi::list1<boost::_bi::value<kudu::master::CatalogManagerBgTasks*> > >::operator()() /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16 (libmaster.so+0xd4553) #5 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::master::CatalogManagerBgTasks>, boost::_bi::list1<boost::_bi::value<kudu::master::CatalogManagerBgTasks*> > >, void>::invoke(boost::detail::function::function_buffer&) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11 (libmaster.so+0xd4359) #6 boost::function0<void>::operator()() const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libkrpc.so+0xb5d81) #7 kudu::Thread::SuperviseThread(void*) /data/jenkins-workspace/kudu-workspace/src/kudu/util/thread.cc:601:3 (libkudu_util.so+0x1b279e) Location is heap block of size 408 at 0x7b4c00000c40 allocated by main thread: #0 operator new(unsigned long) /data/jenkins-workspace/kudu-workspace/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_new_delete.cc:41 (master-test+0x4ec0d3) #1 kudu::master::Master::Master(kudu::master::MasterOptions const&) /data/jenkins-workspace/kudu-workspace/src/kudu/master/master.cc:97:22 (libmaster.so+0xe306f) #2 kudu::master::MiniMaster::Start() /data/jenkins-workspace/kudu-workspace/src/kudu/master/mini_master.cc:91:33 (libmaster.so+0xfeac9) #3 kudu::master::MasterTest::SetUp() /data/jenkins-workspace/kudu-workspace/src/kudu/master/master-test.cc:108:5 (master-test+0x512c95) #4 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /data/jenkins-workspace/kudu-workspace/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2402:10 (libgmock.so+0x52ac9) #5 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /data/jenkins-workspace/kudu-workspace/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2438 (libgmock.so+0x52ac9) #6 testing::Test::Run() /data/jenkins-workspace/kudu-workspace/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2470:3 (libgmock.so+0x32c46) #7 testing::TestInfo::Run() /data/jenkins-workspace/kudu-workspace/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2656:11 (libgmock.so+0x34156) #8 testing::TestCase::Run() /data/jenkins-workspace/kudu-workspace/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2774:28 (libgmock.so+0x34ec6) #9 testing::internal::UnitTestImpl::RunAllTests() /data/jenkins-workspace/kudu-workspace/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:4649:43 (libgmock.so+0x40916) #10 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /data/jenkins-workspace/kudu-workspace/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2402:10 (libgmock.so+0x539a9) #11 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /data/jenkins-workspace/kudu-workspace/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2438 (libgmock.so+0x539a9) #12 testing::UnitTest::Run() /data/jenkins-workspace/kudu-workspace/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:4257:10 (libgmock.so+0x40361) #13 RUN_ALL_TESTS() /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/gtest/gtest.h:2233:46 (libkudu_test_main.so+0x338b) #14 main /data/jenkins-workspace/kudu-workspace/src/kudu/util/test_main.cc:106:13 (libkudu_test_main.so+0x2a46) Mutex M3335 (0x7b4c00000d68) created at: #0 pthread_rwlock_init /data/jenkins-workspace/kudu-workspace/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1205 (master-test+0x47e40c) #1 kudu::RWMutex::Init(kudu::RWMutex::Priority) /data/jenkins-workspace/kudu-workspace/src/kudu/util/rw_mutex.cc:78:8 (libkudu_util.so+0x19ecc8) #2 kudu::RWMutex::RWMutex(kudu::RWMutex::Priority) /data/jenkins-workspace/kudu-workspace/src/kudu/util/rw_mutex.cc:56:3 (libkudu_util.so+0x19ef23) #3 kudu::master::CatalogManager::CatalogManager(kudu::master::Master*) /data/jenkins-workspace/kudu-workspace/src/kudu/master/catalog_manager.cc:733:5 (libmaster.so+0x92a4a) #4 kudu::master::Master::Master(kudu::master::MasterOptions const&) /data/jenkins-workspace/kudu-workspace/src/kudu/master/master.cc:97:26 (libmaster.so+0xe307d) #5 kudu::master::MiniMaster::Start() /data/jenkins-workspace/kudu-workspace/src/kudu/master/mini_master.cc:91:33 (libmaster.so+0xfeac9) #6 kudu::master::MasterTest::SetUp() /data/jenkins-workspace/kudu-workspace/src/kudu/master/master-test.cc:108:5 (master-test+0x512c95) #7 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /data/jenkins-workspace/kudu-workspace/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2402:10 (libgmock.so+0x52ac9) #8 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /data/jenkins-workspace/kudu-workspace/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2438 (libgmock.so+0x52ac9) #9 testing::Test::Run() /data/jenkins-workspace/kudu-workspace/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2470:3 (libgmock.so+0x32c46) #10 testing::TestInfo::Run() /data/jenkins-workspace/kudu-workspace/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2656:11 (libgmock.so+0x34156) #11 testing::TestCase::Run() /data/jenkins-workspace/kudu-workspace/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2774:28 (libgmock.so+0x34ec6) #12 testing::internal::UnitTestImpl::RunAllTests() /data/jenkins-workspace/kudu-workspace/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:4649:43 (libgmock.so+0x40916) #13 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /data/jenkins-workspace/kudu-workspace/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2402:10 (libgmock.so+0x539a9) #14 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /data/jenkins-workspace/kudu-workspace/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2438 (libgmock.so+0x539a9) #15 testing::UnitTest::Run() /data/jenkins-workspace/kudu-workspace/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:4257:10 (libgmock.so+0x40361) #16 RUN_ALL_TESTS() /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/gtest/gtest.h:2233:46 (libkudu_test_main.so+0x338b) #17 main /data/jenkins-workspace/kudu-workspace/src/kudu/util/test_main.cc:106:13 (libkudu_test_main.so+0x2a46) Mutex M3813 (0x7b4c00000d50) created at: #0 __tsan_atomic32_compare_exchange_strong /data/jenkins-workspace/kudu-workspace/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cc:756 (master-test+0x4a8d08) #1 base::subtle::Acquire_CompareAndSwap(int volatile*, int, int) /data/jenkins-workspace/kudu-workspace/src/kudu/gutil/atomicops-internals-tsan.h:84:3 (libkudu_client.so+0xb4087) #2 base::SpinLock::Lock() /data/jenkins-workspace/kudu-workspace/src/kudu/gutil/spinlock.h:74:9 (libkudu_client.so+0xb3ff0) #3 kudu::simple_spinlock::lock() /data/jenkins-workspace/kudu-workspace/src/kudu/util/locks.h:44:8 (libkudu_client.so+0xb3fa9) #4 std::__1::lock_guard<kudu::simple_spinlock>::lock_guard(kudu::simple_spinlock&) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/c++/v1/__mutex_base:108:27 (libmaster.so+0x96321) #5 kudu::master::CatalogManager::IsInitialized() const /data/jenkins-workspace/kudu-workspace/src/kudu/master/catalog_manager.cc:1134 (libmaster.so+0x96321) #6 kudu::master::Master::InitCatalogManager() /data/jenkins-workspace/kudu-workspace/src/kudu/master/master.cc:186:25 (libmaster.so+0xe47e2) #7 kudu::master::Master::InitCatalogManagerTask() /data/jenkins-workspace/kudu-workspace/src/kudu/master/master.cc:178:14 (libmaster.so+0xe4692) #8 kudu::internal::RunnableAdapter<void (kudu::master::Master::*)()>::Run(kudu::master::Master*) /data/jenkins-workspace/kudu-workspace/src/kudu/gutil/bind_internal.h:136:12 (libmaster.so+0xe8876) #9 kudu::internal::InvokeHelper<false, void, kudu::internal::RunnableAdapter<void (kudu::master::Master::*)()>, void ()(kudu::master::Master*)>::MakeItSo(kudu::internal::RunnableAdapter<void (kudu::master::Master::*)()>, kudu::master::Master*) /data/jenkins-workspace/kudu-workspace/src/kudu/gutil/bind_internal.h:873:14 (libmaster.so+0xe8795) #10 kudu::internal::Invoker<1, kudu::internal::BindState<kudu::internal::RunnableAdapter<void (kudu::master::Master::*)()>, void ()(kudu::master::Master*), void ()(kudu::internal::UnretainedWrapper<kudu::master::Master>)>, void ()(kudu::master::Master*)>::Run(kudu::internal::BindStateBase*) /data/jenkins-workspace/kudu-workspace/src/kudu/gutil/bind_internal.h:1065:12 (libmaster.so+0xe86da) #11 kudu::Callback<void ()()>::Run() const /data/jenkins-workspace/kudu-workspace/src/kudu/gutil/callback.h:396:12 (libconsensus.so+0x9259d) #12 kudu::ClosureRunnable::Run() /data/jenkins-workspace/kudu-workspace/src/kudu/util/threadpool.cc:74:9 (libkudu_util.so+0x1be53d) #13 kudu::ThreadPool::DispatchThread(bool) /data/jenkins-workspace/kudu-workspace/src/kudu/util/threadpool.cc:631:22 (libkudu_util.so+0x1bb871) #14 boost::_mfi::mf1<void, kudu::ThreadPool, bool>::operator()(kudu::ThreadPool*, bool) const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:165:29 (libkudu_util.so+0x1c412e) #15 void boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> >::operator()<boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf1<void, kudu::ThreadPool, bool>&, boost::_bi::list0&, int) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:319:9 (libkudu_util.so+0x1c406d) #16 boost::_bi::bind_t<void, boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> > >::operator()() /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16 (libkudu_util.so+0x1c3fd3) #17 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> > >, void>::invoke(boost::detail::function::function_buffer&) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11 (libkudu_util.so+0x1c3d71) #18 boost::function0<void>::operator()() const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libkrpc.so+0xb5d81) #19 kudu::Thread::SuperviseThread(void*) /data/jenkins-workspace/kudu-workspace/src/kudu/util/thread.cc:601:3 (libkudu_util.so+0x1b279e) Thread T79 'bgtasks-24777' (tid=24777, running) created by thread T47 at: #0 pthread_create /data/jenkins-workspace/kudu-workspace/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:897 (master-test+0x47fd4b) #1 kudu::Thread::StartThread(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, boost::function<void ()()> const&, unsigned long, scoped_refptr<kudu::Thread>*) /data/jenkins-workspace/kudu-workspace/src/kudu/util/thread.cc:524:15 (libkudu_util.so+0x1b1f97) #2 kudu::Status kudu::Thread::Create<void (kudu::master::CatalogManagerBgTasks::*)(), kudu::master::CatalogManagerBgTasks*>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, void (kudu::master::CatalogManagerBgTasks::* const&)(), kudu::master::CatalogManagerBgTasks* const&, scoped_refptr<kudu::Thread>*) /data/jenkins-workspace/kudu-workspace/src/kudu/util/thread.h:165:12 (libmaster.so+0xafcb5) #3 kudu::master::CatalogManagerBgTasks::Init() /data/jenkins-workspace/kudu-workspace/src/kudu/master/catalog_manager.cc:470:3 (libmaster.so+0x90cff) #4 kudu::master::CatalogManager::Init(bool) /data/jenkins-workspace/kudu-workspace/src/kudu/master/catalog_manager.cc:766:3 (libmaster.so+0x931de) #5 kudu::master::Master::InitCatalogManager() /data/jenkins-workspace/kudu-workspace/src/kudu/master/master.cc:189:3 (libmaster.so+0xe483f) #6 kudu::master::Master::InitCatalogManagerTask() /data/jenkins-workspace/kudu-workspace/src/kudu/master/master.cc:178:14 (libmaster.so+0xe4692) #7 kudu::internal::RunnableAdapter<void (kudu::master::Master::*)()>::Run(kudu::master::Master*) /data/jenkins-workspace/kudu-workspace/src/kudu/gutil/bind_internal.h:136:12 (libmaster.so+0xe8876) #8 kudu::internal::InvokeHelper<false, void, kudu::internal::RunnableAdapter<void (kudu::master::Master::*)()>, void ()(kudu::master::Master*)>::MakeItSo(kudu::internal::RunnableAdapter<void (kudu::master::Master::*)()>, kudu::master::Master*) /data/jenkins-workspace/kudu-workspace/src/kudu/gutil/bind_internal.h:873:14 (libmaster.so+0xe8795) #9 kudu::internal::Invoker<1, kudu::internal::BindState<kudu::internal::RunnableAdapter<void (kudu::master::Master::*)()>, void ()(kudu::master::Master*), void ()(kudu::internal::UnretainedWrapper<kudu::master::Master>)>, void ()(kudu::master::Master*)>::Run(kudu::internal::BindStateBase*) /data/jenkins-workspace/kudu-workspace/src/kudu/gutil/bind_internal.h:1065:12 (libmaster.so+0xe86da) #10 kudu::Callback<void ()()>::Run() const /data/jenkins-workspace/kudu-workspace/src/kudu/gutil/callback.h:396:12 (libconsensus.so+0x9259d) #11 kudu::ClosureRunnable::Run() /data/jenkins-workspace/kudu-workspace/src/kudu/util/threadpool.cc:74:9 (libkudu_util.so+0x1be53d) #12 kudu::ThreadPool::DispatchThread(bool) /data/jenkins-workspace/kudu-workspace/src/kudu/util/threadpool.cc:631:22 (libkudu_util.so+0x1bb871) #13 boost::_mfi::mf1<void, kudu::ThreadPool, bool>::operator()(kudu::ThreadPool*, bool) const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:165:29 (libkudu_util.so+0x1c412e) #14 void boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> >::operator()<boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf1<void, kudu::ThreadPool, bool>&, boost::_bi::list0&, int) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:319:9 (libkudu_util.so+0x1c406d) #15 boost::_bi::bind_t<void, boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> > >::operator()() /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16 (libkudu_util.so+0x1c3fd3) #16 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> > >, void>::invoke(boost::detail::function::function_buffer&) /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11 (libkudu_util.so+0x1c3d71) #17 boost::function0<void>::operator()() const /data/jenkins-workspace/kudu-workspace/thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libkrpc.so+0xb5d81) #18 kudu::Thread::SuperviseThread(void*) /data/jenkins-workspace/kudu-workspace/src/kudu/util/thread.cc:601:3 (libkudu_util.so+0x1b279e) Change-Id: I8f0363eb963e55a9ecf02bb68616e2925d8cc6cc Reviewed-on: http://gerrit.cloudera.org:8080/8159 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Nov 21, 2017
TSAN builds were significantly flaky with the following error: ==8581==ERROR: AddressSanitizer: heap-use-after-free on address 0x615000003fe8 at pc 0x7fde311d3229 bp 0x7fde16b56a10 sp 0x7fde16b56a08 READ of size 8 at 0x615000003fe8 thread T12 (test-raft-pool ) #0 0x7fde311d3228 in std::_Hashtable<std::string, std::pair<std::string const, kudu::consensus::PeerMessageQueue::TrackedPeer*>, std::allocator<std::pair<std::string const, kudu::consensus::PeerMessageQueue::TrackedPeer*> >, std::__detail::_Se #1 0x7fde311d4c68 in std::_Hashtable<std::string, std::pair<std::string const, kudu::consensus::PeerMessageQueue::TrackedPeer*>, std::allocator<std::pair<std::string const, kudu::consensus::PeerMessageQueue::TrackedPeer*> >, std::__detail::_Se #2 0x7fde311c45bc in std::unordered_map<std::string, kudu::consensus::PeerMessageQueue::TrackedPeer*, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, kudu::consensus::PeerMessageQueue::TrackedPee #3 0x7fde311b55e4 in kudu::consensus::PeerMessageQueue::UntrackPeer(std::string const&) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus_queue.cc:241:23 #4 0x7fde311a580c in kudu::consensus::Peer::Close() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus_peers.cc:446:11 #5 0x7fde311a599a in kudu::consensus::Peer::~Peer() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus_peers.cc:450:3 #6 0x7fde311af50d in std::_Sp_counted_ptr<kudu::consensus::Peer*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() /opt/rh/devtoolset-3/root/usr/lib/gcc/x86_64-redhat-linux/4.9.2/../../../../include/c++/4.9.2/bits/shared_ptr_base.h:373:9 #7 0x560f75 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() /opt/rh/devtoolset-3/root/usr/lib/gcc/x86_64-redhat-linux/4.9.2/../../../../include/c++/4.9.2/bits/shared_ptr_base.h:149:6 #9 0x569231 in boost::function0<void>::~function0() /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/uninstrumented/include/boost/function/function_template.hpp:763:34 #10 0x568777 in kudu::consensus::TestPeerProxy::Respond(kudu::consensus::TestPeerProxy::Method) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus-test-util.h:157:3 #11 0x56f740 in kudu::consensus::DelayablePeerProxy<kudu::consensus::NoOpTestPeerProxy>::RespondUnlessDelayed(kudu::consensus::TestPeerProxy::Method) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus-test-util.h:1 #17 0x7fde2689fb3e in kudu::Thread::SuperviseThread(void*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/thread.cc:624:3 0x615000003fe8 is located 232 bytes inside of 488-byte region [0x615000003f00,0x6150000040e8) freed by thread T0 here: #0 0x551210 in operator delete(void*) /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/asan/asan_new_delete.cc:126 #1 0x55c522 in kudu::consensus::ConsensusPeersTest::~ConsensusPeersTest() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus_peers-test.cc:74:7 #2 0x55ad89 in kudu::consensus::ConsensusPeersTest_TestRemotePeer_Test::~ConsensusPeersTest_TestRemotePeer_Test() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus_peers-test.cc:174:1 previously allocated by thread T0 here: #0 0x5504d0 in operator new(unsigned long) /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/asan/asan_new_delete.cc:82 #1 0x55b3e8 in kudu::consensus::ConsensusPeersTest::SetUp() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus_peers-test.cc:100:26 Specifically, the thread destructor was destructing the ConsensusQueue before letting tasks drain from the "raft_pool_". One of those tasks might have been associated with one of the peers that is owned by the queue. This just adds the appropriate Wait() call to let the tasks drain before destruction. To test I ran: ./build-support/dist_test.py loop -n 100 build/latest/bin/consensus_peers-test --gtest_filter=\*RemotePeer\* --stress-cpu-threads=8 --gtest_repeat=100 Prior to this fix, 100/100 failed[1]. After the fix, 100/100 succeeded[2]. [1] http://dist-test.cloudera.org/job?job_id=todd.1511292638.95033 [2] http://dist-test.cloudera.org/job?job_id=todd.1511292986.102150 Change-Id: I33f5e2da3d2d6c275ece20265b6a455e2c4b967d Reviewed-on: http://gerrit.cloudera.org:8080/8625 Tested-by: Kudu Jenkins Reviewed-by: Mike Percy <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Nov 28, 2017
This patch makes the pre_flush_callback_ protected by the flush_lock_ to address the case of concurrent calls to TabletMetadata::DeleteTabletData() and Tablet::Shutdown(). The call to Tablet::Shutdown() was originated from TabletReplica::OnDiskSize() when the local shared_ptr variable ended up keeping the last reference to the object corresponding to the tablet being deleted. The issue contributed to the flakiness at least of the following tests: * CreateTableStressTest.CreateAndDeleteBigTable * DeleteTableWhileScanInProgressParamTest Prior to this patch, TSAN reported about read/write race for the callback with the traces like the following: Read of size 8 at 0x7b50000703d0 by thread T56 (mutexes: write M2161): #0 kudu::Callback<kudu::Status ()()>::Run() const /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/gutil/callback.h:394:45 (libtablet.so+0x1caa81) #1 kudu::tablet::TabletMetadata::Flush() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tablet/tablet_metadata.cc:549:23 (libtablet.so+0x1c30b8) #2 kudu::tablet::TabletMetadata::DeleteTabletData(kudu::tablet::TabletDataState, boost::optional<kudu::consensus::OpId> const&) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tablet/tablet_metadata.cc:232:3 (libtablet.so+0x1c411a) #3 kudu::tserver::TSTabletManager::DeleteTabletData(scoped_refptr<kudu::tablet::TabletMetadata> const&, scoped_refptr<kudu::consensus::ConsensusMetadataManager> const&, kudu::tablet::TabletDataState, boost::optional<kudu::consensus::OpId>) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tserver/ts_tablet_manager.cc:1281:3 (libtserver.so+0xfad26) #4 kudu::tserver::TSTabletManager::DeleteTablet(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, kudu::tablet::TabletDataState, boost::optional<long> const&, kudu::tserver::TabletServerErrorPB_Code*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tserver/ts_tablet_manager.cc:826:14 (libtserver.so+0xfc365) #5 kudu::tserver::TabletServiceAdminImpl::DeleteTablet(kudu::tserver::DeleteTabletRequestPB const*, kudu::tserver::DeleteTabletResponsePB*, kudu::rpc::RpcContext*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tserver/tablet_service.cc:804:41 (libtserver.so+0xd5dfd) ... Previous write of size 8 at 0x7b50000703d0 by thread T12 (mutexes: write M90770251150680224, write M623321216325058272): #0 kudu::internal::CallbackBase::operator=(kudu::internal::CallbackBase const&) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/gutil/callback_internal.h:34:7 (libtserver.so+0xb0596) #1 kudu::Callback<kudu::Status ()()>::operator=(kudu::Callback<kudu::Status ()()>&&) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/gutil/callback.h:358:7 (libtablet.so+0x115cb0) #2 kudu::tablet::TabletMetadata::SetPreFlushCallback(kudu::Callback<kudu::Status ()()>) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tablet/tablet_metadata.h:229:74 (libtablet.so+0x10e437) #3 kudu::tablet::Tablet::Shutdown() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tablet/tablet.cc:352:14 (libtablet.so+0xf7fdc) #4 kudu::tablet::Tablet::~Tablet() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tablet/tablet.cc:248:3 (libtablet.so+0xf7c9e) ... #10 kudu::tablet::TabletReplica::OnDiskSize() const /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tablet/tablet_replica.cc:742:1 (libtablet.so+0x143af7) ... Change-Id: I21d3195183584d1a51aeec64b049ac49994f69be Reviewed-on: http://gerrit.cloudera.org:8080/8649 Reviewed-by: Mike Percy <[email protected]> Reviewed-by: Andrew Wong <[email protected]> Tested-by: Alexey Serbin <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Feb 7, 2018
In a future patch, after removing the thread safety of the ConsensusMetadata class, unlocked access to cmeta triggered a warning from the collision warner in RaftConsensus::Start(): F0206 20:31:24.554322 30972 thread_collision_warner.cc:23] Thread Collision! Previous thread id: 30962, current thread id: 30972 Thread 4331 (Thread 0x7ffe5f964700 (LWP 30962)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225 #1 0x00007ffff331966f in kudu::ConditionVariable::TimedWait (this=0x8c10b0, max_time=...) at ../../src/kudu/util/condition_variable.cc:123 #2 0x00007ffff74e6c71 in kudu::tserver::Heartbeater::Thread::RunThread (this=0x8c0f80) at ../../src/kudu/tserver/heartbeater.cc:538 #3 0x00007ffff74efec9 in boost::_mfi::mf0<void, kudu::tserver::Heartbeater::Thread>::operator() (this=0xd015d0, p=0x8c0f80) at ../../thirdparty/installed/uninstrumented/include/boost/bind/mem_fn_template.hpp:49 Thread 4341 (Thread 0x7ffe5a95a700 (LWP 30972)): #0 0x00007ffff156c428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 #1 0x00007ffff156e02a in __GI_abort () at abort.c:89 #2 0x00007ffff2ff1c67 in google::DumpStackTraceAndExit () at /home/mpercy/src/kudu/thirdparty/src/glog-0.3.5/src/utilities.cc:152 #3 0x00007ffff2fe8b1d in google::LogMessage::Fail () at /home/mpercy/src/kudu/thirdparty/src/glog-0.3.5/src/logging.cc:1488 #4 0x00007ffff2feaa03 in google::LogMessage::SendToLog (this=0x7ffe5a957d68) at /home/mpercy/src/kudu/thirdparty/src/glog-0.3.5/src/logging.cc:1442 #5 0x00007ffff2fe867a in google::LogMessage::Flush (this=this@entry=0x7ffe5a957d68) at /home/mpercy/src/kudu/thirdparty/src/glog-0.3.5/src/logging.cc:1311 #6 0x00007ffff2feb3cf in google::LogMessageFatal::~LogMessageFatal (this=0x7ffe5a957d68, __in_chrg=<optimized out>) at /home/mpercy/src/kudu/thirdparty/src/glog-0.3.5/src/logging.cc:2023 #7 0x00007ffff30bc20b in base::DCheckAsserter::warn (this=0x7fffbc016a90, previous_thread_id=30962, current_thread_id=30972) at ../../src/kudu/gutil/threading/thread_collision_warner.cc:23 #8 0x00007ffff30bc34d in base::ThreadCollisionWarner::Enter (this=0x7fffbc00bde0) at ../../src/kudu/gutil/threading/thread_collision_warner.cc:81 #9 0x00007ffff6dd3933 in base::ThreadCollisionWarner::ScopedCheck::ScopedCheck (this=0x7ffe5a957e30, warner=0x7fffbc00bde0) at ../../src/kudu/gutil/threading/thread_collision_warner.h:184 #10 0x00007ffff6a95fca in kudu::consensus::ConsensusMetadata::current_term (this=0x7fffbc00bd90) at ../../src/kudu/consensus/consensus_meta.cc:56 #11 0x00007ffff6afa4a4 in kudu::consensus::RaftConsensus::Start(kudu::consensus::ConsensusBootstrapInfo const&, gscoped_ptr<kudu::consensus::PeerProxyFactory, kudu::DefaultDeleter<kudu::consensus::PeerProxyFactory> >, scoped_refptr<kudu::log::Log>, scoped_refptr<kudu::consensus::TimeManager>, kudu::consensus::ReplicaTransactionFactory*, scoped_refptr<kudu::MetricEntity>, kudu::Callback<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>) (this=0x7fffbc0008d0, info=..., peer_proxy_factory=..., log=..., time_manager=..., txn_factory=0x7fffbc00bff0, metric_entity=..., mark_dirty_clbk=...) at ../../src/kudu/consensus/raft_consensus.cc:228 #12 0x00007ffff6e231b5 in kudu::tablet::TabletReplica::Start (this=0x7fffbc00bff0, bootstrap_info=..., tablet=std::shared_ptr (empty) 0x0, clock=..., messenger=std::shared_ptr (empty) 0x0, result_tracker=..., log=..., prepare_pool=0xd07140) at ../../src/kudu/tablet/tablet_replica.cc:220 Change-Id: I661c603a57b9ecaeee926ce7cd86c9ecf2ad58a8 Reviewed-on: http://gerrit.cloudera.org:8080/9245 Reviewed-by: Alexey Serbin <[email protected]> Tested-by: Kudu Jenkins
smukil
pushed a commit
to smukil/kudu
that referenced
this pull request
Feb 12, 2018
This addresses an ASAN crash I've seen a couple times now in test runs during static destruction. We previously used a vector<Mutex*> for the OpenSSL locks, but the order of static destruction isn't very easy to predict, and the destructor of things like SSL sockets may end up needing to acquire these locks. This patch switches to a C-style dynamic array. An example ASAN trace follows. ================================================================= ==28629==ERROR: AddressSanitizer: heap-use-after-free on address 0x61300000b6c0 at pc 0x7f25c7d03339 bp 0x7f25bbdc3af0 sp 0x7f25bbdc3ae8 READ of size 8 at 0x61300000b6c0 thread T4 (rpc reactor-286) #0 0x7f25c7d03338 in kudu::security::(anonymous namespace)::LockingCB(int, int, char const*, int) /home/jenkins-slave/workspace/kudu-1/src/kudu/security/openssl_util.cc:54:14 apache#1 0x7f25c4c41836 in CRYPTO_add_lock (/lib/x86_64-linux-gnu/libcrypto.so.1.0.0+0x5f836) apache#2 0x7f25c49bcedb in SSL_free (/lib/x86_64-linux-gnu/libssl.so.1.0.0+0x39edb) apache#3 0x7f25c7d10f20 in std::_Function_handler<void (ssl_st*), void (*)(ssl_st*)>::_M_invoke(std::_Any_data const&, ssl_st*) /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/functional:2071:2 apache#4 0x7f25c8764ed4 in std::function<void (ssl_st*)>::operator()(ssl_st*) const /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/functional:2471:14 apache#5 0x7f25c7d0daf7 in std::unique_ptr<ssl_st, std::function<void (ssl_st*)> >::reset(ssl_st*) /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/bits/unique_ptr.h:262:4 apache#6 0x7f25c7d14f78 in kudu::security::TlsSocket::Close() /home/jenkins-slave/workspace/kudu-1/src/kudu/security/tls_socket.cc:115:8 apache#7 0x7f25c8773072 in kudu::rpc::Connection::Shutdown(kudu::Status const&) /home/jenkins-slave/workspace/kudu-1/src/kudu/rpc/connection.cc:174:5 apache#8 0x7f25c87bfb25 in kudu::rpc::ReactorThread::ShutdownInternal() /home/jenkins-slave/workspace/kudu-1/src/kudu/rpc/reactor.cc:134:11 apache#9 0x7f25c87c0f3f in kudu::rpc::ReactorThread::AsyncHandler(ev::async&, int) /home/jenkins-slave/workspace/kudu-1/src/kudu/rpc/reactor.cc:198:5 ... 0x61300000b6c0 is located 128 bytes inside of 328-byte region [0x61300000b640,0x61300000b788) freed by thread T0 here: #0 0x5c1f30 in operator delete(void*) /home/jenkins-slave/workspace/kudu-1/thirdparty/src/llvm-3.9.1.src/projects/compiler-rt/lib/asan/asan_new_delete.cc:110 apache#1 0x7f25c7d05235 in std::_Vector_base<kudu::Mutex*, std::allocator<kudu::Mutex*> >::~_Vector_base() /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/bits/stl_vector.h:160:9 apache#2 0x7f25c2b30539 in __cxa_finalize /build/eglibc-oGUzwX/eglibc-2.19/stdlib/cxa_finalize.c:56 previously allocated by thread T0 here: #0 0x5c1870 in operator new(unsigned long) /home/jenkins-slave/workspace/kudu-1/thirdparty/src/llvm-3.9.1.src/projects/compiler-rt/lib/asan/asan_new_delete.cc:78 apache#1 0x7f25c7d03cfc in kudu::Mutex** std::vector<kudu::Mutex*, std::allocator<kudu::Mutex*> >::_M_allocate_and_copy<std::move_iterator<kudu::Mutex**> >(unsigned long, std::move_iterator<kudu::Mutex**>, std::move_iterator<kudu::Mutex**>) /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../ apache#2 0x7f25c7d038e6 in std::vector<kudu::Mutex*, std::allocator<kudu::Mutex*> >::reserve(unsigned long) /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/bits/vector.tcc:73:20 apache#3 0x7f25c7d02420 in kudu::security::(anonymous namespace)::DoInitializeOpenSSL() /home/jenkins-slave/workspace/kudu-1/src/kudu/security/openssl_util.cc:78:16 ... Change-Id: Id6fdd1162eb39114c67f3c46073345829530434f Reviewed-on: http://gerrit.cloudera.org:8080/5853 Tested-by: Kudu Jenkins Reviewed-by: Dan Burkert <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Feb 28, 2018
The wrapping of libdl functions added by commit d9e7037 has a problem if any other dynamic initializer calls dlopen or dlclose. It turns out that OpenSSL in FIPS mode does indeed do that, leading to a crash with a stack like: #0 0x0000000000000000 in ?? () #1 0x0000000001b45d23 in dlopen () #2 0x00007f1f444967ba in ?? () from /lib64/libcrypto.so.1.0.0 #3 0x00007f1f44496857 in ?? () from /lib64/libcrypto.so.1.0.0 #4 0x00007f1f44496bfe in FIPS_module_mode_set () from /lib64/libcrypto.so.1.0.0 #5 0x00007f1f4437216c in FIPS_mode_set () from /lib64/libcrypto.so.1.0.0 #6 0x00007f1f4436eb60 in OPENSSL_init_library () from /lib64/libcrypto.so.1.0.0 #7 0x00007f1f450a2c0a in call_init.part () from /lib64/ld-linux-x86-64.so.2 #8 0x00007f1f450a2cf3 in _dl_init () from /lib64/ld-linux-x86-64.so.2 #9 0x00007f1f4509518a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2 The fix takes the same approach we already used to workaround a similar issue with the ASAN runtime, but generalizes it to all of our wrapped functions. Change-Id: I10a04126411f51b4d8e290a6b061aa585aad0769 Reviewed-on: http://gerrit.cloudera.org:8080/9460 Tested-by: Todd Lipcon <[email protected]> Reviewed-by: Alexey Serbin <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Mar 15, 2018
In real environments there are various types of failures that are "expected". In particular, it's not unlikely for the client to occasionally connect to a server which has gone down or which is not the leader for a given tablet anymore. In that case, we should make sure that the logging output is easy to read and doesn't include irrelevant stack traces stemming from Netty internals. This patch addresses several of those cases. In particular, it gets rid of "unexpected exception" messages (and associated stacks) in the case of "connection refused" errors. It also helpfully includes the underlying error when invalidating the tablet cache for a location. There's a relatively simple new unit test, and I also ran some before/after comparisons using the "loadgen" sample from the kudu-examples repo against a local cluster with one TS: Case 1: Master down when we start the client: ============================================================ Before the patch -------------- [New I/O worker #12] WARN org.apache.kudu.client.ConnectToCluster - Error receiving response from localhost:7051 org.apache.kudu.client.RecoverableException: connection closed at org.apache.kudu.client.Connection.channelClosed(Connection.java:254) at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) at org.apache.kudu.client.Connection.handleUpstream(Connection.java:238) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.channelClosed(ReadTimeoutHandler.java:176) at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.apache.kudu.shaded.org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:60) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.apache.kudu.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493) at org.apache.kudu.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371) at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) at org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468) at org.apache.kudu.shaded.org.jboss.netty.channel.Channels$6.run(Channels.java:457) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [New I/O boss #17] ERROR org.apache.kudu.client.Connection - [peer master-localhost:7051(localhost:7051)] unexpected exception from downstream on [id: 0xf9759051] java.net.ConnectException: Connection refused: localhost/127.0.0.1:7051 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) at org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [New I/O worker #12] WARN org.apache.kudu.client.ConnectToCluster - Unable to find the leader master localhost:7051; will retry ... repeats with exponential backoff After the patch -------------- [New I/O boss #17] INFO org.apache.kudu.client.Connection - Failed to connect to peer master-localhost:7051(localhost:7051): Connection refused: localhost/127.0.0.1:7051 [New I/O worker #1] INFO org.apache.kudu.client.ConnectToCluster - Unable to connect to master localhost:7051: Connection refused: localhost/127.0.0.1:7051 [New I/O worker #1] WARN org.apache.kudu.client.ConnectToCluster - Unable to find the leader master localhost:7051; will retry ... repeats with exponential backoff Case 2: Tserver is down ============================================================ Before the patch ---------------- [New I/O worker #11] INFO org.apache.kudu.client.AsyncKuduClient - Removing server fc9f6ab955cc47a8ab3653d0170305b0 from this tablet's cache 9da164fbcd93404ea8e38e9491eb3fa8 [New I/O boss #17] ERROR org.apache.kudu.client.Connection - [peer fc9f6ab955cc47a8ab3653d0170305b0(todd-laptop:7050)] unexpected exception from downstream on [id: 0x270e3a82] java.net.ConnectException: Connection refused: todd-laptop/127.0.1.1:7050 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) at org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ... repeats with exponential backoff After the patch --------------- [New I/O boss #17] INFO org.apache.kudu.client.Connection - Failed to connect to peer fc9f6ab955cc47a8ab3653d0170305b0(todd-laptop:7050): Connection refused: todd-laptop/127.0.1.1:7050 [New I/O worker #15] INFO org.apache.kudu.client.AsyncKuduClient - Invalidating location fc9f6ab955cc47a8ab3653d0170305b0(todd-laptop:7050) for tablet 478ddbaafb5b494ea0de8e2d0ed45a00: Connection refused: todd-laptop/127.0.1.1:7050 ... repeats with exponential backoff Case 3: A tablet is bad ============================================================ In this case I manually delete a non-replicated tablet that is being written to using 'kudu remote_replica delete'. This is unrecoverable since there is no other replica. Before the patch ----------------- [New I/O worker #10] INFO org.apache.kudu.client.AsyncKuduClient - Removing server fc9f6ab955cc47a8ab3653d0170305b0 from this tablet's cache a633f4a0c14b4404a8ae1f825b378867 ... repeats with exponential backoff After the patch ---------------- [New I/O worker #9] INFO org.apache.kudu.client.AsyncKuduClient - Invalidating location 36622c46b25f4da9ada43f8591728053(todd-laptop:7050) for tablet c0b9488597e8447595ad0ee0a3378f95: Tablet not RUNNING: STOPPED ... repeats with exponential backoff Change-Id: I4b8be871693fecc1ee46a4e238dd2ed8f0730d4b Reviewed-on: http://gerrit.cloudera.org:8080/9644 Reviewed-by: Grant Henke <[email protected]> Tested-by: Grant Henke <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Mar 15, 2018
In real environments there are various types of failures that are "expected". In particular, it's not unlikely for the client to occasionally connect to a server which has gone down or which is not the leader for a given tablet anymore. In that case, we should make sure that the logging output is easy to read and doesn't include irrelevant stack traces stemming from Netty internals. This patch addresses several of those cases. In particular, it gets rid of "unexpected exception" messages (and associated stacks) in the case of "connection refused" errors. It also helpfully includes the underlying error when invalidating the tablet cache for a location. There's a relatively simple new unit test, and I also ran some before/after comparisons using the "loadgen" sample from the kudu-examples repo against a local cluster with one TS: Case 1: Master down when we start the client: ============================================================ Before the patch -------------- [New I/O worker #12] WARN org.apache.kudu.client.ConnectToCluster - Error receiving response from localhost:7051 org.apache.kudu.client.RecoverableException: connection closed at org.apache.kudu.client.Connection.channelClosed(Connection.java:254) at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) at org.apache.kudu.client.Connection.handleUpstream(Connection.java:238) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.channelClosed(ReadTimeoutHandler.java:176) at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.apache.kudu.shaded.org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:60) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.apache.kudu.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493) at org.apache.kudu.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371) at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) at org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468) at org.apache.kudu.shaded.org.jboss.netty.channel.Channels$6.run(Channels.java:457) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [New I/O boss #17] ERROR org.apache.kudu.client.Connection - [peer master-localhost:7051(localhost:7051)] unexpected exception from downstream on [id: 0xf9759051] java.net.ConnectException: Connection refused: localhost/127.0.0.1:7051 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) at org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [New I/O worker #12] WARN org.apache.kudu.client.ConnectToCluster - Unable to find the leader master localhost:7051; will retry ... repeats with exponential backoff After the patch -------------- [New I/O boss #17] INFO org.apache.kudu.client.Connection - Failed to connect to peer master-localhost:7051(localhost:7051): Connection refused: localhost/127.0.0.1:7051 [New I/O worker #1] INFO org.apache.kudu.client.ConnectToCluster - Unable to connect to master localhost:7051: Connection refused: localhost/127.0.0.1:7051 [New I/O worker #1] WARN org.apache.kudu.client.ConnectToCluster - Unable to find the leader master localhost:7051; will retry ... repeats with exponential backoff Case 2: Tserver is down ============================================================ Before the patch ---------------- [New I/O worker #11] INFO org.apache.kudu.client.AsyncKuduClient - Removing server fc9f6ab955cc47a8ab3653d0170305b0 from this tablet's cache 9da164fbcd93404ea8e38e9491eb3fa8 [New I/O boss #17] ERROR org.apache.kudu.client.Connection - [peer fc9f6ab955cc47a8ab3653d0170305b0(todd-laptop:7050)] unexpected exception from downstream on [id: 0x270e3a82] java.net.ConnectException: Connection refused: todd-laptop/127.0.1.1:7050 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) at org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ... repeats with exponential backoff After the patch --------------- [New I/O boss #17] INFO org.apache.kudu.client.Connection - Failed to connect to peer fc9f6ab955cc47a8ab3653d0170305b0(todd-laptop:7050): Connection refused: todd-laptop/127.0.1.1:7050 [New I/O worker #15] INFO org.apache.kudu.client.AsyncKuduClient - Invalidating location fc9f6ab955cc47a8ab3653d0170305b0(todd-laptop:7050) for tablet 478ddbaafb5b494ea0de8e2d0ed45a00: Connection refused: todd-laptop/127.0.1.1:7050 ... repeats with exponential backoff Case 3: A tablet is bad ============================================================ In this case I manually delete a non-replicated tablet that is being written to using 'kudu remote_replica delete'. This is unrecoverable since there is no other replica. Before the patch ----------------- [New I/O worker #10] INFO org.apache.kudu.client.AsyncKuduClient - Removing server fc9f6ab955cc47a8ab3653d0170305b0 from this tablet's cache a633f4a0c14b4404a8ae1f825b378867 ... repeats with exponential backoff After the patch ---------------- [New I/O worker #9] INFO org.apache.kudu.client.AsyncKuduClient - Invalidating location 36622c46b25f4da9ada43f8591728053(todd-laptop:7050) for tablet c0b9488597e8447595ad0ee0a3378f95: Tablet not RUNNING: STOPPED ... repeats with exponential backoff Change-Id: I4b8be871693fecc1ee46a4e238dd2ed8f0730d4b Reviewed-on: http://gerrit.cloudera.org:8080/9644 Reviewed-by: Grant Henke <[email protected]> Tested-by: Grant Henke <[email protected]> (cherry picked from commit ead7568) Reviewed-on: http://gerrit.cloudera.org:8080/9655
asfgit
pushed a commit
that referenced
this pull request
Mar 28, 2018
In real environments there are various types of failures that are "expected". In particular, it's not unlikely for the client to occasionally connect to a server which has gone down or which is not the leader for a given tablet anymore. In that case, we should make sure that the logging output is easy to read and doesn't include irrelevant stack traces stemming from Netty internals. This patch addresses several of those cases. In particular, it gets rid of "unexpected exception" messages (and associated stacks) in the case of "connection refused" errors. It also helpfully includes the underlying error when invalidating the tablet cache for a location. There's a relatively simple new unit test, and I also ran some before/after comparisons using the "loadgen" sample from the kudu-examples repo against a local cluster with one TS: Case 1: Master down when we start the client: ============================================================ Before the patch -------------- [New I/O worker #12] WARN org.apache.kudu.client.ConnectToCluster - Error receiving response from localhost:7051 org.apache.kudu.client.RecoverableException: connection closed at org.apache.kudu.client.Connection.channelClosed(Connection.java:254) at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) at org.apache.kudu.client.Connection.handleUpstream(Connection.java:238) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.channelClosed(ReadTimeoutHandler.java:176) at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.apache.kudu.shaded.org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:60) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.apache.kudu.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493) at org.apache.kudu.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371) at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) at org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468) at org.apache.kudu.shaded.org.jboss.netty.channel.Channels$6.run(Channels.java:457) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [New I/O boss #17] ERROR org.apache.kudu.client.Connection - [peer master-localhost:7051(localhost:7051)] unexpected exception from downstream on [id: 0xf9759051] java.net.ConnectException: Connection refused: localhost/127.0.0.1:7051 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) at org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [New I/O worker #12] WARN org.apache.kudu.client.ConnectToCluster - Unable to find the leader master localhost:7051; will retry ... repeats with exponential backoff After the patch -------------- [New I/O boss #17] INFO org.apache.kudu.client.Connection - Failed to connect to peer master-localhost:7051(localhost:7051): Connection refused: localhost/127.0.0.1:7051 [New I/O worker #1] INFO org.apache.kudu.client.ConnectToCluster - Unable to connect to master localhost:7051: Connection refused: localhost/127.0.0.1:7051 [New I/O worker #1] WARN org.apache.kudu.client.ConnectToCluster - Unable to find the leader master localhost:7051; will retry ... repeats with exponential backoff Case 2: Tserver is down ============================================================ Before the patch ---------------- [New I/O worker #11] INFO org.apache.kudu.client.AsyncKuduClient - Removing server fc9f6ab955cc47a8ab3653d0170305b0 from this tablet's cache 9da164fbcd93404ea8e38e9491eb3fa8 [New I/O boss #17] ERROR org.apache.kudu.client.Connection - [peer fc9f6ab955cc47a8ab3653d0170305b0(todd-laptop:7050)] unexpected exception from downstream on [id: 0x270e3a82] java.net.ConnectException: Connection refused: todd-laptop/127.0.1.1:7050 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) at org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ... repeats with exponential backoff After the patch --------------- [New I/O boss #17] INFO org.apache.kudu.client.Connection - Failed to connect to peer fc9f6ab955cc47a8ab3653d0170305b0(todd-laptop:7050): Connection refused: todd-laptop/127.0.1.1:7050 [New I/O worker #15] INFO org.apache.kudu.client.AsyncKuduClient - Invalidating location fc9f6ab955cc47a8ab3653d0170305b0(todd-laptop:7050) for tablet 478ddbaafb5b494ea0de8e2d0ed45a00: Connection refused: todd-laptop/127.0.1.1:7050 ... repeats with exponential backoff Case 3: A tablet is bad ============================================================ In this case I manually delete a non-replicated tablet that is being written to using 'kudu remote_replica delete'. This is unrecoverable since there is no other replica. Before the patch ----------------- [New I/O worker #10] INFO org.apache.kudu.client.AsyncKuduClient - Removing server fc9f6ab955cc47a8ab3653d0170305b0 from this tablet's cache a633f4a0c14b4404a8ae1f825b378867 ... repeats with exponential backoff After the patch ---------------- [New I/O worker #9] INFO org.apache.kudu.client.AsyncKuduClient - Invalidating location 36622c46b25f4da9ada43f8591728053(todd-laptop:7050) for tablet c0b9488597e8447595ad0ee0a3378f95: Tablet not RUNNING: STOPPED ... repeats with exponential backoff Change-Id: I4b8be871693fecc1ee46a4e238dd2ed8f0730d4b Reviewed-on: http://gerrit.cloudera.org:8080/9644 Reviewed-by: Grant Henke <[email protected]> Tested-by: Grant Henke <[email protected]> Reviewed-on: http://gerrit.cloudera.org:8080/9827 Reviewed-by: Grant Henke <[email protected]> Tested-by: Will Berkeley <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Mar 30, 2018
This fixes the following TSAN race: WARNING: ThreadSanitizer: data race (pid=17822) Read of size 1 at 0x7b4c000054e8 by thread T59 (mutexes: write M1750): ... #3 strings::internal::SubstituteArg::SubstituteArg(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/gutil/strings/substitute.h:76 (libtserver.so+0x9edb0) #4 kudu::MaintenanceManager::LogPrefix() const /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/maintenance_manager.cc:545:31 (libkudu_util.so+0x167791) #5 kudu::MaintenanceManager::UnregisterOp(kudu::MaintenanceOp*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/maintenance_manager.cc:235:3 (libkudu_util.so+0x165963) #6 kudu::MaintenanceOp::Unregister() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/maintenance_manager.cc:123:13 (libkudu_util.so+0x1654fe) #7 kudu::tablet::Tablet::UnregisterMaintenanceOps() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tablet/tablet.cc:1405:9 (libtablet.so+0xfb5af) #8 kudu::tablet::TabletReplica::Stop() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tablet/tablet_replica.cc:271:25 (libtablet.so+0x146e66) #9 kudu::tserver::TSTabletManager::DeleteTablet(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > c Previous write of size 8 at 0x7b4c000054e8 by main thread: #0 memcpy /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/../sanitizer_common/sanitizer_common_interceptors.inc:655 (kudu-tserver+0x449e4c) #1 std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::__move_assign(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, std::__1::integral_constant<bool, true>) /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/c++/v1/string:2044:18 (libkudu_util.so+0x16664d) #2 std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::operator=(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&&) /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/c++/v1/string:2055 (libkudu_util.so+0x16664d) #3 kudu::MaintenanceManager::Init(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/maintenance_manager.cc:169 (libkudu_util.so+0x16664d) ... The race is on the 'server_uuid_' field in the MaintenanceManager. This is set during startup, but was being set _after_ calls such as UnregisterOp could be made as seen above. That means the UnregisterOp call could either see an empty UUID or even crash due to the above race. This simply rejiggers the MaintenanceManager startup to take the UUID in as a constructor parameter instead, and to instantiate the object slightly later during startup. Change-Id: Id06731f56eb98146f7b88541b936c0026b781b16 Reviewed-on: http://gerrit.cloudera.org:8080/9866 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Oct 30, 2018
Noticed this on the flaky test dashboard for alter_table-randomized-test; see the end of the commit message for the complete output. I also removed an unrelated and unnecessary lock acquisition. To test, I looped alter_table_randomized-test in slow mode with TSAN and the two failures I saw did not report any data races. WARNING: ThreadSanitizer: data race (pid=17016) Read of size 8 at 0x7b4c000010d0 by thread T68 (mutexes: write M1500): #0 std::__1::unique_ptr<kudu::hms::HmsCatalog, std::__1::default_delete<kudu::hms::HmsCatalog> >::operator bool() const /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/c++/v1/memory:2583:19 (libmaster.so+0xb99b1) #1 kudu::master::CatalogManager::PrepareForLeadershipTask() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/master/catalog_manager.cc:1055 (libmaster.so+0xb99b1) #2 kudu::internal::RunnableAdapter<void (kudu::master::CatalogManager::*)()>::Run(kudu::master::CatalogManager*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/gutil/bind_internal.h:136:12 (libmaster.so+0x102fa9) #3 kudu::internal::InvokeHelper<false, void, kudu::internal::RunnableAdapter<void (kudu::master::CatalogManager::*)()>, void ()(kudu::master::CatalogManager*)>::MakeItSo(kudu::internal::RunnableAdapter<void (kudu::master::CatalogManager::*)()>, kudu::master::CatalogManager*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/gutil/bind_internal.h:873:14 (libmaster.so+0x102ec5) #4 kudu::internal::Invoker<1, kudu::internal::BindState<kudu::internal::RunnableAdapter<void (kudu::master::CatalogManager::*)()>, void ()(kudu::master::CatalogManager*), void ()(kudu::internal::UnretainedWrapper<kudu::master::CatalogManager>)>, void ()(kudu::master::CatalogManager*)>::Run(kudu::internal::BindStateBase*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/gutil/bind_internal.h:1065:12 (libmaster.so+0x102e0a) #5 kudu::Callback<void ()()>::Run() const /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/gutil/callback.h:396:12 (libconsensus.so+0xa6dfd) #6 kudu::ClosureRunnable::Run() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/threadpool.cc:76:9 (libkudu_util.so+0x1cc9ad) #7 kudu::ThreadPool::DispatchThread() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/threadpool.cc:686:22 (libkudu_util.so+0x1c86d8) #8 boost::_mfi::mf0<void, kudu::ThreadPool>::operator()(kudu::ThreadPool*) const /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:49:29 (libkudu_util.so+0x1d3649) #9 void boost::_bi::list1<boost::_bi::value<kudu::ThreadPool*> >::operator()<boost::_mfi::mf0<void, kudu::ThreadPool>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void, kudu::ThreadPool>&, boost::_bi::list0&, int) /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/boost/bind/bind.hpp:259:9 (libkudu_util.so+0x1d359a) #10 boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::ThreadPool>, boost::_bi::list1<boost::_bi::value<kudu::ThreadPool*> > >::operator()() /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16 (libkudu_util.so+0x1d3523) #11 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::ThreadPool>, boost::_bi::list1<boost::_bi::value<kudu::ThreadPool*> > >, void>::invoke(boost::detail::function::function_buffer&) /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11 (libkudu_util.so+0x1d3319) #12 boost::function0<void>::operator()() const /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libkrpc.so+0xb6651) #13 kudu::Thread::SuperviseThread(void*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/thread.cc:615:3 (libkudu_util.so+0x1bfe34) Previous write of size 8 at 0x7b4c000010d0 by thread T59: #0 std::__1::unique_ptr<kudu::hms::HmsCatalog, std::__1::default_delete<kudu::hms::HmsCatalog> >::reset(kudu::hms::HmsCatalog*) /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/c++/v1/memory:2596:20 (libmaster.so+0xb8b6f) #1 kudu::master::CatalogManager::Init(bool) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/master/catalog_manager.cc:730 (libmaster.so+0xb8b6f) #2 kudu::master::Master::InitCatalogManager() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/master/master.cc:216:3 (libmaster.so+0x11fa5f) #3 kudu::master::Master::InitCatalogManagerTask() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/master/master.cc:205:14 (libmaster.so+0x11f8b2) #4 kudu::internal::RunnableAdapter<void (kudu::master::Master::*)()>::Run(kudu::master::Master*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/gutil/bind_internal.h:136:12 (libmaster.so+0x124449) #5 kudu::internal::InvokeHelper<false, void, kudu::internal::RunnableAdapter<void (kudu::master::Master::*)()>, void ()(kudu::master::Master*)>::MakeItSo(kudu::internal::RunnableAdapter<void (kudu::master::Master::*)()>, kudu::master::Master*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/gutil/bind_internal.h:873:14 (libmaster.so+0x124365) #6 kudu::internal::Invoker<1, kudu::internal::BindState<kudu::internal::RunnableAdapter<void (kudu::master::Master::*)()>, void ()(kudu::master::Master*), void ()(kudu::internal::UnretainedWrapper<kudu::master::Master>)>, void ()(kudu::master::Master*)>::Run(kudu::internal::BindStateBase*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/gutil/bind_internal.h:1065:12 (libmaster.so+0x1242aa) #7 kudu::Callback<void ()()>::Run() const /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/gutil/callback.h:396:12 (libconsensus.so+0xa6dfd) #8 kudu::ClosureRunnable::Run() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/threadpool.cc:76:9 (libkudu_util.so+0x1cc9ad) #9 kudu::ThreadPool::DispatchThread() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/threadpool.cc:686:22 (libkudu_util.so+0x1c86d8) #10 boost::_mfi::mf0<void, kudu::ThreadPool>::operator()(kudu::ThreadPool*) const /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:49:29 (libkudu_util.so+0x1d3649) #11 void boost::_bi::list1<boost::_bi::value<kudu::ThreadPool*> >::operator()<boost::_mfi::mf0<void, kudu::ThreadPool>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void, kudu::ThreadPool>&, boost::_bi::list0&, int) /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/boost/bind/bind.hpp:259:9 (libkudu_util.so+0x1d359a) #12 boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::ThreadPool>, boost::_bi::list1<boost::_bi::value<kudu::ThreadPool*> > >::operator()() /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16 (libkudu_util.so+0x1d3523) #13 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::ThreadPool>, boost::_bi::list1<boost::_bi::value<kudu::ThreadPool*> > >, void>::invoke(boost::detail::function::function_buffer&) /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11 (libkudu_util.so+0x1d3319) #14 boost::function0<void>::operator()() const /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libkrpc.so+0xb6651) #15 kudu::Thread::SuperviseThread(void*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/thread.cc:615:3 (libkudu_util.so+0x1bfe34) Location is heap block of size 432 at 0x7b4c00000fc0 allocated by main thread: #0 operator new(unsigned long) /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_new_delete.cc:57 (kudu-master+0x4c84a3) #1 kudu::master::Master::Master(kudu::master::MasterOptions const&) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/master/master.cc:122:22 (libmaster.so+0x11e3d5) #2 kudu::master::MasterMain(int, char**) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/master/master_main.cc:79:10 (kudu-master+0x4cb4de) #3 main /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/master/master_main.cc:98:10 (kudu-master+0x4cb1be) Mutex M1500 (0x7b4c00001100) created at: #0 pthread_rwlock_init /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1304 (kudu-master+0x4593b4) #1 kudu::RWMutex::Init(kudu::RWMutex::Priority) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/rw_mutex.cc:78:8 (libkudu_util.so+0x1acad8) #2 kudu::RWMutex::RWMutex(kudu::RWMutex::Priority) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/rw_mutex.cc:56:3 (libkudu_util.so+0x1acd13) #3 kudu::master::CatalogManager::CatalogManager(kudu::master::Master*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/master/catalog_manager.cc:688:7 (libmaster.so+0xb81d7) #4 kudu::master::Master::Master(kudu::master::MasterOptions const&) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/master/master.cc:122:26 (libmaster.so+0x11e3e3) #5 kudu::master::MasterMain(int, char**) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/master/master_main.cc:79:10 (kudu-master+0x4cb4de) #6 main /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/master/master_main.cc:98:10 (kudu-master+0x4cb1be) Thread T68 'leader-initiali' (tid=17094, running) created by thread T65 at: #0 pthread_create /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:992 (kudu-master+0x45af0b) #1 kudu::Thread::StartThread(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, boost::function<void ()()> const&, unsigned long, scoped_refptr<kudu::Thread>*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/thread.cc:559:15 (libkudu_util.so+0x1bf61b) #2 kudu::Status kudu::Thread::Create<void (kudu::ThreadPool::*)(), kudu::ThreadPool*>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, void (kudu::ThreadPool::* const&)(), kudu::ThreadPool* const&, scoped_refptr<kudu::Thread>*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/thread.h:164:12 (libkudu_util.so+0x1ca9f5) #3 kudu::ThreadPool::CreateThread() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/threadpool.cc:749:10 (libkudu_util.so+0x1c7ce2) #4 kudu::ThreadPool::DoSubmit(std::__1::shared_ptr<kudu::Runnable>, kudu::ThreadPoolToken*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/threadpool.cc:556:21 (libkudu_util.so+0x1c64af) #5 kudu::ThreadPool::Submit(std::__1::shared_ptr<kudu::Runnable>) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/threadpool.cc:458:10 (libkudu_util.so+0x1c7f4f) #6 kudu::ThreadPool::SubmitClosure(kudu::Callback<void ()()>) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/threadpool.cc:450:10 (libkudu_util.so+0x1c7e91) #7 kudu::master::CatalogManager::ElectedAsLeaderCb() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/master/catalog_manager.cc:754:33 (libmaster.so+0xb936b) #8 kudu::internal::RunnableAdapter<kudu::Status (kudu::master::CatalogManager::*)()>::Run(kudu::master::CatalogManager*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/gutil/bind_internal.h:136:12 (libmaster.so+0x104130) #9 kudu::internal::InvokeHelper<false, kudu::Status, kudu::internal::RunnableAdapter<kudu::Status (kudu::master::CatalogManager::*)()>, void ()(kudu::master::CatalogManager*)>::MakeItSo(kudu::internal::RunnableAdapter<kudu::Status (kudu::master::CatalogManager::*)()>, kudu::master::CatalogManager*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/gutil/bind_internal.h:865:21 (libmaster.so+0x10409d) #10 kudu::internal::Invoker<1, kudu::internal::BindState<kudu::internal::RunnableAdapter<kudu::Status (kudu::master::CatalogManager::*)()>, kudu::Status ()(kudu::master::CatalogManager*), void ()(kudu::internal::UnretainedWrapper<kudu::master::CatalogManager>)>, kudu::Status ()(kudu::master::CatalogManager*)>::Run(kudu::internal::BindStateBase*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/gutil/bind_internal.h:1065:12 (libmaster.so+0x10400f) #11 kudu::Callback<kudu::Status ()()>::Run() const /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/gutil/callback.h:396:12 (libmaster.so+0x14bed6) #12 kudu::master::SysCatalogTable::SysCatalogStateChanged(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/master/sys_catalog.cc:343:27 (libmaster.so+0x145539) #13 kudu::internal::RunnableAdapter<void (kudu::master::SysCatalogTable::*)(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)>::Run(kudu::master::SysCatalogTable*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/gutil/bind_internal.h:250:12 (libmaster.so+0x15269d) #14 kudu::internal::InvokeHelper<false, void, kudu::internal::RunnableAdapter<void (kudu::master::SysCatalogTable::*)(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)>, void ()(kudu::master::SysCatalogTable*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)>::MakeItSo(kudu::internal::RunnableAdapter<void (kudu::master::SysCatalogTable::*)(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)>, kudu::master::SysCatalogTable*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/gutil/bind_internal.h:907:14 (libmaster.so+0x15256b) #15 kudu::internal::Invoker<2, kudu::internal::BindState<kudu::internal::RunnableAdapter<void (kudu::master::SysCatalogTable::*)(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)>, void ()(kudu::master::SysCatalogTable*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&), void ()(kudu::internal::UnretainedWrapper<kudu::master::SysCatalogTable>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >)>, void ()(kudu::master::SysCatalogTable*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)>::Run(kudu::internal::BindStateBase*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/gutil/bind_internal.h:1242:12 (libmaster.so+0x152459) #16 kudu::Callback<void ()(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)>::Run(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) const /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/gutil/callback.h:436:12 (libtablet.so+0x151681) #17 kudu::internal::InvokeHelper<false, void, kudu::Callback<void ()(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)>, void ()(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)>::MakeItSo(kudu::Callback<void ()(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/gutil/bind_internal.h:873:14 (libconsensus.so+0xea428) #18 kudu::internal::Invoker<1, kudu::internal::BindState<kudu::Callback<void ()(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)>, void ()(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&), void ()(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >)>, void ()(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)>::Run(kudu::internal::BindStateBase*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/gutil/bind_internal.h:1065:12 (libconsensus.so+0xea3c3) #19 kudu::Callback<void ()()>::Run() const /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/gutil/callback.h:396:12 (libconsensus.so+0xa6dfd) #20 kudu::ClosureRunnable::Run() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/threadpool.cc:76:9 (libkudu_util.so+0x1cc9ad) #21 kudu::ThreadPool::DispatchThread() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/threadpool.cc:686:22 (libkudu_util.so+0x1c86d8) #22 boost::_mfi::mf0<void, kudu::ThreadPool>::operator()(kudu::ThreadPool*) const /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:49:29 (libkudu_util.so+0x1d3649) #23 void boost::_bi::list1<boost::_bi::value<kudu::ThreadPool*> >::operator()<boost::_mfi::mf0<void, kudu::ThreadPool>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void, kudu::ThreadPool>&, boost::_bi::list0&, int) /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/boost/bind/bind.hpp:259:9 (libkudu_util.so+0x1d359a) #24 boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::ThreadPool>, boost::_bi::list1<boost::_bi::value<kudu::ThreadPool*> > >::operator()() /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16 (libkudu_util.so+0x1d3523) #25 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::ThreadPool>, boost::_bi::list1<boost::_bi::value<kudu::ThreadPool*> > >, void>::invoke(boost::detail::function::function_buffer&) /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11 (libkudu_util.so+0x1d3319) #26 boost::function0<void>::operator()() const /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libkrpc.so+0xb6651) #27 kudu::Thread::SuperviseThread(void*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/thread.cc:615:3 (libkudu_util.so+0x1bfe34) Thread T59 'init [worker]-1' (tid=17081, running) created by main thread at: #0 pthread_create /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:992 (kudu-master+0x45af0b) #1 kudu::Thread::StartThread(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, boost::function<void ()()> const&, unsigned long, scoped_refptr<kudu::Thread>*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/thread.cc:559:15 (libkudu_util.so+0x1bf61b) #2 kudu::Status kudu::Thread::Create<void (kudu::ThreadPool::*)(), kudu::ThreadPool*>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, void (kudu::ThreadPool::* const&)(), kudu::ThreadPool* const&, scoped_refptr<kudu::Thread>*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/thread.h:164:12 (libkudu_util.so+0x1ca9f5) #3 kudu::ThreadPool::CreateThread() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/threadpool.cc:749:10 (libkudu_util.so+0x1c7ce2) #4 kudu::ThreadPool::DoSubmit(std::__1::shared_ptr<kudu::Runnable>, kudu::ThreadPoolToken*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/threadpool.cc:556:21 (libkudu_util.so+0x1c64af) #5 kudu::ThreadPool::Submit(std::__1::shared_ptr<kudu::Runnable>) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/threadpool.cc:458:10 (libkudu_util.so+0x1c7f4f) #6 kudu::ThreadPool::SubmitClosure(kudu::Callback<void ()()>) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/threadpool.cc:450:10 (libkudu_util.so+0x1c7e91) #7 kudu::master::Master::StartAsync() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/master/master.cc:196:3 (libmaster.so+0x11f260) #8 kudu::master::Master::Start() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/master/master.cc:170:3 (libmaster.so+0x11ef25) #9 kudu::master::MasterMain(int, char**) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/master/master_main.cc:84:3 (kudu-master+0x4cb584) #10 main /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/master/master_main.cc:98:10 (kudu-master+0x4cb1be) Change-Id: I090a832b7fb25d8cb1e770c025048f73ac997eac Reviewed-on: http://gerrit.cloudera.org:8080/11818 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin <[email protected]> Reviewed-by: Hao Hao <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Nov 8, 2018
LeakSanitizer will report a leak when allocating a string in SuperviseThread. It's unclear why this is the case, but upon inspecting the code, it seems like a false positive. The stack trace is as follows: ================================================================= ==93677==ERROR: LeakSanitizer: detected memory leaks Direct leak of 58 byte(s) in 1 object(s) allocated from: #0 0x5318c8 in operator new(unsigned long) /data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/asan/asan_new_delete.cc:92 #1 0x3ae3a9c3c8 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (/usr/lib64/libstdc++.so.6+0x3ae3a9c3c8) #2 0x3ae3a9d19a in std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) (/usr/lib64/libstdc++.so.6+0x3ae3a9d19a) #3 0x3ae3a9d5eb in std::string::reserve(unsigned long) (/usr/lib64/libstdc++.so.6+0x3ae3a9d5eb) #4 0x3ae3a9d770 in std::string::append(unsigned long, char) (/usr/lib64/libstdc++.so.6+0x3ae3a9d770) #5 0x7f518f799c12 in strings::SubstituteAndAppend(std::string*, StringPiece, strings::internal::SubstituteArg const&, strings::internal::SubstituteArg const&, strings::internal::SubstituteArg const&, strings::internal::SubstituteArg const&, strings::internal::SubstituteArg const&, strings::internal::SubstituteArg const&, strings::internal::SubstituteArg const&, strings::internal::SubstituteArg const&, strings::internal::SubstituteArg const&, strings::internal::SubstituteArg const&) ../src/kudu/gutil/strings/substitute.cc:110:3 #6 0x536e76 in strings::Substitute(StringPiece, strings::internal::SubstituteArg const&, strings::internal::SubstituteArg const&, strings::internal::SubstituteArg const&, strings::internal::SubstituteArg const&, strings::internal::SubstituteArg const&, strings::internal::SubstituteArg const&, strings::internal::SubstituteArg const&, strings::internal::SubstituteArg const&, strings::internal::SubstituteArg const&, strings::internal::SubstituteArg const&) ../src/kudu/gutil/strings/substitute.h:188:3 #7 0x7f5190590860 in kudu::Thread::SuperviseThread(void*) ../src/kudu/util/thread.cc:607:17 #8 0x3ae0e079d0 in start_thread (/lib64/libpthread.so.0+0x3ae0e079d0) #9 0x3ae0ae88fc in clone (/lib64/libc.so.6+0x3ae0ae88fc) This appears to be affecting a number tests, but generally only lines #0 and #1 are present in the logs, making them difficult to debug (a developer would have to rerun the test with specific ASAN_OPTIONS to unwind the stacktrace more). Namely, exactly_once_writes-itest (KUDU-2517), kudu-admin-test (KUDU-2583), and rebalancer-tool-test (untracked via Jira) all show the top of the above stack trace, and based on the full stack trace, it seems these are all false positives. The presence of issues like google/sanitizers#757 confirms that LeakSanitizer can report false positives in workloads with high concurrency. Generally, the test binary will return an error in the face of real leaks, but in tests like the ones mentioned, the test may log messages reporting leaks, but not actually return an error because the "leak" was transient (e.g. see GenericServiceImpl::CheckLeaks). We currently inject errors into JUnit XML report if any leaks are reported, even for false positives, since the leak messages still find their way into the logs. This patch updates this to only inject these errors if the test also returned an error. For clarity, I also threw in a log statement to GenericServiceImpl::CheckLeaks denoting false positives. Change-Id: I1f199795c48bd9b6106110aae132ec165eb0f647 Reviewed-on: http://gerrit.cloudera.org:8080/11886 Tested-by: Kudu Jenkins Reviewed-by: Andrew Wong <[email protected]>
closing this as it's already been merged manually |
asfgit
pushed a commit
that referenced
this pull request
Sep 18, 2019
The Messenger's lock is only intended to protect closing_, acceptor_pools_, and rpc_services_. This change adjusts its usage to reflect that: 1. There's no need to take the lock in the destructor. 2. It was held for longer than necessary in QueueInboundCall. 3. It wasn't needed at all in DumpConnections. The motivation for this was a TSAN lock inversion warning I saw in a precommit job, between the Messenger lock and glog's vmodule lock. The warning seems wrong (the vmodule lock is released after a VLOG statement ends), but one way to avoid it altogether is to not take the Messenger lock in its destructor. WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=5867) Cycle in lock order graph: M1870 (0x7b14000172f8) => M37857528269694952 (0x000000000000) => M1870 Mutex M37857528269694952 acquired here while holding mutex M1870 in main thread: #0 pthread_rwlock_wrlock /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1352 (kudu+0x4a360f) #1 glog_internal_namespace_::Mutex::Lock() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/glog-0.3.5/src/base/mutex.h:250:30 (libglog.so.0+0x1abe7) #2 glog_internal_namespace_::MutexLock::MutexLock(glog_internal_namespace_::Mutex*) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/glog-0.3.5/src/base/mutex.h:290 (libglog.so.0+0x1abe7) #3 google::InitVLOG3__(int**, int*, char const*, int) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/glog-0.3.5/src/vlog_is_on.cc:199 (libglog.so.0+0x1abe7) #4 kudu::rpc::Messenger::ShutdownInternal(kudu::rpc::Messenger::ShutdownMode) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/rpc/messenger.cc:283:5 (libkrpc.so+0xab101) #5 kudu::rpc::Messenger::AllExternalReferencesDropped() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/rpc/messenger.cc:249:3 (libkrpc.so+0xaaeb7) #6 std::__1::mem_fun_t<void, kudu::rpc::Messenger>::operator()(kudu::rpc::Messenger*) const /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/functional:1120:17 (libkrpc.so+0xaf3a5) #7 std::__1::__shared_ptr_pointer<kudu::rpc::Messenger*, std::__1::mem_fun_t<void, kudu::rpc::Messenger>, std::__1::allocator<kudu::rpc::Messenger> >::__on_zero_shared() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:3586 (libkrpc.so+0xaf3a5) #8 std::__1::__shared_count::__release_shared() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:3490:9 (kudu+0x56affe) #9 std::__1::__shared_weak_count::__release_shared() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:3532 (kudu+0x56affe) #10 std::__1::shared_ptr<kudu::rpc::Messenger>::~shared_ptr() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:4468 (kudu+0x56affe) #11 kudu::client::KuduClient::Data::~Data() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/client/client-internal.cc:179:1 (libkudu_client.so+0x136260) #12 kudu::client::KuduClient::~KuduClient() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/client/client.cc:394:3 (libkudu_client.so+0x1130cc) #13 std::__1::default_delete<kudu::client::KuduClient>::operator()(kudu::client::KuduClient*) const /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:2285:5 (libkudu_client.so+0x12be1b) #14 std::__1::__shared_ptr_pointer<kudu::client::KuduClient*, std::__1::default_delete<kudu::client::KuduClient>, std::__1::allocator<kudu::client::KuduClient> >::__on_zero_shared() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:3586 (libkudu_client.so+0x12be1b) #15 std::__1::__shared_count::__release_shared() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:3490:9 (kudu+0x550d1e) #16 std::__1::__shared_weak_count::__release_shared() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:3532 (kudu+0x550d1e) #17 std::__1::shared_ptr<kudu::client::KuduClient>::~shared_ptr() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:4468 (kudu+0x550d1e) #18 kudu::tools::LeaderMasterProxy::~LeaderMasterProxy() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/tools/tool_action_common.h:233:7 (kudu+0x576cf9) #19 kudu::tools::(anonymous namespace)::ListMasters(kudu::tools::RunnerContext const&) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/tools/tool_action_master.cc:180:1 (kudu+0x572d0b) #20 _ZNSt3__18__invokeIRPFN4kudu6StatusERKNS1_5tools13RunnerContextEEJS6_EEEDTclclsr3std3__1E7forwardIT_Efp_Espclsr3std3__1E7forwardIT0_Efp0_EEEOSA_DpOSB_ /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/type_traits:4482:1 (kudu+0x52e48b) #21 kudu::Status std::__1::__invoke_void_return_wrapper<kudu::Status>::__call<kudu::Status (*&)(kudu::tools::RunnerContext const&), kudu::tools::RunnerContext const&>(kudu::Status (*&)(kudu::tools::RunnerContext const&), kudu::tools::RunnerContext const&) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/__functional_base:318 (kudu+0x52e48b) #22 std::__1::__function::__func<kudu::Status (*)(kudu::tools::RunnerContext const&), std::__1::allocator<kudu::Status (*)(kudu::tools::RunnerContext const&)>, kudu::Status (kudu::tools::RunnerContext const&)>::operator()(kudu::tools::RunnerContext const&) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/functional:1562:12 (kudu+0x52e3bd) #23 std::__1::function<kudu::Status (kudu::tools::RunnerContext const&)>::operator()(kudu::tools::RunnerContext const&) const /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/functional:1916:12 (libkudu_tools_util.so+0x6c1c4) #24 kudu::tools::Action::Run(std::__1::vector<kudu::tools::Mode*, std::__1::allocator<kudu::tools::Mode*> > const&, std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > > const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) const /home/jenkins-slave/workspace/kudu-master/2/src/kudu/tools/tool_action.cc:258:10 (libkudu_tools_util.so+0x6a8d4) #25 kudu::tools::DispatchCommand(std::__1::vector<kudu::tools::Mode*, std::__1::allocator<kudu::tools::Mode*> > const&, kudu::tools::Action*, std::__1::deque<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/tools/tool_main.cc:132:15 (kudu+0x5b42b6) #26 kudu::tools::RunTool(int, char**, bool) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/tools/tool_main.cc:204:16 (kudu+0x5b5211) #27 main /home/jenkins-slave/workspace/kudu-master/2/src/kudu/tools/tool_main.cc:265:10 (kudu+0x5b557e) Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative warning message Mutex M1870 acquired here while holding mutex M37857528269694952 in thread T8: #0 AnnotateRWLockAcquired /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interface_ann.cc:271 (kudu+0x4d53ff) #1 kudu::rw_spinlock::lock() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/locks.h:112:5 (libkudu_client.so+0x177762) #2 kudu::percpu_rwlock::lock() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/locks.h:222:22 (libkudu_client.so+0x1776f2) #3 std::__1::lock_guard<kudu::percpu_rwlock>::lock_guard(kudu::percpu_rwlock&) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/__mutex_base:104:27 (libkrpc.so+0xac9c9) #4 kudu::rpc::Messenger::~Messenger() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/rpc/messenger.cc:430 (libkrpc.so+0xac9c9) #5 std::__1::default_delete<kudu::rpc::Messenger>::operator()(kudu::rpc::Messenger*) const /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:2285:5 (libkrpc.so+0xb246b) #6 std::__1::__shared_ptr_pointer<kudu::rpc::Messenger*, std::__1::default_delete<kudu::rpc::Messenger>, std::__1::allocator<kudu::rpc::Messenger> >::__on_zero_shared() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:3586 (libkrpc.so+0xb246b) #7 std::__1::__shared_count::__release_shared() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:3490:9 (kudu+0x56affe) #8 std::__1::__shared_weak_count::__release_shared() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:3532 (kudu+0x56affe) #9 std::__1::shared_ptr<kudu::rpc::Messenger>::~shared_ptr() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:4468 (kudu+0x56affe) #10 std::__1::shared_ptr<kudu::rpc::Messenger>::reset() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:4603:5 (libkrpc.so+0xc0771) #11 kudu::rpc::ReactorThread::RunThread() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/rpc/reactor.cc:499 (libkrpc.so+0xc0771) #12 boost::_mfi::mf0<void, kudu::rpc::ReactorThread>::operator()(kudu::rpc::ReactorThread*) const /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:49:29 (libkrpc.so+0xca669) #13 void boost::_bi::list1<boost::_bi::value<kudu::rpc::ReactorThread*> >::operator()<boost::_mfi::mf0<void, kudu::rpc::ReactorThread>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void, kudu::rpc::ReactorThread>&, boost::_bi::list0&, int) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/boost/bind/bind.hpp:259:9 (libkrpc.so+0xca5ba) #14 boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::rpc::ReactorThread>, boost::_bi::list1<boost::_bi::value<kudu::rpc::ReactorThread*> > >::operator()() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16 (libkrpc.so+0xca543) #15 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::rpc::ReactorThread>, boost::_bi::list1<boost::_bi::value<kudu::rpc::ReactorThread*> > >, void>::invoke(boost::detail::function::function_buffer&) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11 (libkrpc.so+0xca339) #16 boost::function0<void>::operator()() const /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libkrpc.so+0xba0b1) #17 kudu::Thread::SuperviseThread(void*) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/thread.cc:657:3 (libkudu_util.so+0x1ee174) Thread T8 'rpc reactor-588' (tid=5886, running) created by main thread at: #0 pthread_create /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:992 (kudu+0x490e36) #1 kudu::Thread::StartThread(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, boost::function<void ()> const&, unsigned long, scoped_refptr<kudu::Thread>*) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/thread.cc:601:15 (libkudu_util.so+0x1ed95b) #2 kudu::Status kudu::Thread::Create<void (kudu::rpc::ReactorThread::*)(), kudu::rpc::ReactorThread*>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, void (kudu::rpc::ReactorThread::* const&)(), kudu::rpc::ReactorThread* const&, scoped_refptr<kudu::Thread>*) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/thread.h:164:12 (libkrpc.so+0xc5a15) #3 kudu::rpc::ReactorThread::Init() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/rpc/reactor.cc:185:10 (libkrpc.so+0xc026e) #4 kudu::rpc::Reactor::Init() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/rpc/reactor.cc:759:18 (libkrpc.so+0xc4911) #5 kudu::rpc::Messenger::Init() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/rpc/messenger.cc:446:5 (libkrpc.so+0xaad72) #6 kudu::rpc::MessengerBuilder::Build(std::__1::shared_ptr<kudu::rpc::Messenger>*) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/rpc/messenger.cc:205:3 (libkrpc.so+0xaa7cd) #7 kudu::client::KuduClientBuilder::Build(std::__1::shared_ptr<kudu::client::KuduClient>*) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/client/client.cc:349:3 (libkudu_client.so+0x112561) #8 kudu::tools::LeaderMasterProxy::Init(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, kudu::MonoDelta const&) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/tools/tool_action_common.cc:786:30 (libkudu_tools_util.so+0x7740c) #9 kudu::tools::LeaderMasterProxy::Init(kudu::tools::RunnerContext const&) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/tools/tool_action_common.cc:792:10 (libkudu_tools_util.so+0x774d6) #10 kudu::tools::(anonymous namespace)::ListMasters(kudu::tools::RunnerContext const&) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/tools/tool_action_master.cc:109:3 (kudu+0x572be3) #11 _ZNSt3__18__invokeIRPFN4kudu6StatusERKNS1_5tools13RunnerContextEEJS6_EEEDTclclsr3std3__1E7forwardIT_Efp_Espclsr3std3__1E7forwardIT0_Efp0_EEEOSA_DpOSB_ /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/type_traits:4482:1 (kudu+0x52e48b) #12 kudu::Status std::__1::__invoke_void_return_wrapper<kudu::Status>::__call<kudu::Status (*&)(kudu::tools::RunnerContext const&), kudu::tools::RunnerContext const&>(kudu::Status (*&)(kudu::tools::RunnerContext const&), kudu::tools::RunnerContext const&) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/__functional_base:318 (kudu+0x52e48b) #13 std::__1::__function::__func<kudu::Status (*)(kudu::tools::RunnerContext const&), std::__1::allocator<kudu::Status (*)(kudu::tools::RunnerContext const&)>, kudu::Status (kudu::tools::RunnerContext const&)>::operator()(kudu::tools::RunnerContext const&) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/functional:1562:12 (kudu+0x52e3bd) #14 std::__1::function<kudu::Status (kudu::tools::RunnerContext const&)>::operator()(kudu::tools::RunnerContext const&) const /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/functional:1916:12 (libkudu_tools_util.so+0x6c1c4) #15 kudu::tools::Action::Run(std::__1::vector<kudu::tools::Mode*, std::__1::allocator<kudu::tools::Mode*> > const&, std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > > const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) const /home/jenkins-slave/workspace/kudu-master/2/src/kudu/tools/tool_action.cc:258:10 (libkudu_tools_util.so+0x6a8d4) #16 kudu::tools::DispatchCommand(std::__1::vector<kudu::tools::Mode*, std::__1::allocator<kudu::tools::Mode*> > const&, kudu::tools::Action*, std::__1::deque<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/tools/tool_main.cc:132:15 (kudu+0x5b42b6) #17 kudu::tools::RunTool(int, char**, bool) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/tools/tool_main.cc:204:16 (kudu+0x5b5211) #18 main /home/jenkins-slave/workspace/kudu-master/2/src/kudu/tools/tool_main.cc:265:10 (kudu+0x5b557e) Change-Id: I1fd93c06b14bc97a9ac4a37a5b6ca55ffa38f544 Reviewed-on: http://gerrit.cloudera.org:8080/14250 Tested-by: Kudu Jenkins Reviewed-by: Andrew Wong <[email protected]> Reviewed-by: Alexey Serbin <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Oct 1, 2019
The KernelStackWatchdog thread runs independently of the test thread, and by calling IsBeingDebugged, it winds up creating a trace event of its own. This is problematic given that trace-test sets up event callbacks to write to test fixture members, which go out of scope in between tests. The only solution I could find was to avoid starting the KernelStackWatchdog in trace-test by passing Thread::NO_STACK_WATCHDOG into thread creation. I also had to do this when creating the trace sampling thread, but given that's not on by default, I don't think it's so bad that we lose watchdog monitoring for it. To test, I ran trace-test under gdb and set a breakpoint in KernelStackWatchdog::RunThread. With the fix, gdb no longer hit that breakpoint. WARNING: ThreadSanitizer: data race (pid=4111) Read of size 8 at 0x0000015ba5c8 by thread T2: #0 kudu::TraceEventCallbackTest::Callback(long, char, unsigned char const*, char const*, unsigned long, int, char const* const*, unsigned char const*, unsigned long const*, unsigned char) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/trace-test.cc:463:5 (trace-test+0x4f107f) #1 kudu::debug::TraceLog::AddTraceEventWithThreadIdAndTimestamp(char, unsigned char const*, char const*, unsigned long, int, long const&, int, char const**, unsigned char const*, unsigned long const*, scoped_refptr<kudu::debug::ConvertableToTraceFormat> const*, unsigned char) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/debug/trace_event_impl.cc:1911:7 (libkudu_util.so+0x1208b3) #2 kudu::debug::TraceEventHandle trace_event_internal::AddTraceEventWithThreadIdAndTimestamp<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >(char, unsigned char const*, char const*, unsigned long, int, long const&, unsigned char, char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/debug/trace_event.h:1314:10 (libkudu_util.so+0x146f58) #3 kudu::debug::TraceEventHandle trace_event_internal::AddTraceEvent<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >(char, unsigned char const*, char const*, unsigned long, unsigned char, char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/debug/trace_event.h:1330:10 (libkudu_util.so+0x146bef) #4 kudu::(anonymous namespace)::PosixEnv::NewSequentialFile(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::unique_ptr<kudu::SequentialFile, std::__1::default_delete<kudu::SequentialFile> >*) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/env_posix.cc:1077:5 (libkudu_util.so+0x140905) #5 kudu::ReadFileToString(kudu::Env*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, kudu::faststring*) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/env.cc:73:19 (libkudu_util.so+0x140054) #6 kudu::IsBeingDebugged() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/os-util.cc:154:14 (libkudu_util.so+0x1c9687) #7 kudu::KernelStackWatchdog::RunThread() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/kernel_stack_watchdog.cc:141:9 (libkudu_util.so+0x17de59) #8 boost::_mfi::mf0<void, kudu::KernelStackWatchdog>::operator()(kudu::KernelStackWatchdog*) const /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:49:29 (libkudu_util.so+0x17fd89) #9 void boost::_bi::list1<boost::_bi::value<kudu::KernelStackWatchdog*> >::operator()<boost::_mfi::mf0<void, kudu::KernelStackWatchdog>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void, kudu::KernelStackWatchdog>&, boost::_bi::list0&, int) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/boost/bind/bind.hpp:259:9 (libkudu_util.so+0x17fcda) #10 boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::KernelStackWatchdog>, boost::_bi::list1<boost::_bi::value<kudu::KernelStackWatchdog*> > >::operator()() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16 (libkudu_util.so+0x17fc63) #11 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::KernelStackWatchdog>, boost::_bi::list1<boost::_bi::value<kudu::KernelStackWatchdog*> > >, void>::invoke(boost::detail::function::function_buffer&) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11 (libkudu_util.so+0x17fa59) #12 boost::function0<void>::operator()() const /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libkudu_util.so+0x1f1dd1) #13 kudu::Thread::SuperviseThread(void*) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/thread.cc:657:3 (libkudu_util.so+0x1ef3f4) Previous write of size 8 at 0x0000015ba5c8 by main thread: #0 kudu::TraceEventCallbackTest::SetUp() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/trace-test.cc:340:16 (trace-test+0x4f3a17) #1 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2402:10 (libgmock.so+0x552ef) #2 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2438 (libgmock.so+0x552ef) #3 testing::Test::Run() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2470:3 (libgmock.so+0x343c1) #4 testing::TestInfo::Run() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2656:11 (libgmock.so+0x3574c) #5 testing::TestCase::Run() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2774:28 (libgmock.so+0x36226) #6 testing::internal::UnitTestImpl::RunAllTests() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:4649:43 (libgmock.so+0x425fa) #7 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2402:10 (libgmock.so+0x5625f) #8 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2438 (libgmock.so+0x5625f) #9 testing::UnitTest::Run() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:4257:10 (libgmock.so+0x41ee2) #10 RUN_ALL_TESTS() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/gtest/gtest.h:2233:46 (libkudu_test_main.so+0x351b) #11 main /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/test_main.cc:106:13 (libkudu_test_main.so+0x2cc6) Location is global 'kudu::TraceEventCallbackTest::s_instance' of size 8 at 0x0000015ba5c8 (trace-test+0x0000015ba5c8) Thread T2 'kernel-watcher-' (tid=4116, running) created by main thread at: #0 pthread_create /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:992 (trace-test+0x453c86) #1 kudu::Thread::StartThread(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, boost::function<void ()> const&, unsigned long, scoped_refptr<kudu::Thread>*) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/thread.cc:601:15 (libkudu_util.so+0x1eebdb) #2 kudu::Status kudu::Thread::CreateWithFlags<boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::KernelStackWatchdog>, boost::_bi::list1<boost::_bi::value<kudu::KernelStackWatchdog*> > > >(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::KernelStackWatchdog>, boost::_bi::list1<boost::_bi::value<kudu::KernelStackWatchdog*> > > const&, unsigned long, scoped_refptr<kudu::Thread>*) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/thread.h:152:12 (libkudu_util.so+0x17eed1) #3 kudu::KernelStackWatchdog::KernelStackWatchdog() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/kernel_stack_watchdog.cc:71:3 (libkudu_util.so+0x17dc36) #4 Singleton<kudu::KernelStackWatchdog>::CreateInstance() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/gutil/singleton.h:124:18 (libkudu_util.so+0x17f664) #5 Singleton<kudu::KernelStackWatchdog>::Init() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/gutil/singleton.h:117:17 (libkudu_util.so+0x17f604) #6 GoogleOnceInternalInit(int*, void (*)(), void (*)(void*), void*) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/gutil/once.cc:43:7 (libgutil.so+0x2d7b3) #7 GoogleOnceInit(GoogleOnceType*, void (*)()) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/gutil/once.h:53:5 (libkudu_util.so+0x113e4d) #8 Singleton<kudu::KernelStackWatchdog>::get() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/gutil/singleton.h:79:5 (libkudu_util.so+0x17f5b1) #9 kudu::KernelStackWatchdog::GetInstance() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/kernel_stack_watchdog.h:87:12 (libkudu_util.so+0x17f423) #10 kudu::KernelStackWatchdog::CreateAndRegisterTLS() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/kernel_stack_watchdog.cc:219:3 (libkudu_util.so+0x17ed17) #11 kudu::KernelStackWatchdog::GetTLS() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/kernel_stack_watchdog.h:170:7 (libkudu_util.so+0x1f2901) #12 kudu::ScopedWatchKernelStack::ScopedWatchKernelStack(char const*, int) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/kernel_stack_watchdog.h:248:13 (libkudu_util.so+0x1f1b70) #13 kudu::Thread::StartThread(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, boost::function<void ()> const&, unsigned long, scoped_refptr<kudu::Thread>*) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/thread.cc:600:5 (libkudu_util.so+0x1eebaf) #14 kudu::Status kudu::Thread::Create<void (*)(int, int), int, int>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, void (* const&)(int, int), int const&, int const&, scoped_refptr<kudu::Thread>*) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/thread.h:170:12 (trace-test+0x4f03ef) #15 kudu::TraceTest_TestChromeTracing_Test::TestBody() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/trace-test.cc:172:5 (trace-test+0x4e750b) #16 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2402:10 (libgmock.so+0x552ef) #17 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2438 (libgmock.so+0x552ef) #18 testing::Test::Run() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2474:5 (libgmock.so+0x344b8) #19 testing::TestInfo::Run() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2656:11 (libgmock.so+0x3574c) #20 testing::TestCase::Run() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2774:28 (libgmock.so+0x36226) #21 testing::internal::UnitTestImpl::RunAllTests() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:4649:43 (libgmock.so+0x425fa) #22 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2402:10 (libgmock.so+0x5625f) #23 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2438 (libgmock.so+0x5625f) #24 testing::UnitTest::Run() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:4257:10 (libgmock.so+0x41ee2) #25 RUN_ALL_TESTS() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/gtest/gtest.h:2233:46 (libkudu_test_main.so+0x351b) #26 main /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/test_main.cc:106:13 (libkudu_test_main.so+0x2cc6) Change-Id: I5dc974be22ff101dcb8091be1fe692ab61376bc2 SUMMARY: ThreadSanitizer: data race /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/trace-test.cc:463:5 in kudu::TraceEventCallbackTest::Callback(long, char, unsigned char const*, char const*, unsigned long, int, char const* const*, unsigned char const*, unsigned long const*, unsigned char) Reviewed-on: http://gerrit.cloudera.org:8080/14256 Reviewed-by: Alexey Serbin <[email protected]> Tested-by: Adar Dembo <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Nov 7, 2019
…time This patch removes memkind and libnuma from the thirdparty tree and changes the NVM cache implementation to dynamically link memkind at runtime using dlopen() and friends. KUDU-2990 explains why we need to do this in great detail so I won't repeat that here. The alternatives I considered: 1. Patch memkind to dlopen() libnuma. This is probably the most user-friendly approach because libnuma is found in most systems (or at least in most package repositories), but a new enough memkind is not, and having Kudu distribute memkind eases adoption of the NVM cache. However, I was nervous about performing deep surgery in memkind as it's a codebase with which I'm not familiar, and I wanted to minimize risk because we'll be backporting this patch to several older branches. 2. Patch memkind to define weak libnuma symbols. If done correctly, behavior is unchanged when libnuma is present on the host system, but if it's not there, calls from memkind to libnuma will crash. Again, I didn't feel comfortable hacking into memkind, plus I've found weak symbols difficult to use in the past. 3. Remove the NVM cache from Kudu altogether. This is obviously the safest and simplest option, but it punishes Kudu users who actually use the NVM cache. 4. Gate the build of the NVM cache behind a CMake option. This ought to satisfy the ASF's definition of an "optional" feature, but does nothing for binary redistributors who wish to offer the NVM cache and who build Kudu as a statically linked "fat" binary. 5. Build as we do today, but forbid static linkage of libnuma. Binary redistributors will need to choose between including libnuma in their distributions, or forcing Kudu to look for libnuma at runtime. The former still violates ASF policy, and the latter means Kudu won't start on a system lacking libnuma, regardless of whether the NVM cache is actually in use. So what are the ramifications of the chosen approach? - Kudu no longer distributes memkind or libnuma. To use the NVM cache, the host system must provide both, and memkind must be version 1.6.0 or newer. CentOS 6 and Ubuntu 18.04 repositories all carried memkind 1.1.0. CentOS 7 has memkind 1.7.0. Persistent memory hardware itself also has a pretty steep kernel version requirement, so it's unlikely to be found outside of a new distro in the first place. - Tests that exercise the NVM cache will be skipped if they can't find a conformant memkind (and libnuma). - When starting Kudu, if you don't set --block_cache_type=NVM, you shoudn't notice any change. - If you do, Kudu will crash at startup if it can't find a conformant memkind. This affects upgrades: if you were already an NVM cache user but you didn't have memkind installed, your Kudu will crash post-upgrade. Note: this doesn't preclude implementing alternative #1 (the one I think is ideal) in the future; we'll just have to revert the bulk of this patch when we do so. To test, I ran cfile-test and cache-test as follows: - Without memkind installed: DRAM tests passed, NVM tests were skipped - With an old memkind installed: DRAM tests passed, NVM tests were skipped - With LD_LIBRARY_PATH=/path/to/memkind-1.9.0: All tests passed I also manually ran a Kudu master with --block_cache_type=NVM and without memkind to verify the crashing behavior. Change-Id: I4f474196aa98b5fa6e5966b9a3aea9a7e466805c Reviewed-on: http://gerrit.cloudera.org:8080/14620 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin <[email protected]> (cherry picked from commit ba908ef) Reviewed-on: http://gerrit.cloudera.org:8080/14647 Reviewed-by: Grant Henke <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Nov 7, 2019
…time This patch removes memkind and libnuma from the thirdparty tree and changes the NVM cache implementation to dynamically link memkind at runtime using dlopen() and friends. KUDU-2990 explains why we need to do this in great detail so I won't repeat that here. The alternatives I considered: 1. Patch memkind to dlopen() libnuma. This is probably the most user-friendly approach because libnuma is found in most systems (or at least in most package repositories), but a new enough memkind is not, and having Kudu distribute memkind eases adoption of the NVM cache. However, I was nervous about performing deep surgery in memkind as it's a codebase with which I'm not familiar, and I wanted to minimize risk because we'll be backporting this patch to several older branches. 2. Patch memkind to define weak libnuma symbols. If done correctly, behavior is unchanged when libnuma is present on the host system, but if it's not there, calls from memkind to libnuma will crash. Again, I didn't feel comfortable hacking into memkind, plus I've found weak symbols difficult to use in the past. 3. Remove the NVM cache from Kudu altogether. This is obviously the safest and simplest option, but it punishes Kudu users who actually use the NVM cache. 4. Gate the build of the NVM cache behind a CMake option. This ought to satisfy the ASF's definition of an "optional" feature, but does nothing for binary redistributors who wish to offer the NVM cache and who build Kudu as a statically linked "fat" binary. 5. Build as we do today, but forbid static linkage of libnuma. Binary redistributors will need to choose between including libnuma in their distributions, or forcing Kudu to look for libnuma at runtime. The former still violates ASF policy, and the latter means Kudu won't start on a system lacking libnuma, regardless of whether the NVM cache is actually in use. So what are the ramifications of the chosen approach? - Kudu no longer distributes memkind or libnuma. To use the NVM cache, the host system must provide both, and memkind must be version 1.6.0 or newer. CentOS 6 and Ubuntu 18.04 repositories all carried memkind 1.1.0. CentOS 7 has memkind 1.7.0. Persistent memory hardware itself also has a pretty steep kernel version requirement, so it's unlikely to be found outside of a new distro in the first place. - Tests that exercise the NVM cache will be skipped if they can't find a conformant memkind (and libnuma). - When starting Kudu, if you don't set --block_cache_type=NVM, you shoudn't notice any change. - If you do, Kudu will crash at startup if it can't find a conformant memkind. This affects upgrades: if you were already an NVM cache user but you didn't have memkind installed, your Kudu will crash post-upgrade. Note: this doesn't preclude implementing alternative #1 (the one I think is ideal) in the future; we'll just have to revert the bulk of this patch when we do so. To test, I ran cfile-test and cache-test as follows: - Without memkind installed: DRAM tests passed, NVM tests were skipped - With an old memkind installed: DRAM tests passed, NVM tests were skipped - With LD_LIBRARY_PATH=/path/to/memkind-1.9.0: All tests passed I also manually ran a Kudu master with --block_cache_type=NVM and without memkind to verify the crashing behavior. Change-Id: I4f474196aa98b5fa6e5966b9a3aea9a7e466805c Reviewed-on: http://gerrit.cloudera.org:8080/14620 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Nov 7, 2019
…time This patch removes memkind and libnuma from the thirdparty tree and changes the NVM cache implementation to dynamically link memkind at runtime using dlopen() and friends. KUDU-2990 explains why we need to do this in great detail so I won't repeat that here. The alternatives I considered: 1. Patch memkind to dlopen() libnuma. This is probably the most user-friendly approach because libnuma is found in most systems (or at least in most package repositories), but a new enough memkind is not, and having Kudu distribute memkind eases adoption of the NVM cache. However, I was nervous about performing deep surgery in memkind as it's a codebase with which I'm not familiar, and I wanted to minimize risk because we'll be backporting this patch to several older branches. 2. Patch memkind to define weak libnuma symbols. If done correctly, behavior is unchanged when libnuma is present on the host system, but if it's not there, calls from memkind to libnuma will crash. Again, I didn't feel comfortable hacking into memkind, plus I've found weak symbols difficult to use in the past. 3. Remove the NVM cache from Kudu altogether. This is obviously the safest and simplest option, but it punishes Kudu users who actually use the NVM cache. 4. Gate the build of the NVM cache behind a CMake option. This ought to satisfy the ASF's definition of an "optional" feature, but does nothing for binary redistributors who wish to offer the NVM cache and who build Kudu as a statically linked "fat" binary. 5. Build as we do today, but forbid static linkage of libnuma. Binary redistributors will need to choose between including libnuma in their distributions, or forcing Kudu to look for libnuma at runtime. The former still violates ASF policy, and the latter means Kudu won't start on a system lacking libnuma, regardless of whether the NVM cache is actually in use. So what are the ramifications of the chosen approach? - Kudu no longer distributes memkind or libnuma. To use the NVM cache, the host system must provide both, and memkind must be version 1.6.0 or newer. CentOS 6 and Ubuntu 18.04 repositories all carried memkind 1.1.0. CentOS 7 has memkind 1.7.0. Persistent memory hardware itself also has a pretty steep kernel version requirement, so it's unlikely to be found outside of a new distro in the first place. - Tests that exercise the NVM cache will be skipped if they can't find a conformant memkind (and libnuma). - When starting Kudu, if you don't set --block_cache_type=NVM, you shoudn't notice any change. - If you do, Kudu will crash at startup if it can't find a conformant memkind. This affects upgrades: if you were already an NVM cache user but you didn't have memkind installed, your Kudu will crash post-upgrade. Note: this doesn't preclude implementing alternative #1 (the one I think is ideal) in the future; we'll just have to revert the bulk of this patch when we do so. To test, I ran cfile-test and cache-test as follows: - Without memkind installed: DRAM tests passed, NVM tests were skipped - With an old memkind installed: DRAM tests passed, NVM tests were skipped - With LD_LIBRARY_PATH=/path/to/memkind-1.9.0: All tests passed I also manually ran a Kudu master with --block_cache_type=NVM and without memkind to verify the crashing behavior. I had to resolve conflicts in src/kudu/cfile/cfile-test.cc. Change-Id: I4f474196aa98b5fa6e5966b9a3aea9a7e466805c Reviewed-on: http://gerrit.cloudera.org:8080/14620 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin <[email protected]> (cherry picked from commit ba908ef) Reviewed-on: http://gerrit.cloudera.org:8080/14648 Reviewed-by: Grant Henke <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Dec 18, 2019
Prior to this patch, Kudu masters and tablet servers would crash if {Master,TabletServer}::{Init,Start}() returned non-OK status. As it's seen, there is not much advantage in that behavior vs returning non-zero code from main(): * Since those calls are in the main() function context, there is an easy way to properly handle non-OK return codes from Init() and Start() without sacrificing the consistency of the processes' behavior and their address space: just return non-zero from main() function. * From the monitoring and reporting perspectives, it's possible to detect a failure based on the exit status of a Kudu process. * In most cases in production, core dumps are disabled, and only minidumps were available from processes crashed in such cases. However, given a minidump, there isn't much information available for troubleshooting because of the stripped heap. As for the stack trace provided with a minidump, it looks barely useful at all, not providing even information that's available from the logs: #0 0x00007f2445c691f7 in raise () from ./lib64/libc.so.6 #1 0x00007f2445c6a8e8 in abort () from ./lib64/libc.so.6 #2 0x0000000001bcf1e9 in kudu::AbortFailureFunction () at src/kudu/util/minidump.cc:190 #3 0x0000000000902fad in google::LogMessage::Fail () at thirdparty/src/glog-0.3.5/src/logging.cc:1488 #4 0x0000000000904f03 in google::LogMessage::SendToLog (this=0x7ffc44ffb3c0) at thirdparty/src/glog-0.3.5/src/logging.cc:1442 #5 0x0000000000902b09 in google::LogMessage::Flush (this=this@entry=0x7ffc44ffb3c0) at thirdparty/src/glog-0.3.5/src/logging.cc:1311 #6 0x000000000090588f in google::LogMessageFatal::~LogMessageFatal (this=0x7ffc44ffb3c0, __in_chrg=<optimized out>) at thirdparty/src/glog-0.3.5/src/logging.cc:2023 #7 0x000000000089c9c3 in kudu::master::MasterMain (argc=1, argv=0x7ffc44ffbb60) at src/kudu/master/master_main.cc:74 #8 0x00007f2445c55c05 in __libc_start_main () from ./lib64/libc.so.6 #9 0x000000000089c3c5 in _start () This patch changes the described behavior. I also updated the handling of non-OK return status from CheckCPUFlags() during the earliest init if detecting a non-SSE4.2/non-SSSE3 CPU. With this patch, if failed to init or start, Kudu masters and tablet servers write an error message into the log and exit with non-zero status instead of crashing. Change-Id: Id06646e2211eb24db28c582455d4a34af7501b26 Reviewed-on: http://gerrit.cloudera.org:8080/14908 Reviewed-by: Andrew Wong <[email protected]> Reviewed-by: Adar Dembo <[email protected]> Tested-by: Kudu Jenkins
asfgit
pushed a commit
that referenced
this pull request
Jan 7, 2020
Previously the FunctionGaugeDetacher in a Tablet could be destructed after some of the state in the Tablet had already been destructed. Specifically, the metrics would be detached after destructing 'last_rw_time_lock_' et al. This led to some TSAN warnings[1]. Without this patch, RaftConsensusNumLeadersMetricTest TestNumLeadersMetric, which deletes tablets and then immediately fetches metrics, would fail 29/1000 times in TSAN mode. With it, it only failed 7/1000, those due to an unrelated timeout (not addressed in this patch). I went ahead and updated other misordered instances of FunctionGaugeDetachers. As far as I can tell, there aren't any other affected tests. [1]: ================== WARNING: ThreadSanitizer: data race (pid=30286) Write of size 1 at 0x7b58000521c8 by thread T156 (mutexes: write M3961, write M2995): #0 AnnotateRWLockDestroy /data/8/awong/kudu/thirdparty/src/llvm-9.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interface_ann.cc:264 (kudu-tserver+0x4ab83e) #1 kudu::rw_spinlock::~rw_spinlock() ../src/kudu/util/locks.h:89:5 (libtserver.so+0x262d4b) #2 kudu::tablet::Tablet::~Tablet() ../src/kudu/tablet/tablet.cc:270:1 (libtablet.so+0x174886) #3 std::__1::default_delete<kudu::tablet::Tablet>::operator()(kudu::tablet::Tablet*) const /data/8/awong/kudu/thirdparty/installed/tsan/include/c++/v1/memory:2338:5 (libtablet.so+0x233026) #4 std::__1::__shared_ptr_pointer<kudu::tablet::Tablet*, std::__1::default_delete<kudu::tablet::Tablet>, std::__1::allocator<kudu::tablet::Tablet> >::__on_zero_shared() /data/8/awong/kudu/thirdparty/installed/tsan/include/c++/v1/memory:3511:5 (libtablet.so+0x2343bf) #5 std::__1::__shared_count::__release_shared() /data/8/awong/kudu/thirdparty/installed/tsan/include/c++/v1/memory:3415:9 (libtserver.so+0x125c9c) #6 std::__1::__shared_weak_count::__release_shared() /data/8/awong/kudu/thirdparty/installed/tsan/include/c++/v1/memory:3457:27 (libtserver.so+0x125c12) #7 std::__1::shared_ptr<kudu::tablet::Tablet>::~shared_ptr() /data/8/awong/kudu/thirdparty/installed/tsan/include/c++/v1/memory:4393:19 (libtserver.so+0x1478b7) #8 std::__1::shared_ptr<kudu::tablet::Tablet>::reset() /data/8/awong/kudu/thirdparty/installed/tsan/include/c++/v1/memory:4528:5 (libtablet.so+0x2622c6) #9 kudu::tablet::TabletReplica::Stop() ../src/kudu/tablet/tablet_replica.cc:318:13 (libtablet.so+0x2578bd) #10 kudu::tserver::TSTabletManager::DeleteTablet(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, kudu::tablet::TabletDataState, boost::optional<long> const&, kudu::tserver::TabletServerErrorPB_Code*) ../src/kudu/tserver/ts_tablet_manager.cc:972:12 (libtserver.so+0x25a568) #11 kudu::tserver::DeleteTabletRunnable::Run() ../src/kudu/tserver/ts_tablet_manager.cc:882:36 (libtserver.so+0x27caa8) #12 kudu::ThreadPool::DispatchThread() ../src/kudu/util/threadpool.cc:685:22 (libkudu_util.so+0x40e2f6) #13 boost::_mfi::mf0<void, kudu::ThreadPool>::operator()(kudu::ThreadPool*) const ../thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:49:29 (libkudu_util.so+0x4357bc) #14 void boost::_bi::list1<boost::_bi::value<kudu::ThreadPool*> >::operator()<boost::_mfi::mf0<void, kudu::ThreadPool>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void, kudu::ThreadPool>&, boost::_bi::list0&, int) ../thirdparty/installed/tsan/include/boost/bind/bind.hpp:259:9 (libkudu_util.so+0x43566d) #15 boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::ThreadPool>, boost::_bi::list1<boost::_bi::value<kudu::ThreadPool*> > >::operator()() ../thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16 (libkudu_util.so+0x4355b1) #16 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::ThreadPool>, boost::_bi::list1<boost::_bi::value<kudu::ThreadPool*> > >, void>::invoke(boost::detail::function::function_buffer&) ../thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11 (libkudu_util.so+0x4351e0) #17 boost::function0<void>::operator()() const ../thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libkrpc.so+0x13b9c1) #18 kudu::Thread::SuperviseThread(void*) ../src/kudu/util/thread.cc:675:3 (libkudu_util.so+0x3eed69) Previous atomic write of size 4 at 0x7b58000521c8 by thread T12 (mutexes: write M261907054171829296): #0 __tsan_atomic32_compare_exchange_strong /data/8/awong/kudu/thirdparty/src/llvm-9.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cc:780 (kudu-tserver+0x4b475e) #1 base::subtle::Release_CompareAndSwap(int volatile*, int, int) ../src/kudu/gutil/atomicops-internals-tsan.h:93:3 (libtablet.so+0x1d4842) #2 kudu::rw_semaphore::unlock_shared() ../src/kudu/util/rw_semaphore.h:91:19 (libtablet.so+0x1d4766) #3 kudu::rw_spinlock::unlock_shared() ../src/kudu/util/locks.h:99:10 (libtablet.so+0x1d45da) #4 kudu::shared_lock<kudu::rw_spinlock>::~shared_lock() ../src/kudu/util/locks.h:283:11 (libtablet.so+0x1a3bf5) #5 kudu::tablet::Tablet::LastReadElapsedSeconds() const ../src/kudu/tablet/tablet.cc:1989:1 (libtablet.so+0x17457c) #6 kudu::internal::RunnableAdapter<unsigned long (kudu::tablet::Tablet::*)() const>::Run(kudu::tablet::Tablet const*) ../src/kudu/gutil/bind_internal.h:155:12 (libtablet.so+0x1e5c1c) #7 kudu::internal::InvokeHelper<false, unsigned long, kudu::internal::RunnableAdapter<unsigned long (kudu::tablet::Tablet::*)() const>, void (kudu::tablet::Tablet*)>::MakeItSo(kudu::internal::RunnableAdapter<unsigned long (kudu::tablet::Tablet::*)() const>, kudu::tablet::Tablet*) ../src/kudu/gutil/bind_internal.h:865:21 (libtablet.so+0x1e5a8c) #8 kudu::internal::Invoker<1, kudu::internal::BindState<kudu::internal::RunnableAdapter<unsigned long (kudu::tablet::Tablet::*)() const>, unsigned long (kudu::tablet::Tablet const*), void (kudu::internal::UnretainedWrapper<kudu::tablet::Tablet>)>, unsigned long (kudu::tablet::Tablet const*)>::Run(kudu::internal::BindStateBase*) ../src/kudu/gutil/bind_internal.h:1065:12 (libtablet.so+0x1e59a7) #9 kudu::Callback<unsigned long ()>::Run() const ../src/kudu/gutil/callback.h:396:12 (libtserver.so+0x15acfb) #10 kudu::FunctionGauge<unsigned long>::value() const ../src/kudu/util/metrics.h:1239:22 (libtserver.so+0x15ab1f) #11 kudu::FunctionGauge<unsigned long>::WriteValue(kudu::JsonWriter*) const ../src/kudu/util/metrics.h:1243:19 (libtserver.so+0x15a7bc) #12 kudu::Gauge::WriteAsJson(kudu::JsonWriter*, kudu::MetricJsonOptions const&) const ../src/kudu/util/metrics.cc:716:3 (libkudu_util.so+0x31da17) #13 void kudu::WriteMetricsToJson<std::__1::unordered_map<kudu::MetricPrototype const*, scoped_refptr<kudu::Metric>, std::__1::hash<kudu::MetricPrototype const*>, std::__1::equal_to<kudu::MetricPrototype const*>, std::__1::allocator<std::__1::pair<kudu::MetricPrototype const* const, scoped_refptr<kudu::Metric> > > > >(kudu::JsonWriter*, std::__1::unordered_map<kudu::MetricPrototype const*, scoped_refptr<kudu::Metric>, std::__1::hash<kudu::MetricPrototype const*>, std::__1::equal_to<kudu::MetricPrototype const*>, std::__1::allocator<std::__1::pair<kudu::MetricPrototype const* const, scoped_refptr<kudu::Metric> > > > const&, kudu::MetricJsonOptions const&) ../src/kudu/util/metrics.cc:64:7 (libkudu_util.so+0x321ff2) #14 kudu::MetricEntity::WriteAsJson(kudu::JsonWriter*, kudu::MetricJsonOptions const&) const ../src/kudu/util/metrics.cc:388:3 (libkudu_util.so+0x31a9f4) #15 kudu::MetricRegistry::WriteAsJson(kudu::JsonWriter*, kudu::MetricJsonOptions const&) const ../src/kudu/util/metrics.cc:517:7 (libkudu_util.so+0x31c34a) #16 kudu::server::DiagnosticsLog::LogMetrics() ../src/kudu/server/diagnostics_log.cc:345:3 (libserver_process.so+0xac51a) #17 kudu::server::DiagnosticsLog::RunThread() ../src/kudu/server/diagnostics_log.cc:225:7 (libserver_process.so+0xabc50) #18 boost::_mfi::mf0<void, kudu::server::DiagnosticsLog>::operator()(kudu::server::DiagnosticsLog*) const ../thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:49:29 (libserver_process.so+0xb88ac) #19 void boost::_bi::list1<boost::_bi::value<kudu::server::DiagnosticsLog*> >::operator()<boost::_mfi::mf0<void, kudu::server::DiagnosticsLog>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void, kudu::server::DiagnosticsLog>&, boost::_bi::list0&, int) ../thirdparty/installed/tsan/include/boost/bind/bind.hpp:259:9 (libserver_process.so+0xb875d) #20 boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::server::DiagnosticsLog>, boost::_bi::list1<boost::_bi::value<kudu::server::DiagnosticsLog*> > >::operator()() ../thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16 (libserver_process.so+0xb8671) #21 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::server::DiagnosticsLog>, boost::_bi::list1<boost::_bi::value<kudu::server::DiagnosticsLog*> > >, void>::invoke(boost::detail::function::function_buffer&) ../thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11 (libserver_process.so+0xb82a0) #22 boost::function0<void>::operator()() const ../thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libkrpc.so+0x13b9c1) #23 kudu::Thread::SuperviseThread(void*) ../src/kudu/util/thread.cc:675:3 (libkudu_util.so+0x3eed69) Change-Id: Ib32120178a68b5389e167643e9bb8b89f8c625b9 Reviewed-on: http://gerrit.cloudera.org:8080/14979 Reviewed-by: Alexey Serbin <[email protected]> Tested-by: Kudu Jenkins
asfgit
pushed a commit
that referenced
this pull request
Jan 29, 2020
I saw a warning from the ThreadSanitizer while running the TsTabletManagerITest.TestTableStats scenario (TSAN build). It turned to be a test-only race in scenarios where MiniMaster is involved in single-master test mini-cluster if calling MiniMaster::Shutdown() after MiniMaster::Start() but before the catalog manager has completed its initialization. The relevant snippet from the warning: WARNING: ThreadSanitizer: data race (pid=6972) Write of size 8 at 0x7b7000000c70 by main thread: #0 pthread_cond_destroy #1 ConditionVariable::~ConditionVariable() #2 CountDownLatch::~CountDownLatch() #3 Promise<Status>::~Promise() #4 Master::~Master() src/kudu/master/master.cc:129 #5 Master::~Master() src/kudu/master/master.cc:127 #6 std::__1::default_delete<Master>::operator(Master*) #7 std::__1::unique_ptr<Master, std::__1::default_delete<Master>>::reset(Master*) #8 MiniMaster::Shutdown() src/kudu/master/mini_master.cc:118 ... Previous read of size 8 at 0x7b7000000c70 by thread T20 (mutexes: write M50614): #0 pthread_cond_broadcast #1 ConditionVariable::Broadcast() #2 CountDownLatch::CountDown(int) #3 CountDownLatch::CountDown() #4 Promise<Status>::Set(Status const&) #5 Master::InitCatalogManagerTask() src/kudu/master/master.cc:206 ... Change-Id: If5122a16bb04f089fe1ec4a0d0ff157164ebfdc4 Reviewed-on: http://gerrit.cloudera.org:8080/15125 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Feb 20, 2020
I saw an ASAN test failure that occured when there was a failure to connect to the Hive Metastore. This may not fix the connection issue, but it fixes the unsafe ASAN failure and allows the test to continue. Below is a sample of the log: W0220 18:46:15.548344 18002 client.h:351] Failed to connect to Hive Metastore (127.0.0.1:45269): Network error: failed to open Hive Metastore connection: socket open() error: Connection refused I0220 18:46:16.549294 18002 client.cc:56] TSocket::open() error on socket (after THRIFT_POLL) <Host: 127.0.0.1 Port: 45269>Connection refused W0220 18:46:16.549479 18002 client.h:351] Failed to connect to Hive Metastore (127.0.0.1:45269): Network error: failed to open Hive Metastore connection: socket open() error: Connection refused /home/jenkins-slave/workspace/kudu-master/0/src/kudu/thrift/client.h:204:3: runtime error: left shift of 100 by 26 places cannot be represented in type 'int' #0 0x7f527299d77b in kudu::thrift::HaClient<kudu::hms::HmsClient>::Execute(std::function<kudu::Status (kudu::hms::HmsClient*)>)::'lambda'()::operator()() const /home/jenkins-slave/workspace/kudu-master/0/src/kudu/thrift/client.h:204:3 #1 0x7f526e44ead7 in boost::function0<void>::operator()() const /home/jenkins-slave/workspace/kudu-master/0/thirdparty/installed/uninstrumented/include/boost/function/function_template.hpp:770:14 #2 0x7f526b6f21f4 in kudu::ThreadPool::DispatchThread() /home/jenkins-slave/workspace/kudu-master/0/src/kudu/util/threadpool.cc:685:22 #3 0x7f526b70c992 in boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::ThreadPool>, boost::_bi::list1<boost::_bi::value<kudu::ThreadPool*> > >::operator()() /home/jenkins-slave/workspace/kudu-master/0/thirdparty/installed/uninstrumented/include/boost/bind/bind.hpp:1222:16 #4 0x7f526e44ead7 in boost::function0<void>::operator()() const /home/jenkins-slave/workspace/kudu-master/0/thirdparty/installed/uninstrumented/include/boost/function/function_template.hpp:770:14 #5 0x7f526b6d812a in kudu::Thread::SuperviseThread(void*) /home/jenkins-slave/workspace/kudu-master/0/src/kudu/util/thread.cc:675:3 #6 0x7f5267917183 in start_thread /build/eglibc-SvCtMH/eglibc-2.19/nptl/pthread_create.c:312 #7 0x7f526742dffc in clone sysdeps/unix/sysv/linux/x86_64/clone.S:111 Change-Id: I1282ad36027b314d090e5a2dffdc3854002af761 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /home/jenkins-slave/workspace/kudu-master/0/src/kudu/thrift/client.h:204:3 in Reviewed-on: http://gerrit.cloudera.org:8080/15256 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Mar 10, 2020
On adding couple of member variables and methods to BlockBloomFilter class for importing Or() function, predicate-test would sometimes randomly crash with SIGSEGV in AVX operations. On debugging, the crash would only happen when the "directory_" structure in BlockBloomFilter is not 32-bytes aligned. It's surprising without the addition of those methods, "directory_" so far has always been 32 bytes aligned despite Arena allocator not supporting alignment values greater than 16 bytes. With input from Todd, explored 3 options to fix the alignment problem: 1) Update the HeapBufferAllocator in util/memory to align allocation to 64 bytes. See AllocateInternal() implementation. Surprisingly the FLAGS_allocator_aligned_mode is OFF by default so we appear to be relying on the allocator to always allocate 16 byte aligned buffers. So this option would require turning ON the FLAGS_allocator_aligned_mode flag by default. 2) Update the Arena allocator such that when needed extra bytes are allocated to allow aligning with 32/64 bytes considering the new component will always be 16 byte aligned. This requires updating some address/alignment logic with offset_ and the new component allocation. 3) Don't touch the Arena allocator and simply add padding in the ArenaBlockBloomFilterBufferAllocator to minimize any risk to other parts of the codebase. Opted for option #2 since it broadly adds support for 32/64 byte alignment instead of limited scope of option #3. Option #1 is tempting but unsure about the unknowns that turning on the allocator_aligned_mode would bring. Although we need only support for 32 byte alignment for AVX operations, also added support for 64 bytes to better align with cache line size. Additionally this change: - Adds a simple BlockBloomFilter unit test that reproduced the alignment problem compared to end-to-end predicate-test which was turning out to be difficult to debug. - Fixes and enhances the arena-test with bunch of variations. Change-Id: Ib665115fa0fc262a8b76c48f52947dedb84be2a7 Reviewed-on: http://gerrit.cloudera.org:8080/15372 Tested-by: Alexey Serbin <[email protected]> Reviewed-by: Alexey Serbin <[email protected]> Reviewed-by: Adar Dembo <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Dec 9, 2020
I noticed that the newly added TxnManagerTest.BeginManyTransactions test scenario started failing with ASAN heap-use-after-free warnings. After looking a that, it turned out that the original code was assuming the cache wouldn't be ever reset before calling the MetaCache's destructor. However, changelist 232474a introduced a new method MetaCache::ClearCache() and since then the method is being called upon altering a table if the partitioning scheme has been updated. This patch resolves the issue by introducing so-called tablet server registry that's never reset indeed, where entries in the tablet server cache are just references to the entries in the registry (they are raw pointers, actually). The newly added test scenario was reliably producing AddressSanitizer's heap-use-after-free warnings every time I ran it using ASAN build. Below is a snapshot of the relevant traces captured when running the new test scenario without the changes in the client metacache. AddressSanitizer: heap-use-after-free on address 0x608000129e20 at pc 0x00000078bd54 bp 0x7fa731d0b240 sp 0x7fa731d0b238 READ of size 4 at 0x608000129e20 thread T149 (rpc reactor-146) #0 0x78bd53 in base::subtle::NoBarrier_Load(int const volatile*) src/kudu/gutil/atomicops-internals-x86.h:200:10 #1 0x7fa79520e227 in base::SpinLock::SpinLoop(long, int*) src/kudu/gutil/spinlock.cc:86:10 #2 0x7fa79520e38b in base::SpinLock::SlowLock() src/kudu/gutil/spinlock.cc:104:25 #3 0x7fa7a099aab0 in std::unique_lock<kudu::simple_spinlock>::lock() ../../../include/c++/8/bits/std_mutex.h:267:17 #4 0x7fa7a0991e3e in std::unique_lock<kudu::simple_spinlock>::unique_lock(kudu::simple_spinlock&) ../../../include/c++/8/bits/std_mutex.h:197:2 #5 0x7fa7a0abfda1 in kudu::client::internal::RemoteTabletServer::InitProxy(kudu::client::KuduClient*, std::function<void (kudu::Status const&)> const&) src/kudu/client/meta_cache.cc:145:39 #6 0x7fa7a0ac60f5 in kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function<void (kudu::Status const&, kudu::client::internal::RemoteTabletServer*)> const&, kudu::MonoTime const&) src/kudu/client/meta_cache.cc:524:11 #7 0x7fa7a09b2dcf in kudu::rpc::RetriableRpc<kudu::client::internal::RemoteTabletServer, kudu::tserver::WriteRequestPB, kudu::tserver::WriteResponsePB>::SendRpc() src/kudu/rpc/retriable_rpc.h:163:19 #8 0x7fa7a09ac6cc in kudu::client::internal::Batcher::FlushBuffer(kudu::client::internal::RemoteTablet*, std::vector<kudu::client::internal::InFlightOp*, std::allocator<kudu::client::internal::InFlightOp*> > const&) src/kudu/client/batcher.cc:911:8 #9 0x7fa7a09a9e38 in kudu::client::internal::Batcher::FlushBuffersIfReady() src/kudu/client/batcher.cc:884:5 #10 0x7fa7a09abd2d in kudu::client::internal::Batcher::TabletLookupFinished(kudu::client::internal::InFlightOp*, kudu::Status const&) src/kudu/client/batcher.cc:851:3 #11 0x7fa7a0acec66 in kudu::client::internal::LookupRpc::SendRpcCb(kudu::Status const&) src/kudu/client/meta_cache.cc:923:3 #12 0x7fa7a0ab8800 in kudu::client::internal::AsyncLeaderMasterRpc<kudu::master::GetTableLocationsRequestPB, kudu::master::GetTableLocationsResponsePB>::SendRpc()::'lambda'()::operator()() const src/kudu/client/master_proxy_rpc.cc:130:26 0x608000129e20 is located 0 bytes inside of 96-byte region [0x608000129e20,0x608000129e80) freed by thread T163 here: #0 0x65c650 in operator delete(void*) thirdparty/src/llvm-9.0.0.src/projects/compiler-rt/lib/asan/asan_new_delete.cc:160:399 #1 0x7fa7a0aec859 in void STLDeleteContainerPairSecondPointers<std::__detail::_Node_iterator<std::pair<std::string const, kudu::client::internal::RemoteTabletServer*>, false, true> >(std::__detail::_Node_iterator<std::pair<std::string const, kudu::client::internal::RemoteTabletServer*>, false, true>, std::__detail::_Node_iterator<std::pair<std::string const, kudu::client::internal::RemoteTabletServer*>, false, true>) src/kudu/gutil/stl_util.h:199:5 #2 0x7fa7a0ad96d1 in void STLDeleteValues<std::unordered_map<std::string, kudu::client::internal::RemoteTabletServer*, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, kudu::client::internal::RemoteTabletServer*> > > >(std::unordered_map<std::string, kudu::client::internal::RemoteTabletServer*, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, kudu::client::internal::RemoteTabletServer*> > >*) src/kudu/gutil/stl_util.h:400:3 #3 0x7fa7a0ad4188 in kudu::client::internal::MetaCache::ClearCache() src/kudu/client/meta_cache.cc:1257:3 previously allocated by thread T149 (rpc reactor-146) here: #0 0x65bc98 in operator new(unsigned long) thirdparty/src/llvm-9.0.0.src/projects/compiler-rt/lib/asan/asan_new_delete.cc:99:386 #1 0x7fa7a0ac7bcb in kudu::client::internal::MetaCache::UpdateTabletServerUnlocked(kudu::master::TSInfoPB const&) src/kudu/client/meta_cache.cc:596:48 #2 0x7fa7a0ad0802 in kudu::client::internal::MetaCache::ProcessGetTableLocationsResponse(kudu::client::KuduTable const*, std::string const&, bool, kudu::master::GetTableLocationsResponsePB const&, kudu::client::internal::MetaCacheEntry*, int) src/kudu/client/meta_cache.cc:1030:7 #3 0x7fa7a0acf9c0 in kudu::client::internal::MetaCache::ProcessLookupResponse(kudu::client::internal::LookupRpc const&, kudu::client::internal::MetaCacheEntry*, int) src/kudu/client/meta_cache.cc:941:10 #4 0x7fa7a0ace64e in kudu::client::internal::LookupRpc::SendRpcCb(kudu::Status const&) src/kudu/client/meta_cache.cc:911:31 #5 0x7fa7a0ab8800 in kudu::client::internal::AsyncLeaderMasterRpc<kudu::master::GetTableLocationsRequestPB, kudu::master::GetTableLocationsResponsePB>::SendRpc()::'lambda'()::operator()() const src/kudu/client/master_proxy_rpc.cc:130:26 #6 0x7fa79b2c1620 in kudu::rpc::OutboundCall::CallCallback() src/kudu/rpc/outbound_call.cc:274:5 #7 0x7fa79b2c1af0 in kudu::rpc::OutboundCall::SetResponse(std::unique_ptr<kudu::rpc::CallResponse, std::default_delete<kudu::rpc::CallResponse> >) src/kudu/rpc/outbound_call.cc:306:5 #8 0x7fa79b26ef5e in kudu::rpc::Connection::HandleCallResponse(std::unique_ptr<kudu::rpc::InboundTransfer, std::default_delete<kudu::rpc::InboundTransfer> >) src/kudu/rpc/connection.cc:735:14 #9 0x7fa79b26e0d6 in kudu::rpc::Connection::ReadHandler(ev::io&, int) src/kudu/rpc/connection.cc:673:7 SUMMARY: AddressSanitizer: heap-use-after-free src/kudu/gutil/atomicops-internals-x86.h:200:10 in base::subtle::NoBarrier_Load(int const volatile*) Shadow bytes around the buggy address: 0x0c108001d370: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa 0x0c108001d380: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd 0x0c108001d390: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa 0x0c108001d3a0: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd 0x0c108001d3b0: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa =>0x0c108001d3c0: fa fa fa fa[fd]fd fd fd fd fd fd fd fd fd fd fd 0x0c108001d3d0: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa 0x0c108001d3e0: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd 0x0c108001d3f0: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa 0x0c108001d400: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa 0x0c108001d410: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa Change-Id: I03ec9318526fbfc2da9b068eb3bbd9cd996efbca Reviewed-on: http://gerrit.cloudera.org:8080/16839 Tested-by: Alexey Serbin <[email protected]> Reviewed-by: Andrew Wong <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Dec 11, 2020
I noticed that the newly added TxnManagerTest.BeginManyTransactions test scenario started failing with ASAN heap-use-after-free warnings. After looking a that, it turned out that the original code was assuming the cache wouldn't be ever reset before calling the MetaCache's destructor. However, changelist 232474a introduced a new method MetaCache::ClearCache() and since then the method is being called upon altering a table if the partitioning scheme has been updated. This patch resolves the issue by introducing so-called tablet server registry that's never reset indeed, where entries in the tablet server cache are just references to the entries in the registry (they are raw pointers, actually). The newly added test scenario was reliably producing AddressSanitizer's heap-use-after-free warnings every time I ran it using ASAN build. Below is a snapshot of the relevant traces captured when running the new test scenario without the changes in the client metacache. AddressSanitizer: heap-use-after-free on address 0x608000129e20 at pc 0x00000078bd54 bp 0x7fa731d0b240 sp 0x7fa731d0b238 READ of size 4 at 0x608000129e20 thread T149 (rpc reactor-146) #0 0x78bd53 in base::subtle::NoBarrier_Load(int const volatile*) src/kudu/gutil/atomicops-internals-x86.h:200:10 #1 0x7fa79520e227 in base::SpinLock::SpinLoop(long, int*) src/kudu/gutil/spinlock.cc:86:10 #2 0x7fa79520e38b in base::SpinLock::SlowLock() src/kudu/gutil/spinlock.cc:104:25 #3 0x7fa7a099aab0 in std::unique_lock<kudu::simple_spinlock>::lock() ../../../include/c++/8/bits/std_mutex.h:267:17 #4 0x7fa7a0991e3e in std::unique_lock<kudu::simple_spinlock>::unique_lock(kudu::simple_spinlock&) ../../../include/c++/8/bits/std_mutex.h:197:2 #5 0x7fa7a0abfda1 in kudu::client::internal::RemoteTabletServer::InitProxy(kudu::client::KuduClient*, std::function<void (kudu::Status const&)> const&) src/kudu/client/meta_cache.cc:145:39 #6 0x7fa7a0ac60f5 in kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function<void (kudu::Status const&, kudu::client::internal::RemoteTabletServer*)> const&, kudu::MonoTime const&) src/kudu/client/meta_cache.cc:524:11 #7 0x7fa7a09b2dcf in kudu::rpc::RetriableRpc<kudu::client::internal::RemoteTabletServer, kudu::tserver::WriteRequestPB, kudu::tserver::WriteResponsePB>::SendRpc() src/kudu/rpc/retriable_rpc.h:163:19 #8 0x7fa7a09ac6cc in kudu::client::internal::Batcher::FlushBuffer(kudu::client::internal::RemoteTablet*, std::vector<kudu::client::internal::InFlightOp*, std::allocator<kudu::client::internal::InFlightOp*> > const&) src/kudu/client/batcher.cc:911:8 #9 0x7fa7a09a9e38 in kudu::client::internal::Batcher::FlushBuffersIfReady() src/kudu/client/batcher.cc:884:5 #10 0x7fa7a09abd2d in kudu::client::internal::Batcher::TabletLookupFinished(kudu::client::internal::InFlightOp*, kudu::Status const&) src/kudu/client/batcher.cc:851:3 #11 0x7fa7a0acec66 in kudu::client::internal::LookupRpc::SendRpcCb(kudu::Status const&) src/kudu/client/meta_cache.cc:923:3 #12 0x7fa7a0ab8800 in kudu::client::internal::AsyncLeaderMasterRpc<kudu::master::GetTableLocationsRequestPB, kudu::master::GetTableLocationsResponsePB>::SendRpc()::'lambda'()::operator()() const src/kudu/client/master_proxy_rpc.cc:130:26 0x608000129e20 is located 0 bytes inside of 96-byte region [0x608000129e20,0x608000129e80) freed by thread T163 here: #0 0x65c650 in operator delete(void*) thirdparty/src/llvm-9.0.0.src/projects/compiler-rt/lib/asan/asan_new_delete.cc:160:399 #1 0x7fa7a0aec859 in void STLDeleteContainerPairSecondPointers<std::__detail::_Node_iterator<std::pair<std::string const, kudu::client::internal::RemoteTabletServer*>, false, true> >(std::__detail::_Node_iterator<std::pair<std::string const, kudu::client::internal::RemoteTabletServer*>, false, true>, std::__detail::_Node_iterator<std::pair<std::string const, kudu::client::internal::RemoteTabletServer*>, false, true>) src/kudu/gutil/stl_util.h:199:5 #2 0x7fa7a0ad96d1 in void STLDeleteValues<std::unordered_map<std::string, kudu::client::internal::RemoteTabletServer*, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, kudu::client::internal::RemoteTabletServer*> > > >(std::unordered_map<std::string, kudu::client::internal::RemoteTabletServer*, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, kudu::client::internal::RemoteTabletServer*> > >*) src/kudu/gutil/stl_util.h:400:3 #3 0x7fa7a0ad4188 in kudu::client::internal::MetaCache::ClearCache() src/kudu/client/meta_cache.cc:1257:3 previously allocated by thread T149 (rpc reactor-146) here: #0 0x65bc98 in operator new(unsigned long) thirdparty/src/llvm-9.0.0.src/projects/compiler-rt/lib/asan/asan_new_delete.cc:99:386 #1 0x7fa7a0ac7bcb in kudu::client::internal::MetaCache::UpdateTabletServerUnlocked(kudu::master::TSInfoPB const&) src/kudu/client/meta_cache.cc:596:48 #2 0x7fa7a0ad0802 in kudu::client::internal::MetaCache::ProcessGetTableLocationsResponse(kudu::client::KuduTable const*, std::string const&, bool, kudu::master::GetTableLocationsResponsePB const&, kudu::client::internal::MetaCacheEntry*, int) src/kudu/client/meta_cache.cc:1030:7 #3 0x7fa7a0acf9c0 in kudu::client::internal::MetaCache::ProcessLookupResponse(kudu::client::internal::LookupRpc const&, kudu::client::internal::MetaCacheEntry*, int) src/kudu/client/meta_cache.cc:941:10 #4 0x7fa7a0ace64e in kudu::client::internal::LookupRpc::SendRpcCb(kudu::Status const&) src/kudu/client/meta_cache.cc:911:31 #5 0x7fa7a0ab8800 in kudu::client::internal::AsyncLeaderMasterRpc<kudu::master::GetTableLocationsRequestPB, kudu::master::GetTableLocationsResponsePB>::SendRpc()::'lambda'()::operator()() const src/kudu/client/master_proxy_rpc.cc:130:26 #6 0x7fa79b2c1620 in kudu::rpc::OutboundCall::CallCallback() src/kudu/rpc/outbound_call.cc:274:5 #7 0x7fa79b2c1af0 in kudu::rpc::OutboundCall::SetResponse(std::unique_ptr<kudu::rpc::CallResponse, std::default_delete<kudu::rpc::CallResponse> >) src/kudu/rpc/outbound_call.cc:306:5 #8 0x7fa79b26ef5e in kudu::rpc::Connection::HandleCallResponse(std::unique_ptr<kudu::rpc::InboundTransfer, std::default_delete<kudu::rpc::InboundTransfer> >) src/kudu/rpc/connection.cc:735:14 #9 0x7fa79b26e0d6 in kudu::rpc::Connection::ReadHandler(ev::io&, int) src/kudu/rpc/connection.cc:673:7 SUMMARY: AddressSanitizer: heap-use-after-free src/kudu/gutil/atomicops-internals-x86.h:200:10 in base::subtle::NoBarrier_Load(int const volatile*) Shadow bytes around the buggy address: 0x0c108001d370: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa 0x0c108001d380: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd 0x0c108001d390: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa 0x0c108001d3a0: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd 0x0c108001d3b0: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa =>0x0c108001d3c0: fa fa fa fa[fd]fd fd fd fd fd fd fd fd fd fd fd 0x0c108001d3d0: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa 0x0c108001d3e0: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd 0x0c108001d3f0: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa 0x0c108001d400: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa 0x0c108001d410: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa Change-Id: I03ec9318526fbfc2da9b068eb3bbd9cd996efbca Reviewed-on: http://gerrit.cloudera.org:8080/16839 Tested-by: Alexey Serbin <[email protected]> Reviewed-by: Andrew Wong <[email protected]> (cherry picked from commit ae45cd1) Conflicts: src/kudu/client/client-test.cc src/kudu/client/client.h src/kudu/client/meta_cache.cc src/kudu/client/meta_cache.h Reviewed-on: http://gerrit.cloudera.org:8080/16862 Tested-by: Kudu Jenkins
asfgit
pushed a commit
that referenced
this pull request
Dec 12, 2020
I noticed that the newly added TxnManagerTest.BeginManyTransactions test scenario started failing with ASAN heap-use-after-free warnings. After looking a that, it turned out that the original code was assuming the cache wouldn't be ever reset before calling the MetaCache's destructor. However, changelist 232474a introduced a new method MetaCache::ClearCache() and since then the method is being called upon altering a table if the partitioning scheme has been updated. This patch resolves the issue by introducing so-called tablet server registry that's never reset indeed, where entries in the tablet server cache are just references to the entries in the registry (they are raw pointers, actually). The newly added test scenario was reliably producing AddressSanitizer's heap-use-after-free warnings every time I ran it using ASAN build. Below is a snapshot of the relevant traces captured when running the new test scenario without the changes in the client metacache. AddressSanitizer: heap-use-after-free on address 0x608000129e20 at pc 0x00000078bd54 bp 0x7fa731d0b240 sp 0x7fa731d0b238 READ of size 4 at 0x608000129e20 thread T149 (rpc reactor-146) #0 0x78bd53 in base::subtle::NoBarrier_Load(int const volatile*) src/kudu/gutil/atomicops-internals-x86.h:200:10 #1 0x7fa79520e227 in base::SpinLock::SpinLoop(long, int*) src/kudu/gutil/spinlock.cc:86:10 #2 0x7fa79520e38b in base::SpinLock::SlowLock() src/kudu/gutil/spinlock.cc:104:25 #3 0x7fa7a099aab0 in std::unique_lock<kudu::simple_spinlock>::lock() ../../../include/c++/8/bits/std_mutex.h:267:17 #4 0x7fa7a0991e3e in std::unique_lock<kudu::simple_spinlock>::unique_lock(kudu::simple_spinlock&) ../../../include/c++/8/bits/std_mutex.h:197:2 #5 0x7fa7a0abfda1 in kudu::client::internal::RemoteTabletServer::InitProxy(kudu::client::KuduClient*, std::function<void (kudu::Status const&)> const&) src/kudu/client/meta_cache.cc:145:39 #6 0x7fa7a0ac60f5 in kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function<void (kudu::Status const&, kudu::client::internal::RemoteTabletServer*)> const&, kudu::MonoTime const&) src/kudu/client/meta_cache.cc:524:11 #7 0x7fa7a09b2dcf in kudu::rpc::RetriableRpc<kudu::client::internal::RemoteTabletServer, kudu::tserver::WriteRequestPB, kudu::tserver::WriteResponsePB>::SendRpc() src/kudu/rpc/retriable_rpc.h:163:19 #8 0x7fa7a09ac6cc in kudu::client::internal::Batcher::FlushBuffer(kudu::client::internal::RemoteTablet*, std::vector<kudu::client::internal::InFlightOp*, std::allocator<kudu::client::internal::InFlightOp*> > const&) src/kudu/client/batcher.cc:911:8 #9 0x7fa7a09a9e38 in kudu::client::internal::Batcher::FlushBuffersIfReady() src/kudu/client/batcher.cc:884:5 #10 0x7fa7a09abd2d in kudu::client::internal::Batcher::TabletLookupFinished(kudu::client::internal::InFlightOp*, kudu::Status const&) src/kudu/client/batcher.cc:851:3 #11 0x7fa7a0acec66 in kudu::client::internal::LookupRpc::SendRpcCb(kudu::Status const&) src/kudu/client/meta_cache.cc:923:3 #12 0x7fa7a0ab8800 in kudu::client::internal::AsyncLeaderMasterRpc<kudu::master::GetTableLocationsRequestPB, kudu::master::GetTableLocationsResponsePB>::SendRpc()::'lambda'()::operator()() const src/kudu/client/master_proxy_rpc.cc:130:26 0x608000129e20 is located 0 bytes inside of 96-byte region [0x608000129e20,0x608000129e80) freed by thread T163 here: #0 0x65c650 in operator delete(void*) thirdparty/src/llvm-9.0.0.src/projects/compiler-rt/lib/asan/asan_new_delete.cc:160:399 #1 0x7fa7a0aec859 in void STLDeleteContainerPairSecondPointers<std::__detail::_Node_iterator<std::pair<std::string const, kudu::client::internal::RemoteTabletServer*>, false, true> >(std::__detail::_Node_iterator<std::pair<std::string const, kudu::client::internal::RemoteTabletServer*>, false, true>, std::__detail::_Node_iterator<std::pair<std::string const, kudu::client::internal::RemoteTabletServer*>, false, true>) src/kudu/gutil/stl_util.h:199:5 #2 0x7fa7a0ad96d1 in void STLDeleteValues<std::unordered_map<std::string, kudu::client::internal::RemoteTabletServer*, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, kudu::client::internal::RemoteTabletServer*> > > >(std::unordered_map<std::string, kudu::client::internal::RemoteTabletServer*, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, kudu::client::internal::RemoteTabletServer*> > >*) src/kudu/gutil/stl_util.h:400:3 #3 0x7fa7a0ad4188 in kudu::client::internal::MetaCache::ClearCache() src/kudu/client/meta_cache.cc:1257:3 previously allocated by thread T149 (rpc reactor-146) here: #0 0x65bc98 in operator new(unsigned long) thirdparty/src/llvm-9.0.0.src/projects/compiler-rt/lib/asan/asan_new_delete.cc:99:386 #1 0x7fa7a0ac7bcb in kudu::client::internal::MetaCache::UpdateTabletServerUnlocked(kudu::master::TSInfoPB const&) src/kudu/client/meta_cache.cc:596:48 #2 0x7fa7a0ad0802 in kudu::client::internal::MetaCache::ProcessGetTableLocationsResponse(kudu::client::KuduTable const*, std::string const&, bool, kudu::master::GetTableLocationsResponsePB const&, kudu::client::internal::MetaCacheEntry*, int) src/kudu/client/meta_cache.cc:1030:7 #3 0x7fa7a0acf9c0 in kudu::client::internal::MetaCache::ProcessLookupResponse(kudu::client::internal::LookupRpc const&, kudu::client::internal::MetaCacheEntry*, int) src/kudu/client/meta_cache.cc:941:10 #4 0x7fa7a0ace64e in kudu::client::internal::LookupRpc::SendRpcCb(kudu::Status const&) src/kudu/client/meta_cache.cc:911:31 #5 0x7fa7a0ab8800 in kudu::client::internal::AsyncLeaderMasterRpc<kudu::master::GetTableLocationsRequestPB, kudu::master::GetTableLocationsResponsePB>::SendRpc()::'lambda'()::operator()() const src/kudu/client/master_proxy_rpc.cc:130:26 #6 0x7fa79b2c1620 in kudu::rpc::OutboundCall::CallCallback() src/kudu/rpc/outbound_call.cc:274:5 #7 0x7fa79b2c1af0 in kudu::rpc::OutboundCall::SetResponse(std::unique_ptr<kudu::rpc::CallResponse, std::default_delete<kudu::rpc::CallResponse> >) src/kudu/rpc/outbound_call.cc:306:5 #8 0x7fa79b26ef5e in kudu::rpc::Connection::HandleCallResponse(std::unique_ptr<kudu::rpc::InboundTransfer, std::default_delete<kudu::rpc::InboundTransfer> >) src/kudu/rpc/connection.cc:735:14 #9 0x7fa79b26e0d6 in kudu::rpc::Connection::ReadHandler(ev::io&, int) src/kudu/rpc/connection.cc:673:7 SUMMARY: AddressSanitizer: heap-use-after-free src/kudu/gutil/atomicops-internals-x86.h:200:10 in base::subtle::NoBarrier_Load(int const volatile*) Shadow bytes around the buggy address: 0x0c108001d370: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa 0x0c108001d380: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd 0x0c108001d390: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa 0x0c108001d3a0: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd 0x0c108001d3b0: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa =>0x0c108001d3c0: fa fa fa fa[fd]fd fd fd fd fd fd fd fd fd fd fd 0x0c108001d3d0: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa 0x0c108001d3e0: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd 0x0c108001d3f0: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa 0x0c108001d400: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa 0x0c108001d410: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa Change-Id: I03ec9318526fbfc2da9b068eb3bbd9cd996efbca Reviewed-on: http://gerrit.cloudera.org:8080/16839 Tested-by: Alexey Serbin <[email protected]> Reviewed-by: Andrew Wong <[email protected]> (cherry picked from commit ae45cd1) Conflicts: src/kudu/client/client-test.cc src/kudu/client/client.h src/kudu/client/meta_cache.cc src/kudu/client/meta_cache.h (cherry picked from commit b845d2fcbfd3e2c7ac1ae22d5d7fe75ebefb63dc) Reviewed-on: http://gerrit.cloudera.org:8080/16863 Tested-by: Kudu Jenkins
asfgit
pushed a commit
that referenced
this pull request
Feb 10, 2021
When run in FIPS mode CryptoTest.RsaPrivateKeyInputOutputPEM test fails as described in KUDU-3207 due to the use of PKCS #8 instead of the expected PKCS #1. This patch disables the test when run in FIPS mode until we can standardize the RSA private key format. Change-Id: I2cf4a9286d1e3e9000c359fa69e27ef42d91ae88 Reviewed-on: http://gerrit.cloudera.org:8080/17051 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Feb 17, 2021
I saw the following TSAN report while running one of the newer tests introduced recently (TxnOpDispatcherITest.LifecycleBasic). As I can see, the race isn't related to the TxnOpDispatcher itself, but rather to the way how Master::state_ field is used in the master's code. WARNING: ThreadSanitizer: data race (pid=8313) Read of size 4 at 0x7b78000006b0 by thread T165: #0 kudu::master::Master::InitTxnManagerTask() src/kudu/master/master.cc:312:9 (libmaster.so+0x26f28b) #1 kudu::master::Master::ScheduleTxnManagerInit()::$_1::operator()() const src/kudu/master/master.cc:299:46 (libmaster.so+0x2741a1) Previous write of size 4 at 0x7b78000006b0 by main thread: #0 kudu::master::Master::ShutdownImpl() src/kudu/master/master.cc:406:12 (libmaster.so+0x26da08) #1 kudu::master::Master::Shutdown() src/kudu/master/master.h:74:5 (libmaster.so+0x275d49) #2 kudu::master::MiniMaster::Shutdown() src/kudu/master/mini_master.cc:117:14 (libmaster.so+0x2a9c38) #3 kudu::cluster::InternalMiniCluster::ShutdownNodes(kudu::cluster::ClusterNodes) src/kudu/mini-cluster/internal_mini_cluster.cc:230:22 (libmini_cluster.so+0x84037) #4 kudu::cluster::MiniCluster::Shutdown() src/kudu/mini-cluster/mini_cluster.h:79:5 (libitest_util.so+0xa59d4) #5 kudu::cluster::InternalMiniCluster::~InternalMiniCluster() src/kudu/mini-cluster/internal_mini_cluster.cc:94:3 (libmini_cluster.so+0x82a03) This patch fixes the issue by making Master::state_ atomic. Change-Id: Ifa7541aa7dc7dbdb8e6af5c1f40edf23b850fc92 Reviewed-on: http://gerrit.cloudera.org:8080/17076 Tested-by: Kudu Jenkins Reviewed-by: Andrew Wong <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Mar 8, 2021
This patch fixes an issue resulting in a SIGABRT crash in Kudu client when working with stale scan tokens which contain information about tablet locations for a table (see KUDU-1802) whose range partition was dropped. The patch also adds a test scenario reproducing the crash; now it passes and can catch future regressions. This patch is a follow-up to d23ee5d. Prior the change in src/kudu/client/meta_cache.cc was back-ported from Kudu 1.14 as part of this fix, the scenario crashed with SIGABRT when running with the stack trace similar to the following (this one below was captured on macOS): * frame #0: 0x00007fff7035833a libsystem_kernel.dylib`__pthread_kill + 10 frame #1: 0x00007fff70414e60 libsystem_pthread.dylib`pthread_kill + 430 frame #2: 0x00007fff702df808 libsystem_c.dylib`abort + 120 frame #3: 0x000000010ca1a259 libglog.0.dylib`google::logging_fail() at logging.cc:1474:3 frame #4: 0x000000010ca19121 libglog.0.dylib`google::LogMessage::SendToLog() [inlined] google::LogMessage::Fail() at logging.cc:1488:3 frame #5: 0x000000010ca1911b libglog.0.dylib`google::LogMessage::SendToLog() at logging.cc:1442 frame #6: 0x000000010ca19815 libglog.0.dylib`google::LogMessage::Flush() at logging.cc:1311:5 frame #7: 0x000000010ca1d76f libglog.0.dylib`google::LogMessageFatal::~LogMessageFatal() at logging.cc:2023:5 frame #8: 0x000000010ca1a5f9 libglog.0.dylib`google::LogMessageFatal::~LogMessageFatal() at logging.cc:2022:37 frame #9: 0x0000000103e365e3 libkudu_client.dylib`std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, kudu::client::internal::MetaCacheEntry, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, kudu::client::internal::MetaCacheEntry> > >::mapped_type& FindOrDie<std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, kudu::client::internal::MetaCacheEntry, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, kudu::client::internal::MetaCacheEntry> > > >() at map-util.h:109:3 frame #10: 0x0000000103e34cbb libkudu_client.dylib`kudu::client::internal::MetaCache::ProcessGetTableLocationsResponse() at meta_cache.cc:943:23 frame #11: 0x0000000103e86166 libkudu_client.dylib`kudu::client::KuduScanToken::Data::PBIntoScanner() at scan_token-internal.cc:192:35 frame #12: 0x0000000103e88051 libkudu_client.dylib`kudu::client::KuduScanToken::Data::DeserializeIntoScanner() at scan_token-internal.cc:111:10 frame #13: 0x0000000103d55d3c libkudu_client.dylib`kudu::client::KuduScanToken::DeserializeIntoScanner() at client.cc:1879:10 Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19 Reviewed-on: http://gerrit.cloudera.org:8080/17152 Tested-by: Alexey Serbin <[email protected]> Reviewed-by: Andrew Wong <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Mar 8, 2021
This patch fixes an issue resulting in a SIGABRT crash in Kudu client when working with stale scan tokens which contain information about tablet locations for a table (see KUDU-1802) whose range partition was dropped. The patch also adds a test scenario reproducing the crash; now it passes and can catch future regressions. This patch is a follow-up to d23ee5d. Prior the change in src/kudu/client/meta_cache.cc was back-ported from Kudu 1.14 as part of this fix, the scenario crashed with SIGABRT when running with the stack trace similar to the following (this one below was captured on macOS): * frame #0: 0x00007fff7035833a libsystem_kernel.dylib`__pthread_kill + 10 frame #1: 0x00007fff70414e60 libsystem_pthread.dylib`pthread_kill + 430 frame #2: 0x00007fff702df808 libsystem_c.dylib`abort + 120 frame #3: 0x000000010ca1a259 libglog.0.dylib`google::logging_fail() at logging.cc:1474:3 frame #4: 0x000000010ca19121 libglog.0.dylib`google::LogMessage::SendToLog() [inlined] google::LogMessage::Fail() at logging.cc:1488:3 frame #5: 0x000000010ca1911b libglog.0.dylib`google::LogMessage::SendToLog() at logging.cc:1442 frame #6: 0x000000010ca19815 libglog.0.dylib`google::LogMessage::Flush() at logging.cc:1311:5 frame #7: 0x000000010ca1d76f libglog.0.dylib`google::LogMessageFatal::~LogMessageFatal() at logging.cc:2023:5 frame #8: 0x000000010ca1a5f9 libglog.0.dylib`google::LogMessageFatal::~LogMessageFatal() at logging.cc:2022:37 frame #9: 0x0000000103e365e3 libkudu_client.dylib`std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, kudu::client::internal::MetaCacheEntry, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, kudu::client::internal::MetaCacheEntry> > >::mapped_type& FindOrDie<std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, kudu::client::internal::MetaCacheEntry, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, kudu::client::internal::MetaCacheEntry> > > >() at map-util.h:109:3 frame #10: 0x0000000103e34cbb libkudu_client.dylib`kudu::client::internal::MetaCache::ProcessGetTableLocationsResponse() at meta_cache.cc:943:23 frame #11: 0x0000000103e86166 libkudu_client.dylib`kudu::client::KuduScanToken::Data::PBIntoScanner() at scan_token-internal.cc:192:35 frame #12: 0x0000000103e88051 libkudu_client.dylib`kudu::client::KuduScanToken::Data::DeserializeIntoScanner() at scan_token-internal.cc:111:10 frame #13: 0x0000000103d55d3c libkudu_client.dylib`kudu::client::KuduScanToken::DeserializeIntoScanner() at client.cc:1879:10 Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19 Reviewed-on: http://gerrit.cloudera.org:8080/17152 Tested-by: Alexey Serbin <[email protected]> Reviewed-by: Andrew Wong <[email protected]> (cherry picked from commit 7c8dca6) Conflicts: src/kudu/client/meta_cache.cc src/kudu/client/scan_token-test.cc Reviewed-on: http://gerrit.cloudera.org:8080/17158 Tested-by: Kudu Jenkins Reviewed-by: Grant Henke <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Mar 19, 2021
This patch fixes a race between calling ThreadPoolToken::Submit() and destructing the token concurrently from another thread. The patch moves the update of the thread pool token's queue length metric under the lock used in the ThreadPool::DoSubmit() method. The motivation for this patch was seeing the following TSAN report while running Params/ScanYourWritesParamTest.Test/1: WARNING: ThreadSanitizer: data race (pid=18290) Write of size 8 at 0x7b2c0002b0e8 by main thread: #0 operator delete(void*) #1 std::__1::default_delete<kudu::ThreadPoolToken>::operator()(kudu::ThreadPoolToken*) #2 std::__1::unique_ptr<kudu::ThreadPoolToken, std::__1::default_delete<kudu::ThreadPoolToken> >::reset(kudu::ThreadPoolToken*) #3 std::__1::unique_ptr<kudu::ThreadPoolToken, std::__1::default_delete<kudu::ThreadPoolToken> >::~unique_ptr() #4 kudu::consensus::RaftConsensus::~RaftConsensus() src/kudu/consensus/raft_consensus.cc:210:1 #5 ... #6 std::__1::__shared_count::__release_shared() #7 std::__1::__shared_weak_count::__release_shared() #8 std::__1::shared_ptr<kudu::consensus::RaftConsensus>::~shared_ptr() #9 kudu::tablet::TabletReplica::~TabletReplica() src/kudu/tablet/tablet_replica.cc:195:1 ..... Previous read of size 8 at 0x7b2c0002b0e8 by thread T125: #0 scoped_refptr<kudu::Histogram>::operator kudu::Histogram* scoped_refptr<kudu::Histogram>::*() const #1 kudu::ThreadPool::DoSubmit(std::__1::function<void ()>, kudu::ThreadPoolToken*) src/kudu/util/threadpool.cc:523:7 #2 kudu::ThreadPoolToken::Submit(std::__1::function<void ()>) src/kudu/util/threadpool.cc:124:17 #3 kudu::consensus::Peer::SignalRequest(bool) src/kudu/consensus/consensus_peers.cc:188:28 #4 kudu::consensus::Peer::Init()::$_0::operator()() src/kudu/consensus/consensus_peers.cc:161:14 #5 ... #6 ... #7 ... #8 ... #9 ... #10 ... #11 kudu::rpc::PeriodicTimer::Callback(long) ..... Change-Id: I2b17e4b2b634624fbc51e8ee05749a56f6609f62 Reviewed-on: http://gerrit.cloudera.org:8080/17205 Tested-by: Alexey Serbin <[email protected]> Reviewed-by: Grant Henke <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Apr 8, 2021
It previously possible for a CommitTask to be destructed before completing the loop of scheduling all asynchronous tasks. This led to a race as seen below: WARNING: ThreadSanitizer: data race (pid=32435) Write of size 8 at 0x7b1c000ce2d8 by thread T105 (mutexes: write M424881254664896540): #0 std::__1::__vector_base<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >::__destruct_at_end(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*) /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/vector:427:12 (txn_commit-itest+0x576cb1) #1 std::__1::__vector_base<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >::clear() /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/vector:369:29 (txn_commit-itest+0x5770d1) #2 std::__1::__vector_base<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >::~__vector_base() /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/vector:463:9 (txn_commit-itest+0x59caf9) #3 std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >::~vector() /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/vector:555:5 (libtransactions.so+0x8c2a0) #4 kudu::transactions::CommitTasks::~CommitTasks() ../src/kudu/transactions/txn_status_manager.h:177:26 (libtransactions.so+0xcce8b) #5 kudu::RefCountedThreadSafe<kudu::transactions::CommitTasks, kudu::DefaultRefCountedThreadSafeTraits<kudu::transactions::CommitTasks> >::DeleteInternal(kudu::transactions::CommitTasks const*) ../src/kudu/gutil/ref_counted.h:153:44 (libtransactions.so+0xcce1a) #6 kudu::DefaultRefCountedThreadSafeTraits<kudu::transactions::CommitTasks>::Destruct(kudu::transactions::CommitTasks const*) ../src/kudu/gutil/ref_counted.h:116:5 (libtransactions.so+0xccdc8) #7 kudu::RefCountedThreadSafe<kudu::transactions::CommitTasks, kudu::DefaultRefCountedThreadSafeTraits<kudu::transactions::CommitTasks> >::Release() const ../src/kudu/gutil/ref_counted.h:144:7 (libtransactions.so+0xccd70) #8 scoped_refptr<kudu::transactions::CommitTasks>::~scoped_refptr() ../src/kudu/gutil/ref_counted.h:266:13 (libtransactions.so+0xbf785) #9 std::__1::pair<long const, scoped_refptr<kudu::transactions::CommitTasks> >::~pair() /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/utility:315:29 (libtransactions.so+0xc7652) #10 void std::__1::allocator_traits<std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, void*> > >::__destroy<std::__1::pair<long const, scoped_refptr<kudu::transactions::CommitTasks> > >(std::__1::integral_constant<bool, false>, std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, void*> >&, std::__1::pair<long const, scoped_refptr<kudu::transactions::CommitTasks> >*) /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/memory:1747:23 (libtransactions.so+0xc7614) #11 void std::__1::allocator_traits<std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, void*> > >::destroy<std::__1::pair<long const, scoped_refptr<kudu::transactions::CommitTasks> > >(std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, void*> >&, std::__1::pair<long const, scoped_refptr<kudu::transactions::CommitTasks> >*) /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/memory:1595:14 (libtransactions.so+0xc7518) #12 std::__1::__hash_node_destructor<std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, void*> > >::operator()(std::__1::__hash_node<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, void*>*) /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/__hash_table:844:13 (libtransactions.so+0xc740d) #13 std::__1::unique_ptr<std::__1::__hash_node<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, void*>, std::__1::__hash_node_destructor<std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, void*> > > >::reset(std::__1::__hash_node<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, void*>*) /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/memory:2593:7 (libtransactions.so+0xc72e0) #14 std::__1::unique_ptr<std::__1::__hash_node<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, void*>, std::__1::__hash_node_destructor<std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, void*> > > >::~unique_ptr() /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/memory:2547:19 (libtransactions.so+0xc6cbc) #15 std::__1::__hash_table<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, std::__1::__unordered_map_hasher<long, std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, std::__1::hash<long>, true>, std::__1::__unordered_map_equal<long, std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, std::__1::equal_to<long>, true>, std::__1::allocator<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> > > >::erase(std::__1::__hash_const_iterator<std::__1::__hash_node<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, void*>*>) /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/__hash_table:2598:5 (libtransactions.so+0xc676e) #16 std::__1::unordered_map<long, scoped_refptr<kudu::transactions::CommitTasks>, std::__1::hash<long>, std::__1::equal_to<long>, std::__1::allocator<std::__1::pair<long const, scoped_refptr<kudu::transactions::CommitTasks> > > >::erase(std::__1::__hash_map_iterator<std::__1::__hash_iterator<std::__1::__hash_node<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, void*>*> >) /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/unordered_map:1193:57 (libtransactions.so+0xc5b40) #17 kudu::transactions::TxnStatusManager::RemoveCommitTask(long, kudu::transactions::CommitTasks const*) ../src/kudu/transactions/txn_status_manager.h:433:26 (libtransactions.so+0xbefc6) #18 kudu::transactions::CommitTasks::IsShuttingDownCleanupIfLastOp() ../src/kudu/transactions/txn_status_manager.cc:181:28 (libtransactions.so+0x97dea) #19 kudu::transactions::CommitTasks::AbortTxnAsyncTask(int)::$_2::operator()(kudu::Status const&) const ../src/kudu/transactions/txn_status_manager.cc:319:9 (libtransactions.so+0xaefd6) #20 decltype(std::__1::forward<kudu::transactions::CommitTasks::AbortTxnAsyncTask(int)::$_2&>(fp)(std::__1::forward<kudu::Status const&>(fp0))) std::__1::__invoke<kudu::transactions::CommitTasks::AbortTxnAsyncTask(int)::$_2&, kudu::Status const&>(kudu::transactions::CommitTasks::AbortTxnAsyncTask(int)::$_2&, kudu::Status const&) /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/type_traits:3530:1 (libtransactions.so+0xaeefd) #21 void std::__1::__invoke_void_return_wrapper<void>::__call<kudu::transactions::CommitTasks::AbortTxnAsyncTask(int)::$_2&, kudu::Status const&>(kudu::transactions::CommitTasks::AbortTxnAsyncTask(int)::$_2&, kudu::Status const&) /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/__functional_base:348:9 (libtransactions.so+0xaee3d) #22 std::__1::__function::__alloc_func<kudu::transactions::CommitTasks::AbortTxnAsyncTask(int)::$_2, std::__1::allocator<kudu::transactions::CommitTasks::AbortTxnAsyncTask(int)::$_2>, void (kudu::Status const&)>::operator()(kudu::Status const&) /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/functional:1533:16 (libtransactions.so+0xaedbd) #23 std::__1::__function::__func<kudu::transactions::CommitTasks::AbortTxnAsyncTask(int)::$_2, std::__1::allocator<kudu::transactions::CommitTasks::AbortTxnAsyncTask(int)::$_2>, void (kudu::Status const&)>::operator()(kudu::Status const&) /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/functional:1707:12 (libtransactions.so+0xad06c) #24 std::__1::__function::__value_func<void (kudu::Status const&)>::operator()(kudu::Status const&) const /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/functional:1860:16 (libmaster.so+0x32ca24) #25 std::__1::function<void (kudu::Status const&)>::operator()(kudu::Status const&) const /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/functional:2419:12 (libmaster.so+0x31d80b) #26 kudu::transactions::ParticipantRpc::Finish(kudu::Status const&) ../src/kudu/transactions/participant_rpc.cc:227:3 (libtransactions.so+0x7f3e7) ... Previous read of size 8 at 0x7b1c000ce2d8 by thread T186 (mutexes: read M322424363142217872): #0 std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >::size() const /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/vector:656:46 (libtransactions.so+0x8d2f9) #1 kudu::transactions::CommitTasks::AbortTxnAsync() ../src/kudu/transactions/txn_status_manager.cc:365:42 (libtransactions.so+0x989d2) #2 kudu::transactions::TxnStatusManager::BeginAbortTransaction(long, boost::optional<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > const&, kudu::tserver::TabletServerErrorPB*) ../src/kudu/transactions/txn_status_manager.cc:1219:25 (libtransactions.so+0xa3cc6) #3 kudu::transactions::CommitTasks::ScheduleBeginAbortTxnWrite()::$_3::operator()() const ../src/kudu/transactions/txn_status_manager.cc:378:3 (libtransactions.so+0xb245d) #4 decltype(std::__1::forward<kudu::transactions::CommitTasks::ScheduleBeginAbortTxnWrite()::$_3&>(fp)()) std::__1::__invoke<kudu::transactions::CommitTasks::ScheduleBeginAbortTxnWrite()::$_3&>(kudu::transactions::CommitTasks::ScheduleBeginAbortTxnWrite()::$_3&) /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/type_traits:3530:1 (libtransactions.so+0xb2180) #5 void std::__1::__invoke_void_return_wrapper<void>::__call<kudu::transactions::CommitTasks::ScheduleBeginAbortTxnWrite()::$_3&>(kudu::transactions::CommitTasks::ScheduleBeginAbortTxnWrite()::$_3&) /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/__functional_base:348:9 (libtransactions.so+0xb20e0) #6 std::__1::__function::__alloc_func<kudu::transactions::CommitTasks::ScheduleBeginAbortTxnWrite()::$_3, std::__1::allocator<kudu::transactions::CommitTasks::ScheduleBeginAbortTxnWrite()::$_3>, void ()>::operator()() /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/functional:1533:16 (libtransactions.so+0xb2080) #7 std::__1::__function::__func<kudu::transactions::CommitTasks::ScheduleBeginAbortTxnWrite()::$_3, std::__1::allocator<kudu::transactions::CommitTasks::ScheduleBeginAbortTxnWrite()::$_3>, void ()>::operator()() /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/functional:1707:12 (libtransactions.so+0xb042f) #8 std::__1::__function::__value_func<void ()>::operator()() const /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/functional:1860:16 (libtserver_test_util.so+0x58396) #9 std::__1::function<void ()>::operator()() const /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/functional:2419:12 (libtserver_test_util.so+0x58098) ... This patch fixes this by caching the size before iterating. Prior to this patch, the test failed in TSAN mode 3/100 times. With this patch, it passed 1000/1000 times. Change-Id: Ic974354b300f2a6c1b04505e740249273f33b80c Reviewed-on: http://gerrit.cloudera.org:8080/17283 Reviewed-by: Alexey Serbin <[email protected]> Tested-by: Kudu Jenkins
asfgit
pushed a commit
that referenced
this pull request
Apr 15, 2021
We recently added a few test cases where the client negotiation fails with this error (which is what we expect): GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Server kudu/[email protected] not found in Kerberos database) Apparently SASL doesn't allocate enough memory for this error message in some cases which causes these tests to be flaky with a ~20% error rate with AddressSanitizer enabled: ==9298==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60e00003e2d6 at pc 0x000000530bf4 bp 0x7f8eb50ad0f0 sp 0x7f8eb50ac8a0 READ of size 151 at 0x60e00003e2d6 thread T88 (client-negotiat) #0 0x530bf3 in __interceptor_strlen.part.35 sanitizer_common/sanitizer_common_interceptors.inc:365:5 #1 0x7f8ee6ad9ee8 in std::basic_ostream<char, std::char_traits<char> >& std::operator<<<std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*) (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x113ee8) #2 0x7f8eeb7c9c9b in kudu::rpc::SaslLogCallback(void*, int, char const*) ../src/kudu/rpc/sasl_common.cc:102:29 #3 0x7f8eeb30241c in sasl_seterror (/tmp/dist-test-taskexUtyr/build/dist-test-system-libs/libsasl2.so.3+0x1441c) #4 0x7f8edd8f143d in _init (/tmp/dist-test-taskexUtyr/build/dist-test-system-libs/sasl2/libgssapiv2.so+0x243d) #5 0x7f8edd8f2452 in _init (/tmp/dist-test-taskexUtyr/build/dist-test-system-libs/sasl2/libgssapiv2.so+0x3452) #6 0x7f8eeb2f7844 in sasl_client_step (/tmp/dist-test-taskexUtyr/build/dist-test-system-libs/libsasl2.so.3+0x9844) #7 0x7f8eeb2f7bc5 in sasl_client_start (/tmp/dist-test-taskexUtyr/build/dist-test-system-libs/libsasl2.so.3+0x9bc5) #8 0x7f8eeb678679 in kudu::rpc::ClientNegotiation::SendSaslInitiate()::$_1::operator()() const ../src/kudu/rpc/client_negotiation.cc:594:14 #9 0x7f8eeb67831c in std::_Function_handler<int (), kudu::rpc::ClientNegotiation::SendSaslInitiate()::$_1>::_M_invoke(std::_Any_data const&) ../../../include/c++/8/bits/std_function.h:282:9 #10 0x7f8ef3b28220 in std::function<int ()>::operator()() const ../../../include/c++/8/bits/std_function.h:687:14 #11 0x7f8eeb7c5840 in kudu::rpc::WrapSaslCall(sasl_conn*, std::function<int ()> const&, char const*) ../src/kudu/rpc/sasl_common.cc:341:12 #12 0x7f8eeb67363b in kudu::rpc::ClientNegotiation::SendSaslInitiate() ../src/kudu/rpc/client_negotiation.cc:593:20 #13 0x7f8eeb66e0c7 in kudu::rpc::ClientNegotiation::AuthenticateBySasl(kudu::faststring*, std::unique_ptr<kudu::rpc::ErrorStatusPB, std::default_delete<kudu::rpc::ErrorStatusPB> >*) ../src/kudu/rpc/client_negotiation.cc:523:14 #14 0x7f8eeb667b99 in kudu::rpc::ClientNegotiation::Negotiate(std::unique_ptr<kudu::rpc::ErrorStatusPB, std::default_delete<kudu::rpc::ErrorStatusPB> >*) ../src/kudu/rpc/client_negotiation.cc:220:7 #15 0x7f8eeb715027 in kudu::rpc::DoClientNegotiation(kudu::rpc::Connection*, kudu::TriStateFlag, kudu::TriStateFlag, kudu::MonoTime, std::unique_ptr<kudu::rpc::ErrorStatusPB, std::default_delete<kudu::rpc::ErrorStatusPB> >*) ../src/kudu/rpc/negotiation.cc:218:3 #16 0x7f8eeb712095 in kudu::rpc::Negotiation::RunNegotiation(scoped_refptr<kudu::rpc::Connection> const&, kudu::TriStateFlag, kudu::TriStateFlag, kudu::MonoTime) ../src/kudu/rpc/negotiation.cc:295:9 #17 0x7f8eeb74d4ad in kudu::rpc::ReactorThread::StartConnectionNegotiation(scoped_refptr<kudu::rpc::Connection> const&)::$_1::operator()() const ../src/kudu/rpc/reactor.cc:614:3 #18 0x7f8eeb74d06c in std::_Function_handler<void (), kudu::rpc::ReactorThread::StartConnectionNegotiation(scoped_refptr<kudu::rpc::Connection> const&)::$_1>::_M_invoke(std::_Any_data const&) ../../../include/c++/8/bits/std_function.h:297:2 #19 0x71b760 in std::function<void ()>::operator()() const ../../../include/c++/8/bits/std_function.h:687:14 #20 0x7f8ee917d03d in kudu::ThreadPool::DispatchThread() ../src/kudu/util/threadpool.cc:669:7 #21 0x7f8ee91817dc in kudu::ThreadPool::CreateThread()::$_1::operator()() const ../src/kudu/util/threadpool.cc:742:48 #22 0x7f8ee918162c in std::_Function_handler<void (), kudu::ThreadPool::CreateThread()::$_1>::_M_invoke(std::_Any_data const&) ../../../include/c++/8/bits/std_function.h:297:2 #23 0x71b760 in std::function<void ()>::operator()() const ../../../include/c++/8/bits/std_function.h:687:14 #24 0x7f8ee915660a in kudu::Thread::SuperviseThread(void*) ../src/kudu/util/thread.cc:674:3 #25 0x7f8eec6106da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da) #26 0x7f8ee64de71e in clone (/lib/x86_64-linux-gnu/libc.so.6+0x12171e) 0x60e00003e2d6 is located 0 bytes to the right of 150-byte region [0x60e00003e240,0x60e00003e2d6) allocated by thread T88 (client-negotiat) here: #0 0x5a4bb8 in malloc /home/abukor/src/kudu/thirdparty/src/llvm-9.0.0.src/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:145:3 #1 0x7f8eeb2fa1df in _buf_alloc (/tmp/dist-test-taskexUtyr/build/dist-test-system-libs/libsasl2.so.3+0xc1df) This patch suppresses address sanitizer errors in sasl_seterror(). Change-Id: Ie66e1f14c9750b13676c7e28e6439057a5e73341 Reviewed-on: http://gerrit.cloudera.org:8080/17317 Tested-by: Attila Bukor <[email protected]> Reviewed-by: Alexey Serbin <[email protected]> Reviewed-by: Grant Henke <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Jun 25, 2021
This patch updates the signature of the TxnSystemClient::CoordinateTransactionAsync() method to pass MonoDelta by value. In addition, I added an extra DCHECK() into MonoTime::AddDelta(). The motivation for this change was seeing the following UBSAN warning in one of the pre-commit builds [1]: #0 0x7f48d11e8f3c in kudu::MonoTime::AddDelta(kudu::MonoDelta const&) src/kudu/util/monotime.cc:218:10 #1 0x7f48d11e9eee in kudu::operator+(kudu::MonoTime const&, kudu::MonoDelta const&) src/kudu/util/monotime.cc:333:7 #2 0x7f48e0d846c1 in kudu::transactions::TxnSystemClient::CoordinateTransactionAsync(kudu::tserver::CoordinatorOpPB, kudu::MonoDelta const&, std::function<void (kudu::Status const&)> const&, kudu::tserver::CoordinatorOpResultPB*) src/kudu/transactions/txn_system_client.cc:331:45 #3 0x7f48e0d86f76 in kudu::transactions::TxnSystemClient::KeepTransactionAlive(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, kudu::MonoDelta) src/kudu/transactions/txn_system_client.cc:320:3 #4 0x7f48e24c62b9 in kudu::transactions::TxnManager::KeepTransactionAlive(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, kudu::MonoTime const&) src/kudu/master/txn_manager.cc:238:27 #5 0x7f48e24ca36f in kudu::transactions::TxnManagerServiceImpl::KeepTransactionAlive(kudu::transactions::KeepTransactionAliveRequestPB const*, kudu::transactions::KeepTransactionAliveResponsePB*, kudu::rpc::RpcContext*) src/kudu/master/txn_manager_service.cc:159:42 #6 0x7f48d7224b0e in std::function<void (google::protobuf::Message const*, google::protobuf::Message*, kudu::rpc::RpcContext*)>::operator()(google::protobuf::Message const*, google::protobuf::Message*, kudu::rpc::RpcContext*) const ../../../include/c++/7.5.0/bits/std_function.h:706:14 #7 0x7f48d7223afc in kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*) src/kudu/rpc/service_if.cc:137:3 #8 0x7f48d7229a9d in kudu::rpc::ServicePool::RunThread() src/kudu/rpc/service_pool.cc:232:15 #9 0x7f48d1290c3a in kudu::Thread::SuperviseThread(void*) src/kudu/util/thread.cc:674:3 #10 0x7f48d3a046da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da) #11 0x7f48cd55771e in clone (/lib/x86_64-linux-gnu/libc.so.6+0x12171e) SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior src/kudu/util/monotime.cc:218:10 [1] http://dist-test.cloudera.org/job?job_id=jenkins-slave.1623914260.1110749 Change-Id: I36ba521a3bb7a4ca42a5dc8d383f5d8b6309154d Reviewed-on: http://gerrit.cloudera.org:8080/17611 Tested-by: Kudu Jenkins Reviewed-by: Abhishek Chennaka <[email protected]> Reviewed-by: Andrew Wong <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Nov 3, 2021
This patch addresses the following UB found in a pre-commit: /home/jenkins-slave/workspace/kudu-master/1/src/kudu/util/monotime.cc:220:10: runtime error: signed integer overflow: 271833850110 + 9223372036854775807 cannot be represented in type 'long' #0 0x7f225fca9b31 in kudu::MonoTime::AddDelta(kudu::MonoDelta const&) /home/jenkins-slave/workspace/kudu-master/1/src/kudu/util/monotime.cc:220:10 #1 0x7f225fcaaafe in kudu::operator+(kudu::MonoTime const&, kudu::MonoDelta const&) /home/jenkins-slave/workspace/kudu-master/1/src/kudu/util/monotime.cc:335:7 #2 0x7f226fa3d6ff in kudu::transactions::TxnSystemClient::CoordinateTransactionAsync(kudu::tserver::CoordinatorOpPB, kudu::MonoDelta, std::function<void (kudu::Status const&)> const&, kudu::tserver::CoordinatorOpResultPB*) /home/jenkins-slave/workspace/kudu-master/1/src/kudu/transactions/txn_system_client.cc:331:45 #3 0x7f226fa3feca in kudu::transactions::TxnSystemClient::KeepTransactionAlive(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, kudu::MonoDelta) /home/jenkins-slave/workspace/kudu-master/1/src/kudu/transactions/txn_system_client.cc:320:3 #4 0x7f2271211629 in kudu::transactions::TxnManager::KeepTransactionAlive(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, kudu::MonoTime const&) /home/jenkins-slave/workspace/kudu-master/1/src/kudu/master/txn_manager.cc:238:27 #5 0x7f227121535f in kudu::transactions::TxnManagerServiceImpl::KeepTransactionAlive(kudu::transactions::KeepTransactionAliveRequestPB const*, kudu::transactions::KeepTransactionAliveResponsePB*, kudu::rpc::RpcContext*) /home/jenkins-slave/workspace/kudu-master/1/src/kudu/master/txn_manager_service.cc:159:42 #6 0x7f2265d5749e in std::function<void (google::protobuf::Message const*, google::protobuf::Message*, kudu::rpc::RpcContext*)>::operator()(google::protobuf::Message const*, google::protobuf::Message*, kudu::rpc::RpcContext*) const ../../../include/c++/7.5.0/bits/std_function.h:706:14 #7 0x7f2265d5648c in kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*) /home/jenkins-slave/workspace/kudu-master/1/src/kudu/rpc/service_if.cc:137:3 #8 0x7f2265d5c42d in kudu::rpc::ServicePool::RunThread() /home/jenkins-slave/workspace/kudu-master/1/src/kudu/rpc/service_pool.cc:232:15 #9 0x7f225fd596ba in kudu::Thread::SuperviseThread(void*) /home/jenkins-slave/workspace/kudu-master/1/src/kudu/util/thread.cc:674:3 #10 0x7f22625026da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da) #11 0x7f225bfea71e in clone (/lib/x86_64-linux-gnu/libc.so.6+0x12171e) SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /home/jenkins-slave/workspace/kudu-master/1/src/kudu/util/monotime.cc:220:10 in Previously, we converted an initial deadline to a timeout, potentially rejiggering the value in case of the maximal timeout, and then recomputed the deadline. This patch addresses the UB by addressing a TODO to pass deadlines in the context of the TxnSystemClient instead of timeouts. Change-Id: I1e5d4d06e8c0801c7f6b2399f7622e6f039f988e Reviewed-on: http://gerrit.cloudera.org:8080/17993 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin <[email protected]> Reviewed-by: Abhishek Chennaka <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
May 2, 2023
Since the original implementation stored the random choice for replica selection integer in a variable that was initialized statically, the corresponding calls to libstdc++/libc++ runtime had been issued before the process called the main() function. That means some SSE4.2-specific instructions might be called since libkudu_client is unconditionally compiled with -msse4.2 flag, and there'd been no chance to call KuduClientBuilder::Build() that would verify the required features are present by calling CheckCPUFlags(). As a result, an attempt to run an application linked with kudu_client library at a machine lacking SSE4.2 support would result in a crash with SIGILL signal and a stack trace like below: #0 0x00007fc4b1b58162 in std::mersenne_twister_engine<...>::_M_gen_rand at include/c++/7.5.0/bits/random.tcc:408 #1 std::mersenne_twister_engine<...>::operator() at include/c++/7.5.0/bits/random.tcc:459 #2 0x00007fc4b1b1d65d in kudu::client::(anonymous namespace)::InitRandomSelectionInt at ../../../../../src/kudu/client/client-internal.cc:196 #3 0x00007fc4b1b1d6ef in __static_initialization_and_destruction_0 at ../../../../../src/kudu/client/client-internal.cc:198 #4 _GLOBAL__sub_I_client_internal.cc(void) at ../../../../../src/kudu/client/client-internal.cc:871 This patch addresses that deficiency, so now instead of unexpectedly crashing, the application would return an error upon at attempt to create an instance of KuduClient object. This is a follow-up to ccbbfb3. Change-Id: I11c2a29ef69a8c97c68330d261fdff64accebb0b Reviewed-on: http://gerrit.cloudera.org:8080/19828 Reviewed-by: Abhishek Chennaka <[email protected]> Reviewed-by: Wenzhe Zhou <[email protected]> Tested-by: Alexey Serbin <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
May 31, 2023
Since the original implementation stored the random choice for replica selection integer in a variable that was initialized statically, the corresponding calls to libstdc++/libc++ runtime had been issued before the process called the main() function. That means some SSE4.2-specific instructions might be called since libkudu_client is unconditionally compiled with -msse4.2 flag, and there'd been no chance to call KuduClientBuilder::Build() that would verify the required features are present by calling CheckCPUFlags(). As a result, an attempt to run an application linked with kudu_client library at a machine lacking SSE4.2 support would result in a crash with SIGILL signal and a stack trace like below: #0 0x00007fc4b1b58162 in std::mersenne_twister_engine<...>::_M_gen_rand at include/c++/7.5.0/bits/random.tcc:408 #1 std::mersenne_twister_engine<...>::operator() at include/c++/7.5.0/bits/random.tcc:459 #2 0x00007fc4b1b1d65d in kudu::client::(anonymous namespace)::InitRandomSelectionInt at ../../../../../src/kudu/client/client-internal.cc:196 #3 0x00007fc4b1b1d6ef in __static_initialization_and_destruction_0 at ../../../../../src/kudu/client/client-internal.cc:198 #4 _GLOBAL__sub_I_client_internal.cc(void) at ../../../../../src/kudu/client/client-internal.cc:871 This patch addresses that deficiency, so now instead of unexpectedly crashing, the application would return an error upon at attempt to create an instance of KuduClient object. This is a follow-up to ccbbfb3. Change-Id: I11c2a29ef69a8c97c68330d261fdff64accebb0b Reviewed-on: http://gerrit.cloudera.org:8080/19828 Reviewed-by: Abhishek Chennaka <[email protected]> Reviewed-by: Wenzhe Zhou <[email protected]> Tested-by: Alexey Serbin <[email protected]> Reviewed-on: http://gerrit.cloudera.org:8080/19948 Reviewed-by: Yingchun Lai <[email protected]> Tested-by: Kudu Jenkins Reviewed-by: Yuqi Du <[email protected]> Reviewed-by: Yifan Zhang <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Oct 6, 2023
This update helps to prevent SIGSEGV in libunwind when running Kudu on aarch64 (in particular, Graviton3 instances in EC2). An example of stack trace looked like below, and it's similar to the stack mentioned in [1]: #0 access_mem (as=0x3304418 <local_addr_space>, addr=7745970402396146688, val=0xfffff325ca18, write=0, arg=0xfffff325ce70) at thirdparty/src/libunwind-1.6.2/src/aarch64/Ginit.c:337 #1 0x0000000000a97ac0 in is_plt_entry (c=0xfffff325ce70) at thirdparty/src/libunwind-1.6.2/src/aarch64/Gstep.c:43 #2 0x0000000000a97fdc in _ULaarch64_step (cursor=0xfffff325ce70) at thirdparty/src/libunwind-1.6.2/src/aarch64/Gstep.c:171 #3 0x00000000025050c8 in kudu::StackTrace::Collect ( this=this@entry=0xfffff325d7d8, skip_frames=skip_frames@entry=0) at src/kudu/util/debug-util.cc:612 #4 0x0000000002507f64 in kudu::StackTrace::Collect ( this=this@entry=0xfffff325d7d8, skip_frames=skip_frames@entry=0) at src/kudu/util/debug-util.cc:579 [1] libunwind/libunwind#260 Change-Id: Ie34dc56f78abba537aa15dd3d9c0540157d9afa3 Reviewed-on: http://gerrit.cloudera.org:8080/20540 Tested-by: Kudu Jenkins Reviewed-by: Michael Smith <[email protected]> Reviewed-by: Mahesh Reddy <[email protected]> Reviewed-by: Abhishek Chennaka <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Oct 7, 2023
This update helps to prevent SIGSEGV in libunwind when running Kudu on aarch64 (in particular, Graviton3 instances in EC2). An example of stack trace looked like below, and it's similar to the stack mentioned in [1]: #0 access_mem (as=0x3304418 <local_addr_space>, addr=7745970402396146688, val=0xfffff325ca18, write=0, arg=0xfffff325ce70) at thirdparty/src/libunwind-1.6.2/src/aarch64/Ginit.c:337 #1 0x0000000000a97ac0 in is_plt_entry (c=0xfffff325ce70) at thirdparty/src/libunwind-1.6.2/src/aarch64/Gstep.c:43 #2 0x0000000000a97fdc in _ULaarch64_step (cursor=0xfffff325ce70) at thirdparty/src/libunwind-1.6.2/src/aarch64/Gstep.c:171 #3 0x00000000025050c8 in kudu::StackTrace::Collect ( this=this@entry=0xfffff325d7d8, skip_frames=skip_frames@entry=0) at src/kudu/util/debug-util.cc:612 #4 0x0000000002507f64 in kudu::StackTrace::Collect ( this=this@entry=0xfffff325d7d8, skip_frames=skip_frames@entry=0) at src/kudu/util/debug-util.cc:579 [1] libunwind/libunwind#260 Change-Id: Ie34dc56f78abba537aa15dd3d9c0540157d9afa3 Reviewed-on: http://gerrit.cloudera.org:8080/20540 Tested-by: Kudu Jenkins Reviewed-by: Michael Smith <[email protected]> Reviewed-by: Mahesh Reddy <[email protected]> Reviewed-by: Abhishek Chennaka <[email protected]> (cherry picked from commit dd5fd45) Reviewed-on: http://gerrit.cloudera.org:8080/20542
asfgit
pushed a commit
that referenced
this pull request
Oct 11, 2023
Running various tests on aarch64 (Graviton3) under ASAN produced warnings like below: src/kudu/gutil/bits.h:19:42: runtime error: unsigned integer overflow: 134678536 * 16843009 cannot be represented in type 'unsigned int' #0 0xffffa1ebd8d4 in Bits::CountOnes(unsigned int) src/kudu/gutil/bits.h:19:42 #1 0xffffa1ebd830 in Bits::CountOnes64(unsigned long) src/kudu/gutil/bits.h:30:12 #2 0xffffa1ebd7f8 in Bits::CountOnes64withPopcount(unsigned long) src/kudu/gutil/bits.h:43:12 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior src/kudu/gutil/bits.h:19:42 This patch addresses the issue. Change-Id: I47bff62676ee57706d6b5ef841e3891bba5a62fa Reviewed-on: http://gerrit.cloudera.org:8080/20558 Reviewed-by: Marton Greber <[email protected]> Tested-by: Alexey Serbin <[email protected]> Reviewed-by: Abhishek Chennaka <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Oct 15, 2023
Running various tests on aarch64 (Graviton3) under ASAN produced warnings like below: src/kudu/gutil/bits.h:19:42: runtime error: unsigned integer overflow: 134678536 * 16843009 cannot be represented in type 'unsigned int' #0 0xffffa1ebd8d4 in Bits::CountOnes(unsigned int) src/kudu/gutil/bits.h:19:42 #1 0xffffa1ebd830 in Bits::CountOnes64(unsigned long) src/kudu/gutil/bits.h:30:12 #2 0xffffa1ebd7f8 in Bits::CountOnes64withPopcount(unsigned long) src/kudu/gutil/bits.h:43:12 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior src/kudu/gutil/bits.h:19:42 This patch addresses the issue. Change-Id: I47bff62676ee57706d6b5ef841e3891bba5a62fa Reviewed-on: http://gerrit.cloudera.org:8080/20558 Reviewed-by: Marton Greber <[email protected]> Tested-by: Alexey Serbin <[email protected]> Reviewed-by: Abhishek Chennaka <[email protected]> (cherry picked from commit 8aab39e) Reviewed-on: http://gerrit.cloudera.org:8080/20573 Tested-by: Kudu Jenkins Reviewed-by: Yingchun Lai <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Mar 29, 2024
A Kudu server might start its shutdown sequence while other thread is collecting the server's metrics. If that happens, a data race might manifest itself while fetching the 'rpc_pending_connections' metric. Running one of the tests under TSAN reproduced such a race with the report below. This patch addresses the data race issue. In addition, I took the liberty of optimizing the instantiation and initialization of DiagnosticSocket instances used to retrieve the information on number of pending RPC connections, so now the diagnostic sockets are instantiated and initialized once per AcceptorPool instance. This is a follow-up to c0c44a8. WARNING: ThreadSanitizer: data race Read of size 8 at 0x7b4c00002f78 by thread T63 (mutexes: write M558018781209703984): #0 std::__1::vector<std::__1::shared_ptr<kudu::rpc::AcceptorPool>, std::__1::allocator<std::__1::shared_ptr<kudu::rpc::AcceptorPool> > >::begin() thirdparty/installed/tsan/include/c++/v1/vector:1520:30 (libkrpc.so+0x1642b9) #1 kudu::rpc::Messenger::GetPendingConnectionsNum() src/kudu/rpc/messenger.cc:171:22 (libkrpc.so+0x15f6fb) ... #14 kudu::MetricRegistry::WriteAsJson(kudu::JsonWriter*, kudu::MetricJsonOptions const&) const src/kudu/util/metrics.cc:566:7 (libkudu_util.so+0x3ab82c) ... #17 kudu::server::DiagnosticsLog::Start()::$_0::operator()() const src/kudu/server/diagnostics_log.cc:145:46 (libserver_process.so+0x118361) ... Previous write of size 8 at 0x7b4c00002f78 by main thread (mutexes: write M4638925457023032): #0 memset sanitizer_common/sanitizer_common_interceptors.inc:780:3 (kudu+0x454d16) #1 memset sanitizer_common/sanitizer_common_interceptors.inc:778:1 (kudu+0x454d16) #2 std::__1::vector<std::__1::shared_ptr<kudu::rpc::AcceptorPool>, std::__1::allocator<std::__1::shared_ptr<kudu::rpc::AcceptorPool> > >::__move_assign(std::__1::vector<std::__1::shared_ptr<kudu::rpc::AcceptorPool>, std::__1::allocator<std::__1::shared_ptr<kudu::rpc::AcceptorPool> > >&, std::__1::integral_constant<bool, true>) thirdparty/installed/tsan/include/c++/v1/vector:1392:18 (libkrpc.so+0x16a840) ... #4 kudu::rpc::Messenger::ShutdownInternal(kudu::rpc::Messenger::ShutdownMode) src/kudu/rpc/messenger.cc:213:23 (libkrpc.so+0x15f509) ... Change-Id: I6aaf3373944eac86664ac62db3b7e6151c874539 Reviewed-on: http://gerrit.cloudera.org:8080/21224 Tested-by: Alexey Serbin <[email protected]> Reviewed-by: Abhishek Chennaka <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
May 10, 2024
It turned out that auto leader rebalancing task wasn't explicitly shutdown upon shutting down catalog manager. That lead to race conditions as reported by TSAN, at least in test scenarios (see below). This patch addresses the issue. WARNING: ThreadSanitizer: data race (pid=23827) Write of size 1 at 0x7b4000008208 by main thread: #0 AnnotateRWLockDestroy thirdparty/src/llvm-11.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interface_ann.cpp:264 (auto_rebalancer-test+0x33575e) #1 kudu::rw_spinlock::~rw_spinlock() src/kudu/util/locks.h:89:5 (libmaster.so+0x359376) #2 kudu::master::TSManager::~TSManager() src/kudu/master/ts_manager.cc:108:1 (libmaster.so+0x4ad201) #3 kudu::master::TSManager::~TSManager() src/kudu/master/ts_manager.cc:107:25 (libmaster.so+0x4ad229) #4 std::__1::default_delete<kudu::master::TSManager>::operator()(kudu::master::TSManager*) const thirdparty/installed/tsan/include/c++/v1/memory:2262:5 (libmaster.so+0x407ce7) #5 std::__1::unique_ptr<kudu::master::TSManager, std::__1::default_delete<kudu::master::TSManager> >::reset(kudu::master::TSManager*) thirdparty/installed/tsan/include/c++/v1/memory:2517:7 (libmaster.so+0x40157d) #6 std::__1::unique_ptr<kudu::master::TSManager, std::__1::default_delete<kudu::master::TSManager> >::~unique_ptr() thirdparty/installed/tsan/include/c++/v1/memory:2471:19 (libmaster.so+0x4015eb) #7 kudu::master::Master::~Master() src/kudu/master/master.cc:263:1 (libmaster.so+0x3f7a4a) #8 kudu::master::Master::~Master() src/kudu/master/master.cc:261:19 (libmaster.so+0x3f7dc9) #9 std::__1::default_delete<kudu::master::Master>::operator()(kudu::master::Master*) const thirdparty/installed/tsan/include/c++/v1/memory:2262:5 (libmaster.so+0x435627) #10 std::__1::unique_ptr<kudu::master::Master, std::__1::default_delete<kudu::master::Master> >::reset(kudu::master::Master*) thirdparty/installed/tsan/include/c++/v1/memory:2517:7 (libmaster.so+0x42e6ed) #11 kudu::master::MiniMaster::Shutdown() src/kudu/master/mini_master.cc:120:13 (libmaster.so+0x4c2612) ... Previous atomic write of size 4 at 0x7b4000008208 by thread T439 (mutexes: write M1141235379631443968): #0 __tsan_atomic32_compare_exchange_strong thirdparty/src/llvm-11.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cpp:780 (auto_rebalancer-test+0x33eb60) #1 base::subtle::Release_CompareAndSwap(int volatile*, int, int) /src/kudu/gutil/atomicops-internals-tsan.h:88:3 (libmaster.so+0x2e2b34) #2 kudu::rw_semaphore::unlock_shared() src/kudu/util/rw_semaphore.h:91:19 (libmaster.so+0x2e29c8) #3 kudu::rw_spinlock::unlock_shared() src/kudu/util/locks.h:99:10 (libmaster.so+0x2e28ef) #4 std::__1::shared_lock<kudu::rw_spinlock>::~shared_lock() /thirdparty/installed/tsan/include/c++/v1/shared_mutex:369:19 (libmaster.so+0x2e23e0) #5 kudu::master::TSManager::GetAllDescriptors(std::__1::vector<std::__1::shared_ptr<kudu::master::TSDescriptor>, std::__1::allocator<std::__1::shared_ptr<kudu::master::TSDescriptor> > >*) const src/kudu/master/ts_manager.cc:206:1 (libmaster.so+0x4adeb6) #6 kudu::master::AutoLeaderRebalancerTask::RunLeaderRebalancer() src/kudu/master/auto_leader_rebalancer.cc:405:16 (libmaster.so+0x2fb51b) #7 kudu::master::AutoLeaderRebalancerTask::RunLoop() src/kudu/master/auto_leader_rebalancer.cc:445:7 (libmaster.so+0x2fbaa9) This is a follow-up to 10efaf2. Change-Id: Iccd66d00280d22b37386230874937e5260f07f3b Reviewed-on: http://gerrit.cloudera.org:8080/21417 Reviewed-by: Wang Xixu <[email protected]> Tested-by: Alexey Serbin <[email protected]> Reviewed-by: Yifan Zhang <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
May 14, 2024
This patch addresses a race reported by TSAN with traces like below: WARNING: ThreadSanitizer: data race (pid=11024) Write of size 8 at 0x7b580011f260 by thread T174: #0 kudu::tablet::OpState::set_start_time(kudu::MonoTime) src/kudu/tablet/ops/op.h:274:58 #1 kudu::tablet::WriteOp::Start() src/kudu/tablet/ops/write_op.cc:273:11 #2 kudu::tablet::OpDriver::Prepare() src/kudu/tablet/ops/op_driver.cc:329:7 #3 kudu::tablet::OpDriver::PrepareTask() src/kudu/tablet/ops/op_driver.cc:249:31 ... Previous read of size 8 at 0x7b580011f260 by thread T5 (mutexes: write M835553159786377312): #0 kudu::tablet::OpState::start_time() const src/kudu/tablet/ops/op.h:272:40 #1 kudu::tablet::WriteOp::ToString() const src/kudu/tablet/ops/write_op.cc:378:36 #2 kudu::tablet::OpDriver::ToStringUnlocked() const src/kudu/tablet/ops/op_driver.cc:209:23 #3 kudu::tablet::OpDriver::ToString() const src/kudu/tablet/ops/op_driver.cc:203:10 #4 kudu::tablet::TabletReplica::GetInFlightOps(...) const src/kudu/tablet/tablet_replica.cc:728:41 #5 kudu::tserver::TabletServerPathHandlers::HandleTransactionsPage(...) src/kudu/tserver/tserver_path_handlers.cc:286:14 ... Change-Id: I52de0840aa20f64cf15c7a9da2d553257c7e85e7 Reviewed-on: http://gerrit.cloudera.org:8080/21427 Tested-by: Kudu Jenkins Reviewed-by: Abhishek Chennaka <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Sep 14, 2024
The race condition was reported by the TSAN like the following (with some information omitted): WARNING: ThreadSanitizer: data race (pid=1924273) Write of size 8 at 0x7b30002fe7c0 by thread T6 (mutexes: write M247597861, write M247597860, write M247597300): #0 std::__1::enable_if<(...), void>::type std::__1::swap<kudu::BlockId*>(...) thirdparty/installed/tsan/include/c++/v1/type_traits:4076:9 ... #4 kudu::tablet::RowSetMetadata::CommitRedoDeltaDataBlock(...) src/kudu/tablet/rowset_metadata.cc:197:22 #5 kudu::tablet::DeltaTracker::FlushDMS(...) src/kudu/tablet/delta_tracker.cc:826:23 #6 kudu::tablet::DeltaTracker::Flush(...) src/kudu/tablet/delta_tracker.cc:877:14 #7 kudu::tablet::DiskRowSet::FlushDeltas(...) src/kudu/tablet/diskrowset.cc:552:26 ... Previous read of size 8 at 0x7b30002fe7c0 by thread T34 (mutexes: write M247598319, write M919714229363433616, write M303002710007881612): #0 std::__1::vector<...>::size() const thirdparty/installed/tsan/include/c++/v1/vector:658:61 #1 kudu::tablet::RowSetMetadata::GetAllBlocks() const src/kudu/tablet/rowset_metadata.cc:306:37 #2 kudu::tablet::TabletMetadata::UpdateUnlocked(...) src/kudu/tablet/tablet_metadata.cc:677:40 #3 kudu::tablet::TabletMetadata::UpdateAndFlush(...) src/kudu/tablet/tablet_metadata.cc:549:5 #4 kudu::tablet::Tablet::FlushMetadata(...) src/kudu/tablet/tablet.cc:1992:21 #5 kudu::tablet::Tablet::HandleEmptyCompactionOrFlush() src/kudu/tablet/tablet.cc:2308:3 #6 kudu::tablet::Tablet::DeleteAncientDeletedRowsets() src/kudu/tablet/tablet.cc:3084:3 ... Change-Id: I07103269526d0ee98b0bb19e76e11f7d47a5b217 Reviewed-on: http://gerrit.cloudera.org:8080/21799 Reviewed-by: Abhishek Chennaka <[email protected]> Tested-by: Alexey Serbin <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Sep 19, 2024
This patch fixes a race in access to the RowSetMetadata::id_ field in the rollback scenario in the MajorCompactDeltaStoresWithColumnIds() method of the DiskRowSet class. Before this patch, TSAN would report warnings like below when running the MultiThreadedHybridClockTabletTest.UpdateNoMergeCompaction scenario: of the mt-tablet-test: Read of size 8 at 0x7b3400014780 by thread T30 (mutexes: write M76293278759445 9152, write M7098002): #0 kudu::tablet::RowSetMetadata::id() const src/kudu/tablet/rowset_metadata.h:100:31 (libtablet.so+0x346faa) #1 kudu::tablet::RowSetTree::Reset(...) src/kudu/tablet/rowset_tree.cc:190:48 (libtablet.so+0x4bf666) #2 kudu::tablet::Tablet::ModifyRowSetTree(...) src/kudu/tablet/tablet.cc:1490:3 (libtablet.so+0x323755) #3 kudu::tablet::Tablet::AtomicSwapRowSetsUnlocked(...) src/kudu/tablet/tablet.cc:1504:3 (libtablet.so+0x3239bc) #4 kudu::tablet::Tablet::AtomicSwapRowSets(...) src/kudu/tablet/tablet.cc:1496:3 (libtablet.so+0x3238f9) ... Previous write of size 8 at 0x7b3400014780 by thread T12 (mutexes: write M625572878699880144, write M530715863088620288, write M525367769810683784): #0 kudu::tablet::RowSetMetadata::LoadFromPB(...) src/kudu/tablet/rowset_metadata.cc:77:7 (libtablet.so+0x4f9f03) #1 kudu::tablet::DiskRowSet::MajorCompactDeltaStoresWithColumnIds(...)::$_0::operator()() const src/kudu/tablet/diskrowset.cc:603:23 (libtablet.so+0x46eddf) #2 kudu::ScopedCleanup<kudu::tablet::DiskRowSet::MajorCompactDeltaStoresWithColumnIds(...)::$_0>::~ScopedCleanup() src/kudu/util/scoped_cleanup.h:51:7 (libtablet.so+0x46cc5a) #3 kudu::tablet::DiskRowSet::MajorCompactDeltaStoresWithColumnIds(...) src/kudu/tablet/diskrowset.cc:636:1 (libtablet.so+0x46c5c9) #4 kudu::tablet::DiskRowSet::MajorCompactDeltaStores(...) src/kudu/tablet/diskrowset.cc:570:10 (libtablet.so+0x46c013) ... SUMMARY: ThreadSanitizer: data race src/kudu/tablet/rowset_metadata.h:100:31 in kudu::tablet::RowSetMetadata::id() const Change-Id: I4b09575616e754b7dbb24586293f128e361b9360 Reviewed-on: http://gerrit.cloudera.org:8080/21779 Reviewed-by: Mahesh Reddy <[email protected]> Tested-by: Alexey Serbin <[email protected]> Reviewed-by: Yingchun Lai <[email protected]>
asfgit
pushed a commit
that referenced
this pull request
Feb 4, 2025
The thread pool of the DNS resolver should be shut down along with the messenger in ServerBase to prevent retrying of RPCs that failed as a collateral of the shutdown process in progress. Those RPCs might be retried by invoking rpc::Proxy::RefreshDnsAndEnqueueRequest(), etc. On the related note, I also added a guard to protect ThreadPool::tokens_ in the destructor of the ThreadPool class, as elsewhere. I also snuck in an update to call DCHECK() in a loop only when DCHECK_IS_ON() macro evaluates to 'true'. This addresses flakiness reported at least in one of the RemoteKsckTest scenarios (e.g., TestFilterOnNotabletTable in [1]). One of the related TSAN reports looked like below: RemoteKsckTest.TestFilterOnNotabletTable: WARNING: ThreadSanitizer: data race Read of size 8 at 0x7b54001e5118 by main thread: #0 std::__1::__hash_table<kudu::ThreadPoolToken*, ...>::size() const #1 std::__1::unordered_set<kudu::ThreadPoolToken*, ...>::size() const #2 kudu::ThreadPool::~ThreadPool() ... #6 kudu::kserver::KuduServer::~KuduServer() #7 kudu::tserver::TabletServer::~TabletServer() ... Previous write of size 8 at 0x7b54001e5118 by thread T262 ...: #0 std::__1::__hash_table<kudu::ThreadPoolToken*, ...>::remove(...) ... #4 kudu::ThreadPool::ReleaseToken(...) #5 kudu::ThreadPoolToken::~ThreadPoolToken() ... #24 kudu::consensus::LeaderElection::~LeaderElection() ... #35 kudu::rpc::Proxy::RefreshDnsAndEnqueueRequest(...) ... #41 kudu::DnsResolver::RefreshAddressesAsync() ... Thread T262 'dns-resolver [w' (tid=29102, running) created by thread T182 at: #0 pthread_create #1 kudu::Thread::StartThread(...) #2 kudu::Thread::Create(...) #3 kudu::ThreadPool::CreateThread() #4 kudu::ThreadPool::DoSubmit(..., kudu::ThreadPoolToken*) #5 kudu::ThreadPool::Submit(...) #6 kudu::DnsResolver::RefreshAddressesAsync(..) #7 kudu::rpc::Proxy::RefreshDnsAndEnqueueRequest(...) #8 kudu::rpc::Proxy::AsyncRequest(...) ... #15 kudu::rpc::OutboundCall::CallCallback() #16 kudu::rpc::OutboundCall::SetFailed() #17 kudu::rpc::Connection::Shutdown() #18 kudu::rpc::ReactorThread::ShutdownInternal() ... #25 kudu::rpc::ReactorThread::RunThread() ... [1] http://dist-test.cloudera.org:8080/test_drilldown?test_name=ksck_remote-test Change-Id: I525f1078a349dbd2926938bb4fcc3e80888dfbb4 Reviewed-on: http://gerrit.cloudera.org:8080/22434 Tested-by: Alexey Serbin <[email protected]> Reviewed-by: Abhishek Chennaka <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I noticed that this link was broken so I replaced it with a link to a copy provided by the other author