Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] Lock inversion in master/snapshot codepath #14677

Closed
amitanandaiyer opened this issue Oct 26, 2022 · 1 comment
Closed

[DocDB] Lock inversion in master/snapshot codepath #14677

amitanandaiyer opened this issue Oct 26, 2022 · 1 comment
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@amitanandaiyer
Copy link
Contributor

amitanandaiyer commented Oct 26, 2022

Jira Link: DB-4037

Description

Extracted from SnapshotScheduleTest_RemoveNewTablets.log
in
https://detective-gcp.dev.yugabyte.com/D20426

==================
  Cycle in lock order graph: M0 (0x7b7800099f08) => M1 (0x7b4c000621f0) => M0

  Mutex M1 acquired here while holding mutex M0 in thread T8:

    #4 yb::consensus::RaftConsensus::ConsensusState(yb::consensus::ConsensusConfigType, yb::consensus::LeaderLeaseStatus*) const ${BUILD_ROOT}/../../src/yb/consensus/raft_consensus.cc:3127:23 (libconsensus.so+0x10d239)
    #5 yb::master::CatalogManager::CheckIsLeaderAndReady() const ${BUILD_ROOT}/../../src/yb/master/catalog_manager.cc:1897:40 (libmaster.so+0x35ecdc)
    #6 yb::master::enterprise::CatalogManager::CollectEntries(google::protobuf::RepeatedPtrField<yb::master::TableIdentifierPB> const&, yb::EnumBitSet<yb::master::CollectFlag>) ${BUILD_ROOT}/../../ent/src/yb/master/catalog_manager_ent.cc:784:3 (libmaster.so+0x6aeff3)


  Mutex M0 previously acquired by the same thread here:
    #4 yb::master::CatalogManager::CheckIsLeaderAndReady() const ${BUILD_ROOT}/../../src/yb/master/catalog_manager.cc:1882:36 (libmaster.so+0x35eb46)
    #5 yb::master::enterprise::CatalogManager::CollectEntries(google::protobuf::RepeatedPtrField<yb::master::TableIdentifierPB> const&, yb::EnumBitSet<yb::master::CollectFlag>) ${BUILD_ROOT}/../../ent/src/yb/master/catalog_manager_ent.cc:784:3 (libmaster.so+0x6aeff3)
    #6 yb::master::enterprise::CatalogManager::CollectEntriesForSnapshot(google::protobuf::RepeatedPtrField<yb::master::TableIdentifierPB> const&) ${BUILD_ROOT}/../../ent/src/yb/master/catalog_manager_ent.cc:814:27 (libmaster.so+0x6af82d)


  Mutex M0 acquired here while holding mutex M1 in thread T285:
    #4 yb::master::enterprise::CatalogManager::PrepareRestore() ${BUILD_ROOT}/../../ent/src/yb/master/catalog_manager_ent.cc:7956:36 (libmaster.so+0x72b234)
    #5 non-virtual thunk to yb::master::enterprise::CatalogManager::PrepareRestore() ${BUILD_ROOT}/../../ent/src/yb/master/catalog_manager_ent.cc (libmaster.so+0x72b300)
    #6 yb::master::MasterSnapshotCoordinator::Impl::RestoreSysCatalogReplicated(long, yb::tablet::SnapshotOperation const&, yb::Status*) ${BUILD_ROOT}/../../src/yb/master/master_snapshot_coordinator.cc:501:16 (libmaster.so+0x5a0872)


  Mutex M1 previously acquired by the same thread here:

    #4 yb::consensus::RaftConsensus::UpdateMajorityReplicated(yb::consensus::MajorityReplicatedData const&, yb::OpId*, yb::OpId*) ${BUILD_ROOT}/../../src/yb/consensus/raft_consensus.cc:1344:22 (libconsensus.so+0xf6b71)
    #5 non-virtual thunk to yb::consensus::RaftConsensus::UpdateMajorityReplicated(yb::consensus::MajorityReplicatedData const&, yb::OpId*, yb::OpId*) ${BUILD_ROOT}/../../src/yb/consensus/raft_consensus.cc (libconsensus.so+0xf8144)
    #6 yb::consensus::PeerMessageQueue::NotifyObserversOfMajorityReplOpChangeTask(yb::consensus::MajorityReplicatedData const&) ${BUILD_ROOT}/../../src/yb/consensus/consensus_queue.cc:1610:15 (libconsensus.so+0xaf8d7)
    #7 yb::internal::RunnableAdapter<void (yb::consensus::PeerMessageQueue::*)(yb::consensus::MajorityReplicatedData const&)>::Run(yb::consensus::PeerMessageQueue*, yb::consensus::MajorityReplicatedData const&) ${BUILD_ROOT}/../../src/yb/gutil/bind_internal.h:205:12 (libconsensus.so+0xc086f)


==================
  Cycle in lock order graph: M0 (0x7b7800099f08) => M1 (0x7b4c000621f0) => M2 (0x7b5400000290) => M0

  Mutex M1 acquired here while holding mutex M0 in thread T8:

    #4 yb::consensus::RaftConsensus::ConsensusState(yb::consensus::ConsensusConfigType, yb::consensus::LeaderLeaseStatus*) const ${BUILD_ROOT}/../../src/yb/consensus/raft_consensus.cc:3127:23 (libconsensus.so+0x10d239)
    #5 yb::master::CatalogManager::CheckIsLeaderAndReady() const ${BUILD_ROOT}/../../src/yb/master/catalog_manager.cc:1897:40 (libmaster.so+0x35ecdc)
    #6 yb::master::enterprise::CatalogManager::CollectEntries(google::protobuf::RepeatedPtrField<yb::master::TableIdentifierPB> const&, yb::EnumBitSet<yb::master::CollectFlag>) ${BUILD_ROOT}/../../ent/src/yb/master/catalog_manager_ent.cc:784:3 (libmaster.so+0x6aeff3)


  Mutex M0 previously acquired by the same thread here:
    #4 yb::master::CatalogManager::CheckIsLeaderAndReady() const ${BUILD_ROOT}/../../src/yb/master/catalog_manager.cc:1882:36 (libmaster.so+0x35eb46)
    #5 yb::master::enterprise::CatalogManager::CollectEntries(google::protobuf::RepeatedPtrField<yb::master::TableIdentifierPB> const&, yb::EnumBitSet<yb::master::CollectFlag>) ${BUILD_ROOT}/../../ent/src/yb/master/catalog_manager_ent.cc:784:3 (libmaster.so+0x6aeff3)
    #6 yb::master::enterprise::CatalogManager::CollectEntriesForSnapshot(google::protobuf::RepeatedPtrField<yb::master::TableIdentifierPB> const&) ${BUILD_ROOT}/../../ent/src/yb/master/catalog_manager_ent.cc:814:27 (libmaster.so+0x6af82d)


  Mutex M2 acquired here while holding mutex M1 in thread T274:

    #3 yb::master::MasterSnapshotCoordinator::Impl::CreateReplicated(long, yb::tablet::SnapshotOperation const&) ${BUILD_ROOT}/../../src/yb/master/master_snapshot_coordinator.cc:249:35 (libmaster.so+0x59ec15)
    #4 yb::master::MasterSnapshotCoordinator::CreateReplicated(long, yb::tablet::SnapshotOperation const&) ${BUILD_ROOT}/../../src/yb/master/master_snapshot_coordinator.cc:1944:17 (libmaster.so+0x59be5a)
    #5 yb::tablet::SnapshotOperation::Apply(long, yb::Status*) ${BUILD_ROOT}/../../src/yb/tablet/operations/snapshot_operation.cc:122:64 (libtablet.so+0x28a4ea)
    #6 yb::tablet::SnapshotOperation::DoReplicated(long, yb::Status*) ${BUILD_ROOT}/../../src/yb/tablet/operations/snapshot_operation.cc:214:3 (libtablet.so+0x28b24a)


  Mutex M1 previously acquired by the same thread here:

    #4 yb::consensus::RaftConsensus::UpdateMajorityReplicated(yb::consensus::MajorityReplicatedData const&, yb::OpId*, yb::OpId*) ${BUILD_ROOT}/../../src/yb/consensus/raft_consensus.cc:1344:22 (libconsensus.so+0xf6b71)
    #5 non-virtual thunk to yb::consensus::RaftConsensus::UpdateMajorityReplicated(yb::consensus::MajorityReplicatedData const&, yb::OpId*, yb::OpId*) ${BUILD_ROOT}/../../src/yb/consensus/raft_consensus.cc (libconsensus.so+0xf8144)
    #6 yb::consensus::PeerMessageQueue::NotifyObserversOfMajorityReplOpChangeTask(yb::consensus::MajorityReplicatedData const&) ${BUILD_ROOT}/../../src/yb/consensus/consensus_queue.cc:1610:15 (libconsensus.so+0xaf8d7)
    #7 yb::internal::RunnableAdapter<void (yb::consensus::PeerMessageQueue::*)(yb::consensus::MajorityReplicatedData const&)>::Run(yb::consensus::PeerMessageQueue*, yb::consensus::MajorityReplicatedData const&) ${BUILD_ROOT}/../../src/yb/gutil/bind_internal.h:205:12 (libconsensus.so+0xc086f)




  Mutex M0 acquired here while holding mutex M2 in thread T286:
    #4 yb::master::CatalogManager::CheckIsLeaderAndReady() const ${BUILD_ROOT}/../../src/yb/master/catalog_manager.cc:1882:36 (libmaster.so+0x35eb46)
    #5 yb::master::enterprise::CatalogManager::CollectEntries(google::protobuf::RepeatedPtrField<yb::master::TableIdentifierPB> const&, yb::EnumBitSet<yb::master::CollectFlag>) ${BUILD_ROOT}/../../ent/src/yb/master/catalog_manager_ent.cc:784:3 (libmaster.so+0x6aeff3)
    #6 yb::master::enterprise::CatalogManager::CollectEntriesForSnapshot(google::protobuf::RepeatedPtrField<yb::master::TableIdentifierPB> const&) ${BUILD_ROOT}/../../ent/src/yb/master/catalog_manager_ent.cc:814:27 (libmaster.so+0x6af82d)




  Mutex M2 previously acquired by the same thread here:

    #3 yb::master::MasterSnapshotCoordinator::Impl::SysCatalogLoaded(long) ${BUILD_ROOT}/../../src/yb/master/master_snapshot_coordinator.cc:845:41 (libmaster.so+0x5a5fef)
    #4 yb::master::MasterSnapshotCoordinator::SysCatalogLoaded(long) ${BUILD_ROOT}/../../src/yb/master/master_snapshot_coordinator.cc:2044:10 (libmaster.so+0x59c75a)
    #5 yb::master::enterprise::CatalogManager::SysCatalogLoaded(long) ${BUILD_ROOT}/../../ent/src/yb/master/catalog_manager_ent.cc:7579:25 (libmaster.so+0x724fdf)
    #6 yb::master::CatalogManager::LoadSysCatalogDataTask() ${BUILD_ROOT}/../../src/yb/master/catalog_manager.cc:1048:3 (libmaster.so+0x34cbfe)


==================
==================
  Cycle in lock order graph: M0 (0x7b4c000621f0) => M1 (0x7b7800099f08) => M0

  Mutex M1 acquired here while holding mutex M0 in thread T285:
    #4 yb::master::enterprise::CatalogManager::PrepareRestore() ${BUILD_ROOT}/../../ent/src/yb/master/catalog_manager_ent.cc:7956:36 (libmaster.so+0x72b234)
    #5 non-virtual thunk to yb::master::enterprise::CatalogManager::PrepareRestore() ${BUILD_ROOT}/../../ent/src/yb/master/catalog_manager_ent.cc (libmaster.so+0x72b300)
    #6 yb::master::MasterSnapshotCoordinator::Impl::RestoreSysCatalogReplicated(long, yb::tablet::SnapshotOperation const&, yb::Status*) ${BUILD_ROOT}/../../src/yb/master/master_snapshot_coordinator.cc:501:16 (libmaster.so+0x5a0872)


  Mutex M0 previously acquired by the same thread here:

    #4 yb::consensus::RaftConsensus::UpdateMajorityReplicated(yb::consensus::MajorityReplicatedData const&, yb::OpId*, yb::OpId*) ${BUILD_ROOT}/../../src/yb/consensus/raft_consensus.cc:1344:22 (libconsensus.so+0xf6b71)
    #5 non-virtual thunk to yb::consensus::RaftConsensus::UpdateMajorityReplicated(yb::consensus::MajorityReplicatedData const&, yb::OpId*, yb::OpId*) ${BUILD_ROOT}/../../src/yb/consensus/raft_consensus.cc (libconsensus.so+0xf8144)
    #6 yb::consensus::PeerMessageQueue::NotifyObserversOfMajorityReplOpChangeTask(yb::consensus::MajorityReplicatedData const&) ${BUILD_ROOT}/../../src/yb/consensus/consensus_queue.cc:1610:15 (libconsensus.so+0xaf8d7)
    #7 yb::internal::RunnableAdapter<void (yb::consensus::PeerMessageQueue::*)(yb::consensus::MajorityReplicatedData const&)>::Run(yb::consensus::PeerMessageQueue*, yb::consensus::MajorityReplicatedData const&) ${BUILD_ROOT}/../../src/yb/gutil/bind_internal.h:205:12 (libconsensus.so+0xc086f)




  Mutex M0 acquired here while holding mutex M1 in thread T8:

    #4 yb::consensus::RaftConsensus::ConsensusState(yb::consensus::ConsensusConfigType, yb::consensus::LeaderLeaseStatus*) const ${BUILD_ROOT}/../../src/yb/consensus/raft_consensus.cc:3127:23 (libconsensus.so+0x10d239)
    #5 yb::master::CatalogManager::CheckIsLeaderAndReady() const ${BUILD_ROOT}/../../src/yb/master/catalog_manager.cc:1897:40 (libmaster.so+0x35ecdc)
    #6 yb::master::enterprise::CatalogManager::CollectEntries(google::protobuf::RepeatedPtrField<yb::master::TableIdentifierPB> const&, yb::EnumBitSet<yb::master::CollectFlag>) ${BUILD_ROOT}/../../ent/src/yb/master/catalog_manager_ent.cc:784:3 (libmaster.so+0x6aeff3)


  Mutex M1 previously acquired by the same thread here:
    #4 yb::master::CatalogManager::CheckIsLeaderAndReady() const ${BUILD_ROOT}/../../src/yb/master/catalog_manager.cc:1882:36 (libmaster.so+0x35eb46)
    #5 yb::master::enterprise::CatalogManager::CollectEntries(google::protobuf::RepeatedPtrField<yb::master::TableIdentifierPB> const&, yb::EnumBitSet<yb::master::CollectFlag>) ${BUILD_ROOT}/../../ent/src/yb/master/catalog_manager_ent.cc:784:3 (libmaster.so+0x6aeff3)
    #6 yb::master::enterprise::CatalogManager::CollectEntriesForSnapshot(google::protobuf::RepeatedPtrField<yb::master::TableIdentifierPB> const&) ${BUILD_ROOT}/../../ent/src/yb/master/catalog_manager_ent.cc:814:27 (libmaster.so+0x6af82d)

@amitanandaiyer amitanandaiyer added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Oct 26, 2022
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Oct 26, 2022
@amitanandaiyer
Copy link
Contributor Author

May also be affecting

15:34 ~/debug/tsan_failures/processed $ grep -l yb::master::enterprise::CatalogManager::PrepareRestore *.log
SnapshotScheduleTest_RemoveNewTablets.log
SnapshotScheduleTest_RestoreSchema.log
YbAdminRestoreDuringSplit_RestoreAfterOneChildRegistered.log
YbAdminRestoreDuringSplit_RestoreBeforeGetSplitKey.log
YbAdminRestoreDuringSplit_RestoreBeforeParentHidden.log
YbAdminRestoreDuringSplit_RestoreBeforeSplitOpIsApplied.log
YbAdminRestoreDuringSplit_VerifyParentNotHiddenPostRestore.log
YbAdminSnapshotScheduleFailoverTests_ClusterRestartDuringRestore.log
YbAdminSnapshotScheduleFailoverTests_ClusterRestartDuringRestoreWithReplayUncommitted.log
YbAdminSnapshotScheduleFailoverTests_LeaderFailoverDuringRestorationRpcs.log
YbAdminSnapshotScheduleFailoverTests_LeaderFailoverDuringSysCatalogRestorationPhase.log
YbAdminSnapshotScheduleTest_AlterTable.log
YbAdminSnapshotScheduleTest_Basic.log
YbAdminSnapshotScheduleTest_CatalogLoadRace.log
YbAdminSnapshotScheduleTest_ConsistentRestore.log
YbAdminSnapshotScheduleTest_ConsistentRestoreFailover.log
YbAdminSnapshotScheduleTest_ConsistentTxnRestore.log
YbAdminSnapshotScheduleTest_DDLsDuringRestore.log
YbAdminSnapshotScheduleTest_DeleteIndexOnRestore.log
YbAdminSnapshotScheduleTest_DisallowForwardRestore.log
YbAdminSnapshotScheduleTest_DropKeyspaceAndSchedule.log
YbAdminSnapshotScheduleTest_RestoreAfterSplit.log
YbAdminSnapshotScheduleTest_RestoreToAfterPreviousCompleteAt.log
YbAdminSnapshotScheduleTest_RestoreToBeforePreviousRestoreAt.log
YbAdminSnapshotScheduleTest_TestGCHiddenTables.log
YbAdminSnapshotScheduleTest_TestVerifyRestorationLogic.log
YbAdminSnapshotScheduleTest_UndeleteIndex.log
YbAdminSnapshotScheduleTest_UndeleteIndexToBackfillTime.log
YbAdminSnapshotScheduleTest_UndeleteTable.log
YbAdminSnapshotScheduleTest_UndeleteTableWithRestart.log
YbAdminSnapshotScheduleTest_VerifyRestoreWithDeletedTablets.log

@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Oct 26, 2022
@yugabyte-ci yugabyte-ci assigned msun07 and unassigned lingamsandeep Oct 26, 2022
msun07 added a commit that referenced this issue Jan 17, 2023
Summary:
Case 1:
M0 == CatalogManager::state_lock_

M1 == ReplicaState::update_lock_ (of sys catalog's tablet peer)

Thread A:
CatalogManager::CollectEntriesForSnapshot()
  CatalogManager::CollectEntries()
    CatalogManager::CheckIsLeaderAndReady()
      std::lock_guard<simple_spinlock> l(state_lock_);  <---- M0 acquired
      RaftConsensus::ConsensusState()
        auto lock = state_->LockForRead();  <---- try to acquire M1

Thread B:
PeerMessageQueue::ResponseFromPeer()
  PeerMessageQueue::NotifyObserversOfMajorityReplOpChange()
    PeerMessageQueue::NotifyObserversOfMajorityReplOpChangeTask()
      RaftConsensus::UpdateMajorityReplicated()
        Status s = state_->LockForMajorityReplicatedIndexUpdate(&lock);  <---- M1 acquired
        ReplicaState::UpdateMajorityReplicatedUnlocked()
          ReplicaState::AdvanceCommittedOpIdUnlocked()
            ReplicaState::ApplyPendingOperationsUnlocked()
              ReplicaState::NotifyReplicationFinishedUnlocked()
                ConsensusRound::NotifyReplicationFinished()

                  OperationDriver::ReplicationFinished()
                    OperationDriver::ApplyTask()
                      Operation::Replicated()
                        SnapshotOperation::DoReplicated()
                          SnapshotOperation::Apply()
                            MasterSnapshotCoordinator::Impl::RestoreSysCatalogReplicated()  <---- TabletSnapshotOpRequestPB::RESTORE_SYS_CATALOG
                              CatalogManager::PrepareRestore()
                                std::lock_guard<simple_spinlock> l(state_lock_);  <---- try to acquire M0

Case 2:
M0 == CatalogManager::state_lock_

M1 == ReplicaState::update_lock_ (of sys catalog's tablet peer)

M2 == MasterSnapshotCoordinator::Impl::mutex_

Thread A:
CatalogManager::CollectEntriesForSnapshot()
  CatalogManager::CollectEntries()
    CatalogManager::CheckIsLeaderAndReady()
      std::lock_guard<simple_spinlock> l(state_lock_);  <---- M0 acquired
      RaftConsensus::ConsensusState()
        auto lock = state_->LockForRead();  <---- try to acquire M1

Thread B:
PeerMessageQueue::ResponseFromPeer()
  PeerMessageQueue::NotifyObserversOfMajorityReplOpChange()
    PeerMessageQueue::NotifyObserversOfMajorityReplOpChangeTask()
      RaftConsensus::UpdateMajorityReplicated()
        Status s = state_->LockForMajorityReplicatedIndexUpdate(&lock);  <---- M1 acquired
        ReplicaState::UpdateMajorityReplicatedUnlocked()
          ReplicaState::AdvanceCommittedOpIdUnlocked()
            ReplicaState::ApplyPendingOperationsUnlocked()
              ReplicaState::NotifyReplicationFinishedUnlocked()
                ConsensusRound::NotifyReplicationFinished()

                  OperationDriver::ReplicationFinished()
                    OperationDriver::ApplyTask()
                      OperationDriver::Replicated()
                        SnapshotOperation::DoReplicated()
                          SnapshotOperation::Apply()
                            MasterSnapshotCoordinator::Impl::CreateReplicated()  <---- TabletSnapshotOpRequestPB::CREATE_ON_MASTER
                              std::lock_guard<std::mutex> lock(mutex_);  <---- try to acquire M2

Thread C:
CatalogManager::LoadSysCatalogDataTask()
  MasterSnapshotCoordinator::Impl::SysCatalogLoaded()
    std::lock_guard<decltype(mutex_)> lock(mutex_);  <---- M2 acquired
    MasterSnapshotCoordinator::Impl::VerifyRestoration()
      CatalogManager::VerifyRestoredObjects()
        CatalogManager::CollectEntriesForSnapshot()
          CatalogManager::CollectEntries()
            CatalogManager::CheckIsLeaderAndReady()
              std::lock_guard<simple_spinlock> l(state_lock_);  <---- try to acquire M0

Above two deadlock cases are observed. The CheckIsLeaderAndReady function holds the state_lock_ in
the entire function scope and then tries to acquire update_lock_ to read the ConsensusStatePB, which
is problematic. Holding state_lock_ is not needed for getting the ConsensusStatePB, so this diff
narrows the lock scope for state_lock_.

Test Plan: N/A

Reviewers: amitanand, skedia, zdrudi

Reviewed By: zdrudi

Subscribers: mbautin, bogdan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D21432
amitanandaiyer added a commit that referenced this issue Feb 3, 2023
Summary:
# Enable deadlock detection in TSAN_OPTIONS
# Some minor fixes for issues found
# Issues related to cassandra-cpp-driver will be fixed with https://phabricator.dev.yugabyte.com/D20480. Until it lands and third-party is updated, we suppress errors from libcassandra
# For MasterSnapshot related issues. opened #14677. Until it lands we suppress MasterSnapshotCoordinator related errors

Test Plan: jenkins

Reviewers: mbautin, bogdan, rthallam

Reviewed By: rthallam

Subscribers: asrivastava, rsami

Differential Revision: https://phabricator.dev.yugabyte.com/D20426
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

4 participants