Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] Crash Observed in TransactionParticipant::Impl::Abort #25689

Closed
1 task done
shamanthchandra-yb opened this issue Jan 20, 2025 · 2 comments
Closed
1 task done

Comments

@shamanthchandra-yb
Copy link

shamanthchandra-yb commented Jan 20, 2025

Jira Link: DB-14948

Description

Version: 2.25.1.0-b203

Packed toggle off/on stress testcase failed because of:

* thread #1, name = 'yb-tserver', stop reason = signal SIGABRT
  * frame #0: 0x00007f4ecfd66acf libc.so.6`raise + 271
    frame #1: 0x00007f4ecfd39ea5 libc.so.6`abort + 295
    frame #2: 0x00005606b0891403 yb-server`abort_message + 195
    frame #3: 0x00005606b0890f9c yb-server`demangling_terminate_handler() + 268
    frame #4: 0x00005606b0890c66 yb-server`std::__terminate(void (*)()) + 6
    frame #5: 0x00005606b0892bab yb-server`__cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*) + 27
    frame #6: 0x00005606b0892b3f yb-server`__cxa_throw + 111
    frame #7: 0x00005606ae47cd1e yb-server`std::__1::__throw_bad_weak_ptr[abi:ue170006]() at shared_ptr.h:137:5
    frame #8: 0x00005606af954382 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] std::__1::shared_ptr<yb::tablet::RunningTransaction>::shared_ptr[abi:ue170006]<yb::tablet::RunningTransaction, void>(this=<unavailable>, __r=std::__1::weak_ptr<yb::tablet::RunningTransaction>::element_type @ 0x0000162400000001) at shared_ptr.h:704:13
    frame #9: 0x00005606af954334 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] std::__1::enable_shared_from_this<yb::tablet::RunningTransaction>::shared_from_this[abi:ue170006](this=0x00001624cd5da018) at shared_ptr.h:1954:17
    frame #10: 0x00005606af954334 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] yb::tablet::RunningTransaction::Abort(this=0x00001624cd5da018, client=0x00001624fd7d7f10, callback=yb::TransactionStatusCallback @ 0x00007f4babc39700, lock=0x00007f4babc396e0)>, std::__1::unique_lock<std::__1::mutex>*) at running_transaction.cc:200:34
    frame #11: 0x00005606af953ccf yb-server`yb::tablet::TransactionParticipant::Impl::Abort(this=<unavailable>, id=<unavailable>, callback=<unavailable>)>) at transaction_participant.cc:707:45
    frame #12: 0x00005606af95d7d4 yb-server`yb::tablet::TransactionParticipant::StopActiveTxnsPriorTo(yb::HybridTime, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, yb::StronglyTypedUuid<yb::TransactionId_Tag>*) at transaction_participant.cc:1355:7
    frame #13: 0x00005606af95d3e5 yb-server`yb::tablet::TransactionParticipant::StopActiveTxnsPriorTo(this=<unavailable>, cutoff=<unavailable>, deadline=yb::CoarseTimePoint @ 0x00007f4babc398b8, exclude_txn_id=<unavailable>) at transaction_participant.cc:2700:17
    frame #14: 0x00005606afc07a35 yb-server`yb::tserver::TabletServiceAdminImpl::AlterSchema(this=0x00001624fb36c020, req=0x00001624fc146320, resp=0x00001624fc1463d0, context=<unavailable>) at tablet_service.cc:1022:65
    frame #15: 0x00005606afd264b2 yb-server`std::__1::__function::__func<yb::tserver::TabletServerAdminServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3, std::__1::allocator<yb::tserver::TabletServerAdminServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] yb::tserver::TabletServerAdminServiceIf::InitMethods(this=<unavailable>, req=<unavailable>, resp=<unavailable>, rpc_context=RpcContext @ 0x00007f4babc39c80)::$_3::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>) const::'lambda'(yb::tablet::ChangeMetadataRequestPB const*, yb::tserver::ChangeMetadataResponsePB*, yb::rpc::RpcContext)::operator()(yb::tablet::ChangeMetadataRequestPB const*, yb::tserver::ChangeMetadataResponsePB*, yb::rpc::RpcContext) const at tserver_admin.service.cc:473:9
    frame #16: 0x00005606afd2647a yb-server`std::__1::__function::__func<yb::tserver::TabletServerAdminServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3, std::__1::allocator<yb::tserver::TabletServerAdminServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) at local_call.h:126:7
    frame #17: 0x00005606afd26054 yb-server`std::__1::__function::__func<yb::tserver::TabletServerAdminServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3, std::__1::allocator<yb::tserver::TabletServerAdminServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] yb::tserver::TabletServerAdminServiceIf::InitMethods(this=<unavailable>, call=<unavailable>)::$_3::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>) const at tserver_admin.service.cc:471:7
    frame #18: 0x00005606afd25fc5 yb-server`std::__1::__function::__func<yb::tserver::TabletServerAdminServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3, std::__1::allocator<yb::tserver::TabletServerAdminServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] decltype(__f=<unavailable>, __args=<unavailable>)::$_3&>()(std::declval<std::__1::shared_ptr<yb::rpc::InboundCall>>())) std::__1::__invoke[abi:ue170006]<yb::tserver::TabletServerAdminServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3&, std::__1::shared_ptr<yb::rpc::InboundCall>>(yb::tserver::TabletServerAdminServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3&, std::__1::shared_ptr<yb::rpc::InboundCall>&&) at invoke.h:340:25
    frame #19: 0x00005606afd25fa4 yb-server`std::__1::__function::__func<yb::tserver::TabletServerAdminServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3, std::__1::allocator<yb::tserver::TabletServerAdminServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] void std::__1::__invoke_void_return_wrapper<void, true>::__call[abi:ue170006]<yb::tserver::TabletServerAdminServiceIf::InitMethods(__args=<unavailable>, __args=<unavailable>)::$_3&, std::__1::shared_ptr<yb::rpc::InboundCall>>(yb::tserver::TabletServerAdminServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3&, std::__1::shared_ptr<yb::rpc::InboundCall>&&) at invoke.h:415:5
    frame #20: 0x00005606afd25fa4 yb-server`std::__1::__function::__func<yb::tserver::TabletServerAdminServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3, std::__1::allocator<yb::tserver::TabletServerAdminServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] std::__1::__function::__alloc_func<yb::tserver::TabletServerAdminServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3, std::__1::allocator<yb::tserver::TabletServerAdminServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator(this=<unavailable>, __arg=<unavailable>)[abi:ue170006](std::__1::shared_ptr<yb::rpc::InboundCall>&&) at function.h:192:16
    frame #21: 0x00005606afd25fa4 yb-server`std::__1::__function::__func<yb::tserver::TabletServerAdminServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3, std::__1::allocator<yb::tserver::TabletServerAdminServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator(this=<unavailable>, __arg=<unavailable>)(std::__1::shared_ptr<yb::rpc::InboundCall>&&) at function.h:363:12
    frame #22: 0x00005606afd28f6f yb-server`yb::tserver::TabletServerAdminServiceIf::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) [inlined] std::__1::__function::__value_func<void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator(this=<unavailable>, __args=nullptr)[abi:ue170006](std::__1::shared_ptr<yb::rpc::InboundCall>&&) const at function.h:517:16
    frame #23: 0x00005606afd28f50 yb-server`yb::tserver::TabletServerAdminServiceIf::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) [inlined] std::__1::function<void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator(this=<unavailable>, __arg=nullptr)(std::__1::shared_ptr<yb::rpc::InboundCall>) const at function.h:1168:12
    frame #24: 0x00005606afd28f50 yb-server`yb::tserver::TabletServerAdminServiceIf::Handle(this=<unavailable>, call=<unavailable>) at tserver_admin.service.cc:411:3
    frame #25: 0x00005606af79e140 yb-server`yb::rpc::ServicePoolImpl::Handle(this=0x00001624fb924240, incoming=<unavailable>) at service_pool.cc:269:19
    frame #26: 0x00005606af6b7e7f yb-server`yb::rpc::InboundCall::InboundCallTask::Run(this=<unavailable>) at inbound_call.cc:317:13
    frame #27: 0x00005606af7ad9b3 yb-server`yb::rpc::(anonymous namespace)::Worker::Execute(this=0x000016255f97d340) at thread_pool.cc:115:15
    frame #28: 0x00005606b0094193 yb-server`yb::Thread::SuperviseThread(void*) [inlined] std::__1::__function::__value_func<void ()>::operator(this=0x00001625bc4ed6e0)[abi:ue170006]() const at function.h:517:16
    frame #29: 0x00005606b009417d yb-server`yb::Thread::SuperviseThread(void*) [inlined] std::__1::function<void ()>::operator(this=0x00001625bc4ed6e0)() const at function.h:1168:12
    frame #30: 0x00005606b009417d yb-server`yb::Thread::SuperviseThread(arg=0x00001625bc4ed680) at thread.cc:895:3
    frame #31: 0x00007f4ecfb001ca libpthread.so.0`start_thread + 234
    frame #32: 0x00007f4ecfd51e73 libc.so.6`__clone + 67

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@shamanthchandra-yb shamanthchandra-yb added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Jan 20, 2025
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue priority/high High Priority and removed priority/medium Medium priority issue labels Jan 20, 2025
@rthallamko3
Copy link
Contributor

Per @basavaraj29 , looks like the root cause is the below

void RunningTransaction::Abort(client::YBClient* client,
                               TransactionStatusCallback callback,
                               std::unique_lock<std::mutex>* lock) {
  ...
  lock->unlock();    // we release the lock, the underlying object could be destroyed, as we don't hold a shared ref.
  ...
  context_.rpcs_.RegisterAndStart(
      client::AbortTransaction(
          ..., 
          [status_tablet, self = shared_from_this(), weak_context = context_.RetainWeak()]( // trying to access shared ref
              const Status& status, const tserver::AbortTransactionResponsePB& response) {
            auto context_lock = weak_context.lock();
            if (!context_lock) {
              return;
            }
            self->AbortReceived(status_tablet, status, response);
          }),
      &abort_handle_);
}

This should instead have been

auto shared_ref = shared_from_this();
lock->unlock();
// can safely use shared_ref below.

@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Jan 21, 2025
basavaraj29 added a commit that referenced this issue Jan 23, 2025
… transactions

Summary:
One of the stress tests faced a crash with the following trace
```
* thread #1, name = 'yb-tserver', stop reason = signal SIGABRT
  * frame #0: 0x00007f4ecfd66acf libc.so.6`raise + 271
    frame #1: 0x00007f4ecfd39ea5 libc.so.6`abort + 295
    frame #2: 0x00005606b0891403 yb-server`abort_message + 195
    frame #3: 0x00005606b0890f9c yb-server`demangling_terminate_handler() + 268
    frame #4: 0x00005606b0890c66 yb-server`std::__terminate(void (*)()) + 6
    frame #5: 0x00005606b0892bab yb-server`__cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*) + 27
    frame #6: 0x00005606b0892b3f yb-server`__cxa_throw + 111
    frame #7: 0x00005606ae47cd1e yb-server`std::__1::__throw_bad_weak_ptr[abi:ue170006]() at shared_ptr.h:137:5
    frame #8: 0x00005606af954382 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] std::__1::shared_ptr<yb::tablet::RunningTransaction>::shared_ptr[abi:ue170006]<yb::tablet::RunningTransaction, void>(this=<unavailable>, __r=std::__1::weak_ptr<yb::tablet::RunningTransaction>::element_type @ 0x0000162400000001) at shared_ptr.h:704:13
    frame #9: 0x00005606af954334 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] std::__1::enable_shared_from_this<yb::tablet::RunningTransaction>::shared_from_this[abi:ue170006](this=0x00001624cd5da018) at shared_ptr.h:1954:17
    frame #10: 0x00005606af954334 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] yb::tablet::RunningTransaction::Abort(this=0x00001624cd5da018, client=0x00001624fd7d7f10, callback=yb::TransactionStatusCallback @ 0x00007f4babc39700, lock=0x00007f4babc396e0)>, std::__1::unique_lock<std::__1::mutex>*) at running_transaction.cc:200:34
    frame #11: 0x00005606af953ccf yb-server`yb::tablet::TransactionParticipant::Impl::Abort(this=<unavailable>, id=<unavailable>, callback=<unavailable>)>) at transaction_participant.cc:707:45
    frame #12: 0x00005606af95d7d4 yb-server`yb::tablet::TransactionParticipant::StopActiveTxnsPriorTo(yb::HybridTime, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, yb::StronglyTypedUuid<yb::TransactionId_Tag>*) at transaction_participant.cc:1355:7
    frame #13: 0x00005606af95d3e5 yb-server`yb::tablet::TransactionParticipant::StopActiveTxnsPriorTo(this=<unavailable>, cutoff=<unavailable>, deadline=yb::CoarseTimePoint @ 0x00007f4babc398b8, exclude_txn_id=<unavailable>) at transaction_participant.cc:2700:17
```

This suggests an issue where the underlying `RunningTransaction` is being destroyed and we are trying to call `shared_from_this()` post that. This happens as we release the transaction participant's lock before creating a shared ref for the `RunningTransaction` instance we are trying to abort.

This diff fixes the issue by creating the shared_ref first before releasing the participant's mutex, and then using it later.
Jira: DB-14948

Test Plan: Jenkins

Reviewers: esheng

Reviewed By: esheng

Subscribers: rthallam, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D41384
basavaraj29 added a commit that referenced this issue Jan 25, 2025
…tempting to abort transactions

Summary:
Original commit: 44a67f1 / D41384
One of the stress tests faced a crash with the following trace
```
* thread #1, name = 'yb-tserver', stop reason = signal SIGABRT
  * frame #0: 0x00007f4ecfd66acf libc.so.6`raise + 271
    frame #1: 0x00007f4ecfd39ea5 libc.so.6`abort + 295
    frame #2: 0x00005606b0891403 yb-server`abort_message + 195
    frame #3: 0x00005606b0890f9c yb-server`demangling_terminate_handler() + 268
    frame #4: 0x00005606b0890c66 yb-server`std::__terminate(void (*)()) + 6
    frame #5: 0x00005606b0892bab yb-server`__cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*) + 27
    frame #6: 0x00005606b0892b3f yb-server`__cxa_throw + 111
    frame #7: 0x00005606ae47cd1e yb-server`std::__1::__throw_bad_weak_ptr[abi:ue170006]() at shared_ptr.h:137:5
    frame #8: 0x00005606af954382 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] std::__1::shared_ptr<yb::tablet::RunningTransaction>::shared_ptr[abi:ue170006]<yb::tablet::RunningTransaction, void>(this=<unavailable>, __r=std::__1::weak_ptr<yb::tablet::RunningTransaction>::element_type @ 0x0000162400000001) at shared_ptr.h:704:13
    frame #9: 0x00005606af954334 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] std::__1::enable_shared_from_this<yb::tablet::RunningTransaction>::shared_from_this[abi:ue170006](this=0x00001624cd5da018) at shared_ptr.h:1954:17
    frame #10: 0x00005606af954334 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] yb::tablet::RunningTransaction::Abort(this=0x00001624cd5da018, client=0x00001624fd7d7f10, callback=yb::TransactionStatusCallback @ 0x00007f4babc39700, lock=0x00007f4babc396e0)>, std::__1::unique_lock<std::__1::mutex>*) at running_transaction.cc:200:34
    frame #11: 0x00005606af953ccf yb-server`yb::tablet::TransactionParticipant::Impl::Abort(this=<unavailable>, id=<unavailable>, callback=<unavailable>)>) at transaction_participant.cc:707:45
    frame #12: 0x00005606af95d7d4 yb-server`yb::tablet::TransactionParticipant::StopActiveTxnsPriorTo(yb::HybridTime, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, yb::StronglyTypedUuid<yb::TransactionId_Tag>*) at transaction_participant.cc:1355:7
    frame #13: 0x00005606af95d3e5 yb-server`yb::tablet::TransactionParticipant::StopActiveTxnsPriorTo(this=<unavailable>, cutoff=<unavailable>, deadline=yb::CoarseTimePoint @ 0x00007f4babc398b8, exclude_txn_id=<unavailable>) at transaction_participant.cc:2700:17
```

This suggests an issue where the underlying `RunningTransaction` is being destroyed and we are trying to call `shared_from_this()` post that. This happens as we release the transaction participant's lock before creating a shared ref for the `RunningTransaction` instance we are trying to abort.

This diff fixes the issue by creating the shared_ref first before releasing the participant's mutex, and then using it later.
Jira: DB-14948

Test Plan: Jenkins

Reviewers: esheng, rthallam

Reviewed By: rthallam

Subscribers: ybase, rthallam

Differential Revision: https://phorge.dev.yugabyte.com/D41455
@rthallamko3
Copy link
Contributor

Reactivating for 2024.1 and 2.20 backports

@rthallamko3 rthallamko3 reopened this Jan 28, 2025
basavaraj29 added a commit that referenced this issue Jan 30, 2025
…tempting to abort transactions

Summary:
Original commit: 44a67f1 / D41384
One of the stress tests faced a crash with the following trace
```
* thread #1, name = 'yb-tserver', stop reason = signal SIGABRT
  * frame #0: 0x00007f4ecfd66acf libc.so.6`raise + 271
    frame #1: 0x00007f4ecfd39ea5 libc.so.6`abort + 295
    frame #2: 0x00005606b0891403 yb-server`abort_message + 195
    frame #3: 0x00005606b0890f9c yb-server`demangling_terminate_handler() + 268
    frame #4: 0x00005606b0890c66 yb-server`std::__terminate(void (*)()) + 6
    frame #5: 0x00005606b0892bab yb-server`__cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*) + 27
    frame #6: 0x00005606b0892b3f yb-server`__cxa_throw + 111
    frame #7: 0x00005606ae47cd1e yb-server`std::__1::__throw_bad_weak_ptr[abi:ue170006]() at shared_ptr.h:137:5
    frame #8: 0x00005606af954382 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] std::__1::shared_ptr<yb::tablet::RunningTransaction>::shared_ptr[abi:ue170006]<yb::tablet::RunningTransaction, void>(this=<unavailable>, __r=std::__1::weak_ptr<yb::tablet::RunningTransaction>::element_type @ 0x0000162400000001) at shared_ptr.h:704:13
    frame #9: 0x00005606af954334 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] std::__1::enable_shared_from_this<yb::tablet::RunningTransaction>::shared_from_this[abi:ue170006](this=0x00001624cd5da018) at shared_ptr.h:1954:17
    frame #10: 0x00005606af954334 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] yb::tablet::RunningTransaction::Abort(this=0x00001624cd5da018, client=0x00001624fd7d7f10, callback=yb::TransactionStatusCallback @ 0x00007f4babc39700, lock=0x00007f4babc396e0)>, std::__1::unique_lock<std::__1::mutex>*) at running_transaction.cc:200:34
    frame #11: 0x00005606af953ccf yb-server`yb::tablet::TransactionParticipant::Impl::Abort(this=<unavailable>, id=<unavailable>, callback=<unavailable>)>) at transaction_participant.cc:707:45
    frame #12: 0x00005606af95d7d4 yb-server`yb::tablet::TransactionParticipant::StopActiveTxnsPriorTo(yb::HybridTime, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, yb::StronglyTypedUuid<yb::TransactionId_Tag>*) at transaction_participant.cc:1355:7
    frame #13: 0x00005606af95d3e5 yb-server`yb::tablet::TransactionParticipant::StopActiveTxnsPriorTo(this=<unavailable>, cutoff=<unavailable>, deadline=yb::CoarseTimePoint @ 0x00007f4babc398b8, exclude_txn_id=<unavailable>) at transaction_participant.cc:2700:17
```

This suggests an issue where the underlying `RunningTransaction` is being destroyed and we are trying to call `shared_from_this()` post that. This happens as we release the transaction participant's lock before creating a shared ref for the `RunningTransaction` instance we are trying to abort.

This diff fixes the issue by creating the shared_ref first before releasing the participant's mutex, and then using it later.
Jira: DB-14948

Test Plan: Jenkins

Reviewers: esheng, rthallam

Reviewed By: esheng

Subscribers: ybase, rthallam

Differential Revision: https://phorge.dev.yugabyte.com/D41530
basavaraj29 added a commit that referenced this issue Jan 30, 2025
…mpting to abort transactions

Summary:
Original commit: 44a67f1 / D41384
One of the stress tests faced a crash with the following trace
```
* thread #1, name = 'yb-tserver', stop reason = signal SIGABRT
  * frame #0: 0x00007f4ecfd66acf libc.so.6`raise + 271
    frame #1: 0x00007f4ecfd39ea5 libc.so.6`abort + 295
    frame #2: 0x00005606b0891403 yb-server`abort_message + 195
    frame #3: 0x00005606b0890f9c yb-server`demangling_terminate_handler() + 268
    frame #4: 0x00005606b0890c66 yb-server`std::__terminate(void (*)()) + 6
    frame #5: 0x00005606b0892bab yb-server`__cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*) + 27
    frame #6: 0x00005606b0892b3f yb-server`__cxa_throw + 111
    frame #7: 0x00005606ae47cd1e yb-server`std::__1::__throw_bad_weak_ptr[abi:ue170006]() at shared_ptr.h:137:5
    frame #8: 0x00005606af954382 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] std::__1::shared_ptr<yb::tablet::RunningTransaction>::shared_ptr[abi:ue170006]<yb::tablet::RunningTransaction, void>(this=<unavailable>, __r=std::__1::weak_ptr<yb::tablet::RunningTransaction>::element_type @ 0x0000162400000001) at shared_ptr.h:704:13
    frame #9: 0x00005606af954334 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] std::__1::enable_shared_from_this<yb::tablet::RunningTransaction>::shared_from_this[abi:ue170006](this=0x00001624cd5da018) at shared_ptr.h:1954:17
    frame #10: 0x00005606af954334 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] yb::tablet::RunningTransaction::Abort(this=0x00001624cd5da018, client=0x00001624fd7d7f10, callback=yb::TransactionStatusCallback @ 0x00007f4babc39700, lock=0x00007f4babc396e0)>, std::__1::unique_lock<std::__1::mutex>*) at running_transaction.cc:200:34
    frame #11: 0x00005606af953ccf yb-server`yb::tablet::TransactionParticipant::Impl::Abort(this=<unavailable>, id=<unavailable>, callback=<unavailable>)>) at transaction_participant.cc:707:45
    frame #12: 0x00005606af95d7d4 yb-server`yb::tablet::TransactionParticipant::StopActiveTxnsPriorTo(yb::HybridTime, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, yb::StronglyTypedUuid<yb::TransactionId_Tag>*) at transaction_participant.cc:1355:7
    frame #13: 0x00005606af95d3e5 yb-server`yb::tablet::TransactionParticipant::StopActiveTxnsPriorTo(this=<unavailable>, cutoff=<unavailable>, deadline=yb::CoarseTimePoint @ 0x00007f4babc398b8, exclude_txn_id=<unavailable>) at transaction_participant.cc:2700:17
```

This suggests an issue where the underlying `RunningTransaction` is being destroyed and we are trying to call `shared_from_this()` post that. This happens as we release the transaction participant's lock before creating a shared ref for the `RunningTransaction` instance we are trying to abort.

This diff fixes the issue by creating the shared_ref first before releasing the participant's mutex, and then using it later.
Jira: DB-14948

Test Plan: Jenkins

Reviewers: esheng, rthallam

Reviewed By: esheng

Subscribers: ybase, rthallam

Differential Revision: https://phorge.dev.yugabyte.com/D41531
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants