-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB] Crash Observed in TransactionParticipant::Impl::Abort #25689
Labels
2.20 Backport Required
2024.1 Backport Required
2024.2 Backport Required
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
priority/high
High Priority
Comments
shamanthchandra-yb
added
area/docdb
YugabyteDB core features
status/awaiting-triage
Issue awaiting triage
labels
Jan 20, 2025
yugabyte-ci
added
kind/bug
This issue is a bug
priority/medium
Medium priority issue
priority/high
High Priority
and removed
priority/medium
Medium priority issue
labels
Jan 20, 2025
Per @basavaraj29 , looks like the root cause is the below
This should instead have been
|
basavaraj29
added a commit
that referenced
this issue
Jan 23, 2025
… transactions Summary: One of the stress tests faced a crash with the following trace ``` * thread #1, name = 'yb-tserver', stop reason = signal SIGABRT * frame #0: 0x00007f4ecfd66acf libc.so.6`raise + 271 frame #1: 0x00007f4ecfd39ea5 libc.so.6`abort + 295 frame #2: 0x00005606b0891403 yb-server`abort_message + 195 frame #3: 0x00005606b0890f9c yb-server`demangling_terminate_handler() + 268 frame #4: 0x00005606b0890c66 yb-server`std::__terminate(void (*)()) + 6 frame #5: 0x00005606b0892bab yb-server`__cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*) + 27 frame #6: 0x00005606b0892b3f yb-server`__cxa_throw + 111 frame #7: 0x00005606ae47cd1e yb-server`std::__1::__throw_bad_weak_ptr[abi:ue170006]() at shared_ptr.h:137:5 frame #8: 0x00005606af954382 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] std::__1::shared_ptr<yb::tablet::RunningTransaction>::shared_ptr[abi:ue170006]<yb::tablet::RunningTransaction, void>(this=<unavailable>, __r=std::__1::weak_ptr<yb::tablet::RunningTransaction>::element_type @ 0x0000162400000001) at shared_ptr.h:704:13 frame #9: 0x00005606af954334 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] std::__1::enable_shared_from_this<yb::tablet::RunningTransaction>::shared_from_this[abi:ue170006](this=0x00001624cd5da018) at shared_ptr.h:1954:17 frame #10: 0x00005606af954334 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] yb::tablet::RunningTransaction::Abort(this=0x00001624cd5da018, client=0x00001624fd7d7f10, callback=yb::TransactionStatusCallback @ 0x00007f4babc39700, lock=0x00007f4babc396e0)>, std::__1::unique_lock<std::__1::mutex>*) at running_transaction.cc:200:34 frame #11: 0x00005606af953ccf yb-server`yb::tablet::TransactionParticipant::Impl::Abort(this=<unavailable>, id=<unavailable>, callback=<unavailable>)>) at transaction_participant.cc:707:45 frame #12: 0x00005606af95d7d4 yb-server`yb::tablet::TransactionParticipant::StopActiveTxnsPriorTo(yb::HybridTime, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, yb::StronglyTypedUuid<yb::TransactionId_Tag>*) at transaction_participant.cc:1355:7 frame #13: 0x00005606af95d3e5 yb-server`yb::tablet::TransactionParticipant::StopActiveTxnsPriorTo(this=<unavailable>, cutoff=<unavailable>, deadline=yb::CoarseTimePoint @ 0x00007f4babc398b8, exclude_txn_id=<unavailable>) at transaction_participant.cc:2700:17 ``` This suggests an issue where the underlying `RunningTransaction` is being destroyed and we are trying to call `shared_from_this()` post that. This happens as we release the transaction participant's lock before creating a shared ref for the `RunningTransaction` instance we are trying to abort. This diff fixes the issue by creating the shared_ref first before releasing the participant's mutex, and then using it later. Jira: DB-14948 Test Plan: Jenkins Reviewers: esheng Reviewed By: esheng Subscribers: rthallam, ybase Differential Revision: https://phorge.dev.yugabyte.com/D41384
basavaraj29
added a commit
that referenced
this issue
Jan 25, 2025
…tempting to abort transactions Summary: Original commit: 44a67f1 / D41384 One of the stress tests faced a crash with the following trace ``` * thread #1, name = 'yb-tserver', stop reason = signal SIGABRT * frame #0: 0x00007f4ecfd66acf libc.so.6`raise + 271 frame #1: 0x00007f4ecfd39ea5 libc.so.6`abort + 295 frame #2: 0x00005606b0891403 yb-server`abort_message + 195 frame #3: 0x00005606b0890f9c yb-server`demangling_terminate_handler() + 268 frame #4: 0x00005606b0890c66 yb-server`std::__terminate(void (*)()) + 6 frame #5: 0x00005606b0892bab yb-server`__cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*) + 27 frame #6: 0x00005606b0892b3f yb-server`__cxa_throw + 111 frame #7: 0x00005606ae47cd1e yb-server`std::__1::__throw_bad_weak_ptr[abi:ue170006]() at shared_ptr.h:137:5 frame #8: 0x00005606af954382 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] std::__1::shared_ptr<yb::tablet::RunningTransaction>::shared_ptr[abi:ue170006]<yb::tablet::RunningTransaction, void>(this=<unavailable>, __r=std::__1::weak_ptr<yb::tablet::RunningTransaction>::element_type @ 0x0000162400000001) at shared_ptr.h:704:13 frame #9: 0x00005606af954334 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] std::__1::enable_shared_from_this<yb::tablet::RunningTransaction>::shared_from_this[abi:ue170006](this=0x00001624cd5da018) at shared_ptr.h:1954:17 frame #10: 0x00005606af954334 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] yb::tablet::RunningTransaction::Abort(this=0x00001624cd5da018, client=0x00001624fd7d7f10, callback=yb::TransactionStatusCallback @ 0x00007f4babc39700, lock=0x00007f4babc396e0)>, std::__1::unique_lock<std::__1::mutex>*) at running_transaction.cc:200:34 frame #11: 0x00005606af953ccf yb-server`yb::tablet::TransactionParticipant::Impl::Abort(this=<unavailable>, id=<unavailable>, callback=<unavailable>)>) at transaction_participant.cc:707:45 frame #12: 0x00005606af95d7d4 yb-server`yb::tablet::TransactionParticipant::StopActiveTxnsPriorTo(yb::HybridTime, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, yb::StronglyTypedUuid<yb::TransactionId_Tag>*) at transaction_participant.cc:1355:7 frame #13: 0x00005606af95d3e5 yb-server`yb::tablet::TransactionParticipant::StopActiveTxnsPriorTo(this=<unavailable>, cutoff=<unavailable>, deadline=yb::CoarseTimePoint @ 0x00007f4babc398b8, exclude_txn_id=<unavailable>) at transaction_participant.cc:2700:17 ``` This suggests an issue where the underlying `RunningTransaction` is being destroyed and we are trying to call `shared_from_this()` post that. This happens as we release the transaction participant's lock before creating a shared ref for the `RunningTransaction` instance we are trying to abort. This diff fixes the issue by creating the shared_ref first before releasing the participant's mutex, and then using it later. Jira: DB-14948 Test Plan: Jenkins Reviewers: esheng, rthallam Reviewed By: rthallam Subscribers: ybase, rthallam Differential Revision: https://phorge.dev.yugabyte.com/D41455
Reactivating for 2024.1 and 2.20 backports |
basavaraj29
added a commit
that referenced
this issue
Jan 30, 2025
…tempting to abort transactions Summary: Original commit: 44a67f1 / D41384 One of the stress tests faced a crash with the following trace ``` * thread #1, name = 'yb-tserver', stop reason = signal SIGABRT * frame #0: 0x00007f4ecfd66acf libc.so.6`raise + 271 frame #1: 0x00007f4ecfd39ea5 libc.so.6`abort + 295 frame #2: 0x00005606b0891403 yb-server`abort_message + 195 frame #3: 0x00005606b0890f9c yb-server`demangling_terminate_handler() + 268 frame #4: 0x00005606b0890c66 yb-server`std::__terminate(void (*)()) + 6 frame #5: 0x00005606b0892bab yb-server`__cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*) + 27 frame #6: 0x00005606b0892b3f yb-server`__cxa_throw + 111 frame #7: 0x00005606ae47cd1e yb-server`std::__1::__throw_bad_weak_ptr[abi:ue170006]() at shared_ptr.h:137:5 frame #8: 0x00005606af954382 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] std::__1::shared_ptr<yb::tablet::RunningTransaction>::shared_ptr[abi:ue170006]<yb::tablet::RunningTransaction, void>(this=<unavailable>, __r=std::__1::weak_ptr<yb::tablet::RunningTransaction>::element_type @ 0x0000162400000001) at shared_ptr.h:704:13 frame #9: 0x00005606af954334 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] std::__1::enable_shared_from_this<yb::tablet::RunningTransaction>::shared_from_this[abi:ue170006](this=0x00001624cd5da018) at shared_ptr.h:1954:17 frame #10: 0x00005606af954334 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] yb::tablet::RunningTransaction::Abort(this=0x00001624cd5da018, client=0x00001624fd7d7f10, callback=yb::TransactionStatusCallback @ 0x00007f4babc39700, lock=0x00007f4babc396e0)>, std::__1::unique_lock<std::__1::mutex>*) at running_transaction.cc:200:34 frame #11: 0x00005606af953ccf yb-server`yb::tablet::TransactionParticipant::Impl::Abort(this=<unavailable>, id=<unavailable>, callback=<unavailable>)>) at transaction_participant.cc:707:45 frame #12: 0x00005606af95d7d4 yb-server`yb::tablet::TransactionParticipant::StopActiveTxnsPriorTo(yb::HybridTime, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, yb::StronglyTypedUuid<yb::TransactionId_Tag>*) at transaction_participant.cc:1355:7 frame #13: 0x00005606af95d3e5 yb-server`yb::tablet::TransactionParticipant::StopActiveTxnsPriorTo(this=<unavailable>, cutoff=<unavailable>, deadline=yb::CoarseTimePoint @ 0x00007f4babc398b8, exclude_txn_id=<unavailable>) at transaction_participant.cc:2700:17 ``` This suggests an issue where the underlying `RunningTransaction` is being destroyed and we are trying to call `shared_from_this()` post that. This happens as we release the transaction participant's lock before creating a shared ref for the `RunningTransaction` instance we are trying to abort. This diff fixes the issue by creating the shared_ref first before releasing the participant's mutex, and then using it later. Jira: DB-14948 Test Plan: Jenkins Reviewers: esheng, rthallam Reviewed By: esheng Subscribers: ybase, rthallam Differential Revision: https://phorge.dev.yugabyte.com/D41530
basavaraj29
added a commit
that referenced
this issue
Jan 30, 2025
…mpting to abort transactions Summary: Original commit: 44a67f1 / D41384 One of the stress tests faced a crash with the following trace ``` * thread #1, name = 'yb-tserver', stop reason = signal SIGABRT * frame #0: 0x00007f4ecfd66acf libc.so.6`raise + 271 frame #1: 0x00007f4ecfd39ea5 libc.so.6`abort + 295 frame #2: 0x00005606b0891403 yb-server`abort_message + 195 frame #3: 0x00005606b0890f9c yb-server`demangling_terminate_handler() + 268 frame #4: 0x00005606b0890c66 yb-server`std::__terminate(void (*)()) + 6 frame #5: 0x00005606b0892bab yb-server`__cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*) + 27 frame #6: 0x00005606b0892b3f yb-server`__cxa_throw + 111 frame #7: 0x00005606ae47cd1e yb-server`std::__1::__throw_bad_weak_ptr[abi:ue170006]() at shared_ptr.h:137:5 frame #8: 0x00005606af954382 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] std::__1::shared_ptr<yb::tablet::RunningTransaction>::shared_ptr[abi:ue170006]<yb::tablet::RunningTransaction, void>(this=<unavailable>, __r=std::__1::weak_ptr<yb::tablet::RunningTransaction>::element_type @ 0x0000162400000001) at shared_ptr.h:704:13 frame #9: 0x00005606af954334 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] std::__1::enable_shared_from_this<yb::tablet::RunningTransaction>::shared_from_this[abi:ue170006](this=0x00001624cd5da018) at shared_ptr.h:1954:17 frame #10: 0x00005606af954334 yb-server`yb::tablet::TransactionParticipant::Impl::Abort(yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::function<void (yb::Result<yb::TransactionStatusResult>)>) [inlined] yb::tablet::RunningTransaction::Abort(this=0x00001624cd5da018, client=0x00001624fd7d7f10, callback=yb::TransactionStatusCallback @ 0x00007f4babc39700, lock=0x00007f4babc396e0)>, std::__1::unique_lock<std::__1::mutex>*) at running_transaction.cc:200:34 frame #11: 0x00005606af953ccf yb-server`yb::tablet::TransactionParticipant::Impl::Abort(this=<unavailable>, id=<unavailable>, callback=<unavailable>)>) at transaction_participant.cc:707:45 frame #12: 0x00005606af95d7d4 yb-server`yb::tablet::TransactionParticipant::StopActiveTxnsPriorTo(yb::HybridTime, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>, yb::StronglyTypedUuid<yb::TransactionId_Tag>*) at transaction_participant.cc:1355:7 frame #13: 0x00005606af95d3e5 yb-server`yb::tablet::TransactionParticipant::StopActiveTxnsPriorTo(this=<unavailable>, cutoff=<unavailable>, deadline=yb::CoarseTimePoint @ 0x00007f4babc398b8, exclude_txn_id=<unavailable>) at transaction_participant.cc:2700:17 ``` This suggests an issue where the underlying `RunningTransaction` is being destroyed and we are trying to call `shared_from_this()` post that. This happens as we release the transaction participant's lock before creating a shared ref for the `RunningTransaction` instance we are trying to abort. This diff fixes the issue by creating the shared_ref first before releasing the participant's mutex, and then using it later. Jira: DB-14948 Test Plan: Jenkins Reviewers: esheng, rthallam Reviewed By: esheng Subscribers: ybase, rthallam Differential Revision: https://phorge.dev.yugabyte.com/D41531
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
2.20 Backport Required
2024.1 Backport Required
2024.2 Backport Required
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
priority/high
High Priority
Jira Link: DB-14948
Description
Version: 2.25.1.0-b203
Packed toggle off/on stress testcase failed because of:
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: