Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replication with TLS is broken because TlsSocket preempts withing the AsyncWrite #4347

Open
romange opened this issue Dec 20, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@romange
Copy link
Collaborator

romange commented Dec 20, 2024

See the stack trace below:

* Logs will be written to the first available of the following paths:
/var/log/dragonfly/dragonfly.*
* For the available flags type dragonfly [--help | --helpfull]
* Documentation can be found at: https://www.dragonflydb.io/docs
F20241220 08:09:02.501420 1867835 scheduler.cc:204] Check failed: !IsFiberAtomicSection() Preempting inside of atomic section
*** Check failure stack trace: ***
    @     0x5fa8f623eb05  google::LogMessage::Fail()
    @     0x5fa8f623ea4b  google::LogMessage::SendToLog()
    @     0x5fa8f623e1fe  google::LogMessage::Flush()
    @     0x5fa8f62422f6  google::LogMessageFatal::~LogMessageFatal()
    @     0x5fa8f5e06c5c  util::fb2::detail::Scheduler::Preempt()
    @     0x5fa8f4fc4888  util::fb2::detail::FiberInterface::Suspend()
    @     0x5fa8f5e88edb  util::fb2::FiberCall::Get()
    @     0x5fa8f5e95625  util::fb2::UringSocket::WriteSome()
    @     0x5fa8f5deb704  io::Sink::WriteSome()
    @     0x5fa8f5dea65b  util::tls::TlsSocket::HandleUpstreamWrite()
    @     0x5fa8f5de9b75  util::tls::TlsSocket::MaybeSendOutput()
    @     0x5fa8f5de981e  util::tls::TlsSocket::SendBuffer()
    @     0x5fa8f5de8eac  util::tls::TlsSocket::WriteSome()
    @     0x5fa8f5de99f0  util::tls::TlsSocket::AsyncWriteSome()
    @     0x5fa8f5eb5ab2  io::AsyncSink::AsyncWrite()
    @     0x5fa8f57bd749  dfly::JournalStreamer::AsyncWrite()
    @     0x5fa8f57bdaa0  dfly::JournalStreamer::Write()
    @     0x5fa8f57bce40  _ZZN4dfly15JournalStreamer5StartEPN4util15FiberSocketBaseEbENKUlRKNS_7journal11JournalItemEbE_clES7_b
    @     0x5fa8f57c16fd  _ZSt13__invoke_implIvRZN4dfly15JournalStreamer5StartEPN4util15FiberSocketBaseEbEUlRKNS0_7journal11JournalItemEbE_JS8_bEET_St14__invoke_otherOT0_DpOT1_
    @     0x5fa8f57c1109  _ZSt10__invoke_rIvRZN4dfly15JournalStreamer5StartEPN4util15FiberSocketBaseEbEUlRKNS0_7journal11JournalItemEbE_JS8_bEENSt9enable_ifIX16is_invocable_r_vIT_T0_DpT1_EESC_E4typeEOSD_DpOSE_
    @     0x5fa8f57c0bcd  _ZNSt17_Function_handlerIFvRKN4dfly7journal11JournalItemEbEZNS0_15JournalStreamer5StartEPN4util15FiberSocketBaseEbEUlS4_bE_E9_M_invokeERKSt9_Any_dataS4_Ob
    @     0x5fa8f577c05e  std::function<>::operator()()
    @     0x5fa8f577a4e2  dfly::journal::JournalSlice::AddLogRecord()
    @     0x5fa8f5772401  dfly::journal::Journal::RecordEntry()
    @     0x5fa8f57af3e5  dfly::RecordExpiry()
    @     0x5fa8f5701ced  dfly::DbSlice::ExpireIfNeeded()
    @     0x5fa8f5702e14  _ZZN4dfly7DbSlice17DeleteExpiredStepERKNS_9DbContextEjENKUlNS_9DashTableINS_10CompactObjENS_12ExpirePeriodENS_6detail17ExpireTablePolicyEE8IteratorILb0ELb0EEEE_clESB_
    @     0x5fa8f5707105  _ZZN4dfly9DashTableINS_10CompactObjENS_12ExpirePeriodENS_6detail17ExpireTablePolicyEE8TraverseIRZNS_7DbSlice17DeleteExpiredStepERKNS_9DbContextEjEUlNS5_8IteratorILb0ELb0EEEE_EENS3_10DashCursorESF_OT_ENKUlRKNS3_7SegmentIS1_S2_S4_E8IteratorEE_clESM_
    @     0x5fa8f570b3a9  _ZZNK4dfly6detail7SegmentINS_10CompactObjENS_12ExpirePeriodENS0_17ExpireTablePolicyEE21TraverseLogicalBucketIZNS_9DashTableIS2_S3_S4_E8TraverseIRZNS_7DbSlice17DeleteExpiredStepERKNS_9DbContextEjEUlNS8_8IteratorILb0ELb0EEEE_EENS0_10DashCursorESI_OT_EUlRKNS5_8IteratorEE_RZNS9_ISH_EESI_SI_SK_EUlRKSJ_E_EEbhOT0_SK_ENKUlPSJ_hbE0_clIKNS5_6BucketEEEDaSV_hb
    @     0x5fa8f570b433  _ZNK4dfly6detail7SegmentINS_10CompactObjENS_12ExpirePeriodENS0_17ExpireTablePolicyEE6Bucket15ForEachSlotImplIPKS6_ZNKS5_21TraverseLogicalBucketIZNS_9DashTableIS2_S3_S4_E8TraverseIRZNS_7DbSlice17DeleteExpiredStepERKNS_9DbContextEjEUlNSC_8IteratorILb0ELb0EEEE_EENS0_10DashCursorESM_OT_EUlRKNS5_8IteratorEE_RZNSD_ISL_EESM_SM_SO_EUlRKSN_E_EEbhOT0_SO_EUlPSN_hbE0_EEvSN_SY_
    @     0x5fa8f5709b4a  _ZNK4dfly6detail7SegmentINS_10CompactObjENS_12ExpirePeriodENS0_17ExpireTablePolicyEE6Bucket11ForEachSlotIZNKS5_21TraverseLogicalBucketIZNS_9DashTableIS2_S3_S4_E8TraverseIRZNS_7DbSlice17DeleteExpiredStepERKNS_9DbContextEjEUlNSA_8IteratorILb0ELb0EEEE_EENS0_10DashCursorESK_OT_EUlRKNS5_8IteratorEE_RZNSB_ISJ_EESK_SK_SM_EUlRKSL_E_EEbhOT0_SM_EUlPSL_hbE0_EEvSM_
    @     0x5fa8f5708b0d  _ZNK4dfly6detail7SegmentINS_10CompactObjENS_12ExpirePeriodENS0_17ExpireTablePolicyEE21TraverseLogicalBucketIZNS_9DashTableIS2_S3_S4_E8TraverseIRZNS_7DbSlice17DeleteExpiredStepERKNS_9DbContextEjEUlNS8_8IteratorILb0ELb0EEEE_EENS0_10DashCursorESI_OT_EUlRKNS5_8IteratorEE_RZNS9_ISH_EESI_SI_SK_EUlRKSJ_E_EEbhOT0_SK_
*** SIGABRT received at time=1734682142 on cpu 15 ***
PC: @     0x7428bf89eb1c  (unknown)  pthread_kill
    @     0x5fa8f62c8280         64  absl::lts_20240116::WriteFailureInfo()
    @     0x5fa8f62c84fc         96  absl::lts_20240116::AbslFailureSignalHandler()
    @     0x7428bf845320       3360  (unknown)
    @     0x7428bf84526e         32  raise
    @     0x7428bf8288ff        192  abort
    @     0x5fa8f6247c38        176  google::DumpStackTraceAndExit()
    @     0x5fa8f623eb05         16  google::LogMessage::Fail()
    @     0x5fa8f623ea4b        160  google::LogMessage::SendToLog()
    @     0x5fa8f623e1fe         80  google::LogMessage::Flush()
    @     0x5fa8f62422f6         32  google::LogMessageFatal::~LogMessageFatal()
    @     0x5fa8f5e06c5c        384  util::fb2::detail::Scheduler::Preempt()
    @     0x5fa8f4fc4888         48  util::fb2::detail::FiberInterface::Suspend()
    @     0x5fa8f5e88edb        176  util::fb2::FiberCall::Get()
    @     0x5fa8f5e95625        384  util::fb2::UringSocket::WriteSome()
    @     0x5fa8f5deb704         80  io::Sink::WriteSome()
    @     0x5fa8f5dea65b        256  util::tls::TlsSocket::HandleUpstreamWrite()
    @     0x5fa8f5de9b75         80  util::tls::TlsSocket::MaybeSendOutput()
    @     0x5fa8f5de981e        304  util::tls::TlsSocket::SendBuffer()
    @     0x5fa8f5de8eac       1680  util::tls::TlsSocket::WriteSome()
    @     0x5fa8f5de99f0        128  util::tls::TlsSocket::AsyncWriteSome()
    @     0x5fa8f5eb5ab2        144  io::AsyncSink::AsyncWrite()
    @     0x5fa8f57bd749        384  dfly::JournalStreamer::AsyncWrite()
    @     0x5fa8f57bdaa0        208  dfly::JournalStreamer::Write()
    @     0x5fa8f57bce40        272  dfly::JournalStreamer::Start()::{lambda()#1}::operator()()
    @     0x5fa8f57c16fd         64  std::__invoke_impl<>()
    @     0x5fa8f57c1109         64  std::__invoke_r<>()
    @     0x5fa8f57c0bcd         64  std::_Function_handler<>::_M_invoke()
    @     0x5fa8f577c05e         64  std::function<>::operator()()
    @     0x5fa8f577a4e2        368  dfly::journal::JournalSlice::AddLogRecord()
    @     0x5fa8f5772401        240  dfly::journal::Journal::RecordEntry()
    @     0x5fa8f57af3e5        336  dfly::RecordExpiry()
    @     0x5fa8f5701ced        544  dfly::DbSlice::ExpireIfNeeded()

@romange romange added the bug Something isn't working label Dec 20, 2024
@romange
Copy link
Collaborator Author

romange commented Dec 20, 2024

The bug: helio tls socket implements AsyncWrite as a regular write - I did not know that we assume it does not block. In any case we lack coverage for this flow.

@BorysTheDev
Copy link
Contributor

I think we can make it in 2 steps:

  1. create a fast fix that will copy a bucket and do write with copied data
  2. implement non-blocking asyncWrite method

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants