You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, all
I have discovered a code that can cause deadlock:
void
BatchWriter::store(std::shared_ptr<NodeObject> const& object)
{
std::unique_lock<decltype(mWriteMutex)> sl(mWriteMutex);
// If the batch has reached its limit, we wait
// until the batch writer is finished
while (mWriteSet.size() >= batchWriteLimitSize)
mWriteCondition.wait(sl);
mWriteSet.push_back(object);
if (!mWritePending)
{
mWritePending = true;
m_scheduler.scheduleTask(*this);
}
}
void
NodeStoreScheduler::scheduleTask(NodeStore::Task& task)
{
if (jobQueue_.isStopped())
return;
if (!jobQueue_.addJob(jtWRITE, "NodeObject::store", [&task](Job&) {
task.performScheduledTask();
}))
{
// Job not added, presumably because we're shutting down.
// Recover by executing the task synchronously.
task.performScheduledTask();
}
}
If mWriteSet.size() >= batchWriteLimitSize condition met, all jobs in jobqueue may waiting for this condition variable, because the last time m_scheduler.scheduleTask execute success, it just added a job, but cannot assure the job will execute immediately.
If all jobs are locked(that's the scene I have encountered: all jobs are either waiting for the InboundLedger::update lock or waiting for the mWriteCondition condition variable), and the performScheduledTask can never execute ,it will deadlock!
I am working on rippled 1.6 ,and I have reviewed the relating code on rippled:develop,it seems the problem still lay there.
Is this problem resolved ? I think the condition variable usage (mWriteCondition) can be removed, is there a better resolution?
The text was updated successfully, but these errors were encountered:
The [workers] is configured 9, and there are totally 9 jobs running when deadlock occured.
The directions for doing so: the server have 16 core cpus with 3.0Ghz frequency, and only configured workers to 9, I think the reason this deadlock occured is the cpu ability is far more stronger than the disk io, and leading to a lot of node cannot be written to disk immediatly.
Hi, all
I have discovered a code that can cause deadlock:
If
mWriteSet.size() >= batchWriteLimitSize
condition met, all jobs in jobqueue may waiting for this condition variable, because the last timem_scheduler.scheduleTask
execute success, it just added a job, but cannot assure the job will execute immediately.If all jobs are locked(that's the scene I have encountered: all jobs are either waiting for the
InboundLedger::update
lock or waiting for themWriteCondition
condition variable), and theperformScheduledTask
can never execute ,it will deadlock!I am working on rippled 1.6 ,and I have reviewed the relating code on rippled:develop,it seems the problem still lay there.
Is this problem resolved ? I think the condition variable usage (
mWriteCondition
) can be removed, is there a better resolution?The text was updated successfully, but these errors were encountered: