Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[A code causes deadlock] (Version: [rippled 1.6]) #4023

Open
luleigreat opened this issue Dec 10, 2021 · 2 comments
Open

[A code causes deadlock] (Version: [rippled 1.6]) #4023

luleigreat opened this issue Dec 10, 2021 · 2 comments
Assignees

Comments

@luleigreat
Copy link

luleigreat commented Dec 10, 2021

Hi, all
I have discovered a code that can cause deadlock:

void
BatchWriter::store(std::shared_ptr<NodeObject> const& object)
{
    std::unique_lock<decltype(mWriteMutex)> sl(mWriteMutex);

    // If the batch has reached its limit, we wait
    // until the batch writer is finished
    while (mWriteSet.size() >= batchWriteLimitSize)
        mWriteCondition.wait(sl);

    mWriteSet.push_back(object);

    if (!mWritePending)
    {
        mWritePending = true;

        m_scheduler.scheduleTask(*this);
    }
}
void
NodeStoreScheduler::scheduleTask(NodeStore::Task& task)
{
    if (jobQueue_.isStopped())
        return;

    if (!jobQueue_.addJob(jtWRITE, "NodeObject::store", [&task](Job&) {
            task.performScheduledTask();
        }))
    {
        // Job not added, presumably because we're shutting down.
        // Recover by executing the task synchronously.
        task.performScheduledTask();
    }
}

If mWriteSet.size() >= batchWriteLimitSize condition met, all jobs in jobqueue may waiting for this condition variable, because the last time m_scheduler.scheduleTask execute success, it just added a job, but cannot assure the job will execute immediately.
If all jobs are locked(that's the scene I have encountered: all jobs are either waiting for the InboundLedger::update lock or waiting for the mWriteCondition condition variable), and the performScheduledTask can never execute ,it will deadlock!

I am working on rippled 1.6 ,and I have reviewed the relating code on rippled:develop,it seems the problem still lay there.

Is this problem resolved ? I think the condition variable usage (mWriteCondition) can be removed, is there a better resolution?

@HowardHinnant HowardHinnant self-assigned this Dec 13, 2021
@HowardHinnant
Copy link
Contributor

Can you reliably reproduce this deadlocked state? If so, can you include directions for doing so? Thanks.

@luleigreat
Copy link
Author

luleigreat commented Dec 14, 2021

The [workers] is configured 9, and there are totally 9 jobs running when deadlock occured.
The directions for doing so: the server have 16 core cpus with 3.0Ghz frequency, and only configured workers to 9, I think the reason this deadlock occured is the cpu ability is far more stronger than the disk io, and leading to a lot of node cannot be written to disk immediatly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants