Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplified DiskWriterQueue with blocking concurrency #2411

Merged
merged 2 commits into from
Feb 13, 2024

Conversation

ltetak
Copy link
Contributor

@ltetak ltetak commented Jan 25, 2024

It is relatively easy to put the DiskWriterQueue into a state where it does nothing. It is caused by mismatches where the logic does not track properly which _task is the current one. It has many problems:

  • Wait() waits for a wrong _task
  • _Task is not started at all

e.g.
#2307

My repro steps were to run a lot of Inserts and Deletes in parallel (to fill up the disk queue). Then every couple of seconds run _db.Checkpoint() to force full db lock and Wait() invocation.

Fix is to use a much simpler blocking approach (one thread is dedicated to this). It is a good tradeoff IMO for now. It can be later replaced with an awaitable mutex version.
Edit: I added an async version of the semaphore which does not block the thread.

@ltetak ltetak mentioned this pull request Jan 25, 2024
@mbdavid
Copy link
Collaborator

mbdavid commented Feb 13, 2024

Thanks! This are an old code that must be updated.

@mbdavid mbdavid merged commit 6d2a165 into litedb-org:master Feb 13, 2024
1 check failed
@jdtkw
Copy link

jdtkw commented Mar 6, 2024

Thanks @ltetak - this indeed resolved our isue (#2307 - I work with @dgodwin1175), but v5.0.18 and v5.0.19 causes us to hit #2435 prior to being able to validate this with an official build. A custom build of #2436 on top of v5.0.19 (that includes #2411) seems to indicate that we can have a stable solution.

@ltetak
Copy link
Contributor Author

ltetak commented Mar 6, 2024

hi @jdtkw, transaction (and especially AutoTransaction class) was the next thing I wanted to take a look at. I know about a couple of problems there.

  1. AutoTransaction can fail when reverting the transaction - this is bad by itself but it's double-bad because it hides the original exception.
  2. Error handling in transactions is wrong causing wrong counts. Fix #2435 Transactions are not removed in LiteDB 5.0.18 #2436 may be a fix to it but we need to be sure the DB is in a good state. There are a lot of "ENSURE" errors. My guess is that some transaction does not return the DB to a valid state and it breaks it.
    We run the database in single threaded mode (we serialize every access to the db by locks) so it must be either a problem in the algorithm somewhere or some external exception. I have some evidence that external exceptions make this problem much worse so I would start there - it means if you have an unstable storage medium causing random exceptions it may lead to a corrupted database (which should not happen thanks to the journal approach).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants