-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workers cannot share data directory (second worker crashes with RocksIOError) #299
Comments
For reference, the first worker had this log output:
|
Currently we are using one data directory per worker: $ faust -A example --datadir=data/worker1 worker -l info --without-web
$ faust -A example --datadir=data/worker2 worker -l info --without-web We use one RocksDB file for every changelog partition, RocksDB does not allow reads/writes from multiple processes, so it crashes with this error. An optimization that we could do is skipping starting a standby for partitions that we have a local database file for. It's not that easy to implement in practice: It would have to take into account how far behind the file is from the actual changelog so that recovery is not slow. |
I see. Thanks for the quick response! I believe with better documentation, or with a unique data directories per worker as a default, I'd regard my issue as resolved – though I'd certainly welcome the optimization that is hinted at in your comment. Naively, I thought that this would be case already; maybe a sentence or two in the documentation could clear that up. Thanks again! |
Checklist
master
branch of Faust.Steps to reproduce
The following errors can be reproduced both with the current stable version (1.4.6) and with the development version (1.5.0b1). The worker code is:
Starting the first worker with
faust -A example worker -l info --without-web
works fine. As soon as I start a second worker, I get the traceback below and the worker exits.This is the same error as #184, however my setup is even simpler than that. Also, there seem to be databases for both partitions:
Expected behavior
The second worker assumes responsibility for one partition and the corresponding table.
Actual behavior
The second worker cannot get the lock for the table it is supposed to be responsible for and crashes.
Full traceback
Versions
The text was updated successfully, but these errors were encountered: