-
-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug/v0.17.0-rc1] invalid _pragma: sqlite3: disk I/O error #3360
Comments
Interesting, thanks for investigating. We switched default SQLite driver from ModernC SQLite to WASM SQLite for this release, so it's possibly a kink related to that, though we haven't seen it ourselves yet. What operating system / architecture is your server running on? |
Just for clarity, does it work if you set |
@tsmethurst
If is set I did some more digging, I'm seeing some permission issues when sqlite is committing the shm and wal to the database. I think when I run Even weirder, It seems that if I keep a manual sqlite connection open in my terminal, everything works normally and no more issues. I'll probably patch out the WASM driver and see if the issue is still happening, but I'm almost certain the problem is from there. EDIT: |
Alright thanks for the extra info. You can do a build of GtS with the modernc driver instead of the wasm drive by calling |
So I'm really starting to think the WASM driver doesn't play as nicely with NFS as the modernc driver. I don't fully understand everything written in there but I know that Sqlite on NFS is notoriously NOT recommended specifically because NFS has such a hard time with locking.
I think this PR #2025 enabled pooling. Maybe we need to add an ENV var to turn on If I may ask, what was the reason to switch to the WASM driver in the first place? |
i think nolock is probably something we'd consider as another SQLite configuration parameter so we can set it in the dsn appropriately, and then limit the connection pool. WASM was chosen over the modernc library as the WASM toolchain is much more battle tested as a toolchain than modernc's transpile toolchain, the codebase is much nicer to work with, and the modernc implementation still suffers with a concurrency issue on connection interrupt which after many many hours i could not hope to fix without learning their specific C->Go transpiler :p |
if it's still going to be a build target some use i'll update my branch with our concurrency bug workaround we were using: gitlab.com/NyaaaWhatsUpDoc/sqlite |
I can confirm that the issue doesn't happen when using a build with: |
Do you think it would be a possibility to ship a WASM docker image and a modernc image as a temporary solution? |
just bumped the modernc version in #3367 |
The SQLite FAQ on WAL-mode states:
https://www.sqlite.org/wal.html Irrespective of the driver, using SQLite in WAL-mode, which is what we do for GtS, is generally considered unsafe. It's possibly fine with a single process like we do, but nobody is really clear about that. |
I feel like using WAL in this case it absolutely fine because of this statement:
I think the real issue with running sqlite on NFS is due to poor/unreliable locking implementations. I 100% agree that running it on NFS is definitely not ideal, but at the moment it's my only option besides using postgres. @daenney I'm open to using something other than WAL mode, it was my understanding that it was the best safe/performant journal_mode but reading up on it more, using Synchronous FULL with some other journal_mode is probably very safe too? Do you think using TRUNCATE is a viable alternative? |
So,
Seems to be working with no errors using the WASM driver. You will notice having the open_conns greater than 1 also works. I'll keep monitoring the logs to see if anything pops up. |
As far as the options you've selected goes, FULL with TRUNCATE is fine. I'm not sure if DELETE would be faster with NFS, and there is also PERSIST in case DELETE or TRUNCATE are meaningfully slower on NFS than overriding some bits in the file. It's probably not something to worry about until it becomes measurable. It's a bit tricky because in theory WAL over network, as long as every process is the same host, is fine. But it's not something that's either recommended or really supported by the authors. SQLite is meant to be run colocated with your application on the same host, so anything over the network can introduce issues it's not designed to cope with. If you're going to introduce the network anyway, a client-server database that's designed to operate in this manner is probably the best bet. Litestream provides an alternative that doesn't break the operating model SQLite assumes. The DB and app are in the same place but the DB is continuously replicated to object storage instead. You can restore from it on startup. It's also really easy to break this invariant. For example, if you're doing backups ( |
Indeed, us officially supporting a usecase of SQLite that even the authors don't officially support seems like either responsibility-wise, or maintenance-wise, this can only backfire for us. |
We should add a warning to our docs about using SQLite in such a way, probably, and mention the possibility of configuring it as mentioned here: #3360 (comment). Thanks for the detailed testing and reports @S0yKaf, that's very helpful! |
@daenney Thank you for the detailed explanation! |
Sometimes the real bugfix is the friends you make along the way 🤠 |
I think my driver on NFS is probably be even less safe that the original. Tbh, I don't know if Linux OFD locks even work over NFS (assuming this is Linux). If fact, by design, it's much easier for a networked file system to implement the POSIX locking semantics. |
I'll close this now since we've figured out what the issue was and updated our docs :) |
I opened an issue in my repo to look into this. One thing you might be able to get away with is use PS: about |
Gotosocial Version: 0.17.0-rc1+git-de72855
env: Docker
relevant config:
Upgrading from
0.16.0
to0.17.0-rc1
gotosocial is unable to connect to the database giving the errors above.After some investigation, it seems that the sqlite default is to use Synchronous 2 (FULL) . Gotosocial is able to start properly once the database is manually set to
PRAGMA Synchronous = 1;
Using
GTS_DB_SQLITE_SYNCHRONOUS: "FULL"
does not work and will throwinvalid _pragma: sqlite3: disk I/O error
However the database seem to reset itself to synchronous FULL after some time, causing the error to appear again.
I'm not really sure what's causing this, I know the sqlite3 driver was updated for this release. Seems to be related to #3067 ?
Everything was running fine in
0.16.0
without even needing to setGTS_DB_SQLITE_SYNCHRONOUS: "NORMAL"
andGTS_DB_SQLITE_JOURNAL_MODE: "WAL"
The text was updated successfully, but these errors were encountered: