Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database disk image is malformed error sometimes #30

Open
quolpr opened this issue Sep 21, 2021 · 15 comments · May be fixed by #56
Open

Database disk image is malformed error sometimes #30

quolpr opened this issue Sep 21, 2021 · 15 comments · May be fixed by #56

Comments

@quolpr
Copy link
Contributor

quolpr commented Sep 21, 2021

I use Microsoft edge(chrome-based browser), and it supports the Atomics & SharedBuffer, and sometimes I get database disk image is malformed error. However, I didn't check if it happens when multiple tabs are opened, and I am unsure under which conditions it is happening.

Here is screenshot from sentry:

image

The stack trace:

Error: database disk image is malformed
  at n.handleError(/assets/VaultDb.worker.d94830ba.js:15:55024)
  at e.step(/assets/VaultDb.worker.d94830ba.js:15:50388)
  at n.exec(/assets/VaultDb.worker.d94830ba.js:15:53846)
  at Ri.sqlExec(/assets/VaultDb.worker.d94830ba.js:15:148562)
  at Ri.getRecords(/assets/VaultDb.worker.d94830ba.js:15:148929)
  at Qo.getLinksOfNoteId(/assets/VaultDb.worker.d94830ba.js:15:127812)
  at MessagePort.n(/assets/VaultDb.worker.d94830ba.js:1:1863)
@quolpr quolpr changed the title Database disk image is malformed Database disk image is malformed error sometime Sep 21, 2021
@quolpr quolpr changed the title Database disk image is malformed error sometime Database disk image is malformed error sometimes Sep 21, 2021
@quolpr
Copy link
Contributor Author

quolpr commented Sep 21, 2021

The curious thing — after page refresh, the DB is ok and I can access to read and write 🤔

@rshigg
Copy link

rshigg commented Sep 21, 2021

I also test on Edge and I've noticed that the database doesn't persist on reload and keeps throwing the same error after the initial write. After refreshing the error goes away but the initial query to get all the rows in the table returns an empty array. I'll add a screenshot of the error soon but I don't think it's an image malformed error.

I'll try to write some example code to hopefully make it easy to reproduce.

@quolpr
Copy link
Contributor Author

quolpr commented Sep 21, 2021

@rshigg I think your issue is not connected with this. I had the same error, and here is the solution #14 (comment) . You need to set up the correct CORS

@rshigg
Copy link

rshigg commented Sep 21, 2021

@quolpr thanks for the heads up, that was exactly it! If I come across this issue I'll update here :)

@quolpr
Copy link
Contributor Author

quolpr commented Oct 12, 2021

I think the issue is connected with this #24 . Cause after an increase of page size the frequency of such error got lower

@quolpr
Copy link
Contributor Author

quolpr commented Oct 12, 2021

I also noticed that such error happens when I have intensive writing in one tab, and intensive reading in a separate tab (everything happens at the same time) 🤔

@jlongster
Copy link
Owner

It's hard to know exactly. See my comment on #24, I don't think that's a problem.

What we really need to debug these kinds of issues is detailed logging. This error will happen whenever there the bytes read back to sqlite are in an unexpected format. Which means the reads are wrong. It's a critical issue and should never happen (while the "database is locked" may happen currently)

We could also leverage property-based tests (we already have some in here) to try to simulate these situations. We need a way to record the path to this error to so we can easily debug it.

It would help to at least know which read is the one that makes this happen. Is it when it tried to upgrade a lock? Opening a db? Or just reading data for a query? etc

@cathykc
Copy link

cathykc commented Feb 25, 2022

@quolpr Getting the same error intermittently - I'm 85% sure it's when we're doing large reads. Wondering if you found a solution for this. Refreshes also fix the error for us!

@jlongster First of all thanks for this library - it's been a huge lifesaver :) The queries that we're running are pretty simple selects (screenshot below). We have a debug mode that logs the query that's running and results returning. Do you have any advice on what else we could log (and where) to track this issue down?

image

@steida
Copy link

steida commented Feb 25, 2022

@cathykc How many tabs did you have? I encountered this error only with two or more tabs.

@cathykc
Copy link

cathykc commented Feb 25, 2022

We had two tabs open!

@quolpr
Copy link
Contributor Author

quolpr commented Jun 23, 2022

Ok, I found, at least when it happens — when IndexedDB is resizing. So, if you put 100,000 records in db, and then delete them — this error will never happen till more than 100,000 records will be created again (so absurd-sql will need to create new block in IndexedDB)

@quolpr quolpr linked a pull request Jul 11, 2022 that will close this issue
@rhashimoto
Copy link

rhashimoto commented Jul 15, 2022

I was describing alternatives to the Web Locks API to someone, and when I got to the absurd-sql approach I happened on something suspicious that seems worth investigating as a cause for this bug. It's in Transaction.upgradeExclusive() which upgrades the lock from SHARED to RESERVED/EXCLUSIVE:

async upgradeExclusive() {
this.commit();
// console.log('updating transaction readwrite');
perf.count('transactions');
this.trans = this.db.transaction(['data'], 'readwrite');
this.store = this.trans.objectStore('data');
this.lockType = LOCK_TYPES.EXCLUSIVE;
let cached0 = this.cachedFirstBlock;
// Do a read
let block = await this.prefetchFirstBlock(500);
// TODO: when timeouts are implemented, detect timeout and return BUSY
return isSafeToWrite(block, cached0);
}

This method abandons its read lock by discarding the existing "readonly" IDBTransaction and attempts the upgrade by creating a new "readwrite" IDBTransaction. It then checks that no other connection modified the database in between those operations by verifying that certain bytes in the first block are unchanged.

Let's assume that some other connection did in fact modify the database and that first block is different. What happens?

What happens is the VFS is going to return a status code that it failed to obtain the lock. But what state is it leaving behind? The Transaction instance is going to have LOCK_TYPE.EXCLUSIVE set, it will contain a "readwrite" IDBTransaction, and it has cached the new first block data. Meanwhile, SQLite thinks the lock state is still SHARED (because the upgrade failed) and it still has the original first block data in its cache. These discrepancies make me very nervous.

For one thing, the SQLite locking model is now broken. As far as SQLite knows, it is still in SHARED mode and can continue to read from the database. But the IDBTransaction it was using to read (and counting on for consistency) is now gone. If SQLite does happen to make calls that go to a new IDBTransaction, eventually it will get very confused.

For another thing, if that Transaction instance state isn't reset and remains active, then the next attempt to upgrade the lock will succeed because it thinks it is already in the upgraded state. I don't know if this retry actually happens, but if it does then that opens the door to corruption.

Now whether or not this is in fact the cause of this particular bug, I don't think this upgrade approach of dropping read-only, acquiring read-write, and checking for changes can be made to work with the SQLite locking model. This is because (1) I don't think there is a way to preserve the SHARED lock and (2) there is no protocol to tell SQLite that not only did it not upgrade the lock, it also lost its original lock.

Anyway, I'm just posting this in case it's helpful to someone. I don't plan on pursuing it further myself (not currently an absurd-sql user). If upgrade-or-sustain semantics (as opposed to upgrade-or-unlock) can't be implemented immediately (which I believe will require another Worker and SharedArrayBuffer), patching the code to allow only non-shared access with a "readwrite" IDBTransaction should be relatively easy.

@quolpr
Copy link
Contributor Author

quolpr commented Jul 16, 2022

@rhashimoto good discover 👍

@sampbarrow
Copy link

Getting this too. Used to happen occasionally but would be fixed by a refresh. Lately though, it seems to permanently corrupt the database in question (can't open it ever again without getting the error - have to delete the IDB database and make a new one). Only happens with multiple tabs.

@mrmurphy
Copy link

mrmurphy commented Aug 25, 2022

We just ran into this with our pre-release product. While one tab is doing a lot of writes, the other reads for a while just fine, then something changes, and all reads after that point throw this error until the page is refreshed.

It would help to at least know which read is the one that makes this happen. Is it when it tried to upgrade a lock? Opening a db? Or just reading data for a query? etc

In my case, I think it's just reading data for a query. It's possible that a write operation happened, though, that triggered a bad state, and that the read queries are throwing this error after some silent corrupting action.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants