-
-
Notifications
You must be signed in to change notification settings - Fork 399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convince me to use better-sqlite3 #262
Comments
There's a restaurant down the street from me called Yaso Tangbao. They have delicious beef soup and dumplings with a fantastic ginger vinegar sauce. Everyone that I've brought there has said it was some of the best food they've ever had. Right next to it, there's a McDonalds. Millions of people go to McDonalds every day, but Yaso Tangbao attracts a much more modest following. McDonalds is certainly more popular. Does this mean it's better? The obvious answer is "no", it just has greater brand recognition. Brand recognition, whether you like it or not, controls all of our lives, even in the open source marketplace. As another example, the popular test framework Even the In open source, being first is more important than being better. I can't think of a single example of a Node.js package that is more popular than the one it's trying to replace, even if it's clearly superior. In any case, here are the reasons to choose 1. It's bug-freeDespite 2. It's fasterSee the benchmarks results, where the performance of 3. It's simpler to useLet's say we'd like to execute an atomic transaction that performs 2 steps:
The proper implementation in const { promisify } = require('util');
const { Database } = require('node-sqlite3');
// First, we have to promisify everything, since node-sqlite3 is callback-based
const open = promisify((path, cb) => new Database(path, function (err) { cb(err, this); }));
const get = promisify(Database.prototype.get);
const run = promisify(Database.prototype.run);
const close = promisify(Database.prototype.close);
// To safely handle a transaction, we must open a separate database connection per transaction
// See: https://github.com/mapbox/node-sqlite3/issues/304#issuecomment-45280758
async function myTransaction() {
const db = await open('data.db');
let result;
try {
await run.call(db, 'BEGIN');
try {
await run.call(db, 'UPDATE ...');
result = await get.call(db, 'SELECT ...');
await run.call(db, 'COMMIT');
} catch (err) {
try { await run.call(db, 'ROLLBACK'); }
catch (_) { /* manually ignore cases where the sqlite3 automatically rolled back the transaction */ }
throw err;
}
} finally {
await close.call(db);
}
return result;
} The proper implementation in const db = require('better-sqlite3')('data.db');
const myTransaction = db.transaction(() => {
db.prepare('UPDATE ...').run();
return db.prepare('SELECT ...').get();
}); Not to mention, transactions in 4. FeaturesIn In In In None of the features listed above are possible in Conclusion
|
I did some comparison to node-sqlite3 by myself for 10.000 selects better-sqlite3 won but most of the time you are not using sqlite for selects (I guess) the problem I had with node-sqlite3 is that it uses too much memory and doesn't release enough; I don't think is a memory leak but it bothers me and this is the reason I was looking into better-sqlite3 to solve my problem; the sync only architecture is a huge trade-off for me JoshuaWise should implement async or he should contribute to node-sqlite3 to share his improvements |
@alin1771 With You should test how fast your queries are (e.g., 0.1 milliseconds), and decide how much server latency is an unacceptable amount for your users (let's say, 200 milliseconds). In this example, you'd be able to handle 2000 concurrent requests before your server latency becomes unacceptable. You could switch to Because of SQLite3's serialized nature, it's better to optimize for speed, not concurrency (which is limited to 1). @alin1771 Without seeing your benchmark code, I cannot asses why you're not getting the performance observed by the benchmark in this repo, which is publicly reviewable by all. I'd start by making sure you turn on WAL mode, which is a necessity for any web server using SQLite3. db.pragma('journal_mode = WAL'); |
with fs.writeFile not only you can write multiple files but you are also not blocking the main thread and this is the most important; my tests looks something like this: for me, both are fast enough I could say but better-sqlite3 blocks the main thread; I like your code, is very clean and light. |
@alin1771 If keeping the main thread unblocked is very important to you, you may be interested in this PR which enables |
One issue I see with your benchmark is that in So a proper usage would actually be: const stmt = db.prepare('UPDATE [test] SET value=? WHERE id=?');
for (i = 0; i < 1000; i++) stmt.run(Math.random(), i); |
you can do the same with node-sqlite3, they support prepare as well; I will take a look into that PR but from what I know worker_threads are not as performant as "native" async; but sounds good anyway thank you @JoshuaWise |
You can still reuse prepared statements by keeping a global object of all prepared statements used throughout the application. If you don't like globals, you could move it into its own module.
Oops, I was wrong, it looks like you can reuse prepared statements in Something else you might be interested in is that |
Not to argue, but hoping to better grasp the design/intent:
Seems like this philosophy pays off when most HTTP requests need to hit the database and the database I/O is the lion's share of the time required to process each request. Seems like this philosophy would not pay off if, say, half your HTTP requests need to do no disk I/O at all. In this latter case, presumably, all the time spent by sqlite3 waiting on the disk could have been spent moving along the HTTP requests that do not access the disk. So probably WAL mode tilts the equation in favor of ignoring concurrency (if there are enough updates in the mix) and SSD instead of spinning disk likewise raises the odds that concurrency will not help enough to care. Both factors that reduce the time wasted making V8 wait for disk I/O.
I think here you're saying you can reuse an sqlite connection instead. I think I understand that sqlite3 does not separate transaction context from connection. I mean, I don't get any kind of separate transaction object that would allow me to be conducting two separate transactions at the same time over the same connection. So if someone did not follow your advice to avoid letting a transaction spread across one event loop tick, they would either need to serialize access to one sqlite3 connection, or else open a new connection to handle the next transaction that comes along before the first connection is available to be reused.
But the at-most-one writer does not block readers in WAL mode, right? In which case, it seems like not being able to overlap a read-only connection with another connection reserved for updates might leave significant performance gains on the table, depending on the mix of requests (recalling the UW research that suggests server throughput under load can be dramatically improved by giving priority to shorter-running requests). Ultimately, the disk itself is serialized, but we typically try to keep it loaded up with requests so it's never waiting on us. Did I get any of that right? |
That's correct.
Again, correct.
This is correct. However, it's important to note that while node-sqlite3 (the popular async library) does provide a way to serialize access to a connection, it's not sufficient for writing transactions because it doesn't provide a way to catch errors and rollback the transaction. That's essentially the topic of this thread.
I've tried running benchmarks where async connections (node-sqlite3) were trying to perform read requests concurrently with an overlapping writer connection, but I couldn't get it to perform as well as this library. Perhaps you could try to replicate the UW research by implementing some kind of priority queue, where each prepared statement was ranked with a performance cost. Maybe it'll work, maybe it won't. Either way, no async library I'm aware of (e.g., node-sqlite3) is implemented like that currently. |
I want to chime in to this conversation, because better-sqlite3 has done a lot to make the lives of many beginner nodejs developers simpler, even if they might not know at face value that it does. I'm the developer of a small-ish database wrapper called Enmap, which has evolved a lot since I started writing it years ago. It started with level-db, then I moved to having multiple plugins for the backend (sqlite being only one of them). For the last year, however, I've locked myself into using better-sqlite3 because Enmap's main focus is to be simple to use for beginners, and I've found that having the ability to make synchronous requests to the database has helped tremendously in this endeavour. Every other database connector required the uses of promises, or the loading of the entire data set in memory, in order for me to offer the features that Enmap offers. For example, I could not synchronously auto-fetch data when it's required if I was using the better-sqlite3 being the only sync connector has helped Enmap grow to the point where I'm expecting to reach a million total downloads by the end of the summer. And when I'm not using my own module, I'll automatically default to better-sqlite3 for rapid prototyping because it's faster, simpler, and requires less setup. To counter the shameless self-promotion, I'll add that as far as I'm concerned, the fact that my main "competitor" quick.db has switched to better-sqlite3 in order to remove promises was probably due in part to my own switch, but it's undeniably popular, having itself reached a million downloads this very week. Both of us have a great deal of gratitude owed to Joshua for our popularity and continued success. |
2 Cents, since I stumbled across this: I'm using better-sqlite3 professionally since 3 years and never been disappointed. The blocking was never a problem to us, but that's because SQLite is very fast (if handled properly; PRAGMA page_size / temp_store / journal_mode / synchronous anyone? 😀 ) and the task at hand was focused build and reading those gb's of sqlite files. I can see why it's still common to use sync calls for simplicity and I didn't dig deep enough to get an idea of the current involved overhead for async writes. I can also see why for other tasks, blocking the main thread can be an issue. Node.js already handles async writes in its own thread and the best CPU is the one with 100% load all the time. The performance overhead regarding async dropped alot since the initial choice for sync, so with async/await, which increases simplicity regarding asynchroniousity and the performance increases for promises, there might be a chance to go this async route nowadays without losing too much performance or even adding some. After all, not blocking the main thread might parallel the preparation of upcoming queries while writing the current one. Have there been PR's for direct asynchroniousity or only for worker threads? Either way, better-sqlite3 never failed us. We have some complex things going on and this package has never been an issue, but a huge benefit regarding performance. |
I am looking back into
However, I still have problem with Electron, where |
|
I'd like to provide an esoteric, but possibly relevant to some, reason to use better-sqlite3: when I tried to use it and sqlite3 on a FreeBSD system, better-sqlite3 successfully set itself up and compiled the C-language sqlite3 stuff for me, where sqlite3 failed to compile and thus couldn't install. YMMV, of course! |
can we do sqlite migration calls with this? |
I did a migration script ( import fs from 'fs'
import { Database } from 'better-sqlite3'
import { join } from 'path'
type Migration = ReturnType<typeof parseMigrationFile>
type MigrateParams = Partial<ReturnType<typeof getDefaultParams>>
/**
* Returns a parsed migration file if the file corresponds to the filename
* schema of xxx.yyyyy.sql, where `x` is a number, and `y` anything.
* @param root root directory
* @param filename potential sql file
*/
const parseMigrationFile = (root: string, filename: string) => {
const [, id, name] = filename.match(/^(\d+).(.*?)\.sql$/) || []
if (!name) {
return null
}
const [up, down] = fs
.readFileSync(join(root, filename), 'utf8')
.split(/^--\s+?down\b/im)
.map((part) => part.replace(/^--.*?$/gm, '').trim())
return { id: Number(id), name, up, down }
}
/**
* Loops through files in a directory and extracts the migration SQL files
* @param migrations_directory a directory containing SQL files
*/
const readMigrations = (migrations_directory: string) =>
fs
.readdirSync(migrations_directory)
.reduce((acc, file) => {
const props = parseMigrationFile(migrations_directory, file)
return props ? [...acc, props] : acc
}, [] as ReturnType<typeof parseMigrationFile>[])
.sort((a, b) => a.id - b.id)
/**
* Create a database table for migrations meta data if it doesn't exist
* @param db the database connection
* @param table the table name
*/
const createMigrationsTable = (db: Database, table: string) => {
db.transaction(() => {
db.prepare(
`CREATE TABLE IF NOT EXISTS "${table}" (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
up TEXT NOT NULL,
down TEXT NOT NULL
)`
).run()
})()
}
/**
* Reads the migrations metatable to see if it is valid.
* Undoes migrations that exist only in the database but not in files.
* Starts with the latest migration and climbs up. Assumes files are
* sequential, so if it only removes superfluous migrations, leaving the
* rest as they are
* @param db database connection
* @param table table name
* @param is_valid a function that determines if a migration is valid. If it is,
* then the database is now in sync, and the function ends
*/
const syncDatabaseMigrations = (
db: Database,
table: string,
is_valid: (id: number) => boolean
) => {
// Get the list of already applied migrations
let dbMigrations: Migration[] = db
.prepare(`SELECT id, name, up, down FROM "${table}" ORDER BY id ASC`)
.all()
const remove_dbMigration = (id: number) =>
(dbMigrations = dbMigrations.filter((migration) => migration.id !== id))
const remove_migration = db.prepare(`DELETE FROM "${table}" WHERE id = ?`)
const reversedDbMigrations = dbMigrations.slice().reverse()
for (const { id, down } of reversedDbMigrations) {
if (is_valid(id)) {
break
}
db.transaction(() => {
db.exec(down)
remove_migration.run(id)
remove_dbMigration(id)
})()
}
const lastMigrationId = dbMigrations.length
? dbMigrations[dbMigrations.length - 1].id
: 0
return lastMigrationId
}
/**
* Returns default parameters for the migrate function
*/
const getDefaultParams = () => ({
table: '_meta_migrations',
migrationsDirectory: 'migrations',
reapplyLast: true
})
/**
*
* @param db a database connection
* @param config optional configuration to specify migrations directory and/or
* metadata migration table name
*/
export const migrate = (db: Database, config: MigrateParams = {}) => {
const { table, migrationsDirectory, reapplyLast } = {
...getDefaultParams(),
...config
}
const migrations = readMigrations(migrationsDirectory)
const lastFileMigrationId = migrations[migrations.length - 1]?.id
const migration_exists = (id: number) => {
return (
migrations.some((m) => m.id === id) &&
(!reapplyLast || id !== lastFileMigrationId)
)
}
createMigrationsTable(db, table)
const lastDbMigrationId = syncDatabaseMigrations(db, table, migration_exists)
// Apply pending migrations
const add_migration = db.prepare(
`INSERT INTO "${table}" (id, name, up, down) VALUES (?, ?, ?, ?)`
)
for (const { id, name, up, down } of migrations) {
if (id > lastDbMigrationId) {
db.transaction(() => {
db.exec(up)
add_migration.run(id, name, up, down)
})()
}
}
} |
Can better-sqlite3 open, read and write to a .db file created by sqlite3? |
It should absolutely be able to do that, yes, since sqlite3 is a standard file format and better-sqlite3 is just the "client" to that file :) |
when will it offer encryption option? |
- Which provides transaction / rollback wrapper (I heared talk on IRC that we do everyting transactional... we didn't do anything transactional) - Is synchronous, so we can test the state of the DB beforehand (instead of after we already told everyone it's ok, like wtf why did we even check for schema version when we don't do anything about missmatchs??!) - much much faster - provides more functionality we might use in the future, you never know ¯\_(ツ)_/¯ - less bugs (according to better-sqlite3 team) Also read: https://github.com/JoshuaWise/better-sqlite3#why-should-i-use-this-instead-of-node-sqlite3 WiseLibs/better-sqlite3#262 (comment)
- Which provides transaction / rollback wrapper (I heared talk on IRC that we do everyting transactional... we didn't do anything transactional) - Is synchronous, so we can test the state of the DB beforehand (instead of after we already told everyone it's ok, like wtf why did we even check for schema version when we don't do anything about missmatchs??!) - much much faster - provides more functionality we might use in the future, you never know ¯\_(ツ)_/¯ - less bugs (according to better-sqlite3 team) Also read: https://github.com/JoshuaWise/better-sqlite3#why-should-i-use-this-instead-of-node-sqlite3 WiseLibs/better-sqlite3#262 (comment)
- Which provides transaction/rollback wrapper (I heard talk on IRC that we do everything transactional... we didn't do anything transactional) - Is synchronous, so we can test the state of the DB beforehand (instead of after we already told everyone it's ok, like wtf why did we even check for schema version when we don't do anything about mismatches??!) - much much faster - provides more functionality we might use in the future, you never know ¯\_(ツ)_/¯ - fewer bugs (according to better-sqlite3 team) Also read: https://github.com/JoshuaWise/better-sqlite3#why-should-i-use-this-instead-of-node-sqlite3 WiseLibs/better-sqlite3#262 (comment)
I did a quick test for my use case comparing both. Not sure if it's relevant but I have the query ( |
Hello @JoshuaWise, When you made the initial design decision that you decided to do it synchronously rather than asynchronously, what were your reasons? I particularly can't understand why to do something synchronous if at the time you started doing the project (correct me if I'm wrong) we already had ES6 with async await. The only sensible and rational reason that seems to exist, in my opinion, is to avoid callback problems, right? Is there a reason for this design decision? I saw some comments here in the thread above that if blocking the main thread is a problem, we have this or that option. So, how blocking the main thread might not be a problem for someone in an application that is continually receiving events and might start processing a new event while waiting for an intermediate OI from a previous event. |
I imagine that all the benchmarks don't take into account the time lost by blocking the main thread, right? |
Already discussed here: #181 (comment) Summary: SQLite can only do one transaction at a time, so doing it in another thread doesn't enable parallelism or anything like that, which you would expect from a normal file API. So doing that thread management is just wasted and it actually slows you down. Also, keeping a transaction open across async (
The benchmarks measure throughput of SQLite operations. The concept of "blocking" being bad is only relevant when the same thread is doing many different things, which it isn't in a benchmark. Anyways, you can use SQLite in a worker thread if blocking becomes an issue. However, in many cases it won't even be an issue because, with proper indexes, most transactions can complete in microseconds (except for full table scans). |
Asynchronous database I/O is hell even for the biggest companies like Facebook. I discovered a bug by coincidence when I was signing up on Facebook. I had a broken mouse that double clicks instead of clicking in a very small interval. When I clicked sign up, double Facebook accounts with the same email were created. Facebook doesn't allow to create a duplicate account with same email but that's what happened. If you want asynchronous database I/O fine go ahead but don't be surprised by the amount of bugs that could occur for you. |
Wonderful explanation, @JoshuaWise !! I'd forward it to the many when they wonder 🚀 |
Sharing some thoughts, in case it's useful or also in case I am not understanding something :)
Mistakes just happen... if for some reason a query takes a few seconds to complete, it means that the problem has to get caught first and then the query fixed and moved to a worker. I am evaluating drizzle-orm which seems a nice option. I see that also in those cases "async/await" is present but better-sqlite can't be used. |
Prisma uses node-sqlite3 as their SQLite engine and people are complaining about it already. prisma/prisma#2825 |
Thank you @JoshuaWise for this package. Apart from what was said already regarding sync vs. async and performance, the API design is just great. It really feels like an extension of the SQLite design/philosophy into the JS/Node world. The documentation is also very good. |
You could have one thread per SQLite database, it would make sense. It would enable parallelism with other I/O, and with JS processing on the main thread. It is really a bad practice to implement an I/O on the main thread. |
i built a pos system with it and i was lazy so i didn't have pagination and said i will add it in an update. that was 5 month ago and i may not need to add pagination for two more years after finding that the slowest pc can fetch 10M orders in 500ms |
I'm sorry, but this just doesn't make any sense to me. I have operations that take several seconds due to the volume of some tables. This is running on the client side. With everything being synchronous, the process is blocked for that time while it has a million other things to do, including responding to user events, since this data modification is not crucial for the application startup. Sure, there are a million ways to solve this, but what absolutely makes no sense to me is why this wasn't solved exactly the way JS does it, without blocking the process and delivering everything out of the box? Why leave the overhead of having to solve optimization details to the developer while the runtime delivers the ready solution? Excuse me, but it makes absolutely no sense to me. If we were to spin up workers or anything that "emulates" threads in Node, I'd rather use Python, which has much better quality libs. What's the point of using JS if we need to program in a blocking manner? I don't understand the reasoning behind this choice. This would only deliver more performance, more efficiency, more ease. For those who don't want concurrency issues, the language delivers everything ready to avoid race conditions. Anyway, just venting. |
For me, the answer to the question in this thread is, no, I'm not convinced to use better-sqlite3. For my case, the fact that it's blocking inevitably makes it absurdly less performant, and to make it performant again, I have to waste time messing with things I shouldn't need to. So, my recommendation is, use a non-blocking library; it'll certainly perform better if you're not willing to spend time dealing with these details to fine-tune things. |
That's the architecture choice of JavaScript, non-blocking. If you need to do something blocking, in my view, unless you have an absurdly clear and robust reason, you're probably doing something wrong. If you're worried about race conditions, just use await and you're good to go (for same context, for other contexts is not a race conditions, is only other access to other parts of DB); none of the arguments above make any sense. |
I used it in electron js for a simple pos system so blocking the thread for 100ms isn't a problem considering the performance of my competition. There is no right tool for every case, i was just saying that it's very fast. |
There are plenty of plug-and-play Worker Thread libraries if you need to run queries outside the main thread. A solution like that will always perform faster than using the non-blocking Your only argument seems to be "Node.js does it this way and therefore it's the best and only way", which is simply not true. If your only use-case is to run very slow read-only queries at a low throughput, then yeah, the |
There's really no merit in venting on the internet about the design choices of libraries given to the public for free, which you're not paying for, and you're not forced to use. If you don't like it, you can go make your own. Constructive criticism backed by real-world data is always welcome. |
Don't get me wrong, @JoshuaWise , I'm not a native speaker of the language. When I searched for the term, I thought in my native language. The idea I wanted to convey here was actually the opposite. Of course, the library is yours, and you've done and continue to do an excellent service to the community by making it available to everyone. What I meant was precisely in that sense, saying that I didn't want to criticize you or your design decisions here because, after all, as you rightly said, it's your project, and no one is obligated to use it. I've been using another one for over a year because for me, being non-blocking means real performance. I don't make so many queries that any performance gain with this or that library means anything; hardly an application running on the client side makes enough queries to justify it, but if there are heavy queries, being blocking will always cost a lot. I think you've done and continue to do good work here, including following this thread and being willing to respond even when you wouldn't have to. I and everyone here are grateful for that. I just came here and wanted to express this design decision of mine for my projects, so I can better inform others, not to suggest that you change anything or criticize anything. If for your project and so many other projects, being blocking doesn't cause problems, then great, but it's not the case for everyone. I think using a solution like the one you recommended to make it non-blocking goes against the whole purpose of using a library, which is to save time and keep complexity low. If it's about making workarounds, I don't think it's worth using a lib. Simply an Again, I just wanted to leave this recorded here for indexing purposes and maybe help someone in the future with these points. I'm sorry if you felt attacked or insulted by anything; I can tell you absolutely that it wasn't my intention, and thank you for the excellent work in the library! |
Maybe a good idea to re-evaluate the benchmark from 2020 with todays result: https://github.com/WiseLibs/better-sqlite3/blob/master/docs/benchmark.md#results. I'm just curious. I understand that doing a benchmark is not easy. And doing a good benchmark is harder. And executing a fair benchmark is even more difficult. But yeah... |
For my current situation, I am about to switch to better-sqlite3. Here's why: I am updating an old personal Electron application that uses SQLite3. I was strongly tempted to convert from an ipcRenderer.send request/ipcMain.send response model to ipcRenderer.invoke/ipcMain.handle - it fits the logic better. BUT, that is problematic with node-sqlite3. Its database operations (get, all, etc.) are asynchronous and do not return promises. I found no obvious way to have the handler await the completion of the node-sqlite3 operation rather than return prematurely. (The send/send model doesn't have this issue, since the ipcMain.send responses are in the completion callback.) So, better-sqlite3's synchronous model fits the Electron ipcRenderer.invoke/ipcMain.handle model well, while node-sqlite3's particular asynchronous model would be a force fit. Your mileage may vary. |
Reviving this discussion once again, taking advantage of the performance topic, to ask if running the db in memory provides extra performance by avoiding some disk IO bottleneck? Theoretically, yes, but has anyone ever tried it? I suppose this is more of a SQLite question than a |
Even if the disk i/o is serialized (which need not be true for SSD drives), there are multiple opportunities for parallelism higher up in the stack. Caches, indexes, computations can all be parallelized.
But that doesn't address the possibility of this library improving in performance even further due to async. The overhead of switching contexts in the event loop is extremely negligible compared to the rest of the operations (caching, indexing, disk i/o). So, I agree with @ronburk, that there might be a lot being left on the table with a sync API.
You could achieve the same ergonomics from an async API using Async Context. Some of those features are relatively new, so compatibility for very old versions of node might be a problem. Supporting an async API maybe extra work, and of course it is the choice of the maintainers whether they want to invest in it. But I think there are benefits to it. |
Another use case for parallelism: my server currently handles several dozen SQLite files. Some of them contain files such as images and PDFs. I'm reluctant to go the synchronous way for all I/O. |
What makes me hesitant to use better-sqlite3 is because node-sqlite3 more popular, if this is really
"better" why is the other party more popular?
The text was updated successfully, but these errors were encountered: