Show HN: SQLite Transaction Benchmarking Tool
128 points| seddonm1 | 1 year ago |github.com
Explanatory blog post: https://reorchestrate.com/posts/sqlite-transactions/
128 points| seddonm1 | 1 year ago |github.com
Explanatory blog post: https://reorchestrate.com/posts/sqlite-transactions/
leononame|1 year ago
Since I don't know where else to ask, maybe this is a good place: How do async wrappers around SQLite (e.g. for node or python) work? SQLite only uses synchronous I/O if I'm not mistaken. Is it just a pretend async function with only synchronous code?
And, as a follow-up: If I have a server with say 100 incoming connections that will all read from the database, I've got 100 readers. No problem in WAL mode. However, I still could get congested by file I/O, right? Because every time a reader is waiting for data from disk, I can't execute the application code of another connection in a different thread since execution is blocked on my current thread. Is there any benefit to having a thread pool with a limit of more than $NUM_CPU readers?
And one more: Would you recommend actually pooling connections or just opening/closing the database for each request as needed? Could keeping a file handle open prevent SQLite from checkpointing under certain conditions?
matharmin|1 year ago
When using async wrappers, a good solution is connection pooling like you mentioned - exactly the same concept as used by client->server database drivers. So you can have 5 or 10 read connections serving those 100 connections, with a statement/transaction queue to manage spikes in load. It's probably not worth having more connections than CPUs, but it depends a little on whether your queries are limited by I/O or CPU, and whether you have other delays in your transactions (each transaction requires exclusive use of one connection while it's running).
SQLite maintains an in-memory cache of recently-accessed pages of data. However, this gets cleared on all other connections whenever you write to the database, so is not that efficient when you have high write loads. But the OS filesystem cache will still make a massive difference here - in many cases your connections will just read from the filesystem cache, which is much faster than the underlying storage.
Open connections don't block checkpointing in SQLite. The main case I'm aware of that does block it, is always having one or more active transactions. I believe that's quite rare in practice unless you have really high and continuous load, but if you do then the WAL2 branch may be for you.
I feel connection pooling is much more rare in SQLite libraries than it should be. I'm maintaining one implementation (sqlite_async for Dart), but feel like this should be the standard for all languages with async/await support.
seddonm1|1 year ago
All good and valid questions.
1. I work mostly in Rust so I'll answer there in terms of async. This library [0] uses queues to manage workload. I run a modified version [1] which creates 1 writer and n reader connections to a WAL backed SQLite and dispatch async transactions against them. The n readers will pull work from a shared common queue.
2. Yes there is not much you can do about file IO but SQLite is still a full database engine with caching. You could use this benchmarking tool to help understand where your limits would be (you can do a run against a ramdisk then against your real storage).
3. As per #1, I keep connections open and distribute transactions across them myself. Checkpointing will only be a problem under considerable sustained write load but you should be able to simulate your load and observe the behavior. The WAL2 branch of SQLite is intended to prevent sustained load problems.
[0]: https://github.com/programatik29/tokio-rusqlite [1]: https://github.com/seddonm1/s3ite/blob/0.5.0/src/database.rs
simonw|1 year ago
I have multiple threads for reads and a single dedicated thread for writes, which I send operations to via a queue. That way I avoid ever having two writes against the same connection at the same time.
rmbyrro|1 year ago
hsag|1 year ago
[deleted]