I never stop being impressed at how often people will jump to odd, unsupportable, conclusions like, "using MySQL will make this thing faster".
I've seen it so many times over the years regarding users and email configurations. I can't count the number of times I've dropped into someone's badly behaving mail configuration and found they had MySQL hosting the users, and explained it was for "performance" reasons. Somehow they didn't grasp that /etc/passwd fits entirely in memory, and the map files Postfix uses for various lookups and stuff are already a database (just one specifically designed for the task at hand) and also fit entirely in memory. Putting that MySQL layer in there is disastrous if performance matters; it is orders of magnitude slower for any case I've seen...still plenty fast for most cases, but it's ridiculous that this idea gets cargo-culted around that if you store your mail users in MySQL your mail server will be faster.
A little knowledge is a dangerous thing, is what I'm trying to say, and people who know MySQL is "fast" may not know enough to know that it's not the right tool for the job in a lot of cases...and is probably slower for many use cases. I'm pretty confident this is one of those cases. SQLite is wicked fast on small data sets, and being smaller means more of it will fit in memory; I can't think of any way MySQL could be a more performant choice for this workload.
Also, I don't even want to try to imagine shipping an installable desktop application for non-technical users that relies on MySQL!
Also, I don't even want to try to imagine shipping an installable desktop application for non-technical users that relies on MySQL!
It's really not that hard to imagine. There is an embedded library version of MySQL called libmysqld which is made for this very purpose. Of course as you point out it's quite overkill for something that would work perfectly well with Berkeley DBs.
> I never stop being impressed at how often people will jump to odd, unsupportable, conclusions like, "using MySQL will make this thing faster".
That is a good opportunity to ask questions and learn about the person, their knowledge and their use case.
Yeah sometimes the answer is they don't know how stuff works underneath and are just following a blog post or a trend or heard someone talk about it at meetup.
Programmers love programming tools / frameworks / languages. There are so many new shiny things out there. New ones pop up every month. New language, new build system, new db, new UI library, new message queue etc. Interstingly over the years, lots of those have had great performance improvements. Maybe they took advantage of new kernel API or some new algorithm and so on. That feeds the hype -- put new shiny things in and performance doubles.
So people have been trained to expect all the new things to just magically be better. But like said, you have to probe and find out the reason they propose the change, only then you can only see if it is justified or not (one of the resonable responses could be "I've actually measured with a production load and it does look better"),
Its mostly about concurrency, sqllite serialises all writes, for an embedded database in a single user application that is ok, but for a web application that may have a lot of concurrent writes going on its a performance disaster area. Note that for good perfomamce on a write heavy web application even mysql can be a problem unless you use innodb. As myisam has the same write table locking behaviour.
So many times I see people handing out a root password to edit /etc/password. Sure, it may be crazy for performance but, even a trivial number of users and aliases are more easily managed with a remote mysql client. Everyone that has used postfix in a production environment knows the pain otherwise, the rest feel free to vote down.
> Also, I don't even want to try to imagine shipping an installable desktop application for non-technical users that relies on MySQL!
Amarok music player on KDE linux desktop environment (at least on version 3.5) kept the music collection metadata in database. You could choose sqlite, mysql or postgress during installation if I remember correctly. It worked ok, because of the package system.
I guess if you used separately installed databases there would be some conflicts, though.
Not faster perhaps but with sqlite3 I had locking issues and after I switched to mysql I don't. It's really that simple. The app was used by a small office of 40 people at first, then started being used by close to a hundred people and that's when the locking errors began.
Wow, the article is such a fresh breath of air, primarily because the author demonstrates common sense.
He (they?) picked SQLite for all the correct reasons:
- best tool for the job for their situation;
- write-light and read-heavy;
- zero configuration;
- easy to embed;
- understanding that optimizing queries by far gives the best performance in the shortest amount of time.
As an aside, I'm currently using SQLite for Bacula and Postfix, and it's a joy to use; the only drawback I found so far is lack of REGEXP REPLACE in the SQL dialect which the database supports (must be loaded with .load /path/to/lib/libpcre.so, but it is not part of the language). I used the Oracle RDBMS for my PowerDNS deployments, but in retrospect, the way PowerDNS works, SQLite would have been an even better match. All in all, it is great to read that someone picked it for all the correct reasons, rather than some fashion trend, as is often the case in computer industry.
Premature optimization is evil, but preemptive optimization is necessary unless you want to paint yourself into a corner. I realized this after implementing a bitcoin full node.
In my bitcoin implementation, as an experiment, I tried storing the blockchain in sqlite, postgres, and leveldb. I gathered up a bunch of data from the first ~200k blocks of the blockchain and benchmarked all three databases. I queried for something like 30,000 utxos out of a set of a couple million. What took 300-400ms in leveldb took 1.6 seconds in postgres (on the repl. in my actual node it would have taken longer due to deserialization of the utxos). What took 1.6 seconds in postgres took over 30 seconds in SQlite.
Now, you can tell me I did the benchmarks wrong, and "oh, if you just did this it would be faster!", but 30+ seconds is slower to an absolutely insane level. Needless to say, I went the key-value store route, but I was still astounded at how slow sqlite was once it got a few million records in the database.
I actually like sqlite, but when you know you're going to be dealing with 70gb of data and over 10 million records, preemptive optimization is the key. If I were the author, I would consider switching to postgres if there are over 500k-1m records to be expected. That being said, if they're partial to sqlite, SQLightning (https://github.com/LMDB/sqlightning) looks pretty interesting (SQLite with an LMDB backend).
edit: To clarify, these weren't particularly scientific benchmarks. This was me timing a very specific query to get an idea of the level of data management I was up against. Don't take my word for it.
What operations were you doing in your benchmark? What was SQLite actually doing (CPU?, disk I/O?) while taking 30 seconds to finish? This would be a lot more useful and less FUDdy with more detail.
> What took 300-400ms in leveldb took 1.6 seconds in postgres (on the repl. in my actual node it would have taken longer due to deserialization of the utxos). What took 1.6 seconds in postgres took over 30 seconds in SQlite.
I'm sorry to say but you're almost certainly doing something wrong then.
With relational databases, you absolutely can see subsecond to 30+ second ranges for the same query on the same data, if you don't have proper indexing or stats on your tables.
Between that, and packages readily available on most Linux and BSD distros out there (and, in most cases, installed by default), it's well on its way to become a de facto standard system API for relational storage.
It's amusing how Microsoft's different departments keep trying to kill off each other. They spent so much time trying to shove SQL Server Express down everyone's throat, and now everyone gets SQLite included instead.
Like his POST++ has direct mapping to C++ classes so if you use C++ then you don't need any ORM.
In my Sciter[1] Engine I am using his DyBase library [3] as a bult-in persistence for Sciter's script [2] (JavaScript++).
With the DyBase in script you have features similar to MongoDB (noSQL free-form DB) but without any need for ORM and DAL - you can declare some root object as be persistable and access those data trees as if they are JavaScript objects. The engine pumps objects from DB into memory when they are needed:
var storage = Storage.open(...);
var dataRoot = storage.root; // all things inside are persitable
dataRoot.topics = []; // flat persistable list
dataRoot.topics.push({ foo:1, bar:2 }); // storing object
/* create indexed collection with string keys, keys can be unique or not */
dataRoot.titles = storage.createIndex(#string);
DyBase has Python bindings too.
[1] http://sciter.com - multiplatform HTML/CSS UI Engine for Desktop and Mobile Application
I was unfamiliar with this project and assumed it was a hosted service at first. Not so, this is a local application, so an embedded database makes sense.
It took until the very last paragraph for the blog post to make that point.
FWIW, I've run SQLite in a few production sites (low 6 figure pageviews per month) and it has worked fantastically. If you understand and work with the limitations, it really is amazing how much you can get out of it.
I'm actually surprised WordPress hasn't ever moved over to it for ease of installation/deployment - WordPress and PHP seem more likely to trip over in most deployments I've seen before SQLite would.
Agreed. Not that the 3 items are not relevant, but the fact that it's a local application sounds to me like the most decisive argument in favor of an embedded DB.
SQLite also does remarkably well with recovering from all manner of power loss / crashes / worst case scenarios. We created a "power loss" rig just to test this facility for one particular system. Really SQLite's biggest weakness is concurrency, and if your app needs that in any serious amount you probably ought to look elsewhere. If you're just dealing with occasional concurrency though SQLite shouldn't be dismissed out-of-hand.
The breadth and depth to which SQLite is tested[0] is both admirable and inspiring.
It's not just "internal" tests, like unit tests or things that can be run from client C programs, but tests of all kinds of hard-to-simulate external situations like out-of-memory and power failure situations.
I gained a newfound respect for SQLite recently after reading Dan Luu's post on file consistency (http://danluu.com/file-consistency/). I had always thought of SQLite as a bit of a toy database, but having read that post I was surprised by how rigorously it appears to have been developed.
It works very well for concurrent readers and can handle one concurrent writer when set to WAL mode. Saying that it's no good at concurrency is selling it short. It's blazing fast as long as the use case doesn't require multiple concurrent writers. In fact, it will be faster than client-server databases when you aren't hitting a weak point like that, since it has no IPC overhead.
For any database that isn't huge, a library embedded into your application is going to be faster than anything that has to communicate with a server over a socket connection. Though both execute SQL queries, SQLite is completely different than relational database servers and appropriate many places where running a full RDBMS is not. For example, you can't run MySQL or Postgres on the iPhone, but you can use SQLite.
> we have almost no indices, no principled denormalization
Sounds like an easy win. People are probably suggesting a database switch because they're finding issues with the current speed, but they're not using their current database to its full potential yet.
Is the bottleneck even the database in the first place? From the article, that doesn't seem to be the case:
> The main case when beets writes to its database is on import, and import performance is dominated by I/O to the network and to music files.
Small file I/O is universally slow, even on SSDs, and if you're hitting rate-limited MusicBrainz servers for each file, database performance is almost irrelevant.
Sqlite is fine for small scale systems. It is not a "web scale" database, but not every web site is "web scale."
SQLite does have performance limits, and will break at certain load, but until that, it's okay.
For single user databases, like desktop applications, SQLite is awesome!
What the others bring to the table is concurrent sever performanc, user management, and such.
There's nothing surprising about this, right?
I'm a long time user and lover of SQLite, since way back when. Use it in a lot of our projects (web and Win32) that require local databases for logging etc.
Sure for larger or concurrent user access to a db, we use other databases, but nothing beats the 'zero footprint' install of SQLite. I even enjoyed the fact that earlier versions of SQLite forced me to think optimise my queries due to the lack of nested SELECTs.
SQLite still kind of reminds me of my early days in MySQL. I was recently trying to downgrade MySQL 5.7 to 5.6 after a slew of issues, which forced me to reminisce about how simple things used to be when MySQL was a lot like SQLite still is now...
Beet is an awesome program, you should really check it out if you still are among the minority of people who actually have a music collection and don't rent access from spotify/itunes/etc.
I'm glad to see this post; one of the reasons that I like beet so much is that everything is self-contained.
You're doing it right for your application! MySQL or PostgreSQL would most probably be slower and introduce a lot more overhead as they are client/server oriented systems. Don't listen to those armchair architects!
I don't get the point of this article. SQLite is fine, especially in an embedded database, but once you have concurrent access, it starts to suffer because it has very coarse grained locks, so a "real" database is better for a distributed single-DB design. It's more about using the right tool for the job, and the author seems to be talking himself out of some kind of guilt for SQLite being the right tool for him.
I can't blame them. I've been a huge fan of SQLite for years. Anytime I need storage it's my default choice unless there is a specific reason to use something else.
Another nice advantage of it is if you are distributing something that requires a small dataset[0][1]. If I give you both the code and the data already imported into a sqlite database, then you can use the code right away, or you can dump the data to a different database very easily.
I have used SQLite for similar use cases, but occasionally it's led to a corrupted db. I had a cron task writing to it once a day, but an issue with the scheduler led to 2 tasks one day with the latter one finishing before the former.
Of course I can add locking or something in my code, but I'd prefer to handle at a lower level — for example, have SQLite take the latest write without corrupting. I'm hoping someone has solved this problem with SQLite elegantly.
SQLite is supposed to handle multiple concurrent writes by returning SQLITE_BUSY. I'm imagining hypothetical other causes for your corruption problem, like power loss at a bad moment.
Haven't used SQLite in a while, but this is how I did it.
You should enable WAL (pragma journal_mode=wal) I believe you need to do this on every connection.
Now, whenever you are accessing the data do it within transactions (https://www.sqlite.org/lang_transaction.html) generally SQLite will acquire proper locks when needed (it will minimize amount of locked time by deferring locks until they are needed, but you can use immediate or exclusive modifiers. If you for example put exclusive modifier only one program will be able to read/write at the time, you generally will want to use defer (default) behavior since it still provides the safety but multiple applications still can access the database at the same time)
Also note that locking might not work if you keep the database on network file system.
[+] [-] SwellJoe|9 years ago|reply
I've seen it so many times over the years regarding users and email configurations. I can't count the number of times I've dropped into someone's badly behaving mail configuration and found they had MySQL hosting the users, and explained it was for "performance" reasons. Somehow they didn't grasp that /etc/passwd fits entirely in memory, and the map files Postfix uses for various lookups and stuff are already a database (just one specifically designed for the task at hand) and also fit entirely in memory. Putting that MySQL layer in there is disastrous if performance matters; it is orders of magnitude slower for any case I've seen...still plenty fast for most cases, but it's ridiculous that this idea gets cargo-culted around that if you store your mail users in MySQL your mail server will be faster.
A little knowledge is a dangerous thing, is what I'm trying to say, and people who know MySQL is "fast" may not know enough to know that it's not the right tool for the job in a lot of cases...and is probably slower for many use cases. I'm pretty confident this is one of those cases. SQLite is wicked fast on small data sets, and being smaller means more of it will fit in memory; I can't think of any way MySQL could be a more performant choice for this workload.
Also, I don't even want to try to imagine shipping an installable desktop application for non-technical users that relies on MySQL!
[+] [-] spudlyo|9 years ago|reply
It's really not that hard to imagine. There is an embedded library version of MySQL called libmysqld which is made for this very purpose. Of course as you point out it's quite overkill for something that would work perfectly well with Berkeley DBs.
[+] [-] Sami_Lehtinen|9 years ago|reply
[+] [-] rdtsc|9 years ago|reply
That is a good opportunity to ask questions and learn about the person, their knowledge and their use case.
Yeah sometimes the answer is they don't know how stuff works underneath and are just following a blog post or a trend or heard someone talk about it at meetup.
Programmers love programming tools / frameworks / languages. There are so many new shiny things out there. New ones pop up every month. New language, new build system, new db, new UI library, new message queue etc. Interstingly over the years, lots of those have had great performance improvements. Maybe they took advantage of new kernel API or some new algorithm and so on. That feeds the hype -- put new shiny things in and performance doubles.
So people have been trained to expect all the new things to just magically be better. But like said, you have to probe and find out the reason they propose the change, only then you can only see if it is justified or not (one of the resonable responses could be "I've actually measured with a production load and it does look better"),
[+] [-] thawkins|9 years ago|reply
[+] [-] mianos|9 years ago|reply
[+] [-] ajuc|9 years ago|reply
Amarok music player on KDE linux desktop environment (at least on version 3.5) kept the music collection metadata in database. You could choose sqlite, mysql or postgress during installation if I remember correctly. It worked ok, because of the package system.
I guess if you used separately installed databases there would be some conflicts, though.
[+] [-] INTPenis|9 years ago|reply
[+] [-] kyriakos|9 years ago|reply
[+] [-] Annatar|9 years ago|reply
He (they?) picked SQLite for all the correct reasons:
- best tool for the job for their situation;
- write-light and read-heavy;
- zero configuration;
- easy to embed;
- understanding that optimizing queries by far gives the best performance in the shortest amount of time.
As an aside, I'm currently using SQLite for Bacula and Postfix, and it's a joy to use; the only drawback I found so far is lack of REGEXP REPLACE in the SQL dialect which the database supports (must be loaded with .load /path/to/lib/libpcre.so, but it is not part of the language). I used the Oracle RDBMS for my PowerDNS deployments, but in retrospect, the way PowerDNS works, SQLite would have been an even better match. All in all, it is great to read that someone picked it for all the correct reasons, rather than some fashion trend, as is often the case in computer industry.
[+] [-] chjj|9 years ago|reply
In my bitcoin implementation, as an experiment, I tried storing the blockchain in sqlite, postgres, and leveldb. I gathered up a bunch of data from the first ~200k blocks of the blockchain and benchmarked all three databases. I queried for something like 30,000 utxos out of a set of a couple million. What took 300-400ms in leveldb took 1.6 seconds in postgres (on the repl. in my actual node it would have taken longer due to deserialization of the utxos). What took 1.6 seconds in postgres took over 30 seconds in SQlite.
Now, you can tell me I did the benchmarks wrong, and "oh, if you just did this it would be faster!", but 30+ seconds is slower to an absolutely insane level. Needless to say, I went the key-value store route, but I was still astounded at how slow sqlite was once it got a few million records in the database.
I actually like sqlite, but when you know you're going to be dealing with 70gb of data and over 10 million records, preemptive optimization is the key. If I were the author, I would consider switching to postgres if there are over 500k-1m records to be expected. That being said, if they're partial to sqlite, SQLightning (https://github.com/LMDB/sqlightning) looks pretty interesting (SQLite with an LMDB backend).
edit: To clarify, these weren't particularly scientific benchmarks. This was me timing a very specific query to get an idea of the level of data management I was up against. Don't take my word for it.
[+] [-] panic|9 years ago|reply
[+] [-] mappu|9 years ago|reply
fsync()ing a couple hundred thousand individual INSERTs isn't fast.
[+] [-] orf|9 years ago|reply
I'm sorry to say but you're almost certainly doing something wrong then.
[+] [-] ww520|9 years ago|reply
Can you share what the table schema is? What the queries are? And what indices built for the Postgres and Sqlite tables?
[+] [-] ams6110|9 years ago|reply
[+] [-] int_19h|9 years ago|reply
https://engineering.microsoft.com/2015/10/29/sqlite-in-windo...
Between that, and packages readily available on most Linux and BSD distros out there (and, in most cases, installed by default), it's well on its way to become a de facto standard system API for relational storage.
[+] [-] creshal|9 years ago|reply
[+] [-] c-smile|9 years ago|reply
So I am speaking about embeddable DBs here.
Konstantin Knizhnik have implemented impressive set of various embedded DBs: http://garret.ru/databases.html
Like his POST++ has direct mapping to C++ classes so if you use C++ then you don't need any ORM.
In my Sciter[1] Engine I am using his DyBase library [3] as a bult-in persistence for Sciter's script [2] (JavaScript++).
With the DyBase in script you have features similar to MongoDB (noSQL free-form DB) but without any need for ORM and DAL - you can declare some root object as be persistable and access those data trees as if they are JavaScript objects. The engine pumps objects from DB into memory when they are needed:
DyBase has Python bindings too.[1] http://sciter.com - multiplatform HTML/CSS UI Engine for Desktop and Mobile Application
[2] TIScript - http://www.codeproject.com/Articles/33662/TIScript-Language-...
[3] DyBase - http://www.garret.ru/dybase.html
[+] [-] niftich|9 years ago|reply
It took until the very last paragraph for the blog post to make that point.
[+] [-] petercooper|9 years ago|reply
I'm actually surprised WordPress hasn't ever moved over to it for ease of installation/deployment - WordPress and PHP seem more likely to trip over in most deployments I've seen before SQLite would.
[+] [-] samps|9 years ago|reply
[+] [-] dudul|9 years ago|reply
[+] [-] emodendroket|9 years ago|reply
[+] [-] Feneric|9 years ago|reply
[+] [-] eudox|9 years ago|reply
It's not just "internal" tests, like unit tests or things that can be run from client C programs, but tests of all kinds of hard-to-simulate external situations like out-of-memory and power failure situations.
[0]: https://www.sqlite.org/testing.html
[+] [-] jtakkala|9 years ago|reply
[+] [-] strcat|9 years ago|reply
[+] [-] omarforgotpwd|9 years ago|reply
[+] [-] greenleafjacob|9 years ago|reply
[+] [-] chillacy|9 years ago|reply
> we have almost no indices, no principled denormalization
Sounds like an easy win. People are probably suggesting a database switch because they're finding issues with the current speed, but they're not using their current database to its full potential yet.
[+] [-] creshal|9 years ago|reply
> The main case when beets writes to its database is on import, and import performance is dominated by I/O to the network and to music files.
Small file I/O is universally slow, even on SSDs, and if you're hitting rate-limited MusicBrainz servers for each file, database performance is almost irrelevant.
[+] [-] jwatte|9 years ago|reply
[+] [-] cyberferret|9 years ago|reply
Sure for larger or concurrent user access to a db, we use other databases, but nothing beats the 'zero footprint' install of SQLite. I even enjoyed the fact that earlier versions of SQLite forced me to think optimise my queries due to the lack of nested SELECTs.
SQLite still kind of reminds me of my early days in MySQL. I was recently trying to downgrade MySQL 5.7 to 5.6 after a slew of issues, which forced me to reminisce about how simple things used to be when MySQL was a lot like SQLite still is now...
[+] [-] tracker1|9 years ago|reply
[+] [-] nickysielicki|9 years ago|reply
I'm glad to see this post; one of the reasons that I like beet so much is that everything is self-contained.
[+] [-] pskisf|9 years ago|reply
[+] [-] oppositelock|9 years ago|reply
[+] [-] jedberg|9 years ago|reply
Another nice advantage of it is if you are distributing something that requires a small dataset[0][1]. If I give you both the code and the data already imported into a sqlite database, then you can use the code right away, or you can dump the data to a different database very easily.
[0] https://github.com/jedberg/wordgen
[1] https://github.com/jedberg/Postcodes
[+] [-] zaphar|9 years ago|reply
Desktop apps are like the sweet spot for sqlite. It's practically made for them.
[+] [-] qwertyuiop924|9 years ago|reply
[+] [-] tedmiston|9 years ago|reply
Of course I can add locking or something in my code, but I'd prefer to handle at a lower level — for example, have SQLite take the latest write without corrupting. I'm hoping someone has solved this problem with SQLite elegantly.
[+] [-] dietrichepp|9 years ago|reply
[+] [-] takeda|9 years ago|reply
You should enable WAL (pragma journal_mode=wal) I believe you need to do this on every connection.
Now, whenever you are accessing the data do it within transactions (https://www.sqlite.org/lang_transaction.html) generally SQLite will acquire proper locks when needed (it will minimize amount of locked time by deferring locks until they are needed, but you can use immediate or exclusive modifiers. If you for example put exclusive modifier only one program will be able to read/write at the time, you generally will want to use defer (default) behavior since it still provides the safety but multiple applications still can access the database at the same time)
Also note that locking might not work if you keep the database on network file system.
[+] [-] franciscop|9 years ago|reply
[+] [-] partycoder|9 years ago|reply