SQLite improves performance with memory-mapped I/O

[+] techtalsky|13 years ago|reply

I like to take every possible opportunity to sing the praises of SQLite, which is one of the most widely deployed RDBMS's in the world. Not only is it open source, but it's public domain, and it's an incredibly stably developed and well-tested piece of software.

Please note that it's not for every application. It is not a client/server architecture. It simply a tightly optimized piece of C code that interacts with a file (or RAM) to create a lightning fast approximation of a SQL server that is ACID compliant.

One thing I've used it for the most, is for small, ad-hoc internal web applications. With the webserver as its "single user", it is incredibly speedy and reliable and holds a shockingly large amount of data (still works pretty well when data gets up into the Terabytes).

[+] ot|13 years ago|reply

> it's an incredibly stably developed and well-tested piece of software.

It also has a policy of safe by default, fast at your risk (mmap is itself an example, it is disabled by default), which is the only sane attitude when working with databases.

Contrast this to "modern" databases such as MongoDB, which is crazy fast out of the box, but then you start enabling durability, synchronous updates, etc... it becomes slower than standard DBs. This makes you win benchmarks, but in the long term causes a lot of headache and drives users away.

[+] eliben|13 years ago|reply

> which is one of the most widely deployed RDBMS's in the world

Being inside Chrome, Android, Firefox, Skype and other places (http://www.sqlite.org/famous.html), I'd argue that SQLite is by far the most widely deployed RDBMS in the world :-)

And, BTW, I love it too.

[+] mongol|13 years ago|reply

It is also an excellent tool to have in your command line toolkit. Rather than | wc | uniq | sort etc, pipe to file and bring into SQLite for more extensive querying.

[+] mercurial|13 years ago|reply

One potential pitfall of SQLite is that datatypes declared in your CREATE TABLE statements have purely indicative value. You can in effect store strings or booleans in an integer column. I'd welcome a 'strict mode' which would do away with weak typing and behave more like traditional SQL databases.

[+] TazeTSchnitzel|13 years ago|reply

I love using SQLite for websites. Unlike MySQL, SQL is a well-written and predictable database, there's no daemon, users or permissions to worry about, and the runtime is built into Python!

[+] gerbil|13 years ago|reply

I have to agree wholeheartedly. I find SQlite a great choice for web apps and small DBD websites as it's so much easier to deploy and more than fast-enought for most projects. Plus it is easy to upgrade if the need arises as the queries are for the most-part identical to SQL.

[+] wslh|13 years ago|reply

Yes, and I don't understand why Wordpress does not support it officially and we need to use MySQL instead.

[+] bane|13 years ago|reply

Fun SQLite story, I had a project that needed to do reasonably large scale data processing (gigabytes of data) on a pretty bare boned machine in a pretty bare boned environment at a customer location. I had a fairly up-to-date perl install and notepad. For the process the data needed to look at any single element in the data and find elements similar to it. I thought long and hard about various complex data structures and caching bits to disk to make portions of the problem set fit into memory, and various ways of searching the data. It was okay if the process took a couple days to finish.

It suddenly hit me, I wonder if the sysadmins had installed DBD::SQLite...yes they had!

Why mess with all that crap if I could just use SQLite. Reasonably fast indexing, expressive searching tools, I could dump the data onto disk and use an in memory SQLite db at the same time, caching solved. It turned weeks of agonizing work (producing fragile code) and turned it into a quick 2 week coding romp. The project was a huge success and spread to a couple other departments.

One day they asked if I could advise a group to build a more "formal" implementation of the system as they were finding the results valuable. A half dozen developers and a year later they had succeeded in building a friendlier web interface running on a proper RDBMS (Oracle 10something) with some hadoop goodness running the processing stuff on 8 or 9 dedicated machines.

In the meantime, and largely because SQLite let me query into the data in more and more useful ways that I hadn't foreseen, I had extended the original prototype significantly and it was outperforming their larger scale solution on a 3 year old, very unimpressive, desktop with the same install of perl and sqlite. On top of the raw calculations, it was also performing some automated results analysis and spitting out ready to present artifacts that could be stuck in your presentation tool of choice. Guess which ones the users actually used?

Soon I had a users group, regular monthly releases and was running a semi-regular training class on this cooky command line tool (since most users had never seen a command line). I've since finished that job up and moved on, and last I heard my little script using SQLite was in heavy use, the enterprise dev team was still struggling getting the performance of their solution up to the levels of my original script, and hadn't even touched the enhancements yet.

SQLite was rock-iron solid the entire time. I had a few things I wished it could do, but I usually found a way to manage. I wasn't using it for its feature, it existing was the feature. SQLite sprinkled magic on my one-off hack job and turned into a thing that people ended up using.

[+] baudehlo|13 years ago|reply

DBD::SQLite remains one of my greatest successes - so glad you like it.

So many projects I have prototyped using SQLite (although for most I later upgrade to PostgreSQL). Such a great little tool, and building DBD::SQLite the way I did (including all the source) made it so easy for anyone to install.

[+] malkia|13 years ago|reply

I've had similar story - direct port of "sqlite" to "postgres" (localhost connection) made postgresql originally 6 times slower. But then I've made a connection pool connecting N, N2, or N3 connections where N is the number of logical CPU's. The trick was to use pg_export_snapshot, and make all connections to go through the same place.

This way I matched SQLite, but then I've made it x3-x4 faster by using COPY (SELECT *....) in BINARY format (took some time to write binary parser) - and what was loading for 3-4secs from SQLite, loads for 0.8secs from PostgreSQL 9.2 (Windows 7 / 32GB / 6 core / 12 cpu machine).

e.g. - the point is - that if you care, you could go and optimize a lot. This is not a fair story though, since I've used real enums, floats and ints on PostgreSQL which saved quite a lot of space. The db was bigger than SQLite even with that, but somehow faster... though faster only when used through multiple connections.

I could've go the opposite way, and find ways to make SQLite shine in this respect.

One lesson from the game development industry is that it takes 5 years to develop the best looking games at the end of the hardware cycle. And the hardware did not change even bit for the whole time.

It takes a lot of learning, experimenting, trying out strange things until you get good. And even then, someone comes and makes your solution look stupid :) (I hope someone comes and would do that for me, so I just sit, enjoy and smile at his/her code - several times I've seen it).

[+] pilif|13 years ago|reply

To add another war story: back in 2004 I was in the position of having to write software for a very, very underpowered barcode terminal running ucLinux (the intermec ck1, for those interested. Only a year later, it was discontinued (grrr) due to the EU finalizing the RoHS norm).

I needed to synchronize product data from our web application with that scanner in order to display useful info instead of just scanned bar codes.

Thankfully I found a patch for SQLite to work with the greatly reduced stack size in ucLinux (the site is still around here: http://www.menie.org/georges/uClinux/sqlite.html)

Now, after 9 years, one user is still holding on to their device and we're still synchronizing product data to that SQLite database. We never had crashes or data losses (even though recovery would be trivial) and ever since then, every subsequent hardware solution we were using always relied on SQLite for storage.

I don't even want to start thinking about how much pain it would have been to write my own database for that slow ck1 monster. Seeing SQLite run on the device in my serial terminal was butch a lt of fun and a great relief as it just saved me countless hours of work.

Me, I love SQLite :-)

[+] techtalsky|13 years ago|reply

Great story. I suspect there are about a million single user applications out there running oracle and paying huge license fees for db's under a million rows that would have been orders of magnitude easier to set up and faster using SQLite.

[+] Sami_Lehtinen|13 years ago|reply

I love to use SQLite for projects which process data locally and do not have concurrent writes. SQLite is faster than other database alternatives I tried. I guess one of the main reasons is that there's no context switch on every query. Data set sizes are usually tens of million keys and tens of gigabytes of data. Works like a charm. Btw. SQLite was over 10 times faster than some of the slowest SQLite servers I tried. I have 20+ projects using SQLite successfully. Other tested options were MS SQL, PostgreSQL, MySQL, MongoDB and CodernityDB. MongoDB performance isn't so great, when every commit must be persisted.

[+] genwin|13 years ago|reply

Nice! As a longtime developer using a "proper RDBMS" I was amazed at how feature packed and speedy SQLite is for having such a small footprint.

[+] mooism2|13 years ago|reply

I'm intrigued by disadvantage 3 from http://www.sqlite.org/mmap.html :

The operating system must have a unified buffer cache in order for the memory-mapped I/O extension to work correctly, especially in situations where two processes are accessing the same database file and one process is using memory-mapped I/O while the other is not. Not all operating systems have a unified buffer cache. In some operating systems that claim to have a unified buffer cache, the implementation is buggy and can lead to corrupt databases.

Which commonly-used operating systems lack a bug-free unified buffer cache?

[+] glurgh|13 years ago|reply

http://sqlite.1065341.n5.nabble.com/SQLite-3-7-17-preview-2x...

Seems like one where they had trouble was OpenBSD.

[+] lysium|13 years ago|reply

Later on, the say mmapped I/O is always disabled on OpenBSD because of a buggy unified buffer cache.

[+] jamesaguilar|13 years ago|reply

Not sure, but there are references to proven incidents of fsync bugs in the docs. If you get that wrong, it is not hard to believe that the buffer cache could be wrong too.

[+] raphaelj|13 years ago|reply

My rule of thumb :

Does it can be build using SQLite ?

Yes => Use SQLite ;

No => Use PostegreSQL.

Using SQLite on small/medium projects is so better than alternatives : keeps all the ease of text files while giving the full power and reliability of a RDBMS.

[+] qompiler|13 years ago|reply

Also never say "we will move from X to PostgreSQL later on" where X is something like SQLite or MySQL. It will NEVER happen.

[+] mmastrac|13 years ago|reply

When I was building the Stumbleupon toolbar for IE way back in the day (2009ish?), we decided that we were going to use two things that made our life extremely easy: 1) Javascript and 2) SQLite. We basically had some C++ code that exposed SQLite to the embedded JS, and ended up writing the majority of the code in JS rather than C++.

SQLite was a fantastic embedded DB that was trivial to integrate with the JS engine in Win32. I don't recall ever having to deal with any support issues around corrupted databases.

It looks like the IE toolbar is still using SQLite to this day - you can explore stumbledata.db from any SQLite explorer.

[+] mace|13 years ago|reply

The SQLite site continues to offer a wealth of information about SQLite internals. Most of the pages (ex. http://www.sqlite.org/fileio.html) are a good read for anyone interested databases or creating good system software.

[+] lysium|13 years ago|reply

Cool! Though a bit harsh to have it always disabled on OpenBSD. If you're running only one process or core, you could still benefit from mmaped I/O, couldn't you?

[+] unknown|13 years ago|reply

[deleted]

[+] fulafel|13 years ago|reply

It would be nice to see some benchmarks showing these speedups. It's rare for mmap to be a significant win over read/write.

[+] yeukhon|13 years ago|reply

Honestly, nobody should use SQLite even for testing your application. If you are going to run PSQL or MySQL, please install that DB and run tests on them. SQLite or other DBMS have a lot of incomparability. I would consider SQLite if I just need a database for some really really small project, such as a CLI that speaks to an API and I need them saved locally, then I would consider shipping my CLI with SQLite setup.

[+] coldtea|13 years ago|reply

Notice how your advice is devoid of arguments, but full of opinion?

[+] swiil|13 years ago|reply

Congrats Richard on a great new release.

[+] bifrost|13 years ago|reply

I'm not much for using SQLite but using mmap() is a no-brainer for fast file access and I'm surprised they hadn't used it before. I mean, its trivial to use if you know how to use fopen()...

[+] techtalsky|13 years ago|reply

D. Richard Hipp, the inventor and primary developer of SQLite considers the rock-solid reliability of SQLite of primary importance, which is one of the reasons everything from iOS to Skype uses it. (The fact that it has a public domain license doesn't hurt.) He lists the potential pitfalls of using this method on the page: http://www.sqlite.org/mmap.html and leaves it to users to determine if those risks are acceptable to them. I'm sure he's been thinking about this method for a while.

[+] ketralnis|13 years ago|reply

> using mmap() is a no-brainer for fast file access and I'm surprised they hadn't used it before

With mmap, you map a file into process memory and read and write to it as if it were memory. When you write to this memory there isn't really a way to indicate that an error occurred because there's no function call to return an error code. Instead, the process is sent a SIGBUS to indicate that something happened. This causes control to jump from the memory access into the signal handler.

Normally this would be fine and you can handle the error there. But sqlite is a library. It lives in-process with the "real" application using it. What if the program had a signal handler there? You don't want to intercept signals that they need to receive, and you definitely don't want to mask errors that they are getting on their own mmapped files, and you don't know how to handle the errors that they should be seeing instead. And what if they have some crazy green-threading going on that you've now ripped control away from by entering the signal handler?

In this particular case, sqlite's answer is to crash when that happens, taking down the entire host application with it. That's not okay for a lot of applications, and why it defaults to being turned off.

And maybe this particular case does or doesn't apply to sqlite, but there are always cases that your application's domain-specific knowledge of how it uses disk and memory and CPU can out-perform the OS's more general optimisations.

That's not the only issue, and the particular subtleties of mmap aren't important here. What's important is that when very smart people like the authors of sqlite choose to not do something, it's generally not as much as "no-brainer" as some know-it-all types would have you believe. It's easy to read something like Varnish's "always use mmap and leave it to the OS!" manifesto and think that you now automatically know better than all of those silly plebeians that just don't get it, but usually things are more nuanced than that.

> I mean, its trivial to use if you know how to use fopen

I seriously doubt that the difficulty of implementing it is a concern

[+] audidude|13 years ago|reply

> I'm not much for using SQLite but using mmap() is a no-brainer for fast file access and I'm surprised they hadn't used it before. I mean, its trivial to use if you know how to use fopen()...

mmap() is great for reads, but much tougher for writes. this is because it is hard for the kernel to determine what it is you are trying to do. madvise() helps here, but not many projects seem to use it. using plain old write[v]() for writes can be informative enough to the kernel to "do the right thing".

[+] glurgh|13 years ago|reply

It's just a poorly titled mis-link, clicking through describes in more detail why it's not entirely a 'no brainer':

http://www.sqlite.org/mmap.html

tl;dr is "Improves performance in some cases with certain caveats/risks, is disabled by default"

[+] rdtsc|13 years ago|reply

> but using mmap() is a no-brainer for fast file access and I'm surprised they hadn't used it before.

Is it always? I remember benchmark sequential file reads and writes and plain old read and write were sometimes faster or there was no difference.

[+] yareally|13 years ago|reply

I never saw much of a reason for SQLite either until I started doing mobile development. However, it's nearly a must for doing any sort of intensive storage though with Android or iOS (unless use cases are simple enough for XML and JSON).

I just wish doing asynchronous queries with SQLite on Android was not so obtuse and resulted in a bunch of boilerplate or using third party libraries for a core part of the API.

[+] unknown|13 years ago|reply

[deleted]

[+] mikeash|13 years ago|reply

It's a no-brainer for "fast", but you had better brain pretty hard if your goal is "reliable".

80 comments