The Design Of SQLite4 (work in progress)

[+] bane|13 years ago|reply

Sqlite is really just one of those rare software projects that simply makes the world a better place. No more thinking about file storage formats, searching and indexing strategies, long term file readability (it's almost as future proofed as ASCII), extensible, flexible, fast and all around awesome.

The number of projects where sqlite turned a difficult project from impossible or very hard to doable is almost uncountable.

Here's a fun use case. Generate sqlite databases for each user, let them do whatever they want in that database from whatever interface you are letting them use. Store the sqlite file as a binary blob in a "real" database with associated metadata for fast look up and retrieval, plus whatever other things your database gives you (replication, redundancy etc.) And now you basically get sand boxed work spaces for your users almost for free.

[+] fusiongyro|13 years ago|reply

It's a clever idea. Please don't do it.

SQLite has one very limited write scenario: one writer at a time. Your real database isn't going to understand that. The real database will happily hand out the inner SQLite database to multiple connections. At best, you'll be converting SQLite's locking into a bunch of transaction failures for conflicting updates; at worst, you'll have a race condition where only the last of several edits gets persisted.

On top of that you have the disk churn associated with rewriting the database on every change. Suppose your SQLite database is 20 MB—every write to that means 20 MB from the real database to your application and back, which means 20 MB of disk thrashing too. Guess how many users need to do this before your database is spending all day rewriting essentially unchanged bytes. Depending on your storage backend, you can't even necessarily expect that the old ones are overwritten—with MVCC, the database may wind up just allocating new blocks, in which case you're creating a lot of garbage collection work and wasting a lot of space.

Every real database has a facility to let you isolate schemas from each other. In MySQL, you can create separate databases for each user; in PostgreSQL you can do that or just create separate schemas. There are better solutions to this problem that don't have the drawbacks identified above. It's a cute idea, but in practice I don't think it will work out at all well.

[+] gouranga|13 years ago|reply

That's a filesystem isn't it (not NFS as locks don't work properly on NFS filesystems)?

1. One per user - in home directory. Check.

2. Real database with metadata - filesystem. Check.

3. Replication - filesystem (block replication). Check.

4. Redundancy - filesystem (RAID). Check.

So technically it already does that without any need for further abstraction. You can use UNIX perms to set it read only for all and read-write for user, then you can share it :)

For reference, I've built a HUGE amount of stuff with sqlite3 - you are right it does make the world a better place! We use it for:

1. unit test back end for NHibernate as it's fast and in memory.

2. issue/tickets/wiki (trac)

3. data processing.

4. archival datasets.

5. general swiss army knife when you can't be bothered to do it in SSIS.

[+] malkia|13 years ago|reply

Perfect replacement for what you usually have in game dev studios - lots of small loose files (we have about 380,000 of those) - materials, textures, shaders, models, animations, etc.

No easier format than just reusing SQLite.

[+] ahi|13 years ago|reply

I am interested in more details/criticisms of the above use case. Better approaches? or "Holy balls, that's idiocy!"

[+] DrJokepu|13 years ago|reply

If you do this that would mean that you could have no shared locks safely on the SQLite database while you have a reserved lock, that is, if one process is planning to write to the database, other processes cannot even read it. Even then, you would have to be very very careful if you want to avoid losing data.

Basically, unless you can (and willing to) write your own SQLite VFS that is designed to handle this safely, I wouldn't do it.

[+] zacharyvoase|13 years ago|reply

As long as SQLite4 preserves the level of testing we've seen in SQLite3, I’ll be extremely happy: http://www.sqlite.org/testing.html

[+] swah|13 years ago|reply

Sidenote: this flat file organization (http://www.sqlite.org/src4/dir?name=src) gives me a wonderful feel about the design and size of the project, although it probably makes Java programmers cringe.

[+] gojomo|13 years ago|reply

"The default built-in storage engine is a log-structured merge database. It is very fast, faster than LevelDB, supports nested transactions, and stores all content in a single disk file. Future versions of SQLite4 might also include a built-in B-Tree storage engine. "

Faster than LevelDB, and in a single file? Interesting.

[+] swah|13 years ago|reply

Yeah, also I thought that comparison was a little out of place...

[+] wvenable|13 years ago|reply

The thing that really bugs me with SQLite3 is the inability to modify tables. It's not much of a problem unless you have foreign key constraints and then it's a nightmare. Ultimately I resorted to just recreating the entire database from a script every time I had to make a change. If this is "fixed" in SQLite4 then I'm sold.

I'm surprised that there aren't any real binary integers stored anymore but I can see how that would reduce complexity compared to how numbers were handled in SQLite3.

[+] rogerbinns|13 years ago|reply

You can alter tables to add columns. Doing more complex modifications would require SQLite internally to do pretty much what you do - recreating a new database from existing content.

[+] EzGraphs|13 years ago|reply

SQLite3 is great. Had a few questions that were not directly addressed on the linked page. 1) Will the SQL implementation be identical between SQL3 and SQL4? 2) Does "I/O is direct to disk" indicate the same sort of locking that exists in SQLite3 (the database is locked DML which makes its application as a web server db somewhat limited).

My guess is yes to both...

[+] cryptonector|13 years ago|reply

I want all sorts of things in SQLite4, but let's start by saying that usign a log-structured DB format is very welcomed news.

I'd also like the VFS to provide the page cache, so that we could use mmap(2) windows instead of read(2) and pread(2), which should result in fewer data copies and better performance.

Everything else I'd like to see in SQLite4 is icing on the cake.

[+] spdegabrielle|13 years ago|reply

I'm a big fan of SQLite3, and fossil-scm is my weapon of choice for solo projects. '4 is looking good.

[+] zvrba|13 years ago|reply

> SQLite4 does all numeric computations using decimal arithmetic. SQLite4 never uses C datatypes double or float (except in interface routines when converting between double and the internal decimal representation).

Is it only me who thinks this is a bad idea? Suddenly, what the DB computes for you won't be bit-for-bit identical with what your C program would have computed using IEEE arithmetic. Also, Goldberg has a number of slides showing that (counterintuitively), extra precision and multiple roundings produce LESS accurate results.

[+] AnIrishDuck|13 years ago|reply

Yeah, this section of the design document made me do a double-take. The problems with this aren't just limited to the impedance mismatch; it also seems that arithmetic will be a ton slower due to lack of dedicated hardware.

[+] j_s|13 years ago|reply

Nearly-on-topic warning: Android automatically deletes corrupt SQLite files http://stackoverflow.com/questions/7764943

[+] rogerbinns|13 years ago|reply

Not enough of a change from the existing SQLite 3.

What I'd like to see most is trivial interoperability with JSON. Almost all the code I write that goes near SQLite has a trip through JSON as well and it gets tedious mapping between them. This also means being able to support things like a value being a list which Postgres can do.

Other than that a way of doing offline mode and online synchronization builtin. CouchDB tried to address this space, but wasn't practical to integrate into most products.

[+] gcp|13 years ago|reply

This really sounds like the kind of thing that is perfectly doable outside the SQLite 3 core.

[+] edwinyzh|13 years ago|reply

Good news. Sqlite3 has already been great. It's also good to read that the API will not be changed much, it's very important.

[+] pdw|13 years ago|reply

I wonder how this pluggable storage architecture relates to the SQLite/BerkeleyDB hybrid that Oracle created a while back.

[+] ArbitraryLimits|13 years ago|reply

TL;DR All replacements for fopen() grow until they span multiple files.

[+] dchest|13 years ago|reply

Why multiple files?

The (default) on-disk database image is a single disk file with a well-documented and stable file format, making the SQLite4 library suitable for use as an application file format.

49 comments