top | item 4168645

The Design Of SQLite4 (work in progress)

169 points| dchest | 13 years ago |sqlite.org | reply

49 comments

order
[+] bane|13 years ago|reply
Sqlite is really just one of those rare software projects that simply makes the world a better place. No more thinking about file storage formats, searching and indexing strategies, long term file readability (it's almost as future proofed as ASCII), extensible, flexible, fast and all around awesome.

The number of projects where sqlite turned a difficult project from impossible or very hard to doable is almost uncountable.

Here's a fun use case. Generate sqlite databases for each user, let them do whatever they want in that database from whatever interface you are letting them use. Store the sqlite file as a binary blob in a "real" database with associated metadata for fast look up and retrieval, plus whatever other things your database gives you (replication, redundancy etc.) And now you basically get sand boxed work spaces for your users almost for free.

[+] fusiongyro|13 years ago|reply
It's a clever idea. Please don't do it.

SQLite has one very limited write scenario: one writer at a time. Your real database isn't going to understand that. The real database will happily hand out the inner SQLite database to multiple connections. At best, you'll be converting SQLite's locking into a bunch of transaction failures for conflicting updates; at worst, you'll have a race condition where only the last of several edits gets persisted.

On top of that you have the disk churn associated with rewriting the database on every change. Suppose your SQLite database is 20 MB—every write to that means 20 MB from the real database to your application and back, which means 20 MB of disk thrashing too. Guess how many users need to do this before your database is spending all day rewriting essentially unchanged bytes. Depending on your storage backend, you can't even necessarily expect that the old ones are overwritten—with MVCC, the database may wind up just allocating new blocks, in which case you're creating a lot of garbage collection work and wasting a lot of space.

Every real database has a facility to let you isolate schemas from each other. In MySQL, you can create separate databases for each user; in PostgreSQL you can do that or just create separate schemas. There are better solutions to this problem that don't have the drawbacks identified above. It's a cute idea, but in practice I don't think it will work out at all well.

[+] gouranga|13 years ago|reply
That's a filesystem isn't it (not NFS as locks don't work properly on NFS filesystems)?

1. One per user - in home directory. Check.

2. Real database with metadata - filesystem. Check.

3. Replication - filesystem (block replication). Check.

4. Redundancy - filesystem (RAID). Check.

So technically it already does that without any need for further abstraction. You can use UNIX perms to set it read only for all and read-write for user, then you can share it :)

For reference, I've built a HUGE amount of stuff with sqlite3 - you are right it does make the world a better place! We use it for:

1. unit test back end for NHibernate as it's fast and in memory.

2. issue/tickets/wiki (trac)

3. data processing.

4. archival datasets.

5. general swiss army knife when you can't be bothered to do it in SSIS.

[+] malkia|13 years ago|reply
Perfect replacement for what you usually have in game dev studios - lots of small loose files (we have about 380,000 of those) - materials, textures, shaders, models, animations, etc.

No easier format than just reusing SQLite.

[+] ahi|13 years ago|reply
I am interested in more details/criticisms of the above use case. Better approaches? or "Holy balls, that's idiocy!"
[+] DrJokepu|13 years ago|reply
If you do this that would mean that you could have no shared locks safely on the SQLite database while you have a reserved lock, that is, if one process is planning to write to the database, other processes cannot even read it. Even then, you would have to be very very careful if you want to avoid losing data.

Basically, unless you can (and willing to) write your own SQLite VFS that is designed to handle this safely, I wouldn't do it.

[+] gojomo|13 years ago|reply
"The default built-in storage engine is a log-structured merge database. It is very fast, faster than LevelDB, supports nested transactions, and stores all content in a single disk file. Future versions of SQLite4 might also include a built-in B-Tree storage engine. "

Faster than LevelDB, and in a single file? Interesting.

[+] swah|13 years ago|reply
Yeah, also I thought that comparison was a little out of place...
[+] wvenable|13 years ago|reply
The thing that really bugs me with SQLite3 is the inability to modify tables. It's not much of a problem unless you have foreign key constraints and then it's a nightmare. Ultimately I resorted to just recreating the entire database from a script every time I had to make a change. If this is "fixed" in SQLite4 then I'm sold.

I'm surprised that there aren't any real binary integers stored anymore but I can see how that would reduce complexity compared to how numbers were handled in SQLite3.

[+] rogerbinns|13 years ago|reply
You can alter tables to add columns. Doing more complex modifications would require SQLite internally to do pretty much what you do - recreating a new database from existing content.
[+] EzGraphs|13 years ago|reply
SQLite3 is great. Had a few questions that were not directly addressed on the linked page. 1) Will the SQL implementation be identical between SQL3 and SQL4? 2) Does "I/O is direct to disk" indicate the same sort of locking that exists in SQLite3 (the database is locked DML which makes its application as a web server db somewhat limited).

My guess is yes to both...

[+] cryptonector|13 years ago|reply
I want all sorts of things in SQLite4, but let's start by saying that usign a log-structured DB format is very welcomed news.

I'd also like the VFS to provide the page cache, so that we could use mmap(2) windows instead of read(2) and pread(2), which should result in fewer data copies and better performance.

Everything else I'd like to see in SQLite4 is icing on the cake.

[+] spdegabrielle|13 years ago|reply
I'm a big fan of SQLite3, and fossil-scm is my weapon of choice for solo projects. '4 is looking good.
[+] zvrba|13 years ago|reply
> SQLite4 does all numeric computations using decimal arithmetic. SQLite4 never uses C datatypes double or float (except in interface routines when converting between double and the internal decimal representation).

Is it only me who thinks this is a bad idea? Suddenly, what the DB computes for you won't be bit-for-bit identical with what your C program would have computed using IEEE arithmetic. Also, Goldberg has a number of slides showing that (counterintuitively), extra precision and multiple roundings produce LESS accurate results.

[+] AnIrishDuck|13 years ago|reply
Yeah, this section of the design document made me do a double-take. The problems with this aren't just limited to the impedance mismatch; it also seems that arithmetic will be a ton slower due to lack of dedicated hardware.
[+] rogerbinns|13 years ago|reply
Not enough of a change from the existing SQLite 3.

What I'd like to see most is trivial interoperability with JSON. Almost all the code I write that goes near SQLite has a trip through JSON as well and it gets tedious mapping between them. This also means being able to support things like a value being a list which Postgres can do.

Other than that a way of doing offline mode and online synchronization builtin. CouchDB tried to address this space, but wasn't practical to integrate into most products.

[+] gcp|13 years ago|reply
This really sounds like the kind of thing that is perfectly doable outside the SQLite 3 core.
[+] edwinyzh|13 years ago|reply
Good news. Sqlite3 has already been great. It's also good to read that the API will not be changed much, it's very important.
[+] pdw|13 years ago|reply
I wonder how this pluggable storage architecture relates to the SQLite/BerkeleyDB hybrid that Oracle created a while back.
[+] ArbitraryLimits|13 years ago|reply
TL;DR All replacements for fopen() grow until they span multiple files.
[+] dchest|13 years ago|reply
Why multiple files?

The (default) on-disk database image is a single disk file with a well-documented and stable file format, making the SQLite4 library suitable for use as an application file format.