top | item 31449368

SQLite may become foundational for digital progress

165 points| alexrustic | 3 years ago |venturebeat.com

120 comments

order
[+] endisneigh|3 years ago|reply
I've yet to see how you can use SQLite for a multi-user multi-write app effectively. Unless we're all going back to single tenant, single user applications SQLite seems overhyped for the new usecases.

Don't get me wrong, there are plenty of uses for SQLite, but I think the hype is getting out of hand IMHO.

If you are making an app that only a single person is going to use at a time, then there are plenty of options, including SQLite. Heck, IndexedDB is sufficient. CouchDB tried the whole DB-per-user thing and it didn't end super well.

Happy to be proven wrong though. If anyone has an example of a site with more than say, 10K concurrent writers (edit: changed from users) running on a single SQLite DB I'd probably change my mind.

---

Now, what would be interesting is a way to architect an app such that you can have a central entity, like this site, for instance, but all of your posts are actually referring to your own SQLite, or equivalent store. In that sense you can own your data completely. For performance you can specify a TTL for your data and the consumer (this site) could cache* it accordingly.

Though this would probably end up being a lot less performant than simply using something like Postgres, but at least you'd have more control over your own data.

* - This caching ideally would be enforced by the browser, and not the server, that way the server actually never touches or knows what data you have.

[+] simonw|3 years ago|reply
In my experience most SQLite writes take less than 1ms. So if your app is handling less than a thousand writes per second you will probably be OK!

A common pattern (which I've implemented myself in Datasette) is to put your writes in an in-memory queue and apply them using a single dedicated connection.

[+] stefanos82|3 years ago|reply
Bloomberg [1] uses SQLite in a custom distributed RDBMS engine of theirs in a way I lack the knowledge to completely understand its design.

I remember Richard Hipp mentioning Bloomberg in one of his interviews (don't ask me which one, I don't remember), that they use SQLite to serve billions of bank transaction on a daily basis without a problem.

  [1] https://github.com/bloomberg/comdb2
Update: I have found the following paper that describes the engine in greater detail: http://www.vldb.org/pvldb/vol9/p1377-scotti.pdf
[+] nine_k|3 years ago|reply
It's like asking whether it's possible to write an OS kernel module in Python.

No, not really possible, and attempts to do so would be misguided. But it does not prevent Python from being highly successful in other areas.

Same with SQLite: it's a fine tool for a large spectrum of applications, but high-throughput OLTP is not it; SQLite can't reasonably replace Cassandra or even Postgres.

The point of much of the "SQLite hype" is that there are many applications where read load is high and queries complicated, but the write load is low to non-existent. In situations like these, you can make do with a smaller DB than Postgres.

[+] catmanjan|3 years ago|reply
10K users isn’t much, but 10k concurrent writer’s is a different proposition - I’d be surprised if 100k users had 10k concurrent writers
[+] jrockway|3 years ago|reply
I have a controversial opinion.

We got to the canonical "LAMP" stack incrementally, shaped by the computing environment at the time. The servers that you had available to your web app 15 years ago were maybe 2 cores running at 1.5GHz and you had a couple gigs of RAM, and you wrote the app in a very slow dynamic language. So in order to scale out to the Internet, you had to split your application into replicas that ran on different servers; the traffic you were getting and the architecture of your application meant that you needed more than 1 CPU second per second, and that was the only way to get that at the time. All of these replicas needed some coordination, so you had a database server to handle that coordination, and it ran on the biggest computer possible because it was the single point of failure. (And when one computer wasn't big enough, it was time for ugly hacks like sharding.)

Fast forward 15 years and the landscape has changed dramatically. You can get 448vCPU machines from AWS. You're writing your application that is compiled or JIT'd into native code. With the same hardware, your application of today might be 10-100x faster than your application 15 years ago on the same hardware. But the hardware is even faster, and computers are bigger (you can go get a 64 core processor off the shelf at an electronics store!). So one node might now be 1000-10000x more powerful than where this "I must have a hundred replicas of my application" mindset came from.

With that in mind, if you start right now, you can build a web application that scales beyond your wildest dreams with a single computer. If you only have one computer, why not build the database into the application? You will have made so much money by the time the 448 core computer is inadequate for your application that it doesn't even matter. Hire 100 developers to move the thing to FoundationDB or Spanner or Postgres or whatever "real" database you think you need to scale to the next 10 billion users. (Interestingly, the top article on HN right now is "What if it changes?", a sarcastic reminder that maybe today you should build the system you need for today. If it sucks in a year, fix it in a year.)

I am throwing some pretty important things out the window. You probably update your application, so you want some rolling deployment that aborts if the new version fails to start up or whatever. A tornado blowing up your datacenter on your biggest use day of the year would be bad. Your users are all over the world, so you probably want to serve as much content as you can from a computer near them. But honestly, the state of the art for these concepts are pretty new. Globally consistent ("planet scale") transactions are hard, and not that fast. Automated canarying is also not trivial. So you might build a really complicated application to support those needs, and not even achieve them, because nobody has achieved perfection there yet.

Anyway, my controversial opinion is: don't build a distributed system unless you are absolutely sure you must have a distributed system. Every time you split up your state storage, your work as a programmer becomes harder. A single thread; no need for transactions, everything happens in order. A single computer with multiple threads; you'll need some locks, write barriers, or atomic instructions. Multiple computers with multiple threads? Now you're writing a computer science paper. That's a super fun activity, but check that discovering new classes of computer science problems is what your company's business is before you go out of your way to start doing it. You might be able to make a lot of money by being pretty boring. When you're relaxing on your private island, you might find a recreational activity even more fun than finding bugs in distributed transaction protocols. Who knows.

The currently-accepted simplification is to use a single database server to coordinate your stateless application replicas. That doesn't protect you from tornadoes, give users in Antarctica sub-millisecond page loads, or let you upgrade the database without downtime. If you're OK with that, it's totally reasonable to just build the database into your application as SQLite does. It's simpler, and no worse to an outside observer.

(That said, I tend to reach for Postgres first because it has a lot of polish that I don't think SQLite has. But I don't think anyone is unreasonable or stupid for picking SQLite. Especially if the SQLite instance streams its data to S3 and you have point-in-time recovery options for a disaster or bad release. You're going to want those with Postgres too, and they aren't enabled out of the box.)

[+] tepitoperrito|3 years ago|reply
"This caching ideally would be enforced by the browser, and not the server"

I don't think we'd need a UI served by that central server at that point. A sqlite file and some local program that communicates with other (de)centralized hosts for whatever services or data isn't available locally sounds like a neat departure from the approach the Solid project is taking with plug-able data backends.

Instead whatever ecosystem springs up around really making use of sqlite at "the edge" (here I mean individual's computing devices - not what they mostly references to fly.io and cloudflare from TFA) could invert how web apps are delivered and consumed for the 1% of people who care about this kind of thing. By invert I just mean that the user hosts their own frontends and interfaces with DNS named hosts / ports providing functionality, CPU, disk, network a la plan9.

[+] alatkins|3 years ago|reply
> CouchDB tried the whole DB-per-user thing and it didn't end super well.

Can anyone give further context here? Haven't looked at CouchDB for several years, remember it as being good for certain use cases (intermittently offline devices etc).

[+] tlb|3 years ago|reply
Concurrent writers isn't a thing people directly need. What they need is high write throughput. One way to improve write throughput is with concurrency, but the other way is by just being fast.

Making the database fast has the advantage of not requiring extra concurrency complications throughout the rest of your application.

[+] anamexis|3 years ago|reply
If I understand your idea correctly, wouldn't that require connecting to the browser of every single person who has posted to load the content?
[+] otabdeveloper4|3 years ago|reply
My only experience with sqlite is having to spend time ripping it out and replacing with something that actually works.

Sqlite is almost never the right solution.

[+] tiffanyh|3 years ago|reply
The answer is much simpler.

When people created desktop apps - they needed a datastore that was simple and reliable to use. SQLite was perfect for this.

Now, no-one is creating desktop apps anymore (everything is a web app), but people still need a simple & reliable datastore. Even though SQLite has historically advertised to not be used for client/server, people are finding that SQLite works great for most web apps as well.

[+] luvs2code|3 years ago|reply
I really miss desktop apps. They worked so well and didn't have half the problems that come with web apps.

I'm working on a project, for fun not for money, that uses Golang and talks to a SQLite database for things like accounts and user content. So far I really enjoy writing it a lot but I have to admit that it would probably have issues if it scaled to a large number of users.

In fairness to SQLite it was designed in a different era technologically. Maybe there will be a replacement for the web app era if we distill what we like about SQLite

[+] marpstar|3 years ago|reply
"Native Mobile Apps" are the new "Desktop App". They can both be client apps to a remote server/datastore, or not. "Desktop Apps" didn't go away, the internet-connected apps just overtook them.
[+] NonNefarious|3 years ago|reply
My company just finished a new desktop app and will continue to produce more.
[+] TheAceOfHearts|3 years ago|reply
I'm still bitter that we could've had SQLite in the browser as WebSQL but it got killed off in favor of IndexedDB which is terrible.

The code is in the public domain. The spec could just say to include SQLite.

IndexedDB implementations all use SQLite under the hood anyway, AFAIK.

[+] hitekker|3 years ago|reply
Back then, many NoSQL/MongoDB opportunists were in positions of authority on the web. When WebSQL was discussed, those folks pushed the lie that "SQL is SQLite. Therefore, SQL does not belong in the browser". They pretended that SQLite and SQL APIs were the exact same thing, as if no browser could offer a dialect of SQL without fully exposing the database engine running under the hood. In their rhetoric, encapsulation was impossible, the specification was the implementation. With SQL and SQLite conflated, the opportunists were then able to wield the valid argument "SQlite is not a standard. Browsers should be based on standards" against the simple point "SQL ought to be a standard". Once they beat the more reasonable voices into submission and were then free to assert that only NoSQL can be standardized in the browser. Which misdirected years of efforts into IndexedDB.

Nowadays, if you look up those opportunists on LinkedIn, you can see their NoSQL careers didn't quite go the way they were expecting. Their wrongdoing is a very, very sore point for them, and it's unfortunate the web had to pay for their hubris.

For more details: https://nolanlawson.com/2014/04/26/web-sql-database-in-memor...

[+] LAC-Tech|3 years ago|reply
The conspiracy theorist in me thinks Mozilla has it in for Richard Hipp.

Wonder if there's some schadenfreude seeing Firefox fade rapidly into obscurity and SQLite continue to take over the world.

[+] i5heu|3 years ago|reply
It is possible to have SQLite in WASM and use indexDB as a filesystem
[+] richardc323|3 years ago|reply
For reasons I wont go into here, I built a system with a similar approach 10 years ago. The system was horizontally scaleable. There was no database tier, instead each server had a replica of the database locally which were used for reads. The servers discovered each other other and nominated one server as the master, which writes were sent to. Replication was done by having the master sending the DML queries to a writer process on each server. When a new server joined the cluster it was sent a copy of the entire database and a stream of queries for it to catch up before it joined the cluster. There were other tricks to make sure reads from replicas waited until the replicas were sufficiently up to date.

It worked fine as the system was read heavy and write light. SQLite serialises writes so does not perform well with multiple writers, particularly if the write transactions are long running. Reads were blazingly fast as there was no round-trips across the network to a separate database tier. The plan for dealing with performance problems if/when they arrived was to shard the servers into groups of customers.

I moved on and the next developer ripped it out and replaced it with Postgres because it was such an oddball system. I came back six months later to fix the mess as the new developer messed up transactions with the new database code.

Technically using SQLite with replication tacked on works fine. Superficially it is all the same because it is SQL. However the performance characteristics are very different from a conventional Multi Version Concurrency Control databases such as Postgres.

This is where the problem lies with this kind of database - developers seeing SQL and assuming they can develop exactly the same way they would with other SQL databases. That said I love approaches that get away from the database architectures of last century.

[+] kumarvvr|3 years ago|reply
As a Postgres enthusiast, I am really curious to know how the new developer messed up the transactions.

Could you please elaborate?

[+] TheRealPomax|3 years ago|reply
SQLite is your file format. If your use case is poorly dealt with by using files it's poorly dealt with by using SQLite. But a _lot_ of things actually work fine with a file backing until they need to scale, and most projects that use SQLite never need that scale. But when they do, unlike a home-rolled file formats, SQLite has "prepping your data for importing into a real data store" baked right in.
[+] Mertax|3 years ago|reply
I think the potential exists for many multi-tenancy B2B SaaS products to be converted to an SQLite-per-tenant model.

CRUD Database products like Jira, Salesforce, Airtable, Notion, Clickup, Asana, etc. seem like internal line of business services that just need to be thick clients with a distributed database. Most individual businesses/tenants won’t have to deal with more than 100 concurrent writes at a time (if that) so SQLite could definitely handle that.

[+] lenkite|3 years ago|reply
How does one handle HA and failover with SQLite ? These products you mentioned are more than just simple CRUD and have full plugin engines and workflows.
[+] ec109685|3 years ago|reply
Are there really questions about using SQLite?

> Another wrinkle is that it’s not exactly open source. The original developer of SQLite, Dwane Richard Hipp, placed it in the public domain. Generally, this means that there are no legal restrictions on using the code at all, although there are some questions whether every country recognizes this view.

A bigger issue than the license is that it’s not open for contributions. They don’t accept pull requests and the extensive test suite is proprietary.

[+] capableweb|3 years ago|reply
> A bigger issue than the license is that it’s not open for contributions

SQLite is open for contributions, as long as you "submit an affidavit dedicating your contribution into the public domain" (https://www.sqlite.org/copyright.html). It's not "Open-Contribution" in the sense that some npm packages are for example, where if you send any PR, it will most likely be merged and then you get write access to the repository. If you're curious how you can contribute, check out this page: https://system.data.sqlite.org/index.html/doc/trunk/www/cont...

Personally, I prefer projects that are limited in what contributions they accept to the main trunk. Code quality tends to be a lot higher in those. As long as the source is open so I can freely maintain my own patches, I have no qualms with that workflow.

[+] cryptonector|3 years ago|reply
This.

The public domain thing is a red herring not worthy of the attention it's gotten, but the infeasibility of making contributions is very much worth talking about.

Basically, Richard Hipp is an open source business genius. He made an exceedingly popular open source thing. Then he found a way to make it so no one would bother using any forks. Then he created a consortium that pays for development and upkeep.

The secret to that success is the proprietary test suite and their refusal to accept contributions. Anyone wanting to fork the code base will not be able to recreate that test suite, nor the upkeep, therefore they won't be anywhere near as trusted as the SQLite dev team.

[+] smitty1e|3 years ago|reply
> “I think the biggest complication for us is that there’s no tooling for it.” said Mackey. “We have people deploy apps. They’re like, ‘How do I connect to my database and like query things? How do I import data?’”

Wut?

Every scripting language includes a SQLite driver.

Python's SQLAlchemy wraps SQLite beautifully. You can develop locally and then deploy to an enterprise server with a simple change of connection string.

I must be missing the speaker's point.

[+] nicoburns|3 years ago|reply
I think they mean GUI tooling for ad-hoc inspection of the database. And possibly the stumbling block is difficulty accessing it over a network?
[+] sally1620|3 years ago|reply
This sounds like a hype that is going to die at some point. SQLite has its own uses in client devices and small webservers, but it is not built to scale to millions of transactions per second.

It is possible to build all kinds of cool things on top of SQLite, but most of these articles sound like it is going to replace PostgreSQL or MySQL.

[+] euroderf|3 years ago|reply
I want to see a mountable file system contained in an SQLite file. It creates possibilities. [1] is Java (yikes) and has been inactive six years. [2] is in Rust and has been inactive three years but does not seem to be available in cargo. [3] is in Go, lacks a LICENSE file, has been inactive four months, the README is in Russian, but well the demo code works. pkg.go.dev will not display its API.

[1] https://github.com/andyhebear/sqlitefs [2] https://github.com/narumatt/sqlitefs [3] https://github.com/jilio/sqlitefs, https://github-com.translate.goog/jilio/sqlitefs?_x_tr_sl=ru...

[+] eterps|3 years ago|reply
Why would that be useful? (I'm genuinely curious)
[+] metadat|3 years ago|reply
This makes sense, I have experience with and lots of appreciation for SQLite. It's exempliry across many areas of software engineering. Elegant simplicity, not bloated, high performance, broad API library support, amazing 3rd party ecosystem (rqlite [1], litestream [2]). It's wildly nice!

With that said, I've found that I'm a bit hesitant to use SQLite for anything interactive for one silly reason:

It's so hard to query precise date ranges in an ergonomic way interactively. There is no timestamp type, and the general advice I've seen is to use Unix Epoch offsets. This works fine except for exploring.

I keep finding myself desiring porting my SQLite's to PostgreSQL for only the comparatively easy ranged data exploration.

[1] https://github.com/rqlite/rqlite

[2] https://github.com/benbjohnson/litestream

[+] piaste|3 years ago|reply
SQLite supports computed columns, so you could just add a virtual text column next to the 'real' epoch column defined as the date representation of your choice.

https://www.sqlite.org/draft/gencol.html

Ignore the text column in your application so it won't be computed at runtime, but it will appear when you 'select *' or use a db browser in exploration mode.

[+] somenameforme|3 years ago|reply
Can you elaborate on the problem there? I'm confused because as you described it, it seems just something like

select * from table where startDate >= 123 and endDate <= 456;

would solve the problem.

[+] TazeTSchnitzel|3 years ago|reply
“may become”? It already is the backbone of so much software.
[+] Matthias247|3 years ago|reply
May become? Sounds like the author missed that it's already used in lots of extremely important places. Among other use-cases, Android uses SQLite for > 10 years.
[+] adamnemecek|3 years ago|reply
Can’t wait for something better to come along. The text orientedness of SQL is annoying. Something like Linq without the underlying SQL.
[+] jdmichal|3 years ago|reply
Isn't "LINQ without the underlying SQL" simply list comprehensions?
[+] boyter|3 years ago|reply
The thing I always liked about linq was it’s lazy evaluation. You could write simple selects on your data model and join and filter in the service. Allowing unit tests that operated on the joins.

I’m yet to find a better way to write crud apps than that pattern.

[+] mfarstad|3 years ago|reply
Obligatory: https://github.com/mathaou/termdbms

Granted I'm the author, but until I get around to rewriting it from the ground up, I really think it saves a lot of time for editing single cells, querying, and viewing in a headless environment.

[+] ryandvm|3 years ago|reply
Wait until these kids discover embedding BerkleyDB...
[+] ProtoMathics|3 years ago|reply
Just another filesystem container.
[+] NonNefarious|3 years ago|reply
"I can probably figure out how to do what I need to do with [Key-Value] but"

It's key/value.