top | item 47031758

(no title)

comex | 13 days ago

If it works, then it’s impressive. Does it work? Looking at test.sh, the oracle tests (the ones compared against SQLite) seem to consist in their entity of three trivial SELECT statements. SQLite has tens of thousands of tests; it should be possible to port some of those over to get a better idea of how functional this codebase is.

Edit: I looked over some of the code.

It's not good. It's certainly not anywhere near SQLite's quality, performance, or codebase size. Many elements are the most basic thing that could possibly work, or else missing entirely. To name some examples:

- Absolutely no concurrency.

- The B-tree implementation has a line "// TODO: Free old overflow pages if any."

- When the pager adds a page to the free list, it does a linear search through the entire free list (which can get arbitrarily large) just to make sure the page isn't in the list already.

- "//! The current planner scope is intentionally small: - recognize single-table `WHERE` predicates that can use an index - choose between full table scan and index-driven lookup."

- The pager calls clone() on large buffers, which is needlessly inefficient, kind of a newbie Rust mistake.

However…

It does seem like a codebase that would basically work. At a large scale, it has the necessary components and the architecture isn't insane. I'm sure there are bugs, but I think the AI could iron out the bugs, given some more time spent working on testing. And at that point, I think it could be perfectly suitable as an embedded database for some application as long as you don't have complex needs.

In practice, there is little reason not to just reach for actual SQLite, which is much more sophisticated. But I can think of one possible reason: SQLite has been known to have memory safety vulnerabilities, whereas this codebase is written in Rust with no unsafe code. It might eat your data, but it won't corrupt memory.

That is impressive enough for now, I think.

discuss

alt187|13 days ago

> But I can think of one possible reason: SQLite has been known to have memory safety vulnerabilities, whereas this codebase is written in Rust with no unsafe code.

I've lost every single shred of confidence I had in the comment's more optimistic claims the moment I read this.

If you read through SQLite's CVE history, you'll notice most of those are spurious at best.

Some more context here: https://sqlite.org/cves.html

ii41|13 days ago

I am using sqlite in my project. It definitely solves problems, but I keep seeing overly arrogant and sometimes even irresponsible statements from their website, and can't really appreciate much of their attitude towards software engineering. The below quote from this CVE page is one more example of such statements.

> All historical vulnerabilities reported against SQLite require at least one of these preconditions:

> 1. ...

> 2. The attacker can submit a maliciously crafted database file to the application that the application will then open and query.

> Few real-world applications meet either of these preconditions, and hence few real-world applications are vulnerable, even if they use older and unpatched versions of SQLite.

This 2. precondition is literally one of the idiomatic usage of sqlite that they've suggested on their site: https://sqlite.org/appfileformat.html

wedog6|13 days ago

SQLite is tested against failure to allocate at every step of its operation: running out of memory never causes it to fail in a serious way, eg data loss. It's far more robust than almost every other library.

gzread|13 days ago

assuming your malloc function returns NULL when out of memory. Linux systems don't. They return fake addresses that kill your process when you use them.

Lucky that SQLite is also robust against random process death.

sigmoid10|13 days ago

Unfortunately it is not so easy. If rigorous tests at every step were able to guarantee that your program can't be exploited, we wouldn't need languages like Rust at all. But once you have a program in an unsafe language that is sufficiently complex, you will have memory corruption bugs. And once you have memory corruption bugs, you eventually will have code execution exploits. You might have to chain them more than in the good old days, but they will be there. SQLite even had single memory write bugs that allowed code execution which lay in the code for 20 years without anyone spotting them. Who knows how many hackers and three letter agencies had tapped into that by the time it was finally found by benevolent security researchers.

camgunz|13 days ago

I'm not impressed:

- if you're not passing SQLite's open test suite, you didn't build SQLite

- this is a "draw the rest of the owl" scenario; in order to transform this into something passing the suite, you'd need an expert in writing databases

These projects are misnamed. People didn't build counterstrike, a browser, a C compiler, or SQLite solely with coding agents. You can't use them for that purpose--like, you can't drop this in for maybe any use case of SQLite. They're simulacra (slopulacra?)--their true use is as a prop in a huge grift: tricking people (including, and most especially, the creators) into thinking this will be an economical way to build complex software products in the future.

stavros|13 days ago

I'm generally not this pedantic, but yeah, "I wrote an embedded database" is fine to say. If you say "I built SQLite", I expected to at least see how many of the SQLite tests your thing passed.

gf000|13 days ago

Also, the very idea is flawed. These are open-source projects and the code is definitely part of the training data.

wseqyrku|13 days ago

> tricking people (including, and most especially, the creators),

I believe it's an ad. Everything about it is trying so hard to seem legit and it's the most pointless thing I have ever seen.

9dev|13 days ago

Well--given a full copy of the SQLite test suite, I'm pretty sure it'd get there eventually. I agree that most of these show-off projects are just prop pieces, but that's kind of the point: Demonstrate it's technically possible to do the thing, not actually doing the thing, because that'd have diminishing returns for the demonstration. Still, the idea of setting a swarm of agents to a task, and, given a suitable test suite, have them build a compliant implementation, is sound in itself.

kyars|13 days ago

sorry for misleading, added an update stating that this is a simulacra of sqlite

rstuart4133|13 days ago

> That is impressive enough for now, I think.

There are lot of embedded SQL libraries out there. I'm not particularly enamoured with some of the design choices SQLite made, for example the "flexible" approach they take to naming column types, so that isn't why I use it.

I use it for one reason: it is the most reliable SQL implementation I know of. I can safely assume if file corruption, or invariants I tried to keep aren't there, it isn't SQLite. By completely eliminating one branch of the failure tree, it saves me time.

That one reason is the one thing this implementation lacks - while keeping what I consider SQLite's warts.

olmo23|13 days ago

IIRC the official test-suite is not open-source, so I'm not sure how possible this is.

SQLite|13 days ago

You do not recall correctly. There is more than 500K SLOC of test code in the public source tree. If you "make releasetest" from the public source tarball on Linux, it runs more than 15 million test cases.

It is true that the half-million lines of test code found in the public source tree are not the entirety of the SQLite test suite. There are other parts that are not open-source. But the part that is public is a big chunk of the total.

IshKebab|13 days ago

> I think the AI could iron out the bugs, given some more time spent working on testing

I would need to see evidence of that. In my experience it's really difficult to get AI to fix one bug without having it introduce others.

simonw|13 days ago

Have it maintain and run a test suite.