30 minutes seems long. Is there a lot of data? I’ve been working on bootstrapping sqlite dbs off of lots of json data and by holding a list of values and then inserting 10k at a time with inserts, Ive found a good perf sweet spot where I can insert plenty of rows (millions) in minutes. I had to use some tricks with bloom filters and LRU caching, but can build a 6 gig db in like 20ish minutes now
zeroq|10 months ago
I create a new in-mem db, run schema and then import every table in one single transaction (in my testing it showed that it doesn't matter if it's a single batch or multiple single inserts as long are they part of single transaction).
I do a single string replacement per every CSV line to handle an edge case. This results in roughly 15 million inserts per minute (give or take, depending on table length and complexity). 450k inserts per second is a magic barrier I can't break.
I then run several queries to remove unwanted data, trim orphans, add indexes, and finally run optimize and vacuum.
Here's quite recent log (on stock Ryzen 5900X):
thechao|10 months ago
pessimizer|10 months ago