top | item 39709305

(no title)

roozbeh18 | 1 year ago

Can someone tell me how this is collected in SQLite

discuss

order

wolfgang42|1 year ago

I wrote a blog post a while back about reading these dumps: https://search.feep.dev/blog/post/2021-09-04-stackexchange

Presumably they have a script that does something similar to that process, and then writes the resulting data into a predefined table structure.

JasonPunyon|1 year ago

Nice post!

Yep, my process is similar. It goes...

  - decompress (users|posts)  
  - split into batches of 10,000  
  - xsltproc the batch into sql statements  
  - pipe the batches of statements into sqlite in parallel using flocks for coordination
On my M1 Max it takes about 40 minutes for the whole network. Then I compress each database with brotli which takes about 5 hours.