top | item 10138004

Efficient Tabular Storage

62 points| ah- | 10 years ago |matthewrocklin.com | reply

9 comments

order
[+] TheGuyWhoCodes|10 years ago|reply
Vertica has all those performance enhancements, great DB can't recommend because of pricing :(
[+] beagle3|10 years ago|reply
kdb+ answers same description.

And it's a 300KB executable with no dependencies (other than glib/MSVCRT).

[+] owlish|10 years ago|reply
How do databases like MySQL store data efficiently for querying? It seems like something like protobuf would do well here, though you'd need to generate code for each dataset.
[+] kragen|10 years ago|reply
Typically they use row-oriented binary storage, optionally with individual columns or subsets of columns duplicated into indices for fast querying. Have you tried protobufs? How many hundreds of megs per second do you get? I think it is remarkably slow on the scales we're talking about here.
[+] brudgers|10 years ago|reply
Traditional DBMS's get performance by optimizing storage down to the physical layout of the data on the hardware. So MySQL makes a lot of assumptions based on the mechanics if spinning disks and buffers tailored to their physics. Database Systems: The Complete Book is a good text on the subject and the second half is all about the hardware and software used in implementing traditional systems.
[+] Little_Peter|10 years ago|reply
How is it different from HDF5 (h5py and pytables)?
[+] Sprint|10 years ago|reply
This is super interesting, thanks!