top | item 29185693

(no title)

mpercy | 4 years ago

Who is developing this product? Under what circumstances is it 30% faster than Parquet?

discuss

order

mwlon|4 years ago

It is a new startup I'm building.

It's a new type of database that can take in streaming data with very fast (~10ms) response times and output batch data with very fast throughput. To do that, it uses a new columnar file format and compression algorithm. Together, this makes its columnar files 30-50% smaller under most circumstances while decoding just as quickly. That means storage costs are lower and it's 30+% faster assuming the same network bandwidth is used to transfer the data for all columns. And this is a pessimistic scenario, since most queries have a `select column_0, column_1, ...` clause that PancakeDB can leverage better than Parquet, transferring only the exact columns needed!

You can find edge cases (e.g. very long strings of uniformly random bytes) where it's only a few % faster instead of 30%, but in every real-world-resembling scenario I've tried, the advantage is much greater.