It's a new type of database that can take in streaming data with very fast (~10ms) response times and output batch data with very fast throughput. To do that, it uses a new columnar file format and compression algorithm. Together, this makes its columnar files 30-50% smaller under most circumstances while decoding just as quickly. That means storage costs are lower and it's 30+% faster assuming the same network bandwidth is used to transfer the data for all columns. And this is a pessimistic scenario, since most queries have a `select column_0, column_1, ...` clause that PancakeDB can leverage better than Parquet, transferring only the exact columns needed!
You can find edge cases (e.g. very long strings of uniformly random bytes) where it's only a few % faster instead of 30%, but in every real-world-resembling scenario I've tried, the advantage is much greater.
mwlon|4 years ago
It's a new type of database that can take in streaming data with very fast (~10ms) response times and output batch data with very fast throughput. To do that, it uses a new columnar file format and compression algorithm. Together, this makes its columnar files 30-50% smaller under most circumstances while decoding just as quickly. That means storage costs are lower and it's 30+% faster assuming the same network bandwidth is used to transfer the data for all columns. And this is a pessimistic scenario, since most queries have a `select column_0, column_1, ...` clause that PancakeDB can leverage better than Parquet, transferring only the exact columns needed!
You can find edge cases (e.g. very long strings of uniformly random bytes) where it's only a few % faster instead of 30%, but in every real-world-resembling scenario I've tried, the advantage is much greater.