mwlon | 2 years ago
mwlon's comments
mwlon | 4 years ago | on: PancakeDB Is Now Free
I've released it under BSL so that any company can run it on their own servers for free.
mwlon | 4 years ago | on: PancakeDB offers columnar reads 30% faster than Parquet
It's a new type of database that can take in streaming data with very fast (~10ms) response times and output batch data with very fast throughput. To do that, it uses a new columnar file format and compression algorithm. Together, this makes its columnar files 30-50% smaller under most circumstances while decoding just as quickly. That means storage costs are lower and it's 30+% faster assuming the same network bandwidth is used to transfer the data for all columns. And this is a pessimistic scenario, since most queries have a `select column_0, column_1, ...` clause that PancakeDB can leverage better than Parquet, transferring only the exact columns needed!
You can find edge cases (e.g. very long strings of uniformly random bytes) where it's only a few % faster instead of 30%, but in every real-world-resembling scenario I've tried, the advantage is much greater.
mwlon | 4 years ago | on: New, better compression for columns of numerical data
I also made a blog post that introduces the idea more from the math perspective: https://graphallthethings.com/posts/quantile-compression
You might have seen me post about Quantile Compression in previous years. Pco is its successor! Pco gets slightly better compression ratio, robustly handles more types of data, and (most importantly) decompresses much faster.
If you're interested in using it, there's a Rust API, Python (PyO3) API, and a CLI.