(no title)
wesm | 5 years ago
http://cidrdb.org/cidr2021/papers/cidr2021_paper08.pdf
"A common, efficient serialized and wire format across data engines is a transformational development. Many previous systems and approaches (e.g., [26, 36, 38, 51]) have observed the prohibitive cost of data conversion and transfer, precluding optimizers from exploiting inter-DBMS performance advantages. By contrast, inmemory data transfer cost between a pair of Arrow-supporting systems is effectively zero. Many major, modern DBMSs (e.g., Spark, Kudu, AWS Data Wrangler, SciDB, TileDB) and data-processing frameworks (e.g., Pandas, NumPy, Dask) have or are in the process of incorporating support for Arrow and ArrowFlight. Exploiting this is key for Magpie, which is thereby free to combine data from different sources and cache intermediate data and results, without needing to consider data conversion overhead."
polskibus|5 years ago
data_ders|5 years ago