top | item 28293959

(no title)

andygrove | 4 years ago

Yes. The Ballista crate (part of the arrow-datafusion repo) provides distributed query execution and the scheduler has a gRPC service. Flight is used internally as well but not directly exposed to users. There is also work in progress to add Python bindings for Ballista (they already exist for DataFusion).

discuss

order

jarpineh|4 years ago

Thank you. I went through its GitHub repo for docs. It seems I need to dig a bit deeper perhaps. How to get started with my Parquet files isn’t immediately obvious.

I assume Python bindings would talk through gRPC. I could use gRPC directly perhaps?

andygrove|4 years ago

The best "Getting Started" documentation right now is that on docs.rs - https://docs.rs/ballista/0.5.0/ballista/

This demonstrates using the Rust client (BallistaContext + DataFrame). There are already Python bindings for DataFrame but not BallistaContext yet.

Documentation for Ballista is severely lacking right now and this will be an area of focus for the next release.