top | item 15822187

(no title)

ruw1090 | 8 years ago

What is the benefit of shipping data to the GPU for execution if the data is on S3 or HDFS? Won't most of the cost of the query be I/O?

discuss

order

felipe_aramburu|8 years ago

Sure the very first time you run a query. But with multi tiered caching the data you frequently access sits closer and closer to the gpus so that alleviates that bottlekneck over time to an extent. Also what is a fantastic way of improving i/o? Compression and decompression. Our own file format compresses and decompresses using the GPU. We are working on doing the same for some of the Parquet decompression steps. I/O is almost always your main concern here, but you can improve upon it greatly by leverage processes that might not have been computationally feasible before.