top | item 33371884

ConnectorX: Accelerating Data Loading from Databases to Dataframes [pdf]

3 points| gruuya | 3 years ago |vldb.org | reply

1 comment

order
[+] gruuya|3 years ago|reply
What really struck me here at first was how Pandas read_sql spends so little time on the actual query execution and data transfer, while client side processing is taking up the majority (~85%) of time.

It makes more sense though, once you realise that they're talking about unsaturated networks, and so they can focus on relatively simple optimisation techniques (e.g. query partitioning and zero-copy) to bring about significant speedup in data loading.