(no title)
karbarcca | 5 years ago
I agree that for a data scientist doing exploratory analysis locally on their computer, it doesn't make nearly as much a difference (also because they're usually not working on crazy large files).
The performance work in the CSV.jl package (that the article is about) was very much geared towards these kinds of production scenarios.
mcrad|5 years ago
ChrisRackauckas|5 years ago
Why should the exploratory and production teams be using completely different tools? That seems like it would cause frictions in productivity and make there be gaps that introduce translation errors. I would venture to say that just having the exploratory and production teams working using the same code base is a very strong productivity gain, and we've seen this is true in many companies.
mr_toad|5 years ago
All stored on NVMe SSDs? Because unless you have really fast IO the CSV parser isn’t going to be the bottleneck.
andi999|5 years ago
karbarcca|5 years ago
We've started exploring the apache arrow format as a compressible binary format with a dedicated wire format just to cut down on parsing processing costs.