deshpand's comments

deshpand | 4 years ago | on: DuckDB-Wasm: Efficient analytical SQL in the browser

Many enterprises are coming up with patterns where they replicate the data from the database (say Redshift) into parquet files (data lake?) and directing more traffic including analytical workloads onto the parquet files.

duckdb will be very useful here, instead of having to use Redshift Spectrum or whatever.

deshpand | 4 years ago | on: Reflecting on Four Years at Databricks

If you can completely stay away from Python/pandas, get all your work done with typed languages like Scala/Java, that's good. A lot of scientists and non-CS folks are using Python/R. They need to avoid mish mash of bringing in Spark and SQL for some bits and then getting back to Python/R. Native Python, especially, offers mature ways to handle data in the 100s GB data. Learning to incorporate Dask and Numba is going to be far easier than teaching all these folks distributed programming and spinning up Spark clusters, when that can be un-necessary in many cases.

deshpand | 4 years ago | on: Reflecting on Four Years at Databricks

Spark may be a mature solution for truly big data, in a SQL like fashion, 1TB and more. But I constantly see it being misused, even with datasets as small as 5GB. Maybe the valuation of the company reflects this 'growth' and 'adoption'. And data locality is a thing. You can't read terabytes from object storage (over http). The batch oriented, map reduce is not going to be conducive to too many ML algorithms where state needs to be passed around.

deshpand | 4 years ago | on: Is BI dead? – On dismantling data's ship of Theseus

A little python training in the python stack (pandas/numpy/matplotlib or other visualization libraries) can go a long way to simplifying tech stack and get rid of these mind numbing BI tools.

And companies are trying. Ex:https://www.bobsguide.com/articles/barclays-gordon-risk-mana...

But I also see spending more on these tools, in the name of innovation, because some bigshot likes tool X and that's what he wants to use. And guess what, now you also need it to be made available in the cloud.

deshpand | 4 years ago | on: Scikit-Learn Version 1.0

< I hate finding CSVs that other data scientists

Ideally you should be using the parquet format which will use the binary format, preserve column types and indexes [df.to_parquet(<file>); df = pd.read_parquet(<file>)]

You can get away from a lot of problems by simply avoiding text files

deshpand | 6 years ago | on: The Fasting Cure Is No Fad

Another related point is to increase eating raw/uncooked food. I'm noticing how much less I eat as I decrease cooked food portions. It may be easier or better than outright fasting.

Of course, I am only referring to certain veggies or soaking things like peas/beans/lentils in water overnight (instead of cooking). Please don't eat raw meat or anything after you see my message!

deshpand | 7 years ago | on: Python Developer Survey 2018 Results

Python is not perfect and languages do evolve. Maybe Rust or Go or something new will take over the world. But there's a trade-off between developer time and execution time and you can't focus on just one. There's also the benefit of a large ecosystem of libraries (and talent, if you are a large enterprise) when you embrace Python that may not yet be available in some other newer languages.

Prototyping usually involves working with existing code base and the ecosystem, although if you are building a brand new product or application, your context may be different.

deshpand | 7 years ago | on: Python Developer Survey 2018 Results

Maybe you have the luxury of rewriting that others don't have. Do you rewrite to assembly? That's going to be really fast you know!

With a proper test framework, people can and are building large scale applications. There are plenty of ways to optimize performance without a complete rewrite. Python is really Python+C to a large extent.

deshpand | 7 years ago | on: Neuron, a new VS Code extension for data science

You have to give credit to JupyterCon for inviting him! Most of his issues relate to Jupyter, he has/had not seen JupyterLab. You can trust that many of these issues will get worked out in the long run. I have nothing against VS Code but Jupyter(Lab) didn't win the ACM Software System award (Nobel price for software) for nothing.
page 2