top | item 42990987

(no title)

This is a head-scratcher of a take. Have you actually done any in-depth work on data pipelines and analytics tooling? If so, what precisely do you see LLMs making easier?

I tried using enterprise chat gpt to write a query to load some json data into a data warehouse. I was impressed with how good a job it did, but it still required several rounds of refinement and hand-holding and the end result was almost, but not quite, correct. So I'm not coming at this from the perspective of hating LLMs a priori, but I am unimpressed with the hype and over-selling of its capabilities. In the end, it was no faster than writing the query myself, but it wasn't slower either, so I can see it being somewhat helpful in limited conditions.

Unless the technology makes another quantum leap improvement at the same time the price drops like a stone, I don't see LLMs coming anywhere close to your claim.

That said, I expect to see a huge amount of snake oil and enterprise dollars wastefully burned on executive pipe dreams of "here's a pile of data now magic me a better business!" in the next few years of LLM over-hyped nonsense. There's always a quick buck to make in duping clueless execs drooling over replacing pesky, annoying, "over-paid" tech people.

discuss

robwwilliams|1 year ago

Let me give you a complementary perspective. Same problems all of you have but I work in a small lab team of PhD biologist who generate huge omics data set and even larger lightsheet microscopy and MRI datasets but don’t know how to do a VLOOKUP in Excel. And who do not know the exotic acronyms: LIMS, QA, QC, or SQL. Yes, really.

What do we typically do in academic biomedical research in this situation?

The lead PI looks around the lab and finds a grad student or postdoc who knows how to turn on a computer and if very lucky also has had 6 months of experience noodling around with R or Python. This grad or postdoc is then charged with running some statistical analyses without any training whatsoever in data science. What is an outlier anyway, what do you mean by “normalize”, what is metadata exactly?

You get my drift: It is newbies in data science and programming (often 40-and 50-year-olds) leading novices (20- and 30-year-olds) to the slaughter. Might contribute to some lack of replicability ;-)

And it has been this way in the majority of academic labs since I started using CPM on an Apple 2 in 1980 at UC Davis in an electrophysiology lab in Psychology, to the first Macs I set up at Yale in a developmental neurobiology lab in 1984, and up to the point at which I set up my own lab in neurogenetics at the University of Tennessee with a pair of Mac IIs in 1989 and $150,000 in set-up funds, just enough for me to hire one very inexperience technician to help me do everything.

So in this context I hope all of you can appreciate that ANY help in bringing some real data science into mom-and-pop laboratories would be a huge huge boon.

And please god, let it be FOSS.

drunkpotato|1 year ago

I feel you, and LLMs are no doubt a boon in tooling to help in this kind of scenario. I'm not poo-pooing LLMs in general; they are very cool! I wish they were allowed to just be very cool while we incorporate them into our tooling and workflows, rather than over-hyped.