kzuberi's comments

kzuberi | 2 years ago | on: ChatGPT: The end of new programming languages?

I think LLMs will interact with programming language design, and software design, in a way that will likely be significant but is currently very difficult to predict. Sort of a singularity moment for software, where we're not sure what comes next but it does seem likely significantly change.

From my own experience, writing software today doesn't that much different than it was 20 years ago using some java IDE from that era (Borland? maybe I'm remembering it better than it was). I can imagine what development will be like next year with increasing integration of copilot like tools, but I can't imagine with any confidence what software development will be like 10 years from now.

kzuberi | 3 years ago | on: Consider working on genomics

I also found the quality & proliferation of data pipeline tools to be baffling. Somehow always more painful to put these together than it seemed like it ought to be.

At one point we wrote an internal tool (I think lots of organizations do this, since all the 100s of existing tools somehow don't fit, so you invent #101) and while it was tremendously satisfying getting batch jobs with 1000's of cpu's churning away, that kind of data infrastructure needs to be standardized. I think some companies are doing this, e.g. saw a presentation about Arvados/Curii that seemed interesting (but haven't used it so not sure). Maybe CWL will turn out to be the way forward here?

kzuberi | 3 years ago | on: Consider working on genomics

> this really makes the engineer's end of the bargain sound like janitorial work

I don't think you should interpret it that way. Another take would be that its like collaborating with a domain expert outside your specialization.

Important is that your potential impact as an engineer can grow as you become more knowledgeable in the relevant bio. Most of the scientists I've worked with were happy to teach background (and some were just exceptional, fun times if you also found the field interesting as I did!). Obviously some allowance must be made for differences in culture from org to org, and that likely accounts to some of the disappointed voices - but I'm not convinced this is endemic to the field as opposed to organization specific. Just like with an opportunity with any particular company, do your research.

Incidentally, working on a well defined engineering+optimization problem, if you are lucky enough to bump into one, is just candy for lots of engineering types. Ok quick & simple one: a scientist I worked with was doing some analysis that involved intersecting piles of genomic intervals with each other, which was taking many hours for a single run - super painful to tweak and re-execute. Our team showed them how to use interval-trees and made these available integrated in our internal tools, and the problem transformed into ~10 min execution runs. See, a wee a bit of comp-sci where suddenly you're the domain expert. And appropriately appreciated!

kzuberi | 3 years ago | on: Open Source Tools for Computational Biology

Are any end users willing to pay for comp-bio software tools? Or the professional support of open source tools? I understand academic labs preference for free/open-source software, but there are lots of biotech companies out there as well.

Seems like there is some funded software in this space, and lots of academic research code of varying quality - sometimes very useful and I've certainly appreciated it. But also common are many shortcomings: usability, performance, integration with other tools, packaging & distribution to users, docs & training material, abandoned tools, etc.

Maybe the use cases are too diverse, with the common needs having evolved good open source solutions, leaving a constant uneven frothing of other bits of software being born and then declining for the all the other specialist needs. Or something. Still, I wonder if it could be better.

kzuberi | 4 years ago | on: The human genome is, at long last, complete

The term junk DNA triggers a lot of confused discussion (on HN and everywhere else), and I suspect a part of that is our getting defensive about the idea of our DNA containing "junk". That term is just more loaded than saying something more benign like "non-functional".

But another part is the term is poorly defined, this article seems to use junk DNA to mean the until-recently unsequenced portions of our genome (and I think that's an unconventional usage), some comments here take it to mean non-protein coding, and another common use is for the term to mean non-functional.

If it helps, a defensible recent accounting is probably something like 1% of our genome being protein coding, perhaps 10% being functional in some way but not protein coding (e.g. regulatory, or transcribed to RNA that is functional etc), and the remaining 90% being without known function and likely non-functional.

After further years and much great painstaking work we'll perhaps learn that to a bit more is functional, though it may end up being say 11% vs 89% non-functional. And that's ok! I wouldn't worry progress being stunted by assumptions of too much of the genome being non-functional, rather the opposite, continuing to believe there is function where there is little evidence to warrant it.

disclaimer: not a geneticist, but sometimes write tools they might use.

kzuberi | 4 years ago | on: Ask HN: Who wants to collaborate?

The choice of Go for this is interesting. Having worked with Python & BioPython for bioinformatics problems I've found that there was a good deal of complexity around eking out performance (Cython, C++ extensions, Numba and so on) and also around distributing the tools (e.g. conda packaging). I've been wondering if Go would provide a reasonable middle ground in performance and ease of use between Python and C++ here. I'm not actually convinced yet but think its worth exploring. Noticed there's a BioGo project that's been around for a while, not sure of its uptake. Probably figuring out how well Go works for this domain will be a hobby project for me this year.

kzuberi | 4 years ago | on: RNA Takes Over

While new discoveries of biological function are fascinating, I don't think it follows that we are overturning the idea of junk DNA (however the phrase was coined). More likely we are converging towards an understanding where most of the genome really is non-functional. Here's a recent accounting [1] of known function suggesting 90% is junk. I wonder how much this differs from our understanding of decades ago, it really may not have changed much at least in terms of broad accounting. Now the potential significance of those little fractional bits of recently uncovered function, well that's a different discussion and more aligned with the original article, which I don't think mentioned junk anyway.

[1] https://sandwalk.blogspot.com/2021/11/whats-in-your-genome-2...

page 1