top | item 33734846

Introduction to Genomics for Engineers

310 points| froggychairs | 3 years ago |learngenomics.dev | reply

81 comments

order
[+] glofish|3 years ago|reply
Those looking for a proper and comprehensive introduction into genomics from a programmer's perspective should try the Biostar Handbook:

https://www.biostarhandbook.com/

I have learned so much from it.

It is an introduction into what is like to do genomics in a scientific environment. The content at the link the OP posted appears to be an oversimplified, high level and naive overview

[+] Ultimatt|3 years ago|reply
The opening paragraph of this resource states its absolutely not about being a comprehensive introduction to genomics. I strongly disagree with the sentiment its naive or oversimplified. It's trying to give someone with no knowledge a working mental model to begin to dig into building a comprehensive view. A framework of analogy for many people is an extremely helpful device for learning, frequently left out by comprehensive scientific or engineering texts.
[+] nosianu|3 years ago|reply
Actually, I would throw this into the ring instead:

https://www.edx.org/course/introduction-to-biology-the-secre...

by Professor Eric Lander

> Introduction to Biology - The Secret of Life

> Explore the secret of life through the basics of biochemistry, genetics, molecular biology, recombinant DNA, genomics and rational medicine.

It's really well done and genomics is the focus. I took many dozens of edX and Coursra courses over the years, this is one of the top 5% of the courses there I would say.

I don't understand the phrase "from a programmer's perspective", or "for Engineers" in the title on top.

As a programmer whos studied CS but also took numerous life science courses throughout my life. You want to learn biology you study biology, what does a "programmer's view", or an engineer's, have to do with it? You use the correct tool for the job, and having a background in both, I don't see this working out well, more like the opposite actually.

The point of looking at biology for an engineer or programmer should be to broaden ones horizons, not to use ones internal models build for a completely different field in another one that really is not like that at all. IMO it's best to forget all computer metaphors here.

----

By the way, since there was something about this yesterday, there also is this course: https://www.edx.org/course/principles-of-biochemistry - it too is very good. A good knowledge of organic chemistry is a prerequisite, but there are plenty of equally interesting course resources for that available too, including even Khan Academy (https://www.khanacademy.org/science/organic-chemistry), or to give a(nother) random link, https://ocw.mit.edu/courses/5-12-organic-chemistry-i-spring-...

Biology becomes a lot more fun with this foundation already established in ones head.

[+] gravelc|3 years ago|reply
This is indeed a far better resource.
[+] dddiaz1|3 years ago|reply
I have absolutely loved working in genomics. I am a huge believer that genomics will be a huge part of healthcare in the future, and i have two examples to motivate that point that I think may be interesting to the reader.

1) The Moderna vaccine was made with the help of illumina genome sequencing. They were able to sequence the virus and send that sequence of nucleotides over to moderna for them to develop the vaccine - turning a classically biology problem, into a software problem, reducing the need for them to bring the virus in house.

2) Illumina has a cancer screening test called Galleri, that can identify a bunch of cancers from a blood test. It identifies mutated dna released by cancer cells. This is huge, if we can identify cancer before someone even starts to show symptoms, the chances of having a useful treatment dramatically go up.

Disclaimer: I work for illumina, views my own.

I wrote some more about why genomics is cool from a technical point of view here (truly big data, hardware accelerated bioinformatics) : https://dddiaz.com/post/genomics-is-cool/

[+] mtlmtlmtlmtl|3 years ago|reply
The thing I'm most excited about long term is biocomputing.

Having Turing complete programmatic control over biological systems has an absolutely endless list of transformative applications.

Imagine being able to program bacteria that can "infect" the patient and attack tumor cells, or act as fodder to keep autoimmune disease in check.

Or let's say we could program stem cells into "liver repair mode" to go and differentiate into new liver cells.

Then the implications for things like drug synthesis with the ability to programmatically control enzyme levels to compile more or less arbitrary biosynthetic pathways into fast growing photosynthetic algea, turning CO2, water and sunlight into medicine.

It's still a long way off being at that level of applicability, but man oh man it's gonna change everything.

[+] pinkwinds|3 years ago|reply
Purposefully blocked for certain countries?

"The Amazon CloudFront distribution is configured to block access from your country."

[+] agumonkey|3 years ago|reply
what kind of math/cs/algorithmic skills do you think one should work on to get a job in this kind of company ?
[+] crispycas12|3 years ago|reply
TBH I'm surprised how hard Illumina is already pushing Galleri as a product. Current ctDNA/cfDNA are imperfect for advanced cancers which should have a lot of shedding to begin with. Additionally CHIP is and outstanding issue. DNA methylation sequencing has promise but I feel more data would be needed to truly make diagnostic findings. So to see Illumina market it as a ready to go product is quite worrying. It may burn a lot of people
[+] civilized|3 years ago|reply
Really glad to see this, but it reminds me of the earlier HN post that said engineers don't go into genomics because it doesn't pay and requires a lot of investment in learning biology.
[+] firstplacelast|3 years ago|reply
https://news.ycombinator.com/item?id=33671264

^Most recent discussion I’ve seen.

I worked in genomics, left this year because you’re underpaid and often disregarded “IT-help” that assists wildly over-educated and underpaid people driving the actual research in 95% of cases.

[+] conradev|3 years ago|reply
If you want some personal motivation to get into genomics, you can get your whole genome sequenced for a few hundred bucks and play around with the raw files yourself. I used Dante Labs[1] and they are great. You can even ask them to delete your data and samples!

[1] – https://dantelabs.com/

[+] zosima|3 years ago|reply
Working with genomics technology is too far away from the money to become rich from. There are too many middlemen in-between technology and application.

But it's a fun subject, and as the technology develops, middle layers will disappear and then the money from expertise will become better.

The number of people that are both capable software developers and has a good understanding of cellular biology are quite few and will probably remain so for the foreseeable future.

[+] bsder|3 years ago|reply
The reason why San Diego has such a craft brew scene is that it has a lot of underpaid microbiologists.
[+] ramraj07|3 years ago|reply
There are a lot of starry eyed individuals who are ready to “sacrifice” stable welll paid career to “make a difference” by working on fields like biology.

Then there are also engineers from XKCD 1831 https://xkcd.com/1831

[+] wheresmycraisin|3 years ago|reply
You basically end up with the salary of a helpdesk person at a university.
[+] faizshah|3 years ago|reply
One of my favorite books in this space is “BioInformatics Data Skills.” It’s just nice concise coverage of a lot of basic tech skills like git, bash, tmux etc. and then coverage of basic bioinformatics skills.

For me coming from a SWE background the computational skills are very easy to pick up especially if you work with bioinformaticians you can ask questions. It’s the genomics knowledge that is very difficult for an engineer to acquire.

[+] ramraj07|3 years ago|reply
Starts with “ This Guide is written specifically by and for computer scientists and engineers”

And yeah it shows - contrived example after another, and honestly not a great description of anything.

If you want to truly understand genomics you have to understand how biology works. And honestly it’s great info for anyone even if you’re not getting into genomics or whatever.. why would you not want a working model of how life is put together? In that case I’d just recommend dusting off a biochem or cell bio text book and reading just the first 5-8 chapters. Typically they lay it out very simply from basic principles and the authors have far more experience and understanding and writing help than this weird tutorial course thing.

[+] ArchD|3 years ago|reply
Do you have an example of a contrived example and explanation of why it is contrived, for the non-biologist to see why it is contrived?

I once tried reading a few chapters of a bioinformatics book explaining DNA, RNA, protein creation, etc. The basic idea seems very simple but to my mind they explained it non-systematically with too many words. There seems to be an internal information structure in these RNA- and DNA- related processes that was not being concisely presented and it seemed that if the writers presented the material in terms of computer-science concepts, so much time could be saved.

[+] tonto|3 years ago|reply
I think perhaps this (learngenomics.dev) resource is a little too shallow on some levels, but has interesting depth in odd places. I think there is a need to get users up to speed with things like the SAM format, which is very fundamental to 99% "dna sequencing" projects, but it's an odd format in some ways because it's quite low level, so trying to get people to understand how the basics of biology interact with it is worthwhile. I did my own attempt in this sometimes-updated blog post https://cmdcolin.github.io/posts/2022-02-06-sv-sam
[+] User23|3 years ago|reply
H-bonds! It's totally h-bonds all the way down.

Amusingly that's literally like 80% true. Water is just a really big deal in biochem.

[+] ALittleLight|3 years ago|reply
I didn't get this from skimming the first page - but what will this let me do? If I take this course will I be able to mess with a cell or will I just learn some stuff about biology.

I saw a recent Lex Friedman podcast where the guest talks about "bioelectric patterns" and somehow getting a worm to grow a second head by messing with those patterns. I would absolutely start on this course now if it was a realistic pathway to doing something like that.

[+] pgayed|3 years ago|reply
This is the worst outcome of regulation of the life sciences.

There is no REPL for the cell. No tinkering allowed.

When Marvin Minsky was growing up in New York, neighborhood pharmacists owned fluoroscopes. He said those fluoroscopes were like “great black boxes” to him and that “those kinds of black boxes don't exist for kids anymore.”

[+] guy4242|3 years ago|reply
It's difficult to get into this field if you don't have a graduate degree. I was a double major, Computer Science and Biochemistry, with a minor in Biotechnology. I sent my resume to many biotech and pharma companies, but could not even get an interview. A lot of the jobs said you need 0 years experience if you have a PhD, but 10 years experience if you have a Bachelors. Now that I have 10 years experience as a developer, I've forgotten almost everything I learned in my science education, and I've lost interest.
[+] gravelc|3 years ago|reply
Don't want to be too disparaging, but this to me doesn't seem to be an 'Introduction to Genomics', but more an introduction to read mapping and variant detection in human (or more broadly diploid) genomes.

Genomics stretches vastly beyond this - assembly and annotation to start with.

I'd argue the most interesting problem space for software engineers is outside of what is covered in the document.

[+] Ultimatt|3 years ago|reply
The space of startups cashing in on genomics but making shiny web apps that software engineers need to understand something about human diploid genetic variation is far higher though. Thats where the money and engineers are, not in fundamental algo development for slimemould assembly.
[+] rainmaker124|3 years ago|reply
CS person with biology PhD here. The mix of biology and computation is huge, and with the right skill set, interdisciplinary unicorns make tons of money. If you want to see how computation and biology mix, first dive into a standard university Intro Biology course, and then with that foundation, look into computational biology & bioinformatics (they're distinct). You'll find that genomics is only one piece of a much bigger and absolutely fascinating story.

To get that basic biology foundation, another post mentioned an EdX Intro Biology course, that would be a terrific start, or just get a recent university-level intro biology textbook. It's not terribly difficult material and you'll be in far better shape than reading a biology-for-laypersons pamphlet.

[+] yuppiepuppie|3 years ago|reply
Genomics is where I started learning how to program. Having worked as bench scientist in a genetics lab I understood nothing about my lab mates research when they were showing me python scripts of their analysis. Which initially got me curious. Now having been in the in the industry developing apis for large companies for the past 8 years, I’d be keen to get back into it. Any ideas where to start or find jobs in the space? I would love to go back into the space.
[+] chairhairair|3 years ago|reply
I have a similar story with chemistry. I’d also like to get back into the sciences, but I’m not sure how relevant programmers are.
[+] lordofgibbons|3 years ago|reply
I find the field extremely interesting, but I wish the pay in genomics was better. Compared to fang/unicorn type companies, their pay is way below market and it's really hard to justify the massive pay cut.
[+] f6v|3 years ago|reply
> their pay is way below market

The pay is exactly where market is. There’re ton of wet-lab people wanting to get into “data”. And the industry is less lucrative than showing ads like Google does.

[+] qualudeheart|3 years ago|reply
Does this touch on recent developments in information biology?
[+] penciltwirler|3 years ago|reply
Nicee, but I feel like really the only thing you need to know as an eng is DNA -> RNA -> Protein. Sometimes RNA -> DNA via reverse transcriptase. Everything else is just normal Python scripting.
[+] janeway|3 years ago|reply
Oh no. A major flaw that kills protects; to run a valid statistical test you need to understand the underlying reality of the data. Otherwise you just run tests until you find “something”.

How do you handle one genomic variant affecting dozens of different rna transcripts and isoforms? How do you handle tissue-specific expression? LD haplotype blocks? Frequency across populations and reference choice? Sample handling affecting read depth? Mixed direction of effects in phenotype-genotype? The critical (and beauty IMO) feature of bioinfo is requiring an understanding of how your dataset can rarely be considered clean and as simple as _observation name_ and _observation value_. To succeed it is usually critical to know a lot about the observation meta data which is not collected in the dataset. Hopefully in the future it will be better curated and less esoteric.

[+] greazy|3 years ago|reply
...no. There is more to genomics than python scripting. This is widely incorrect assumption.

A new generation of bioinformaticians and computational biologists are using rust, go, and the web to create, share and deliver.

Checkout nextclade.org

[+] epgui|3 years ago|reply
I’m a biochemist + software engineer, and while I understand where you’re coming from, IMO that’s a very harmful/self-sabotaging attitude.

As soon as you start touching science, everything is important.

[+] joshuahedlund|3 years ago|reply
That’s what I thought too until I learned about

- the dna that doesn’t code for proteins but makes up the vast majority of human dna

- the intron regions of genes that are translated into RNA but then sliced out of the RNA and not transcribed into protein and are 5x larger than the coding parts

Those two things alone are absolutely critical to understand to interpret a genome sequence. Of course there is much more.

[+] aquafox|3 years ago|reply
You do know that there are things like epigenetics, DNA repair (using specialized proteins), RNAi, post-translational modifications, metabolites (just to name a few)?
[+] otherme123|3 years ago|reply
Sooner or later you'll have to learn all the other stuff in the linked page: file formats used only in genomics, structural variants, NGS, evolution, regulation, polygenics, etc.
[+] gravelc|3 years ago|reply
Who knew complex large polyploid genome assembly (i.e. sugar cane) was just a matter of python scripting?
[+] Exendroinient00|3 years ago|reply
Surely sellouts working on ads won't interject the comment section.
[+] zach_garwood|3 years ago|reply
Wow, those are some sour grapes you got there!