top | item 5655165

How Perl Saved the Human Genome Project (1996)

80 points| bsima | 13 years ago |bioperl.org

63 comments

order
[+] codex|13 years ago|reply
Perl and the human genome are almost perfectly matched; both are almost incomprehensible, with no central design, accreted haphazardly over a long time.
[+] kbenson|13 years ago|reply
> with no central design

Any supporting evidence to that? If you look into it, you may be surprised.

Here's a hint, Perl doesn't necessarily optimize for the same things other languages optimize for.

[+] kamaal|13 years ago|reply
Perl has no central design? Seriously?

The very purpose it has survived three decades is because it solves a great variety of problems, in a very centralized design philosophy which no other tool has addressed till today.

[+] bane|13 years ago|reply
Honest question from somebody that doesn't know any better, with the data from the HGP available for quite some time now, it doesn't appear to me (as a layperson) to have had the promised impact or suddenly providing a genetic map that will allow us to quickly find and target genetic diseases and other undesirable traits: would anybody knowledgeable in the field be able to provide some insight into what kind of impact the HGP data has had?
[+] epistasis|13 years ago|reply
In terms of science research, the impact has been unparalleled. It's hard to find a molecular biology paper that doesn't owe a ton to having the full human genome sequence. There have also been technology side effects. Just like defense projects fueled the early market for silicon-based semiconductors, the HGP kick-started a market for sequencing machines that is generating a huge revolution in sequencing technology where costs are now falling 5x-10x per year, which is quite a bit faster than Moore's law.

In terms of finding genetic causes of disease, this happens every day, and is practically mundane, but there are two complications towards getting to cures. First, most disease is far far more complicated than a single gene; any single gene may account for just a percent or two of what we call the same disease. Second, knowing which gene is broken does not provide a cure for that disease; even for a given small molecule, determining if it will interact with a gene's protein or have any effect on that protein's structure or function is a task that physics has not been able to tackle. Additionally, the genome has only been available for a mere decade, and for many if not most diseases, the process of going from a known gene target an approved drug is going to take far longer than 10 years.

So the HGP has fueled a huge amount of discovery, is the foundation of nearly all human biology research, and is completely indispensable, but in terms of new cures for various diseases it has not delivered, yet, but really it shouldn't have to.

[+] micro_cam|13 years ago|reply
And set us back years as well.

Too much perl code is essentially write once and forget. It gets results quick but it is a disaster for repeatability which is an essential part of science. I've worked on bioinformatics perl projects where bugs canceled each other out (ie code that was supposed to clear an array and repopulate it with corrected values did neither so the original values were returned). And I've spent far too many hours trying to figure out what a perl script that is the reference implementation for a certain procedure actually does.

Their certainly still are scientists who use it but python and R are gaining ground for good reason.

Wiring together analysis pipelines with pipes as they describe is, however, an excellent technique regardless of language.

[+] Mithaldu|13 years ago|reply
> Too much perl code is essentially write once and forget.

Please stop repeating this misconception. People who put little effort into learning how to program write "write-once" code in any language. Perl had the "misfortune" of being the only dynamic language on the block for a long time, leading to many people reaching for it to get things done without bothering to actually learn the language, thus creating a vast corpus of low quality code.

(It does not help that the definitive resource of Perl for bioinformatics people, which i've seen in libraries like those of the Genome Campus in Cambridge, isn't worthy of being used as toilet paper, yet influenced a whole generation of scientists.)

> I've spent far too many hours trying to figure out what a perl script does

How often do you reach for perltidy when you do this?

[+] Moto7451|13 years ago|reply
I've had to deal with my fair share of C# which was written in a "write once" manner. My favorite being when a particular dev wrote an entire Silverlight application while ignoring how the framework functioned. The best bit of obfuscation were sections of code littered with constructs like "((DesiredType)parent.parent.parent.parent.parent).method" where each parent was an Object reference.

Any language can be used to write a riddle. Even Python[1][2]. It's the community's values that determine code quality. The banner of "impossible to read" hangs over the Perl community and it has been my experience that it reminds us to write much better code.

[1] http://www.python.org/search/hypermail/python-1993/0232.html [2] http://p-nand-q.com/python/obfuscated_python.html

[+] nijk|13 years ago|reply
It is insane to claim that R is a language of repeatable science.

Python is barely better.

None of these languages come anywhere near a reliable verifiable formal model of what they claim to compute.

[+] sciurus|13 years ago|reply
17 years later, Perl still seems to be the go-to tool in bioinformatics.
[+] epistasis|13 years ago|reply
In my personal experience, I see no new Perl scripts these days. Python has completely replaced Perl for new code.
[+] bsima|13 years ago|reply
I'm learning Perl now, specifically for bioinformatics. Appreciated this article
[+] mrmagooey|13 years ago|reply
The PUG that I'm a member of had a very interesting presentation of PyCogent (http://pycogent.org/) which is meant to be a Python based successor to BioPerl. IANA bioinformatics researcher so have no idea as to the actual relative strengths of each, but the PyCogent guys appear to have put the hard yards in (~8 years of development and still going)
[+] manish_gill|13 years ago|reply
Is there any point for new programmers to learn Perl, when they have the choice of Ruby (which supposedly is inspired from Perl) ?
[+] timr|13 years ago|reply
Perl is very different than Ruby. It's also very different than Python. It's also faster and more concise than both.

You should learn Perl because you're likely to encounter a lot of Perl code in the wild, and because you can learn something from it. Knowing how to generate a Perl one-liner that does something incredible will take your CLI skills to a new level in a way that knowing Ruby will not.

Perl is still one of my go-to languages for sysadmin scripts, because it's so concise and powerful. It's a long-beard language.

[+] abraininavat|13 years ago|reply
Every article about Perl leads to the same pattern of comments. Most people think Perl is horrible and lends itself to incomprehensible code. And Perl people have their backs against a wall, furiously defending their language with prevarications like Perl had the "misfortune" of being the only dynamic language on the block for a long time, leading to many people reaching for it to get things done without bothering to actually learn the language, thus creating a vast corpus of low quality code.

Sure, couldn't have anything to do with the language. The whole rest of the world just doesn't get it.

[+] kbenson|13 years ago|reply
I think it's more a case of people think they understand Perl, or can make assumptions because of their existing C/PHP/Shell programming experience and apply it to Perl without problem, and that is not always the case. The fact is, Perl is fundamentally different, but looks just similar enough to fool people.

If it looked like Lisp, people would be less likely to think that it's just a matter of applying their C experience, but alas, it generally looks pretty familiar, if a bit messy, to users of other imperative languages.

If you are trying to understand, write or change a Perl script, and you don't know what context a statement takes place in, or what I mean by context in this case, then you don't know what you are doing. (I mean you in the general sense, not as an indictment against the parent).

[+] Mithaldu|13 years ago|reply
> The whole rest of the world just doesn't get it.

Never claimed that. If you read the quote you chose, you will find that i claimed that many did not even bother to try and get it, because they got things done and did not have any reason to get it.