top | item 28593462

A Python-based programming language for high-performance computational genomics

22 points| wh1teknight | 4 years ago |nature.com | reply

17 comments

order
[+] shepardrtc|4 years ago|reply
[+] xvilka|4 years ago|reply
For high-performance there is BioJulia[1][2] ecosystem. They did comparison with Seq specifically[3]. If you are willing to help improving the state of the biology and genomics in Julia language, they accept donations[4] as well.

[1] https://biojulia.net/

[2] https://github.com/BioJulia

[3] https://biojulia.net/post/seq-lang/

[4] https://opencollective.com/biojulia

[+] sundarurfriend|4 years ago|reply
tl;dr of the comparison post: BioSequences (from BioJulia) used to be slower than Seq, with most of that time spent on input validation (which Seq does not do). After this paper, BioSequences was performance-tuned, so that it's now on par with Seq in speed, while still retaining input validation and other benefits.

> With the updates, BioSequences [2.X] rivals Seq in speed while keeping its advantages of a lower memory footprint and doing data validation.

[+] prirun|4 years ago|reply
I messed around with Seq when it was posted here a while back. At the time, I was looking for a more performant language than Python for HashBackup (author), and was looking into D, Go, and Nim. I had a few microbenchmarks to get me a little familiar with the syntax and check performance on things that were a problem in Python, like huge dicts of unique integers (each integer in Python is 24 bytes).

The HN post on Seq came up right as I was doing this so I figured I'd check it too. It did really fantastic on the dict microbenchmark, using something like 350MB of RAM while Python used 1.8 GB, or something like that.

I have no use for any of the genome features, and when I talked with them, they have no use for crypto features. The things that are important to me were not a high priority on their roadmap, so I didn't pursue it.

[+] BiteCode_dev|4 years ago|reply
If anybody have used it, what's the benefit of it being a DSL, over, say, a regular library?
[+] globular-toast|4 years ago|reply
The only thing I can see in the demo code that you can't do in regular Python is the sequence literal. So `s"ACGT"` vs `Seq("ACGT")` or something. Oh, and having the Python 2 `print` instead of Python 3 `print()`.

Is there something else?

[+] tenaciousDaniel|4 years ago|reply
The marketing page claims to offer "up to a 160x" perf improvement to Python. No idea how though.
[+] tenaciousDaniel|4 years ago|reply
Noob question. What does it mean to say that a language is "python-based"? Python is itself a language. Does it mean the parser/compiler is written in Python?
[+] BiteCode_dev|4 years ago|reply
> "Seq enables users to write high-level, Pythonic code without having to worry about low-level or domain-specific optimizations, and allows for the seamless expression of the algorithms, idioms and patterns found in many genomics or bioinformatics applications. "

I gave it a look, and indeed it's not complete python, it will break if you start using the language/stdlib features.

Stdlib modules I couldn't import:

- sqlite3

- urllib

- pathlib

- hashlib

- json

Infra unavailable:

- Debug mode

- pip/venv

- shell

Syntax/built in that didn't work:

- byte and complex literals

- type()

- Some unpacking (E.g: [*[0]])

- raise from

- async/await

It also adds incompatible syntax that is not python, such as 's""' and '|>'.

So despite what that the README says "the vast majority of Python programs should work without any modifications", it's actually the opposite.

The project has real value though, you just need to understand that what you buy here.

[+] chomp|4 years ago|reply
In this instance, it means that it has a Python-compatible syntax.
[+] MR4D|4 years ago|reply
I wonder how well paywalled programming languages do relative to more openly available languages.