A Python-based programming language for high-performance computational genomics

[+] shepardrtc|4 years ago|reply

Non-paywall link: https://www.biorxiv.org/content/10.1101/2020.10.29.361402v1....

Link to the language itself: https://seq-lang.org/

[+] asicsp|4 years ago|reply

HN discussions:

* https://news.ycombinator.com/item?id=28537179 (5 days ago, 58 comments)

* https://news.ycombinator.com/item?id=22107510 (Jan 21, 2020, 68 comments)

[+] m-watson|4 years ago|reply

Ah you got this in as I was also typing it, well done! Deleting mine now.

Just to make my comment less useless here is the github: https://github.com/seq-lang/seq

[+] xvilka|4 years ago|reply

For high-performance there is BioJulia[1][2] ecosystem. They did comparison with Seq specifically[3]. If you are willing to help improving the state of the biology and genomics in Julia language, they accept donations[4] as well.

[1] https://biojulia.net/

[2] https://github.com/BioJulia

[3] https://biojulia.net/post/seq-lang/

[4] https://opencollective.com/biojulia

[+] sundarurfriend|4 years ago|reply

tl;dr of the comparison post: BioSequences (from BioJulia) used to be slower than Seq, with most of that time spent on input validation (which Seq does not do). After this paper, BioSequences was performance-tuned, so that it's now on par with Seq in speed, while still retaining input validation and other benefits.

> With the updates, BioSequences [2.X] rivals Seq in speed while keeping its advantages of a lower memory footprint and doing data validation.

[+] prirun|4 years ago|reply

I messed around with Seq when it was posted here a while back. At the time, I was looking for a more performant language than Python for HashBackup (author), and was looking into D, Go, and Nim. I had a few microbenchmarks to get me a little familiar with the syntax and check performance on things that were a problem in Python, like huge dicts of unique integers (each integer in Python is 24 bytes).

The HN post on Seq came up right as I was doing this so I figured I'd check it too. It did really fantastic on the dict microbenchmark, using something like 350MB of RAM while Python used 1.8 GB, or something like that.

I have no use for any of the genome features, and when I talked with them, they have no use for crypto features. The things that are important to me were not a high priority on their roadmap, so I didn't pursue it.

[+] BiteCode_dev|4 years ago|reply

If anybody have used it, what's the benefit of it being a DSL, over, say, a regular library?

[+] globular-toast|4 years ago|reply

The only thing I can see in the demo code that you can't do in regular Python is the sequence literal. So `s"ACGT"` vs `Seq("ACGT")` or something. Oh, and having the Python 2 `print` instead of Python 3 `print()`.

Is there something else?

[+] tenaciousDaniel|4 years ago|reply

The marketing page claims to offer "up to a 160x" perf improvement to Python. No idea how though.

[+] koeng|4 years ago|reply

Check out Poly [1] for a great Golang library specifically built for forward engineering and synthetic biology:

[1] https://github.com/TimothyStiles/poly/issues

[+] unknown|4 years ago|reply

[deleted]

[+] tenaciousDaniel|4 years ago|reply

Noob question. What does it mean to say that a language is "python-based"? Python is itself a language. Does it mean the parser/compiler is written in Python?

[+] shoulderchipper|4 years ago|reply

> Seq is a Python-compatible language, and the vast majority of Python programs should work without any modifications

https://github.com/seq-lang/seq

[+] BiteCode_dev|4 years ago|reply

> "Seq enables users to write high-level, Pythonic code without having to worry about low-level or domain-specific optimizations, and allows for the seamless expression of the algorithms, idioms and patterns found in many genomics or bioinformatics applications. "

I gave it a look, and indeed it's not complete python, it will break if you start using the language/stdlib features.

Stdlib modules I couldn't import:

- sqlite3

- urllib

- pathlib

- hashlib

- json

Infra unavailable:

- Debug mode

- pip/venv

- shell

Syntax/built in that didn't work:

- byte and complex literals

- type()

- Some unpacking (E.g: [*[0]])

- raise from

- async/await

It also adds incompatible syntax that is not python, such as 's""' and '|>'.

So despite what that the README says "the vast majority of Python programs should work without any modifications", it's actually the opposite.

The project has real value though, you just need to understand that what you buy here.

[+] chomp|4 years ago|reply

In this instance, it means that it has a Python-compatible syntax.

[+] MR4D|4 years ago|reply

I wonder how well paywalled programming languages do relative to more openly available languages.

[+] syntonym2|4 years ago|reply

The programming language itself and all documentation can be found at https://seq-lang.org/ .

[+] Proven|4 years ago|reply

[deleted]

17 comments