top | item 28543595

(no title)

east2west | 4 years ago

I recall that the group that created Spark had a bioinformatics project on Spark but I don't know what happened to it. All I could find now is a paper[1] hosted by databricks.

[1]https://databricks.com/wp-content/uploads/2018/08/SSE15-40-D...

discuss

order

heuermh|4 years ago

We're here, still plugging along.

ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.

https://github.com/bigdatagenomics/adam

dekhn|4 years ago

Yep, that's the one I was thinking of (along with GNOMAD, which IIRC uses ADAM or some similar tech). My main complaint with ADAM was that they came up with their own file format (which had some flaws). But the general idea is the right one.

heuermh|4 years ago

I'm interested in chatting with you about this, and genomics on Spark more generally, feel free to reach out on Github or via my username at the usual suspects.