I recall that the group that created Spark had a bioinformatics project on Spark but I don't know what happened to it. All I could find now is a paper[1] hosted by databricks.
Yep, that's the one I was thinking of (along with GNOMAD, which IIRC uses ADAM or some similar tech). My main complaint with ADAM was that they came up with their own file format (which had some flaws). But the general idea is the right one.
I'm interested in chatting with you about this, and genomics on Spark more generally, feel free to reach out on Github or via my username at the usual suspects.
heuermh|4 years ago
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
https://github.com/bigdatagenomics/adam
dekhn|4 years ago
heuermh|4 years ago