top | item 16623692

Machine Learning Spots Treasure Trove of Elusive Viruses

33 points| portofcall | 8 years ago |nature.com

3 comments

order
[+] maxander|8 years ago|reply
The article doesn't really say how the algorithm works aside from "they use machine learning," but the basic principle turns out to be pretty simple. In the work by Amgartden, they classify lengths of sequenced DNA by looking at the counts of k-mers (read: substrings of length k, where they use k between 2 to 10) that occur in each; it simply turns out that k-mer occurrence differs sufficiently between viral DNA and cell DNA this works.

The article mysteriously doesn't cite Roux, and I don't have time right now to track it down, but it's probably similar.

[+] comp1927|8 years ago|reply
Google the reference text leads to PubMed site, download paper free.