top | item 16623692

Machine Learning Spots Treasure Trove of Elusive Viruses

33 points| portofcall | 8 years ago |nature.com

3 comments

[+] maxander|8 years ago|reply

The article doesn't really say how the algorithm works aside from "they use machine learning," but the basic principle turns out to be pretty simple. In the work by Amgartden, they classify lengths of sequenced DNA by looking at the counts of k-mers (read: substrings of length k, where they use k between 2 to 10) that occur in each; it simply turns out that k-mer occurrence differs sufficiently between viral DNA and cell DNA this works.

The article mysteriously doesn't cite Roux, and I don't have time right now to track it down, but it's probably similar.

[+] comp1927|8 years ago|reply

Google the reference text leads to PubMed site, download paper free.

[+] car|8 years ago|reply

paper: http://rdcu.be/tZ33

code: https://github.com/jessieren/VirFinder

[EDIT] CNN based approach: https://github.com/jessieren/VirFinder/tree/master/VFdeep