The article doesn't really say how the algorithm works aside from "they use machine learning," but the basic principle turns out to be pretty simple. In the work by Amgartden, they classify lengths of sequenced DNA by looking at the counts of k-mers (read: substrings of length k, where they use k between 2 to 10) that occur in each; it simply turns out that k-mer occurrence differs sufficiently between viral DNA and cell DNA this works.
The article mysteriously doesn't cite Roux, and I don't have time right now to track it down, but it's probably similar.
[+] [-] maxander|8 years ago|reply
The article mysteriously doesn't cite Roux, and I don't have time right now to track it down, but it's probably similar.
[+] [-] comp1927|8 years ago|reply
[+] [-] car|8 years ago|reply
code: https://github.com/jessieren/VirFinder
[EDIT] CNN based approach: https://github.com/jessieren/VirFinder/tree/master/VFdeep