top | item 28651551

(no title)

That's great. However the method that you use to find the canonical representative [1] is quadratic (when the string has length N, there are N rotations and for each rotation you need to check N characters to determine whether this is earlier than the best on that you have found so far). For large strings you would probably want to switch to one of the linear minimal string rotation algorithms [2], for example Booth's Algorithm.

[1] https://github.com/Benjamin-Lee/viroiddb/blob/main/scripts/c...

[2] https://en.wikipedia.org/wiki/Lexicographically_minimal_stri...

discuss

benjamin-lee|4 years ago

You hit the nail on the head. This is just the lexicographically minimal string rotation with a canonicalization step. I have actually had this on my to-do list for a while. The truth is that the data set I'm applying this to is small (though I'm doing my best to change that by discovering new viroid-like agents) so the optimization has yet to become pressing. At some point, I'd like to spin this script out into a standalone tool but I'm not sure the demand is there yet.