top | item 2209890

(no title)

lvvlvv | 15 years ago

> Note that the worst case of complexity for this algorithm is much, much worse than the worst case complexity for Boyer Moore.

Can you explain your thought process for why worst case complexity is worse than other? Did you do any measurements? I believe you are incorrect. I am author of this page. I've just did quick test for SS = "aaa ...aaaBaaa...aaa", with SS_size=240. My algorithm is faster than 3 BMs out of 4 tested.

> Do not use this algorithm carelessly. For example, if you use it in a thoughtless way in your web server, you may open yourself to a DoS attack.

Same can be said about all BM, naive and BSD's memmem/strstr. Possibility of DOS for substring search algorithm was known for a long time - but it never materialized. The cure is trivial - limit substring size.

> That immediately made me think he might have devised it for genomics research.

No, it wasn't. It was devised when I was preparing for Google interview (which I failed). And of cause algorithms which pre-index haystack will always be faster. I myself do not consider haystack pre-indexing algorithms a "text search" algorithms (maybe incorrectly).

> gross exaggeration

If you count pre-indexing algorithms and edge cases - then you are correct. For most common case - can you show me something faster (from a student or even yourself)?

discuss

order

eru|15 years ago

> Can you explain your thought process for why worst case complexity is worse than other? Did you do any measurements? I believe you are incorrect. I am author of this page. I've just did quick test for SS = "aaa ...aaaBaaa...aaa", with SS_size=240. My algorithm is faster than 3 BMs out of 4 tested.

You write on the page that your algorithm has "O(NM) worst case complexity." Compare Boyer-Moore's time complexity. The grandfather was interested in asymptotics. Not in some examples.

> Same can be said about all BM, naive and BSD's memmem/strstr.

Red herring.

> Possibility of DOS for substring search algorithm was known for a long time - but it never materialized.

What are you talking about?

> If you count pre-indexing algorithms and edge cases - then you are correct. For most common case - can you show me something faster (from a student or even yourself)?

What do you mean by egde cases? All the cases that make your program run slow?

Using your terminology, KMP has complexity O(N + M). That's better than O(MN).