top | item 16286291

(no title)

28mm | 8 years ago

Ah, interesting observation. I’ll look at changing it to something more like what you’ve suggested.

If memory serves, the reason it doesn’t first split the description on white space is that some categories contain whitespace, and would never match.

Thanks!

discuss

haikuginger|8 years ago

If you need to check against multiword tags, I'd suggest a utility function to expand a list of words into each possible one-or-more-word subset. Should still be substantially faster than the current state, and you can improve it even more by limiting it to phrases with no more words than the tag with the maximum number of words.

    def get_all_phrases(descr):
        words = descr.split()
        if len(words) == 1:
            return words
        phrases = []
        for i in range(2, len(words) + 1):
            phrases += get_phrases_of_len(i, words)
        return words + phrases

    def get_phrases_of_len(length, words):
        return [' '.join(words[i:i+length]) for i in range((len(words) - length) + 1)]