top | item 21335171

(no title)

Yes - the crux is just to add some logic and throw out beams which don't match your constraint, then rank candidates based on sequence probability.

You can roll-back the generation process and/or mask the probability distribution using simple secondary logic, but I find beam search gives generally better results, especially when the word I want to force is very low probability - most of my sequence models kind of go off the rails when they are forced into a low-probability sequence ("the man went to the xylophone zebra sdawoqhdjwna"). Also I find this problem gets worse in domains without "reset" tokens like spaces, where there are always high entoropy possibilities (the letter after a space has a lot of good choices) followed by lower ones (after the first letter, there often become less good choices - at least until you hit another space). Particularly in music generation, models that sample a "surprising" sequence tend to go off the rails. It is also a behavior that seems worse in RNNs, than transformers for me.

discuss

No comments yet.