(no title)
lindig | 2 years ago
https://www2.stat.duke.edu/~scs/Courses/Stat376/Papers/Tempe...
"Annealing, as implemented by the Metropolis procedure, differs from iterative improvement in that the procedure need not get stuck since transitions out of a local optimum are always possible at nonzero temperature. A second and more important feature is that a sort of adaptive divide-and-conquer occurs. Gross features of the eventual state of the system appear at higher tempera-tures; fine details develop at lower tem-peratures. This will be discussed with specific examples."
_0ffh|2 years ago
WanderPanda|2 years ago
marcosdumay|2 years ago
There are all kinds of possibilities for specific problems, but if you want something generic, you have to traverse the possibility space and use its topology to get into an optimum. And if the topology is chaotic, you are out of luck, and if it's completely random, there's no hope.
mturmon|2 years ago
And beyond this intuition (escape from local optima), the reason that annealing matters is that you can show that (under conditions) with the right annealing schedule (it's rather slow, T ~ 1/log(Nepoch) iirc?) you will converge to the global optimum.
I'm not well-versed enough to recall the conditions, but it wouldn't surprise me if they are quite restrictive, and/or hard to implement (e.g., with no explicit annealing guidance to choose a specific temperature).