top | item 30630005

(no title)

First transformer models still dealt only with the training set.

Eventually it was extended to work with an external data source that it queries. This is not a new thing, for example, image style transfer and some other image tasks that were attempted before the domination of NNs did the same thing (linear models would query the db for help and guided feature extraction).

The greatest effect in transformers is the attention mechanism combined with self-supervised learning. Investigations in self-supervised learning tasks (article illustrates one word gap, but there are others) can result in superior models that are sometimes even easier to train.

As for SAT, optimization, graph neural networks might end up being more effective (due to high structure of the inputs). I'm definitely awaiting for traveling salesman solver or similar, guided by NN, solving things faster and reaching optimality more frequently that optimized heuristic algos.

discuss

bglazer|4 years ago

> I'm definitely awaiting for traveling salesman solver or similar, guided by NN, solving things faster and reaching optimality more frequently that optimized heuristic algos.

There was a competition for exactly this at Neurips 2021

https://www.ecole.ai/2021/ml4co-competition/

Not sure how much they improved over handcrafted heuristics, but the summary paper may give some insights

https://arxiv.org/abs/2203.02433

yobbo|4 years ago

> As for SAT, optimization, graph neural networks might end up being more effective

Learning from data is a different problem from optimization. For example, if facts about cities gave additional clues beyond their location about the optimal order, then learning could benefit in the travelling salesman problem. Or if the cost of paths is only known implicitly through data examples.

Compare to how NN:s can be used for data compression, for example upscaling images, by learning from photographs only the tiny the subset of all possible images that are meaningful to humans. But it is not useful for general data compression.

cleancoder0|4 years ago

What about AlphaGo, AlphaZero (chess)?

Optimization is also data, given a local state, can you identify the sequence of transformations that will get you to a better state. The reward is instantly measurable and the goal is minimizing the total cost.

graycat|4 years ago

> I'm definitely awaiting for traveling salesman solver or similar, guided by NN, solving things faster and reaching optimality more frequently that optimized heuristic algos.

Just in case we are not being clear, let's be clear. Bluntly in nearly every practical sense, the traveling salesman problem (TSP) is NOT very difficult. Instead we have had good approaches for decades.

I got into the TSP writing software to schedule the fleet for FedEx. A famous, highly accomplished mathematician asked me what I was doing at FedEx, and as soon as I mentioned scheduling the fleet he waved his hand and concluded I was only wasting time, that the TSP was too hard. He was wrong, badly wrong.

Once I was talking with some people in a startup to design the backbone of the Internet. They were convinced that the TSP was really difficult. In one word, WRONG. Big mistake. Expensive mistake. Hype over reality.

I mentioned that my most recent encounter with combinatorial optimization was solving a problem with 600,000 0-1 variables and 40,000 constraints. They immediately, about 15 of them, concluded I was lying. I was telling the full, exact truth.

So, what is difficult about the TSP? Okay, we would like an algorithm for some software that would solve TSP problems (1) to exact optimality, (2) in worst cases, (3) in time that grows no faster than some polynomial in the size of the input data to the problem. So, for (1) being provably within 0.025% of exact optimality is not enough. And for (2) exact optimality in polynomial time for 99 44/100% of real problems is not enough.

In the problem I attacked with 600,000 0-1 variables and 40,000 constraints, a real world case of allocation of marketing resources, I came within the 0.025% of optimality. I know I was this close due to some bounding from some nonlinear duality -- easy math.

So, in your

> reaching optimality more frequently that optimized heuristic algos.

heuristics may not be, in nearly all of reality probably are not, reaching "optimality" in the sense of (2).

The hype around the TSP has been to claim that the TSP is really difficult. Soooo, given some project that is to cost $100 million, an optimal solution might save $15 million, and some software based on what has long been known (e.g., from G. Nemhauser) can save all but $1500 is not of interest. Bummer. Wasted nearly all of $15 million.

For this, see the cartoon early in Garey and Johnson where they confess they can't solve the problem (optimal network design at Bell Labs) but neither can a long line of other people. WRONG. SCAM. The stockholders of AT&T didn't care about the last $1500 and would be thoroughly pleased by the $15 million without the $1500. Still that book wanted to say the network design problem could not yet be solved -- that statement was true only in the sense of exact optimality in polynomial time on worst case problems, a goal of essentially no interest to the stockholders of AT&T.

For neural networks (NN), I don't expect (A) much progress in any sense over what has been known (e.g., Nemhauser et al.) for decades. And, (B) the progress NNs might make promise to be in performance aspects other than getting to exact optimality.

Yes, there are some reasons for taking the TSP and the issue of P versus NP seriously, but optimality on real world optimization problems is not one of the main reasons.

Here my goal is to get us back to reality and set aside some of the hype about how difficult the real world TSP is.

cleancoder0|4 years ago

There's LKH http://webhotel4.ruc.dk/~keld/research/LKH/ which is heuristics and best open implementation. Adding optimality estimates is the least complicated part.

When TSP is mentioned today, unlike 50 years ago when LK heuristic got published, I assume all of the popular & practical variants, like time window constraints, pickup and delivery, capacity constraints, max drop time requirement after pickup, flexible route start, adding location independent breaks (break can happen anytime in the sequence or in a particular time window of day) etc. Some of the subproblems are so constrained that you cannot even move around that effectively as you can with raw TSP.

Some of the subproblems have O(n) or O(n log n) evaluations of best local moves, generic solvers are even worse at handling that (Concorde LP optimizations cannot cover that efficiently). When no moves are possible, you have to see what moves brings you back to a feasible solution and how many local changes you need to do to accomplish this.

For example, just adding time windows complicates or makes most well known TSP heuristics useless. Now imagine if we add a requirement between pairs of locations that they need to be at most X time apart (picking up and then delivering perishable goods), that the route can start at an arbitrary moment etc.

I personally spent quite a lot of time working on these algorithms and I'd say the biggest issue is instance representation (is it enough to have a sequence of location ids ?). For example, one of my recent experiments was using zero suppressed binary decision diagrams to easily traverse some of these constrained neighborhoods and maintain the invariants after doing local changes. Still too slow for some instances I handle (real world is 5000 locations, 100 salesmen and an insane amount of location/salesmen constraints).

enchiridion|4 years ago

Where is a good place to look for algorithms/math for solving problems similar to the ones you mentioned?