top | item 39187561

(no title)

xkcd386 | 2 years ago

The randomized algorithm that Reis & Rothvoss [1] present at the end of their paper will not be implemented in Gurobi/CPLEX/XPRESS. It remains a fantastic result regardless (see below). But first let me explain.

In terms of theoretical computational complexity, the best algorithms for "integer linear programming" [2] (whether the variables are binary or general integers, as in the case tackled by the paper) are based on lattices. They have the best worst-case big-O complexity. Unfortunately, all current implementations need (1) arbitrary-size rational arithmetic (like provided by gmplib [3]), which is memory hungry and a bit slow in practice, and (2) some LLL-type lattice reduction step [4], which does not take advantage of matrix sparsity. As a result, those algorithms cannot even start tackling problems with matrices larger than 1000x1000, because they typically don't fit in memory... and even if they did, they are prohibitively slow.

In practice instead, integer programming solver are based on branch-and-bound, a type of backtracking algorithm (like used in SAT solving), and at every iteration, they solve a "linear programming" problem (same as the original problem, but all variables are continuous). Each "linear programming" problem could be solved in polynomial time (with algorithms called interior-point methods), but instead they use the simplex method, which is exponential in the worst case!! The reason is that all those linear programming problems to solve are very similar to each other, and the simplex method can take advantage of that in practice. Moreover, all the algorithms involved greatly take advantage of sparsity in any vector or matrix involved. As a result, some people routinely solve integer programming problems with millions of variables within days or even hours.

As you can see, the solver implementers are not chasing the absolute best theoretical complexity. One could say that the theory and practice of discrete optimization has somewhat diverged.

That said, the Reis & Rothvoss paper [1] is deep mathematical work. It is extremely impressive on its own to anyone with an interest in discrete maths. It settles a 10-year-old conjecture by Dadush (the length of time a conjecture remains open is a rough heuristic many mathematicians use to evaluate how hard it is to (dis)prove). Last november, it was presented at FOCS, one of the two top conferences in computer science theory (together with STOC). Direct practical applicability is besides the point; the authors will readily confess as much if asked in an informal setting (they will of course insist otherwise in grant applications -- that's part of the game). It does not mean it is useless: In addition to the work having tremendous value in itself because it advances our mathematical knowledge, one can imagine that practical algorithms based on its ideas could push the state-of-the-art of solvers, a few generations of researchers down the line.

At the end of the day, all those algorithms are exponential in the worst case anyways. In theory, one would try to slightly shrink the polynomial in the exponent of the worst-case complexity. Instead, practitioners typically want to solve one big optimization problems, not family of problems of increasing size n. They don't care about the growth rate of the solving time trend line. They care about solving their one big instance, which typically has structure that does not make it a "worst-case" instance for its size. This leads to distinct engineering decisions.

[1] https://arxiv.org/abs/2303.14605

[2] min { c^T x : A x >= b, x in R^n, some components of x in Z }

[3] https://gmplib.org/

[4] https://www.math.leidenuniv.nl/~hwl/PUBLICATIONS/1982f/art.p...

discuss

Bimos|2 years ago

Thanks for your information. I think it really bridge the gap between the people who are interested in this algorithm and MILP "users". I have two more questions.

1. Usually we deal with models with both integer and continuous variables (MILP). Conceptually B&B tackles ILP and MILP in similar ways. Is there any difficulty for lattice based method to be extended to solve MILP?

2. How likely do you think this lattice type algorithm will overcome the difficulties you mentioned and eventually replace B&B, totally or partly (like barrier vs simplex methods)?

xkcd386|2 years ago

> Is there any difficulty for lattice based method to be extended to solve MILP?

I don't think that continuous variables are an issue. Even when all the explicit variables are integer, we have implicit continuous variables as soon as we have an inequality: the slack of that inequality. There is probably some linear algebra trick one can use to transform any problem into a form that is convenient for lattice-based algorithms.

> How likely do you think this lattice type algorithm will overcome the difficulties you mentioned and eventually replace B&B, totally or partly (like barrier vs simplex methods)?

Very unlikely in the next 5 years. Beyond that, they could be the next small revolution, maybe. "Cutting planes" were another tool that had some good theory but were thought to be impractical. Then 25 years ago, people found a way to make them work, and they were a huge boost to solvers. We may be due for another big jump.

Lattice-based method are already effective in some niches. Branch-and-bound solvers are horrible at cryptography and number theory problems (those problems are bad fits for floating-point arithmetic in general), and lattice-based methods shine there. There are also some rare dense optimization problems that benefit from lattice-based methods (typically, one would use lattices in a pre-processing step, then pass the reformulated problem to a regular branch-and-bound solver [1]).

[1] https://link.springer.com/chapter/10.1007/3-540-48777-8_1

djoldman|2 years ago

Thanks for these resources and comments.

Would say that the following is a good summary? -> This is an important theoretical result, but most real-world problems are far from worst case scenarios, therefore improving the worst case currently has little practical use.

xkcd386|2 years ago

> most real-world problems are far from worst case scenarios, therefore improving the worst case currently has little practical use.

This statement is probably mostly correct, but I think that in one way it could be misleading: I would not want to imply that real-world problem instances are somehow easier than the worst-case, in terms of computational complexity. They still very much exhibit exponential increase in computational cost as you scale them up.

Instead, most read-world instances have structure. Some of that structure is well understood (for example, 99% of optimization problems involve extremely sparse matrices), some is not. But sometimes, we can exploit structure even without understanding it fully (some algorithmic techniques work wonder on some instances, and we don't fully know why).

It could be argued that by exploiting structure, it is the constant factor in the big-O computational complexity that gets dramatically decreased. If that is the case, the theory and practice do not really contradict each other. It is just that in practice, we are willing to accept a larger exponent in exchange for a smaller constant factor. Asymptotically it is a losing bargain. But for a given instance, it could be extremely beneficial.

asimpletune|2 years ago

No they're saying theoretical improvements does not directly lead to practical, because theory and practice have diverged due to how computers work. Instead, theoretical will most likely lead to indirect gains, as the techniques used will result in the next-generation of practical improvements.