top | item 42664400

Nearly all binary searches and mergesorts are broken (2006)

164 points| thunderbong | 1 year ago |research.google | reply

189 comments

order
[+] jsnell|1 year ago|reply
[+] motorest|1 year ago|reply
> (2006)

In the very least,this should feature in the title. In fact, previous submissions from over a decade ago also feature year in the title.

Otherwise it conveys the idea that this is some major epiphany.

[+] joshka|1 year ago|reply
The problem with this article's name is that we're in an era where actually checking whether "nearly all sorts are broken" against all publicly available implementations of binary searches would be almost feasible, but that's not actually what the article is claiming.

> Moreover, to be really certain that a program is correct, you have to test it for all possible input values, but this is seldom feasible.

This is incorrect. Generally it's just a little excessive to try to solve the halting problem in a library's unit tests ;). You don't have to test a program for all possible inputs, only for all materially unique state transitions. In a binary search, each variable can be zero, one, some(*), max, overflowed. The combination of these is not infinite as the values of some that cause different behavior is much more reasonable when grouped into chunks of the same behavior.

[+] PaulKeeble|1 year ago|reply
It was certainly possible to run the algorithm on 16GB of array before the moment when it happened but did the original developer have that sort of space on their desktop at the time? Possibly not.

If a unit test only runs on a server and not on the laptops of the developers then its not going to be written, whereas ideally someone should write a test that is tagged to only run on the server but that is a lot of extra work if that isn't a common thing on a project. Even now I would be quite wary of producing a max size input test for an array of ints and especially objects, that is going to raise some questions due to being a slow test and a highly resource consuming one.

If I was worrying about the same in aerospace programming however then no question that test would get written and we would ensure it got run on the hardware that it was designed to run on. In typical business software its less common to run potentially very long running tests and simulations of the machines states everyone wants to go faster than that especially for the sake of a max input test.

[+] rileymat2|1 year ago|reply
> In a binary search, each variable can be zero, one, some(*), max, overflowed. The combination of these is not infinite as the values of some that cause different behavior is much more reasonable when grouped into chunks of the same behavior.

You are presuming that the algorithm is correct in the first place. Contrived example.

   binary_search(array, key) {
     if (key == 453) explode_world()
     //Do correct binary search.
   }
So you also need to prove explode_world is not called or something similar but less contrived is not in there.
[+] dotancohen|1 year ago|reply

  > each variable can be zero, one, some(*), max, overflowed
These are the value ranges that I test in my unit tests, plus at and before some powers of 2 (e.g. 65535 and 65536), and -1. That is because -1 is uniquely used in some systems to indicate error, and thus trips some bugs.
[+] MaxikCZ|1 year ago|reply
Are you quantizing information?
[+] manquer|1 year ago|reply
You mean property testing ? QuickCheck would be the go to library ( the original is in Haskell, however most languages have one ).
[+] LiamPowell|1 year ago|reply
This bug, and many others, can be detected with a trivial amount of formal verification. I really think formal verification should see much wider use for things that see as much use as standard libraries, even if it's just for the trivial things like overflow and out-of-bounds access.

In the below code we can see a formal verification tool (GNATProve) detect both the original error and the out-of-bounds access that it causes. Doing the arithmetic using a larger type clears both reported errors without the need for any additional annotations for GNATProve.

    function Search (A : A_Type; Target : Integer) return Integer is
       Left : Integer := A'First;
       Right : Integer := A'Last;
    begin
       while Left <= Right loop
          declare
             Mid : Integer := (Left + Right) / 2;
          begin
             if A (Mid) = Target then
                return Mid;
             elsif A (Mid) < Target then
                Left := Mid + 1;
             elsif A (Mid) > Target then
                Right := Mid - 1;
             end if;
          end;
       end loop;
    end Search;
    
GNATProve output:

    Phase 1 of 2: generation of Global contracts ...
    Phase 2 of 2: flow analysis and proof ...

    wrapper.adb:12:36: medium: overflow check might fail, cannot prove lower bound for Left + Right
       12 |            Mid : Integer := (Left + Right) / 2;
          |                             ~~~~~~^~~~~~~~
      reason for check: result of addition must fit in a 32-bits machine integer
    wrapper.adb:12:45: info: division check proved
    
    wrapper.adb:14:19: medium: array index check might fail
       14 |            if A (Mid) = Target then
          |                  ^~~
      reason for check: value must be a valid index into the array
[+] jheriko|1 year ago|reply
can also be spotted with experience.

these kinds of problems are present in very many standard treatments of algorithms and don't survive their first contact with real world use in some cases.

"needs a carry bit or a wider type" is common for arithmetic operations that actually use the range.

[+] bhouston|1 year ago|reply
It makes the case that we need Math.mean or Math.avg function that we can use in these cases rather than than reinventing it.

Basically we should favor using built in functions for as much as possible because those should be reliable and tested more than ad hoc code we write. And compilers should optimize those built in functions well so there is no extra cost in using them.

[+] sltkr|1 year ago|reply
C++ added std::midpoint() to the standard library: https://en.cppreference.com/w/cpp/numeric/midpoint

Another fun case, besides integer overflow, is negative values: in Java and C/C++ (i + j)/2 will round towards j, but i + (j - i)/2 will round towards i instead. Sometimes the difference matters.

[+] ape4|1 year ago|reply
I was thinking the same. And the code would be clearer too.
[+] heinrichhartman|1 year ago|reply

    int mid =(low + high) / 2;
> Specifically, it fails if the sum of low and high is greater than the maximum positive int value (2^31 - 1).

I would really challenge calling this kind of effects "bug" or "breakage".

It's like calling Newtons law of gravity broken, because it's not accurate at predicting how galaxies move.

Things are engieered and tested for a certain scale.

Knowing which tools to use at which sacle is part of the craft of engineering.

[+] wat10000|1 year ago|reply
Sometimes they’re engineered and tested for a certain scale.

More often they’re engineered and tested for an arbitrary scale. The limits aren’t considered, behavior at the edges isn’t accounted for, and it’s assumed it will be good enough for real world inputs.

The use of `int` tends to be a dead giveaway. There are some cases where it’s clearly correct: where the spec says so (like argv), where you’re starting from a smaller type and it’s impossible for the calculations to overflow in an int (like adding two uint8), that sort of thing. And there are cases where it’s subtly correct, because you know the range of the value is sufficiently limited, either by mechanics or by spec.

But most of the time, int gets chosen because it’s the apparent default and it’s easy to type. No analysis has been done to see if it’s correct or if you want to declare your code to only support inputs in a certain range.

It’s really clear if you’ve written a binary search (or anything else that works on general arrays) in C and you use int as the index type. There’s pretty much no scenario where that makes sense. In theory you could analyze the entire program and prove that over-large arrays are never passed in, and keep doing it to make sure it stays that way, but that’s not realistic. If the programmer actually took one second to think about the appropriate data types, they’d use size_t rather than int.

You can still have this bug with size_t, of course. But it won’t be “this falls apart with arrays over 1G elements on 64-bit systems that can easily handle them.” If you declare that you wrote the obvious midpoint calculation with size_t because you didn’t intend to support byte arrays larger than half the address space, it’s at least plausible.

[+] alanfranz|1 year ago|reply
> certain scale

Make it explicit. If the array is too large, throw an IllegalArgumentException. Document the limit. Then I agree with you.

Otherwise, if an allowed input crashes the program at runtime with a random exception, I respectfully disagree.

[+] ajuc|1 year ago|reply
The problem is that it's not that much harder to make it work for all the valid inputs.

Not doing that is not good enough.

Another example is summing lots of floats naively instead of using Kahan's algorithm.

It's like we had a theory of gravity that doesn't work on Mars because we have unnecessary division by (m-Mars' mass) in our laws :) It wouldn't be good physics.

[+] javcasas|1 year ago|reply
> Certain scale

Or just put the algorithm in a 16-bit microcontroller, put some table that needs to be looked up (think precomputed sine table), put that table near the end of the memory range, and just make the mistake to call binary search specifying the start and end memory positions of the table.

[+] perching_aix|1 year ago|reply
These implementations are definitely broken when the specification goes like "you just pass in your array here and it will perform binary search on it for your value." Yes, you could constrain this spec, but come on...

It's such a blatant example for a bug, that I struggle to believe how can anyone even remotely conceive and pivot to the idea that "nuh-uh, this is a bad spec not a bad implementation!".

[+] secondcoming|1 year ago|reply
I would disagree. The inputs to these functions are user controlled and so can be forced to break, whereas humans cannot change how gravity works.
[+] djoldman|1 year ago|reply
> The bug is in this line:

> In Programming Pearls Bentley says that the analogous line "sets m to the average of l and u, truncated down to the nearest integer." On the face of it, this assertion might appear correct, but it fails for large values of the int variables low and high. Specifically, it fails if the sum of low and high is greater than the maximum positive int value (231 - 1). The sum overflows to a negative value, and the value stays negative when divided by two. In C this causes an array index out of bounds with unpredictable results. In Java, it throws ArrayIndexOutOfBoundsException.

At some point we have to draw an arbitrary line. Even an "arbitrary precision" calculation is bounded by system memory.

"bug" is not well-defined, or perhaps "bug" is more of a continuous label as opposed to discrete.

[+] poincaredisk|1 year ago|reply
Why do we need to draw that line somewhere? Fixed solution works for a full range of Java int.
[+] borntyping|1 year ago|reply
Nearly all binary searches and mergesorts are broken in languages with bounded integers.
[+] dietr1ch|1 year ago|reply
Doing any math with bounded integers considered harmful.

At some point during my undergrad I realized this and tried to be really careful when implementing algorithms, but it's stupidly hard to do in a tidy way, at least in old C. It's not practical and people just rather close their eyes and live in blissful avoidance.

[+] perching_aix|1 year ago|reply
And on machines with finite memory, right? Which would be every actual computer ever built?
[+] xigoi|1 year ago|reply
So previously it worked for arrays up to length 2³⁰ and now it works for arrays up to length 2³¹. Is it really that much of a difference? Wouldn’t it be better to change the indexing type to a 64-bit integer?
[+] ddxxdd|1 year ago|reply
The difference is that we know for a fact that the proper implementation works for integers up to 2^31, whereas the improper implementation deceived us into thinking that the code would work in situations where the code actually doesn't work.

I find it valuable to understand when my code will crash.

[+] foldr|1 year ago|reply
Article was written in 2006 when 32-bit architectures were dominant.

I suppose the issue is moot now on 64-bit architectures, as the difference between 2^62 and 2^63 isn't relevant. (2^62 bytes is 4611686 terrabytes.)

[+] eesmith|1 year ago|reply
The use of:

   int high = a.length - 1;
tells us that a.length-1 is supposed to fit into an int, so there is no need to handle 2³¹ or larger.
[+] MattPalmer1086|1 year ago|reply
Not really. Arrays are limited to an int in size. You would be using more memory for the calculation and have to cast the value down to a 32 bit value to use as an array index.

Or you could just write the code so it isn't vulnerable to integer overflow.

[+] Ameo|1 year ago|reply
A cynical takeaway from this is that it's hard to write good code and it doesn't really matter if you do or not.

Most code at every company I've worked at and project I've built is littered with technical incorrectness and buggy half-measure implementations. It's human nature or time pressure or something like that, but the application continues providing business value despite that.

[+] SoftTalker|1 year ago|reply
Because it's sort of like premature optimization. If your business case will never be dealing with billion-element arrays, it's a waste of time to make sure your code can handle them.
[+] 3ple_alpha|1 year ago|reply
No they're not. If you're using an array with length over a billion in Java, your code stinks already before you start using binary search.
[+] coldtea|1 year ago|reply
That's not only wrong in itself, but totally orthogonal.

A binary search implementation should still work, regardless of the array length, or have the limitation documented.

And of course an array "with length over a billion" can be totally valid, depending on the use case, your tradeoffs, available memory, etc. It could even be the optimal data structure for some use cases.

[+] poincaredisk|1 year ago|reply
I'm not a Java programmer, but how would you load of a 1GB file into memory? I assume read returns some kind of an array.

Also big arrays being (supposedly) a coffeee smell doesn't mean that code handling them improperly is not buggy.

[+] danvonk|1 year ago|reply
Relational databases often require searching and sorting gigabytes of data to answer queries (sometimes larger than RAM if e.g. k-way merge sort is used) so it doesn't seem that far-fetched, especially given that there are database systems written in Java.
[+] tehjoker|1 year ago|reply
not doing much scientific programming eh?
[+] sonu27|1 year ago|reply
Title needs updating with the year 2006
[+] usr1106|1 year ago|reply
I often think AI is mostly crap, wasting a lot of energy for very questionable benefits. But could/should this repetitive task of reminding submitters to follow the submission guidelines and add the year to submissions of old articles be automated?
[+] nsxwolf|1 year ago|reply
Write this in a Leetcode interview and I suspect the average interviewer will fail you and not believe your reasoning.
[+] nritchie|1 year ago|reply
Isn't the larger point of this article that errors like this can sneak in and remain dormant for years? Even if this example is old, isn't this lesson still relevant? Expecting that we are now immune from this class of errors because it is not 2025 and not 2006 is hubris. Hubris rarely ends well.
[+] zahlman|1 year ago|reply
The simplest fix is to use 64-bit indices; that way you couldn't possibly allocate enough memory for a sum of valid index values to overflow (even if your array stores single-byte types, there are necessarily other things in memory besides the array).

(Also, what brabel said: https://news.ycombinator.com/item?id=42665755.)

(Or you could use a language with arbitrary-sized integers. Python surged in popularity in 2005, as I recall.)

[+] jiggawatts|1 year ago|reply
This article when I first read it over a decade ago made me internalise the fact that “int” types are not mathematical integers but are instead rings.

If you look at every algorithm through that lens, interpreting them with the actual type instead of an idealised one, then bugs like this will jump out at you. (Similarly, compiler optimizer passes and the like all need to account for this.)

[+] cvoss|1 year ago|reply
> It is not sufficient merely to prove a program correct; you have to test it too.

Well... If your proof made the (false) assumption that int is an unbounded integral type, then you didn't prove the program is correct at all. What you proved was than an algorithm in some ideal universe is correct. But your program is a different beast that lives in a specific programming language.

[+] kazinator|1 year ago|reply
How you can fix it is by using an integer type that is the same width as the pointer type.

Then, there is no way your array can be so big that high + low overflows, unless it's an array of bytes over more than half the entire bit virtual space, which is a fiction.

(If you're binary searching some abstract space that is not an array, I retract my ignorant statements.)