The problem with this article's name is that we're in an era where actually checking whether "nearly all sorts are broken" against all publicly available implementations of binary searches would be almost feasible, but that's not actually what the article is claiming.
> Moreover, to be really certain that a program is correct, you have to test it for all possible input values, but this is seldom feasible.
This is incorrect. Generally it's just a little excessive to try to solve the halting problem in a library's unit tests ;). You don't have to test a program for all possible inputs, only for all materially unique state transitions. In a binary search, each variable can be zero, one, some(*), max, overflowed. The combination of these is not infinite as the values of some that cause different behavior is much more reasonable when grouped into chunks of the same behavior.
It was certainly possible to run the algorithm on 16GB of array before the moment when it happened but did the original developer have that sort of space on their desktop at the time? Possibly not.
If a unit test only runs on a server and not on the laptops of the developers then its not going to be written, whereas ideally someone should write a test that is tagged to only run on the server but that is a lot of extra work if that isn't a common thing on a project. Even now I would be quite wary of producing a max size input test for an array of ints and especially objects, that is going to raise some questions due to being a slow test and a highly resource consuming one.
If I was worrying about the same in aerospace programming however then no question that test would get written and we would ensure it got run on the hardware that it was designed to run on. In typical business software its less common to run potentially very long running tests and simulations of the machines states everyone wants to go faster than that especially for the sake of a max input test.
> In a binary search, each variable can be zero, one, some(*), max, overflowed. The combination of these is not infinite as the values of some that cause different behavior is much more reasonable when grouped into chunks of the same behavior.
You are presuming that the algorithm is correct in the first place. Contrived example.
> each variable can be zero, one, some(*), max, overflowed
These are the value ranges that I test in my unit tests, plus at and before some powers of 2 (e.g. 65535 and 65536), and -1. That is because -1 is uniquely used in some systems to indicate error, and thus trips some bugs.
This bug, and many others, can be detected with a trivial amount of formal verification. I really think formal verification should see much wider use for things that see as much use as standard libraries, even if it's just for the trivial things like overflow and out-of-bounds access.
In the below code we can see a formal verification tool (GNATProve) detect both the original error and the out-of-bounds access that it causes. Doing the arithmetic using a larger type clears both reported errors without the need for any additional annotations for GNATProve.
function Search (A : A_Type; Target : Integer) return Integer is
Left : Integer := A'First;
Right : Integer := A'Last;
begin
while Left <= Right loop
declare
Mid : Integer := (Left + Right) / 2;
begin
if A (Mid) = Target then
return Mid;
elsif A (Mid) < Target then
Left := Mid + 1;
elsif A (Mid) > Target then
Right := Mid - 1;
end if;
end;
end loop;
end Search;
GNATProve output:
Phase 1 of 2: generation of Global contracts ...
Phase 2 of 2: flow analysis and proof ...
wrapper.adb:12:36: medium: overflow check might fail, cannot prove lower bound for Left + Right
12 | Mid : Integer := (Left + Right) / 2;
| ~~~~~~^~~~~~~~
reason for check: result of addition must fit in a 32-bits machine integer
wrapper.adb:12:45: info: division check proved
wrapper.adb:14:19: medium: array index check might fail
14 | if A (Mid) = Target then
| ^~~
reason for check: value must be a valid index into the array
these kinds of problems are present in very many standard treatments of algorithms and don't survive their first contact with real world use in some cases.
"needs a carry bit or a wider type" is common for arithmetic operations that actually use the range.
It makes the case that we need Math.mean or Math.avg function that we can use in these cases rather than than reinventing it.
Basically we should favor using built in functions for as much as possible because those should be reliable and tested more than ad hoc code we write. And compilers should optimize those built in functions well so there is no extra cost in using them.
Another fun case, besides integer overflow, is negative values: in Java and C/C++ (i + j)/2 will round towards j, but i + (j - i)/2 will round towards i instead. Sometimes the difference matters.
Sometimes they’re engineered and tested for a certain scale.
More often they’re engineered and tested for an arbitrary scale. The limits aren’t considered, behavior at the edges isn’t accounted for, and it’s assumed it will be good enough for real world inputs.
The use of `int` tends to be a dead giveaway. There are some cases where it’s clearly correct: where the spec says so (like argv), where you’re starting from a smaller type and it’s impossible for the calculations to overflow in an int (like adding two uint8), that sort of thing. And there are cases where it’s subtly correct, because you know the range of the value is sufficiently limited, either by mechanics or by spec.
But most of the time, int gets chosen because it’s the apparent default and it’s easy to type. No analysis has been done to see if it’s correct or if you want to declare your code to only support inputs in a certain range.
It’s really clear if you’ve written a binary search (or anything else that works on general arrays) in C and you use int as the index type. There’s pretty much no scenario where that makes sense. In theory you could analyze the entire program and prove that over-large arrays are never passed in, and keep doing it to make sure it stays that way, but that’s not realistic. If the programmer actually took one second to think about the appropriate data types, they’d use size_t rather than int.
You can still have this bug with size_t, of course. But it won’t be “this falls apart with arrays over 1G elements on 64-bit systems that can easily handle them.” If you declare that you wrote the obvious midpoint calculation with size_t because you didn’t intend to support byte arrays larger than half the address space, it’s at least plausible.
The problem is that it's not that much harder to make it work for all the valid inputs.
Not doing that is not good enough.
Another example is summing lots of floats naively instead of using Kahan's algorithm.
It's like we had a theory of gravity that doesn't work on Mars because we have unnecessary division by (m-Mars' mass) in our laws :) It wouldn't be good physics.
Or just put the algorithm in a 16-bit microcontroller, put some table that needs to be looked up (think precomputed sine table), put that table near the end of the memory range, and just make the mistake to call binary search specifying the start and end memory positions of the table.
These implementations are definitely broken when the specification goes like "you just pass in your array here and it will perform binary search on it for your value." Yes, you could constrain this spec, but come on...
It's such a blatant example for a bug, that I struggle to believe how can anyone even remotely conceive and pivot to the idea that "nuh-uh, this is a bad spec not a bad implementation!".
> In Programming Pearls Bentley says that the analogous line "sets m to the average of l and u, truncated down to the nearest integer." On the face of it, this assertion might appear correct, but it fails for large values of the int variables low and high. Specifically, it fails if the sum of low and high is greater than the maximum positive int value (231 - 1). The sum overflows to a negative value, and the value stays negative when divided by two. In C this causes an array index out of bounds with unpredictable results. In Java, it throws ArrayIndexOutOfBoundsException.
At some point we have to draw an arbitrary line. Even an "arbitrary precision" calculation is bounded by system memory.
"bug" is not well-defined, or perhaps "bug" is more of a continuous label as opposed to discrete.
Doing any math with bounded integers considered harmful.
At some point during my undergrad I realized this and tried to be really careful when implementing algorithms, but it's stupidly hard to do in a tidy way, at least in old C. It's not practical and people just rather close their eyes and live in blissful avoidance.
So previously it worked for arrays up to length 2³⁰ and now it works for arrays up to length 2³¹. Is it really that much of a difference? Wouldn’t it be better to change the indexing type to a 64-bit integer?
The difference is that we know for a fact that the proper implementation works for integers up to 2^31, whereas the improper implementation deceived us into thinking that the code would work in situations where the code actually doesn't work.
I find it valuable to understand when my code will crash.
Not really. Arrays are limited to an int in size. You would be using more memory for the calculation and have to cast the value down to a 32 bit value to use as an array index.
Or you could just write the code so it isn't vulnerable to integer overflow.
A cynical takeaway from this is that it's hard to write good code and it doesn't really matter if you do or not.
Most code at every company I've worked at and project I've built is littered with technical incorrectness and buggy half-measure implementations. It's human nature or time pressure or something like that, but the application continues providing business value despite that.
Because it's sort of like premature optimization. If your business case will never be dealing with billion-element arrays, it's a waste of time to make sure your code can handle them.
That's not only wrong in itself, but totally orthogonal.
A binary search implementation should still work, regardless of the array length, or have the limitation documented.
And of course an array "with length over a billion" can be totally valid, depending on the use case, your tradeoffs, available memory, etc. It could even be the optimal data structure for some use cases.
Relational databases often require searching and sorting gigabytes of data to answer queries (sometimes larger than RAM if e.g. k-way merge sort is used) so it doesn't seem that far-fetched, especially given that there are database systems written in Java.
I often think AI is mostly crap, wasting a lot of energy for very questionable benefits. But could/should this repetitive task of reminding submitters to follow the submission guidelines and add the year to submissions of old articles be automated?
Isn't the larger point of this article that errors like this can sneak in and remain dormant for years? Even if this example is old, isn't this lesson still relevant? Expecting that we are now immune from this class of errors because it is not 2025 and not 2006 is hubris. Hubris rarely ends well.
The simplest fix is to use 64-bit indices; that way you couldn't possibly allocate enough memory for a sum of valid index values to overflow (even if your array stores single-byte types, there are necessarily other things in memory besides the array).
This article when I first read it over a decade ago made me internalise the fact that “int” types are not mathematical integers but are instead rings.
If you look at every algorithm through that lens, interpreting them with the actual type instead of an idealised one, then bugs like this will jump out at you. (Similarly, compiler optimizer passes and the like all need to account for this.)
> It is not sufficient merely to prove a program correct; you have to test it too.
Well... If your proof made the (false) assumption that int is an unbounded integral type, then you didn't prove the program is correct at all. What you proved was than an algorithm in some ideal universe is correct. But your program is a different beast that lives in a specific programming language.
How you can fix it is by using an integer type that is the same width as the pointer type.
Then, there is no way your array can be so big that high + low overflows, unless it's an array of bytes over more than half the entire bit virtual space, which is a fiction.
(If you're binary searching some abstract space that is not an array, I retract my ignorant statements.)
[+] [-] jsnell|1 year ago|reply
Past discussions:
https://news.ycombinator.com/item?id=33492122
https://news.ycombinator.com/item?id=14906429
https://news.ycombinator.com/item?id=12147703
https://news.ycombinator.com/item?id=9857392
https://news.ycombinator.com/item?id=6799336
https://news.ycombinator.com/item?id=1130463
https://news.ycombinator.com/item?id=621557
[+] [-] motorest|1 year ago|reply
In the very least,this should feature in the title. In fact, previous submissions from over a decade ago also feature year in the title.
Otherwise it conveys the idea that this is some major epiphany.
[+] [-] VonNaturAustre|1 year ago|reply
[+] [-] joshka|1 year ago|reply
> Moreover, to be really certain that a program is correct, you have to test it for all possible input values, but this is seldom feasible.
This is incorrect. Generally it's just a little excessive to try to solve the halting problem in a library's unit tests ;). You don't have to test a program for all possible inputs, only for all materially unique state transitions. In a binary search, each variable can be zero, one, some(*), max, overflowed. The combination of these is not infinite as the values of some that cause different behavior is much more reasonable when grouped into chunks of the same behavior.
[+] [-] PaulKeeble|1 year ago|reply
If a unit test only runs on a server and not on the laptops of the developers then its not going to be written, whereas ideally someone should write a test that is tagged to only run on the server but that is a lot of extra work if that isn't a common thing on a project. Even now I would be quite wary of producing a max size input test for an array of ints and especially objects, that is going to raise some questions due to being a slow test and a highly resource consuming one.
If I was worrying about the same in aerospace programming however then no question that test would get written and we would ensure it got run on the hardware that it was designed to run on. In typical business software its less common to run potentially very long running tests and simulations of the machines states everyone wants to go faster than that especially for the sake of a max input test.
[+] [-] rileymat2|1 year ago|reply
You are presuming that the algorithm is correct in the first place. Contrived example.
So you also need to prove explode_world is not called or something similar but less contrived is not in there.[+] [-] dotancohen|1 year ago|reply
[+] [-] MaxikCZ|1 year ago|reply
[+] [-] manquer|1 year ago|reply
[+] [-] LiamPowell|1 year ago|reply
In the below code we can see a formal verification tool (GNATProve) detect both the original error and the out-of-bounds access that it causes. Doing the arithmetic using a larger type clears both reported errors without the need for any additional annotations for GNATProve.
GNATProve output:[+] [-] jheriko|1 year ago|reply
these kinds of problems are present in very many standard treatments of algorithms and don't survive their first contact with real world use in some cases.
"needs a carry bit or a wider type" is common for arithmetic operations that actually use the range.
[+] [-] bhouston|1 year ago|reply
Basically we should favor using built in functions for as much as possible because those should be reliable and tested more than ad hoc code we write. And compilers should optimize those built in functions well so there is no extra cost in using them.
[+] [-] sltkr|1 year ago|reply
Another fun case, besides integer overflow, is negative values: in Java and C/C++ (i + j)/2 will round towards j, but i + (j - i)/2 will round towards i instead. Sometimes the difference matters.
[+] [-] ape4|1 year ago|reply
[+] [-] heinrichhartman|1 year ago|reply
I would really challenge calling this kind of effects "bug" or "breakage".
It's like calling Newtons law of gravity broken, because it's not accurate at predicting how galaxies move.
Things are engieered and tested for a certain scale.
Knowing which tools to use at which sacle is part of the craft of engineering.
[+] [-] wat10000|1 year ago|reply
More often they’re engineered and tested for an arbitrary scale. The limits aren’t considered, behavior at the edges isn’t accounted for, and it’s assumed it will be good enough for real world inputs.
The use of `int` tends to be a dead giveaway. There are some cases where it’s clearly correct: where the spec says so (like argv), where you’re starting from a smaller type and it’s impossible for the calculations to overflow in an int (like adding two uint8), that sort of thing. And there are cases where it’s subtly correct, because you know the range of the value is sufficiently limited, either by mechanics or by spec.
But most of the time, int gets chosen because it’s the apparent default and it’s easy to type. No analysis has been done to see if it’s correct or if you want to declare your code to only support inputs in a certain range.
It’s really clear if you’ve written a binary search (or anything else that works on general arrays) in C and you use int as the index type. There’s pretty much no scenario where that makes sense. In theory you could analyze the entire program and prove that over-large arrays are never passed in, and keep doing it to make sure it stays that way, but that’s not realistic. If the programmer actually took one second to think about the appropriate data types, they’d use size_t rather than int.
You can still have this bug with size_t, of course. But it won’t be “this falls apart with arrays over 1G elements on 64-bit systems that can easily handle them.” If you declare that you wrote the obvious midpoint calculation with size_t because you didn’t intend to support byte arrays larger than half the address space, it’s at least plausible.
[+] [-] alanfranz|1 year ago|reply
Make it explicit. If the array is too large, throw an IllegalArgumentException. Document the limit. Then I agree with you.
Otherwise, if an allowed input crashes the program at runtime with a random exception, I respectfully disagree.
[+] [-] ajuc|1 year ago|reply
Not doing that is not good enough.
Another example is summing lots of floats naively instead of using Kahan's algorithm.
It's like we had a theory of gravity that doesn't work on Mars because we have unnecessary division by (m-Mars' mass) in our laws :) It wouldn't be good physics.
[+] [-] javcasas|1 year ago|reply
Or just put the algorithm in a 16-bit microcontroller, put some table that needs to be looked up (think precomputed sine table), put that table near the end of the memory range, and just make the mistake to call binary search specifying the start and end memory positions of the table.
[+] [-] perching_aix|1 year ago|reply
It's such a blatant example for a bug, that I struggle to believe how can anyone even remotely conceive and pivot to the idea that "nuh-uh, this is a bad spec not a bad implementation!".
[+] [-] secondcoming|1 year ago|reply
[+] [-] djoldman|1 year ago|reply
> In Programming Pearls Bentley says that the analogous line "sets m to the average of l and u, truncated down to the nearest integer." On the face of it, this assertion might appear correct, but it fails for large values of the int variables low and high. Specifically, it fails if the sum of low and high is greater than the maximum positive int value (231 - 1). The sum overflows to a negative value, and the value stays negative when divided by two. In C this causes an array index out of bounds with unpredictable results. In Java, it throws ArrayIndexOutOfBoundsException.
At some point we have to draw an arbitrary line. Even an "arbitrary precision" calculation is bounded by system memory.
"bug" is not well-defined, or perhaps "bug" is more of a continuous label as opposed to discrete.
[+] [-] poincaredisk|1 year ago|reply
[+] [-] borntyping|1 year ago|reply
[+] [-] dietr1ch|1 year ago|reply
At some point during my undergrad I realized this and tried to be really careful when implementing algorithms, but it's stupidly hard to do in a tidy way, at least in old C. It's not practical and people just rather close their eyes and live in blissful avoidance.
[+] [-] perching_aix|1 year ago|reply
[+] [-] xigoi|1 year ago|reply
[+] [-] ddxxdd|1 year ago|reply
I find it valuable to understand when my code will crash.
[+] [-] foldr|1 year ago|reply
I suppose the issue is moot now on 64-bit architectures, as the difference between 2^62 and 2^63 isn't relevant. (2^62 bytes is 4611686 terrabytes.)
[+] [-] eesmith|1 year ago|reply
[+] [-] MattPalmer1086|1 year ago|reply
Or you could just write the code so it isn't vulnerable to integer overflow.
[+] [-] Ameo|1 year ago|reply
Most code at every company I've worked at and project I've built is littered with technical incorrectness and buggy half-measure implementations. It's human nature or time pressure or something like that, but the application continues providing business value despite that.
[+] [-] SoftTalker|1 year ago|reply
[+] [-] 3ple_alpha|1 year ago|reply
[+] [-] coldtea|1 year ago|reply
A binary search implementation should still work, regardless of the array length, or have the limitation documented.
And of course an array "with length over a billion" can be totally valid, depending on the use case, your tradeoffs, available memory, etc. It could even be the optimal data structure for some use cases.
[+] [-] poincaredisk|1 year ago|reply
Also big arrays being (supposedly) a coffeee smell doesn't mean that code handling them improperly is not buggy.
[+] [-] danvonk|1 year ago|reply
[+] [-] tehjoker|1 year ago|reply
[+] [-] unknown|1 year ago|reply
[deleted]
[+] [-] sonu27|1 year ago|reply
[+] [-] usr1106|1 year ago|reply
[+] [-] nsxwolf|1 year ago|reply
[+] [-] nritchie|1 year ago|reply
[+] [-] zahlman|1 year ago|reply
(Also, what brabel said: https://news.ycombinator.com/item?id=42665755.)
(Or you could use a language with arbitrary-sized integers. Python surged in popularity in 2005, as I recall.)
[+] [-] secondcoming|1 year ago|reply
Interestingly, calculating the midpoint of doubles seems quite complex (according to gcc at least) [1]
[0] https://en.cppreference.com/w/cpp/numeric/midpoint
[1] https://godbolt.org/z/G1P611o7Y
[+] [-] truefossil|1 year ago|reply
[+] [-] jiggawatts|1 year ago|reply
If you look at every algorithm through that lens, interpreting them with the actual type instead of an idealised one, then bugs like this will jump out at you. (Similarly, compiler optimizer passes and the like all need to account for this.)
[+] [-] cvoss|1 year ago|reply
Well... If your proof made the (false) assumption that int is an unbounded integral type, then you didn't prove the program is correct at all. What you proved was than an algorithm in some ideal universe is correct. But your program is a different beast that lives in a specific programming language.
[+] [-] kazinator|1 year ago|reply
Then, there is no way your array can be so big that high + low overflows, unless it's an array of bytes over more than half the entire bit virtual space, which is a fiction.
(If you're binary searching some abstract space that is not an array, I retract my ignorant statements.)
[+] [-] kunley|1 year ago|reply
Here is a relevant line from Go, even with a comment that it's about overflow:
https://github.com/golang/go/blob/19e9231/src/sort/search.go...