Beautiful Code: That Pesky Binary Search

August 02, 2007 at 06:19 PM | Uncategorized | View Comments

Beautiful Code: That Pesky Binary Search

I was on the subway yesterday, reading Beautiful Code. In Chapter 7, Alberto Savoia is talking about "Beautiful Tests". The code he decides to use to demonstrate beautiful testing is binary search. As he explains:

Binary search is a great example because it's so simple and yet it's so easy to implement incorrectly. In Programming Pearly, Jon Bently shares how, over the years, he asked hundreds of programmers to implement binary search after providing them with a description of the basic algorithm. He gave them a very generous two hours to write it, and even allowed them to use the high-level language of their choice (including pseudocode). Surprisingly, only about 10 percent of professional programmers implemented binary search correctly. (Beautiful Code, p. 87)

Well, I realize that there are some "professional" programmers out there who utterly incompetent... But binary search? Two hours? Pseudo code? I found that very hard to believe. So when he went on to quote Jon Bently again:

Most programmers think that with the above description [of binary search] in hand, writing the code is easy. They are wrong. The only way to believe this is by putting down the column right now and writing the code yourself. Try it. (Programming Pearls)

I didn't have any options other than putting down the book, pulling my laptop out and writing a binary search function. Two TTC stops (about 10 minutes) later, I had a tested solution.

But that's not quite fair. Chapter 4, which I read about a week ago, has binary search code very similar to the code I produced. I'm also fresh out of first-year computer science, where they teach you all sorts of boring wonderful theory.

Well, I work with a bunch of programmers... So why not see what they can do?

Well, out of the four people who wrote a binary search function, the three who vaguely remembered the algorithm were able to code it in less than 20 minutes (including testing). The fourth, who had never heard of the search before, took a little bit longer -- 40 minutes or so -- but still finished with a correct implementation.

But, wait. According to the next page of the book, none of us are correct: every one of us used this line: mid = (low + high) / 2. That could cause an integer overflow if low and high are both greater than 230. Just how big is that? Well, an array with 230 32 bit integers would need 4 gigabytes of memory... Have you ever worked with a single, 4 gigabyte array? I know I haven't. Heck, I'd be willing to bet that 90% of programmers have never dealt with that much data (although Sun had this exact problem with Java)... So I don't think programmers were being penalized for this particular bug.

So what WERE those programmers studied by Jon Bentley doing? Well, after reading the first page of the original article, the numbers start to make a bit more sense:

The professional programmers had one hour (sometimes more) to convert the above description [of a binary search] into a program in the language of their choice; a high-level pseudocode was fine. At the end of the specified time, almost all the programmers reported that they had correct code for the task. We would then take 30 minutes to examine their code, which the programmers did which test cases. In many different classes and with over a hundred programmers, the results varied little: 90 percent of the programmers found bugs in their code (and I wasn't always convinced of the correctness of the code in which no bugs were found).

I don't know about you, but this gives quite a different impression than the description Alberto gave. This makes it seem like 90% of programmers had bugs in their untested code, and the other 10% simply weren't able to find the bugs that were, in fact, there. Now, there is a statistic I would believe.

It's just unfortunate that Alberto did not make that clear... Because the rest of the chapter, where tests are written for a binary search implementation, is about as interesting as testing can be. I just had a hard time appreciating it because I was stuck on the thought that I was better than 90% of professional programmers... Which, thank heaven, I am not.