Hopefully you'll forgive me for posting this later than I said I would - I was on a plane going to Hong Kong when I finished writing it, and just now had an opportunity to post it for your viewing pleasure.
Last week I explained what binary searching is and how to do it. However, checking for the existence of an element in a sorted array is just one (particularly easy to explain) way to use this technique.
Any time you have a problem where you could guess an answer to a question and dependably know if it's less than or greater than the desired answer, you can apply binary searching. Also, sometimes binary searching can be used to approximate values that can't easily be calculated exactly.
nth Roots
Let's take the example of finding the nth root of some number. Normally you can just call a library function to do it, but minus those tools, you could use binary searching to estimate the answer. Let's say we want to have a function that looks like:
public double nthRoot(int n, double x)
To make things easier, we'll assume that n is small and x is positive. The first step is to come up with bounds for where the answer could be. If x is positive, the nth root is also positive (sure, if n is even, it also has a negative nth root, but if someone wants that, they can negate it themselves!). That means we can start with a minimum of 0. Intuitively, we could say that the nth root will always be less than x, but that's not the case if x is less than 1. So, we can start our maximum bound as either x or 1, whichever is greater.
The next step is to ask ourselves how accurate we want to be. The more times we iterate on our binary search, the lower our margin of error will be (for each iteration, our potential margin of error is cut in half). If we iterate a flat 100 times, we'll be able to accurately find about 16 decimal digits for the root of a 10-digit number. That's probably sufficient for most of our needs, and also isn't too far off from the accuracy we can store in a double.
My code looks like this:
public double nthRoot(int n, double x)
{
double min = 0;
double max = Math.max(1, x);
for (int i=0; i<100; i++)
{
double mid = (min+max)/2;
double midpow = 1;
for (int j=0; j<n; j++)
midpow *= mid;
if (midpow > x)
max = mid;
else
min = mid;
}
return (min+max)/2;
}
If we expected n to be larger, we could change the inner "j" loop to do something more efficient, but this is better for clarity (and most people don't care what the 2000th root of some number is).
AutoLoan from TopCoder SRM 258, Division 1 250/Division 2 500
The problem here is to figure out the Annual Percentage Rate of interest on a car if you know what the monthly payment is, the number of months it takes to pay off the loan, and the price of the car. The interest is compounded monthly (so the amount of interest charged each month is 1/12 of the APR). The input constraints on this problem are also important to consider:
- The price of the car is between 1 and 1000000.
- The monthly payment is between 0 and price/2.
- The number of payments is between 1 and 600.
- The answer is between 0 and 100.
- We need to find the answer to with a margin of error of 10-9
Well, the first step is done for us - we have a minimum and maximum bound. We also know what accuracy we need (but to be sure we don't get surprised, we should go a little further). The basic algorithm we follow is just like the nth power code above, except for the contents above. If we think of a continuous binary search in this way:
max := some value
min := some value
while max-min > threshold
mid := (max+min)/2
find out if mid is too low or too high
if mid is too low then
min = mid
else
max = mid
The only thing left for us to figure out is deciding if mid is too high or too low. This can be done with a simple simulation - take the price, multiply by 1+mid/100/12 and then subtract the payment amount repeatedly for the number of payments, and if the final balance is less than 0, the interest rate was too low (otherwise it was too high).
When many people read this problem, it looks like an obvious Dynamic Programming exercise. Now, I won't tell you that it couldn't be solved with DP, but the way I found to solve it was with a binary search.
The problem is that you have N stalls at unique locations to put C cows (C is between 2 and N, inclusive), and you want to maximize the minimum distance. The locations of the stalls are on a single line, and you need to return the largest minimum distance between cows.
Since I already tipped you off that this can be solved with binary searching, you might be wondering what exactly you are binary searching on. Well, the positions of the stalls are between 0 and 1 billion, so we will never have an answer better than 1 billion, and we can't have them any closer than 1 if the locations of the stalls are unique. So there are some bounds. You could also come up with a better maximum bound - if you find the minimum and maximum actual values of stalls, you could divide the difference by C-1 and get a smaller bound, but in the end, it doesn't matter much.
Now all we need is to be able to decide, given a guess at a minimum distance, if it's too high or too low. If we sort the stalls, we can make a linear pass on the stalls and put each cow in the first one that isn't closer than mid from the previous. We can prove that if an arrangement exists where all the cows are spread out by at least mid, they can be assigned this way, because we can never come up with a better situation by skipping one that isn't blocked. Our goal is to find the largest mid for which an assignment exists.
Since we can figure out if a distance is possible, the only other piece of information we need to show that binary searching works for this problem is that there is some distance where all greater distances are impossible and all smaller distances have valid assignments. That way, given a guess, if there is a valid assignment for a given distance, it is less than (or equal to) the final answer. Otherwise, the guess is too high. This is easy to prove, because if a valid assignment exists for a distance d, that same assignment is a valid answer ford-1, so there can be no "islands" of valid distances.
DNAMultiMatcher from TopCoder SRM 187, Division 1 900
I'm almost positive the writer of this problem didn't intend for it to be solved with binary searching, but the fact that this difficult problem has so many valid solutions is what makes it so cool. I think the intended answer was to build a suffix tree (which is possibly improved by the fact that there are only 4 valid letters), but after reading the editorial on the match for which this problem was used, the solution that made the most sense to me was the one that used binary searching.
The problem here is to find the longest common substring between 3 long strings (each string is up to 2500 characters). The obvious solution is to check each substring of 2 strings by length (starting with the longest possible length and decreasing) and if we find a common one, look to see if it exists in the third. As with most obvious solutions, this one could take a good hour to finish on strings of this length.
The binary searching solution is to start by guessing the answer (as we like to do with binary searching), and then finding out if there is a substring of that length that is shared in common by all 3 strings.
In order for this to run under TopCoder's time limit, we need to be able to figure out if such a common substring exists in O(n2) time or less (so the runtime of the whole algorithm is O(n2log(n))). Even this part of the algorithm isn't obvious. The way we can do it is to create a hash-set of substrings of length mid for each of the strings - this can be done in O(n2) time - and then find the set intersection of the three sets (each intersection can be done in O(n) time). If you don't have good libraries to help you, the implementation is non-trivial, but the concept is valid. The Java standard library, STL and .NET Framework all have structures to efficiently do this, though, and those are the languages used on TopCoder. Using a tree-set could also work, but the runtime bound is raised slightly to O(n2log2n).
Summary
A problem may be able to be solved with binary searching if:
- You can set a finite bound on the range of possible answers
- Given a prospective answer, you can correctly identify if that answer is larger or smaller than the correct answer
As an exercise for the reader, a good (rather advanced) problem to try with binary searching is NegativePhotoresist on TopCoder (SRM 210, Division 1 1000). This problem has sentimental value to me, because it was the first division 1 hard problem I solved correctly during a competition.
One last note - a friend pointed out on last week's post that the arithmetic for finding mid ((min+max)/2) could overflow for large numbers. If the numbers you are using are particularly large, you might consider using min+(max-min)/2 instead. The problems discussed above are written so that the applicable range is small enough that this isn't necessary (the constraints on the Aggressive Cows problem were written so they wouldn't quite overflow a signed 32-bit integer if the arithmetic from last week was used).