## Archive for the ‘Binary Search’ Category

### Binary Search Tree

Friday, June 29th, 2012

Binary Search Tree by Stoimen Popov.

Nothing new but clearly explained and well illustrated, two qualities that make this post merit mentioning.

To say nothing of the related posts at the bottom of this one that cover related material in an equally effective manner.

BTW, if you do use these illustrations in slides or teaching, give credit where credit is due. It will encourage others to contribute as well.

### Binary Jumbled String Matching: Faster Indexing in Less Space

Saturday, June 16th, 2012

Binary Jumbled String Matching: Faster Indexing in Less Space by Golnaz Badkobeh, Gabriele Fici, Steve Kroon, and Zsuzsanna Lipták.

Abstract:

We introduce a new algorithm for the binary jumbled string matching problem, where the aim is to decide whether a given binary string of length n has a substring whose multiplicity equals a query vector (x, y). For example, for the string abaaababab, the query (3, 1) would return “yes”, and the query (5, 1) “no”. Previous solutions answered queries in constant time by creating an index of size O(n) in a pre-processing step. The fastest known approach to constructing this index is O(n^2/logn) [Burcsi et al., FUN 2010; Moosa and Rahman, IPL 2010] resp. O(n^2/log2 n) in the word-RAM model [Moosa and Rahman, JDA, 2012]. We propose an algorithm which creates an index for an input string s by using the string’s run-length encoding. This index can be queried in logarithmic time. Our index has worst-case size n, but extensive experimentation has consistently yielded a size which is between 0.8 and 3 times the length of the run-length encoding of s. The algorithm runs in time O(r^2 log r), where r is the number of a-runs of s, i.e., half the length of the run-length encoding of the string. This is no worse than previous solutions if r = O(n/logn) and better if r = o(n/logn)-which is the case for binary strings in many domains. Our experimentation further shows that in the vast majority of cases, the construction algorithm does not exceed the space needed for the index, and when it does, it does so only by a tiny constant.

Queries of binary strings in logarithmic time anyone?

I cite this in part because some topic mappers may be indexing binary strings but also as encouragement to consider the properties of your data carefully.

You may discover it has properties that lend themselves to very efficient processing.

### We Need To Talk About Binary Search

Wednesday, February 8th, 2012

From the post:

Although the basic idea of binary search is comparatively straightforward, the details can be surprisingly tricky.

– Donald Knuth

Why is binary search so damn hard to get right? Why is it that 90% of programmers are unable to code up a binary search on the spot, even though it’s easily the most intuitive of the standard algorithms?

• Firstly, binary search has a lot of potential for off-by-one errors. Do you do inclusive bounds or exclusive bounds? What’s your break condition: lo=hi+1, lo=hi, or lo=hi-1? Is the midpoint (lo+hi)/2 or (lo+hi)/2 - 1 or (lo+hi)/2 + 1? And what about the comparison, < or ? Certain combinations of these work, but it’s easy to pick one that doesn’t.
• Secondly, there are actually two variants of binary search: a lower-bound search and an upper-bound search. Bugs are often caused by a careless programmer accidentally applying a lower-bound search when an upper-bound search was required, or vice versa.
• Finally, binary search is very easy to underestimate and very hard to debug. You’ll get it working on one case, but when you increase the array size by 1 it’ll stop working; you’ll then fix it for this case, but now it won’t work in the original case!

I want to generalise and nail down the binary search, with the goal of introducing a shift in the way the you perceive it. By the end of this post you should be able to code any variant of binary search without hesitation and with complete confidence. But first, back to the start: here is the binary search you were probably taught…

In order to return a result to a user or for use in some other process, you have to find it first. This post may help you do just that, reliably.