Haskell : find smallest nonnegative integer not in list in linear time, using only lists
Consider a function minout :: [Int] -> Int which takes a list of distinct nonnegative integers, and returns the smallest nonnegative integer not present in the list. The behaviour of the function if the input has duplicates does not matter. Can this be implemented in linear time, using only lists (no arrays or vectors or other data structures with efficient random access)?
(This came up here.)
If l has all numbers between 0 and (length l) - 1 inclusive, then minout l is length l, otherwise, it lies in [0..(length l - 1)]. So minout l always lies in [0..(length l)], and only the elements of l which are in [0..(length l - 1)] are relevant. We can discard the remaining elements. Using this idea we can implement a linear-time divide-and-conquer solution. Unlike in merge sort, at each step of the recursion, we recurse into only one of two sublists each of which is at most half the size of the original (after doing some linear work). This gives a linear time complexity.
minout :: [Int] -> Int minout = minoutaux 0 where minoutaux :: Int -> [Int] -> Int -- \list base -> smallest integer >= base not occuring in list minoutaux base  = base minoutaux base [x] = base + (if x==base then 1 else 0) minoutaux base xs = if (length smallpart == n2) then minoutaux (base+n2) bigpart else minoutaux base smallpart where n = (length xs) n2 = n `div` 2 smallpart = [x | x <- xs , base <= x , x < base + n2] bigpart = [x | x <- xs, base + n2 <= x, x < base + n]
In the above code, minoutaux is a function which given a "base" integer and a list with distinct entries, returns the smallest integer which is at least base and does not occur in the list. To do this, it discards the "irrelevant" elements which can be discarded, as explained earlier, and generates two lists, consisting of those numbers which lie in [base, base + n2) (called smallpart), and [base + n2, base + n) (called bigpart). Each of these lists will have length at most n2. If length smallpart == n2, then smallpart has all numbers in [base, base + n2), and so the answer must lie in bigpart, otherwise, there is a "gap" in smallpart itself, so the answer lies in smallpart.
Why does this run in linear time? First, the whole list of length N is traversed a few times, which needs some 10N operations, let's say. Then minoutaux is called on a smaller list, of size at most N/2. So we have (at most) 10N/2 more operations. Then 10N/4, 10N/8, and so on. Adding all these, we get a bound of 10(2N) = 20N. (the constant 10 was just used as an example)
Here we are traversing the list multiple times to compute its length, compute smallpart, compute bigpart, and so on. One could optimize that fairly easily by doing all this in a single pass. However this is still a linear time solution, and I wanted to make the code clearer, rather than optimize on constant factors.
This question and solution is not my original; I came across it in class when I learnt Haskell.