Haskell : find smallest nonnegative integer not in list in linear time, using only lists

Consider a function minout :: [Int] -> Int which takes a list of distinct nonnegative integers, and returns the smallest nonnegative integer not present in the list. The behaviour of the function if the input has duplicates does not matter. Can this be implemented in linear time, using only lists (no arrays or vectors or other data structures with efficient random access)?

(This came up here.)


If l has all numbers between 0 and (length l) - 1 inclusive, then minout l is length l, otherwise, it lies in [0..(length l - 1)]. So minout l always lies in [0..(length l)], and only the elements of l which are in [0..(length l - 1)] are relevant. We can discard the remaining elements. Using this idea we can implement a linear-time divide-and-conquer solution. Unlike in merge sort, at each step of the recursion, we recurse into only one of two sublists each of which is at most half the size of the original (after doing some linear work). This gives a linear time complexity.

minout :: [Int] -> Int
minout = minoutaux 0
    minoutaux :: Int -> [Int] -> Int -- \list base -> smallest integer >= base not occuring in list
    minoutaux base [] = base
    minoutaux base [x] = base + (if x==base then 1 else 0)
    minoutaux base xs = if (length smallpart == n2) then  minoutaux (base+n2) bigpart else minoutaux base smallpart
        n = (length xs)
        n2 = n `div` 2
        smallpart = [x | x <- xs , base <= x , x < base + n2]
        bigpart = [x | x <- xs, base + n2 <= x, x < base + n]

In the above code, minoutaux is a function which given a "base" integer and a list with distinct entries, returns the smallest integer which is at least base and does not occur in the list. To do this, it discards the "irrelevant" elements which can be discarded, as explained earlier, and generates two lists, consisting of those numbers which lie in [base, base + n2) (called smallpart), and [base + n2, base + n) (called bigpart). Each of these lists will have length at most n2. If length smallpart == n2, then smallpart has all numbers in [base, base + n2), and so the answer must lie in bigpart, otherwise, there is a "gap" in smallpart itself, so the answer lies in smallpart.

Why does this run in linear time? First, the whole list of length N is traversed a few times, which needs some 10N operations, let's say. Then minoutaux is called on a smaller list, of size at most N/2. So we have (at most) 10N/2 more operations. Then 10N/4, 10N/8, and so on. Adding all these, we get a bound of 10(2N) = 20N. (the constant 10 was just used as an example)

Here we are traversing the list multiple times to compute its length, compute smallpart, compute bigpart, and so on. One could optimize that fairly easily by doing all this in a single pass. However this is still a linear time solution, and I wanted to make the code clearer, rather than optimize on constant factors.

This question and solution is not my original; I came across it in class when I learnt Haskell.

Need Your Help

How to setup GIT bare HTTP-available repository on IIS-machine

git version-control iis-7 dvcs

server already runs IIS to serve 80 and 443 port over TCP. I want to make centralized "push/pull" GIT repository available to all my team members over the Internet.

How do I get a UTF-8 string out of an MD5 digest?


I am trying to use an API that requires an MD5 hash to be sent in UTF-8 format.

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.