'Proper' collection to use to obtain items in O(1) time in C# .NET?

Something I do often if I'm storing a bunch of string values and I want to be able to find them in O(1) time later is:

foreach (String value in someStringCollection)
{
    someDictionary.Add(value, String.Empty);
}

This way, I can comfortably perform constant-time lookups on these string values later on, such as:

if (someDictionary.containsKey(someKey))
{
    // etc
}

However, I feel like I'm cheating by making the value String.Empty. Is there a more appropriate .NET Collection I should be using?

Answers


If you're using .Net 3.5, try HashSet. If you're not using .Net 3.5, try C5. Otherwise your current method is ok (bool as @leppie suggests is better, or not as @JonSkeet suggests, dun dun dun!).

HashSet<string> stringSet = new HashSet<string>(someStringCollection);

if (stringSet.Contains(someString))
{
    ...
}

You can use HashSet<T> in .NET 3.5, else I would just stick to you current method (actually I would prefer Dictionary<string,bool> but one does not always have that luxury).


something you might want to add is an initial size to your hash. I'm not sure if C# is implemented differently than Java, but it usually has some default size, and if you add more than that, it extends the set. However a properly sized hash is important for achieving as close to O(1) as possible. The goal is to get exactly 1 entry in each bucket, without making it really huge. If you do some searching, I know there is a suggested ratio for sizing the hash table, assuming you know beforehand how many elements you will be adding. For example, something like "the hash should be sized at 1.8x the number of elements to be added" (not the real ratio, just an example).

From Wikipedia:

With a good hash function, a hash table can typically contain about 70%–80% as many elements as it does table slots and still perform well. Depending on the collision resolution mechanism, performance can begin to suffer either gradually or dramatically as more elements are added. To deal with this, when the load factor exceeds some threshold, it is necessary to allocate a new, larger table, and add all the contents of the original table to this new table. In Java's HashMap class, for example, the default load factor threshold is 0.75.


I should probably make this a question, because I see the problem so often. What makes you think that dictionaries are O(1)? Technically, the only thing likely to be something like O(1) is access into a standard integer-indexed fixed-bound array using an integer index value (there being no look-up in arrays implemented that way).

The presumption that if it looks like an array reference it is O(1) when the "index" is a value that must be looked up somehow, however behind the scenes, means that it is not likely an O(1) scheme unless you are lucky to obtain a hash function with data that has no collisions (and probably a lot of wasted cells).

I see these questions and I even see answers that claim O(1) [not on this particular question, but I do seem them around], with no justification or explanation of what is required to make sure O(1) is actually achieved.

Hmm, I guess this is a decent question. I will do that after I post this remark here.


Need Your Help

Infinite 100% CPU usage at java.io.FileInputStream.readBytes(Native Method)

java jvm cpu-usage infinite-loop

I'm right now debugging a program which has two threads per one external process, and those two threads keep on reading Process.getErrorStream() and Process.getInputStream() using a while ((i = in....

How to disable the ListItems without disabling the CheckBoxes embedded on them?

wpf xaml checkbox combobox listitem

I'm working on a ComboBox that has to have a CheckBox on its items. I've implemented a functionality to attend changes on those checkboxes. Part of that functionality establishes the content of the