SQL Server Hash Indexes
When using the CHECKSUM column type to artificially create a hash index, is the lookup actually O(1) or is it still O(lg n) like it is for a clustered index? I have a table from which I will select based on its ID column and I need the lookup to be as fast as possible, so is the clustered index the fastest possible option? I am looking for something that will provide O(1) performance.
Okay, 2 points. The SQL CHECKSUM function does not produce a hash value. It actually calculates a CRC value. It is not a very good candidate to base a hash check on becuase there will be a relativly large number of collisions. You should check the hash_bytes function if you want a hash function. Secondly, you are not actually creating a hash index. You are creating a normal b-tree on a hash value so the lookup time will be exactly the same as for any other b-tree index on a similar sized data type. There is a chance that you could gain a little performance by using a CRC or hash of a long varchar value to allow comparisons of a smaller number of bytes, but string comparison only checks as many bytes as it needs to, which is as far as the first character that doesn't match, and if you do match on the hashed value, you then need to double check the actual value anyway. So unless you have a lot of very similar strings you will probably end up comparing MORE bytes by using the hash (or CRC). In short, I don't think this is a sensible plan, but as with all optimisations you should test it in your specific case and then decide. I would be interested to see your results if you would care to post them. And I don't believe that there is any faster way to locate a row in SQL server than by using a clustered index. In case you care, Ingres (by CA) can create hash indexes which would then achive O(1). there may be other RDBM's out there that also support true hash indexes.