MySQL efficient "select id else insert" query

I have a MySQL table consisting of:

CREATE TABLE `url_list` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `crc32` int(10) unsigned NOT NULL,
  `url` varchar(512) NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `crc32` (`crc32`)
);

When inserting data into a related table I need to lookup the primary key from this table, and using the crc32 really speeds that up whilst allowing a small index. The URLs do need to be unique, but I'd like to avoid having more index than actual data.

If the value isn't present I need to insert it, but using structures such as INSERT IGNORE, or ON DUPLICATE KEY either requires me to put a unique on the huge varchar, or don't take advantage of my index.

How can I "SELECT id else INSERT", whilst preserving the lookup speed for the 80-90% of hits that are already in the table?

Answers


I would recommend ditching the id column and the crc32 because they're not necessary.

You can use an MD5() hash to provide a fixed-length, virtually unique value computed from the lengthy URL data, and then use that hash as the primary key.

CREATE TABLE `url_list` (
  `url_hash` BINARY(16) NOT NULL PRIMARY KEY
  `url`      VARCHAR(512) NOT NULL
);

DELIM !!
CREATE TRIGGER `url_ins` BEFORE INSERT ON `url_list`
FOR EACH ROW
BEGIN
  SET NEW.`url_hash` = UNHEX( MD5( NEW.`url` ) );
END!!

Then you can use INSERT..ON DUPLICATE KEY UPDATE because unlike crc32, the hash should have a very low chance of collision.

edit: See http://en.wikipedia.org/wiki/Birthday_attack. If you log 1 million distinct URL's per day for 2,000 years, the MD5 hashes of these URL's are still less likely to include a collision than your hard disk is to have an uncorrectable bit error.


This website offers a solution to a similar problem.


Need Your Help

Using the jQuery validatation plugin to send multiple values to an ASP.NET MVC controller action?

asp.net-mvc jquery-plugins jquery jquery-validate

Using the jQuery Validation plugin and AJAX, how can I validate the contents of say an input (textbox) but pass more than one parameter to a controller action?

Storing hash of username using ASP.NET identity

owin asp.net-identity-2 asp.net-authorization

I'm writing an application which needs to have personally identifiable information removed/absent at all times from the database. Given that someone may use their real name in their username, and t...