Is there a REAL performance difference between INT and VARCHAR primary keys?
Is there a measurable performance difference between using INT vs. VARCHAR as a primary key in MySQL? I'd like to use VARCHAR as the primary key for reference lists (think US States, Country Codes) and a coworker won't budge on the INT AUTO_INCREMENT as a primary key for all tables.
My argument, as detailed here, is that the performance difference between INT and VARCHAR is negligible, since every INT foreign key reference will require a JOIN to make sense of the reference, a VARCHAR key will directly present the information.
So, does anyone have experience with this particular use-case and the performance concerns associated with it?
You make a good point that you can avoid some number of joined queries by using what's called a natural key instead of a surrogate key. Only you can assess if the benefit of this is significant in your application.
That is, you can measure the queries in your application that are the most important to be speedy, because they work with large volumes of data or they are executed very frequently. If these queries benefit from eliminating a join, and do not suffer by using a varchar primary key, then do it.
Don't use either strategy for all tables in your database. It's likely that in some cases, a natural key is better, but in other cases a surrogate key is better.
Other folks make a good point that it's rare in practice for a natural key to never change or have duplicates, so surrogate keys are usually worthwhile.
It's not about performance. It's about what makes a good primary key. Unique and unchanging over time. You may think an entity such as a country code never changes over time and would be a good candidate for a primary key. But bitter experience is that is seldom so.
INT AUTO_INCREMENT meets the "unique and unchanging over time" condition. Hence the preference.