In the last week or so, I worked on a very interesting problem that dealt with using varchar's as a primary key in MySQL. I wrote some Java code that was migrating sensitive user data from a SQLite database to a MySQL database. I was inserting the migrated data into a MySQL table that declared a varchar field as a primary key. On several of the inserts, I received an "ERROR 1062 (23000): Duplicate entry 'bleh' for key 1". I was puzzled, because the primary key I was inserting into the table was the same string of characters, just with different case. For example, one key was "Ozzie" and another was "ozzie". Technically speaking, these SHOULD be different primary keys (and they are) but MySQL didn't immediately think so.Continue reading for an explanation.
For example, this will fail even though "Ozzie" and "ozzie" are technically separate primary keys:
mysql> create table cat ( name varchar(20), primary key ( name ) );
Query OK, 0 rows affected (0.04 sec)
mysql> insert into cat ( name ) values ( 'ozzie' );
Query OK, 1 row affected (0.00 sec)
mysql> insert into cat ( name ) values ( 'Ozzie' );
ERROR 1062 (23000): Duplicate entry 'Ozzie' for key 1
Declaring the varchar name field BINARY when creating the table solves this problem:
mysql> create table cat ( name varchar(20) BINARY, primary key ( name ) );
Query OK, 0 rows affected (0.04 sec)
mysql> insert into cat ( name ) values ( 'ozzie' );
Query OK, 1 row affected (0.00 sec)
mysql> insert into cat ( name ) values ( 'Ozzie' );
Query OK, 1 row affected (0.00 sec)
I'm not sure why MySQL, by default, uses a lexical ordering instead of a more appropriate approach like comparing the underlying character code values for primary keys.
Anyways, enjoy.


Did you find this post helpful, or at least, interesting?