The problem that most people face when setting up a UTF-8 database in mySQL is that without calling ‘SET NAMES’ in the mySQL client prior to issuing any queries (PHP, C++ etc …) that the client connection will actually in most cases default to  latin-1.

However as of mySQL 5.x or higher you can issue a statement in the my.cnf file calling init_connect.

This will trigger a series of defined commands / queries every time a non super user connects (So if you are using root to connect to your mySQL database, stop reading now and slap yourself HARD).

i.e.

1
2
3
4
5
6
7
[mysqld]
init_connect='SET collation_connection = utf8_general_ci'
init_connect='SET NAMES utf8'
default-character-set=utf8
character-set-server=utf8
collation-server=utf8_general_ci
skip-character-set-client-handshake

UPDATE 04/09/09

my mySQL version 5.0.45 x64 only picks up the last entry of init_connect

Use this example in this case:

1
2
3
4
5
[mysqld]
init_connect='SET collation_connection = utf8_general_ci; SET NAMES utf8;'
default-character-set=utf8
character-set-server=utf8
collation-server=utf8_general_ci

Restart mySQL and check the mysqld.log has not returned any errors (Or your event viewer if you are using windows).

Every client connection will now default to utf-8 encoding and not latin-1, removing the need to add a SET NAMES call on every connection.

This will work for PHP, C++, ruby etc… as the client encoding is now handeled server side, rather that waiting on the client to issue a SET NAMES command.

UPDATE 30/03/09: Added “skip-character-set-client-handshake” this ignores the clients request to set the connection charset, this info courtesy of “wardo” http://word.wardosworld.com/?p=164

UPDATE 10/09/09

Been having some issues with this working the workaround is to add this config as a single line:

1
init_connect='SET collation_connection = utf8_general_ci; SET NAMES utf8;'

Tags: , , , , ,
17 Responses to “mySQL forcing utf-8 compliance for all connections.”
  1. Thank you!

    I have been looking everywhere for hints on how to get UTF-8 working on my new server.

    This installation is a fully updated FreeBSD 7.0 with Apache 2.2.9 and MySQL 5.0.67.

    When importing my old utf8_general_ci database everything looked fine… but reading it from my website showed characters wrong. Writing new entries into the database (from web) made something unreadable to the human eye for special chars like æøåöëü etc.

    I have tried SO many options for both Apache, MySQL and my own (perl) code.

    Page bookmarked! (and I will copy some info to my soon to come wiki http://www.confighell.com)

    Thanks again

    – Nicolai

  2. [...] existing literature on the topic including possible fixes [here, here, here, here, here, here, horror story here, here] encourage the configuration of the MySQL daemon (mysqld) to [...]

  3. Olle says:

    Thank you! Works great

  4. [...] Force mySQL utf8 connections [...]

  5. Massimiliano says:

    Thanks Buzz,
    you saves my life! i was stuck since 9.00 AM…

  6. Quick question. Looking through `show variables;` I see the following, seemingly character/collation related variables:
    character_set_client
    character_set_connection
    character_set_database
    character_set_results
    character_set_server
    character_set_system

    collation_connection
    collation_database
    collation_server

    The reason I say related, is because by default they all have a latin based character/collation value on a fresh mysql install. Your command switches all but the following to utf8/utf8_general_ci:
    character_set_client
    character_set_connection
    character_set_results
    collation_connection

    Those remain latin1/latin1_swedish_ci

    —-
    My question is, do these not need to be set manually as all databases will now be created accordingly in utf8/general? Is there a reason them being latin based doesn’t cause issues? I would assume making them the appropriate utf8 type would make it more consistent, but am curious what they would affect. I assumed (based on their names alone) that the connection ones (character_set_connection, collation_connection) might be of interest.

    Any thoughts?

  7. “When a client connects to the server, it sends the name of the character set that it wants to use. The server uses the name to set the character_set_client, character_set_results, and character_set_connection system variables. In effect, the server performs a SET NAMES operation using the character set name.”


    found that in mysql’s docs. is that saying that since we have the init_connect setting, when we connect to the db, from say a php script, it’ll automatically set those three settings accordingly on it’s own?

  8. moon says:

    i’ve just slapped myself.
    good doc btw. ty.

  9. Jordan says:

    Great write-up, you’ve saved me countless hours of frustration dealing with the world’s in ability to agree on a common set of characters. Thanks!!
    This worked perfectly!

  10. Jason says:

    Most useful! Thanks for sharing.

  11. Sathish says:

    Thank you very much….

    The following 3 lines in my.conf given the me perfect result.

    init_connect=’SET collation_connection = utf8_unicode_ci; SET NAMES utf8;’
    character_set_server = utf8
    collation_server=utf8_unicode_ci

    Thank you once again

  12. Gary says:

    Hi

    I cannot locate the my.cnf file, could someone please clarify where I can locate this file. I read else where that I should create an “\etc\my.cnf” folder and file however this for some reason does not work.

    G

  13. Dan says:

    The mysqld variable “default-character-set=utf8″ is unknown for v5.5 (not sure about others) so you have to use this…

    [mysqld]
    init_connect=’SET collation_connection = utf8_general_ci’
    init_connect=’SET NAMES utf8′
    character-set-server=utf8
    collation-server=utf8_general_ci
    skip-character-set-client-handshake

  14. [...] MySQL forcing utf-8 compliance for all connections [...]

  15. Maurice says:

    Anyone knows how to do something similar to this for java? I changed the my.cnf file to be exactly what is written on top but it still doesn’t work unless I use set names utf8.

  16. AMIT KUMAR SINGH says:

    When I am transefering a unicode data like 日本語 by C++ progrmme from Mysql source database to Mysql destination database, i am getting proper HEX value
    for given word but not able to write it in Mysql destination database,it is getting replce with ??? maks.
    even by using coneection string as utf-8, column is type Varcahr(100),

    If i am using coneection string as utf-16 it is not able to eastablish connection.

    ANY SOLUTION..ANY ONE HAVE?? PLz Rply…

  17.  
Leave a Reply