<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Saiweb &#187; utf8</title>
	<atom:link href="http://www.saiweb.co.uk/tag/utf8/feed" rel="self" type="application/rss+xml" />
	<link>http://www.saiweb.co.uk</link>
	<description>Ramblings of a Sys admin</description>
	<lastBuildDate>Mon, 06 Feb 2012 14:57:49 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Converting mySQL latin1 to utf8</title>
		<link>http://www.saiweb.co.uk/mysql/converting-mysql-latin1-to-utf8</link>
		<comments>http://www.saiweb.co.uk/mysql/converting-mysql-latin1-to-utf8#comments</comments>
		<pubDate>Tue, 14 Jul 2009 08:46:02 +0000</pubDate>
		<dc:creator>Buzz</dc:creator>
				<category><![CDATA[mySQL]]></category>
		<category><![CDATA[convert]]></category>
		<category><![CDATA[converting]]></category>
		<category><![CDATA[iconv]]></category>
		<category><![CDATA[latin-1]]></category>
		<category><![CDATA[latin1]]></category>
		<category><![CDATA[multibyte]]></category>
		<category><![CDATA[mysqldump]]></category>
		<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[utf-8]]></category>
		<category><![CDATA[utf8]]></category>

		<guid isPermaLink="false">http://www.saiweb.co.uk/?p=696</guid>
		<description><![CDATA[The problem We&#8217;ve all been in this position at some point, working for a company who wants to internationalize their website, and so their mySQL CMS data &#8230; But all is not so well as just using &#8216;SET NAMES utf8&#8242; and changing all &#8216;charset&#8217; on tables to utf8, You may fall foul of seeing content [...]]]></description>
			<content:encoded><![CDATA[<ul>
<strong>The problem</strong></ul>
<p>We&#8217;ve all been in this position at some point, working for a company who wants to internationalize their website, and so their mySQL CMS data &#8230;</p>
<p>But all is not so well as just using &#8216;SET NAMES utf8&#8242; and changing all &#8216;charset&#8217; on tables to utf8,</p>
<p>You may fall foul of seeing content like &#193;&pound;</p>
<p>This is due to the fact in this case the latin1 encoded £ has not properly been converted to utf8 and as such is not rendering correctly, this is true of most &#8216;multibyte&#8217; characters.</p>
<ul>
<strong>The solution</strong></ul>
<p>What you need to do is actually convert the character set to utf8, in order to do this your going to need to run it through a program you could use iconv if you are already familiar with it, however if your system has python installed you can grab a copy of my <a href="http://www.saiweb.co.uk/sysadmin">sysadmin</a> program which has iconv like functionality but is far more user friendly.</p>
<ul>
<strong>What you will need</strong></ul>
<ul>
<li>Text Editor (vi/nano/pico/emacs)</li>
<li>Python 2.4 or higher</li>
<li><a href="http://linux.about.com/od/commands/l/blcmdl1_sed.htm">SED</a> package</li>
<li><a href="http://www.saiweb.co.uk/sysadmin">Sysadmin program</a></li>
<li>mySQL</li>
</ul>
<ul>
<strong>Preparing the file</strong></ul>
<p>This assumes the database is currently using latin1, in theory this could be any encoding.</p>
<p>Get a dump of the database:</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">mysqldump <span style="color: #660033;">--set-character-set</span>=latin-<span style="color: #000000;">1</span> <span style="color: #660033;">--set-charset</span> <span style="color: #660033;">-u</span> user <span style="color: #660033;">-pPASSWORD</span> databasename <span style="color: #000000; font-weight: bold;">&gt;</span> databasename-latin1.sql</div></td></tr></tbody></table></div>
<p>Now you have to be aware of what you need to replace using SED, you can&#8217;t just replace all instances of &#8216;latin1&#8242; as <a href="http://en.wikipedia.org/wiki/Murphy%27s_law">Murphy&#8217;s law</a> being as it is means that somewhere there will be &#8216;latin1&#8242; in the physical content, especially for instance if I was using a mysql dump from this blog.</p>
<p>As such you need to replace the following:</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #000000; font-weight: bold;">/*!</span><span style="color: #000000;">40101</span> SET NAMES latin1 <span style="color: #000000; font-weight: bold;">*/</span>;</div></td></tr></tbody></table></div>
<p>If your database dump is small enough (sub 100mb) you can edit this line directly in your text editor, alternatively you can do the following.</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br /></div></td><td><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #c20cb9; font-weight: bold;">cat</span> .<span style="color: #000000; font-weight: bold;">/</span>databasename-latin1.sql <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">sed</span> <span style="color: #ff0000;">'s/SET NAMES latin1/SET NAMES utf8/g'</span> <span style="color: #000000; font-weight: bold;">&gt;</span> tmp<br />
<span style="color: #c20cb9; font-weight: bold;">cat</span> .<span style="color: #000000; font-weight: bold;">/</span>tmp <span style="color: #000000; font-weight: bold;">&gt;</span> .<span style="color: #000000; font-weight: bold;">/</span>databasename-latin1.sql<br />
<span style="color: #c20cb9; font-weight: bold;">rm</span> <span style="color: #660033;">-f</span> .<span style="color: #000000; font-weight: bold;">/</span>tmp</div></td></tr></tbody></table></div>
<p>Now you need to replace all instances of &#8216;CHARSET=latin1&#8242;</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br /></div></td><td><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #c20cb9; font-weight: bold;">cat</span> .<span style="color: #000000; font-weight: bold;">/</span>databasename-latin1.sql <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">sed</span> <span style="color: #ff0000;">'s/CHARSET=latin1/CHARSET=utf8/g'</span> <span style="color: #000000; font-weight: bold;">&gt;</span> tmp<br />
<span style="color: #c20cb9; font-weight: bold;">cat</span> .<span style="color: #000000; font-weight: bold;">/</span>tmp <span style="color: #000000; font-weight: bold;">&gt;</span> .<span style="color: #000000; font-weight: bold;">/</span>databasename-latin1.sql<br />
<span style="color: #c20cb9; font-weight: bold;">rm</span> <span style="color: #660033;">-f</span> .<span style="color: #000000; font-weight: bold;">/</span>tmp</div></td></tr></tbody></table></div>
<p>Now we have to run the file through the charset converter</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">sysadmin <span style="color: #660033;">-c</span> iconv <span style="color: #660033;">-d</span> .<span style="color: #000000; font-weight: bold;">/</span>databasename-latin1.sql,latin-<span style="color: #000000;">1</span>,utf-<span style="color: #000000;">8</span></div></td></tr></tbody></table></div>
<p>If your sql dump is over 30mb, you will be prompted to confirm you wish to proceed, please remember that this will load the entire file into memory, so ensure you have enough available system memory before proceeding, I also suggest not running this on a production server.</p>
<p>If any characters could not be converted you will be alerted to their exact position within the file, from there you will either need to use sed to replace the character or use your text editor.</p>
<p>If all went well you now have ./databasename-latin1.sql.utf-8 (note the utf-8 extension), you now have a complete utf8 mySQL dump, all you need do now is import the dump.</p>
<p><strong>
<ul>Further reading</ul>
<p></strong></p>
<ol>
<li><a href="http://www.saiweb.co.uk/mysql/mysql-forcing-utf-8-compliance-for-all-connections">Force mySQL utf8 connections</a></li>
<li><a href="http://www.saiweb.co.uk/mysql/mysql-bash-backup-script">mySQL backup script</a></li>
</ol>
<p><span style="float: left;" ><a class="twitter-share-button"  data-via="Saiweb" data-count="horizontal" data-related="Saiweb:David Busby" data-lang="en" data-url="http://www.saiweb.co.uk/mysql/converting-mysql-latin1-to-utf8" data-text="Converting mySQL latin1 to utf8" href="http://twitter.com/share?via=Saiweb&#038;count=horizontal&#038;related=Saiweb%3ADavid%20Busby&#038;lang=en&#038;url=http%3A%2F%2Fwww.saiweb.co.uk%2Fmysql%2Fconverting-mysql-latin1-to-utf8&#038;text=Converting%20mySQL%20latin1%20to%20utf8" >Tweet</a></span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.saiweb.co.uk/mysql/converting-mysql-latin1-to-utf8/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>mySQL forcing utf-8 compliance for all connections.</title>
		<link>http://www.saiweb.co.uk/mysql/mysql-forcing-utf-8-compliance-for-all-connections</link>
		<comments>http://www.saiweb.co.uk/mysql/mysql-forcing-utf-8-compliance-for-all-connections#comments</comments>
		<pubDate>Wed, 12 Nov 2008 10:05:45 +0000</pubDate>
		<dc:creator>Buzz</dc:creator>
				<category><![CDATA[mySQL]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[init_connect]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[utf-8]]></category>
		<category><![CDATA[utf8]]></category>

		<guid isPermaLink="false">http://www.saiweb.co.uk/mysql/mysql-forcing-utf-8-compliance-for-all-connections</guid>
		<description><![CDATA[The problem that most people face when setting up a UTF-8 database in mySQL is that without calling &#8216;SET NAMES&#8217; in the mySQL client prior to issuing any queries (PHP, C++ etc &#8230;) that the client connection will actually in most cases default to  latin-1. However as of mySQL 5.x or higher you can issue [...]]]></description>
			<content:encoded><![CDATA[<p>The problem that most people face when setting up a UTF-8 database in mySQL is that without calling &#8216;SET NAMES&#8217; in the mySQL client prior to issuing any queries (PHP, C++ etc &#8230;) that the client connection will actually in most cases default to  latin-1.</p>
<p>However as of mySQL 5.x or higher you can issue a statement in the my.cnf file calling init_connect.</p>
<p>This will trigger a series of defined commands / queries every time a non super user connects (So if you are using root to connect to your mySQL database, stop reading now and slap yourself HARD).</p>
<p>i.e.</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">[mysqld]<br />
init_connect='SET collation_connection = utf8_general_ci'<br />
init_connect='SET NAMES utf8'<br />
default-character-set=utf8<br />
character-set-server=utf8<br />
collation-server=utf8_general_ci<br />
skip-character-set-client-handshake</div></td></tr></tbody></table></div>
<p><strong>UPDATE 04/09/09</strong></p>
<p>my mySQL version 5.0.45 x64 only picks up the last entry of init_connect</p>
<p>Use this example in this case:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">[mysqld]<br />
init_connect='SET collation_connection = utf8_general_ci; SET NAMES utf8;'<br />
default-character-set=utf8<br />
character-set-server=utf8<br />
collation-server=utf8_general_ci</div></td></tr></tbody></table></div>
<p>Restart mySQL and check the mysqld.log has not returned any errors (Or your event viewer if you are using windows).</p>
<p>Every client connection will now default to utf-8 encoding and not latin-1, removing the need to add a SET NAMES call on every connection.</p>
<p>This will work for PHP, C++, ruby etc&#8230; as the client encoding is now handeled server side, rather that waiting on the client to issue a SET NAMES command.</p>
<p><strong>UPDATE 30/03/09</strong>: Added &#8220;skip-character-set-client-handshake&#8221; this ignores the clients request to set the connection charset, this info courtesy of &#8220;wardo&#8221; <a href="http://word.wardosworld.com/?p=164 ">http://word.wardosworld.com/?p=164 </a></p>
<p><strong>UPDATE 10/09/09</strong></p>
<p>Been having some issues with this working the workaround is to add this config as a single line:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">init_connect='SET collation_connection = utf8_general_ci; SET NAMES utf8;'</div></td></tr></tbody></table></div>
<p><span style="float: left;" ><a class="twitter-share-button"  data-via="Saiweb" data-count="horizontal" data-related="Saiweb:David Busby" data-lang="en" data-url="http://www.saiweb.co.uk/mysql/mysql-forcing-utf-8-compliance-for-all-connections" data-text="mySQL forcing utf-8 compliance for all connections." href="http://twitter.com/share?via=Saiweb&#038;count=horizontal&#038;related=Saiweb%3ADavid%20Busby&#038;lang=en&#038;url=http%3A%2F%2Fwww.saiweb.co.uk%2Fmysql%2Fmysql-forcing-utf-8-compliance-for-all-connections&#038;text=mySQL%20forcing%20utf-8%20compliance%20for%20all%20connections." >Tweet</a></span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.saiweb.co.uk/mysql/mysql-forcing-utf-8-compliance-for-all-connections/feed</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using apc
Database Caching 5/19 queries in 0.068 seconds using apc
Object Caching 847/881 objects using apc
Content Delivery Network via Rackspace Cloud Files: cdn.saiweb.co.uk

Served from: www.saiweb.co.uk @ 2012-02-07 16:38:32 -->
