Code

Convert an Entire MySQL Table to UTF-8

My running log has been attracting more French speaking runners lately and they've been bringing their foreign characters with them, running in cities such as . Those accents were not always displaying properly on the site – often they'd appear as a question mark or a square box instead of the intended character. After doing some research, I came to the conclusion that the best solution would be to get everything into UTF-8 format. Only one problem… My database was all collated as latin1_german2_ci and I didn't know how to update it. The function below is the solution I came up with.

Before I display the code, let me discuss a couple trouble points I ran into. Be careful when using this on existing data! Create a backup before you begin! When I was updating my own tables, I discovered that htmlentities() sometimes returns a blank string when you have the charset set manually. As a result, a bunch of cities got wiped out of my log and I had to start over from my backup. The same or similar might happen as you're updating your data, so use this function at your own risk.

Also, if your data is already in UTF-8 (like if it's coming in on a form post from a UTF-8 encoded page) it needs to be decoded before being run through htmlentities(). I've included a commented out example of that where $original is converted to $new.

One other thing – if your table is really big, this is going to time out so you'll have to either adjust the max_execution_time or break this into smaller chunks (or both).

Finally, note that the code uses some custom database functions that you'll need to define before using it.

<?php

function convert_table_to_utf8($table_name, $execute=true) {
    global $link;
    $text_words = array('text','char','blob'); //columns with one of these in the column type will have a collation set
    $text_columns = array();
    $primary_key = null;
    $result = dbQuery('SHOW COLUMNS FROM `'.$table_name.'`');
    while ($row = dbGetRow($result,$link,false,false)) {
        if ($row['3'] == 'PRI') { $primary_key = $row['0']; }
        foreach ($text_words as $word) {
            //look to see if this column contains text (and therefore will need to be converted to UTF-8)
            if (stripos($row['1'],$word) !== false) {
                $text_columns[] = array('field'=>$row['0'],'type'=>$row['1']);
                break;
            }
        }
    }

    //update the default collation for your table
    $query = 'ALTER TABLE `'.$table_name.'` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci';
    echo $query.';<br />';
    if ($execute) { dbQuery($query); }

    //that last query doesn"t affect existing text rows, so you"ll have to update them separately...
    foreach ($text_columns as $col) {
        $query = 'ALTER TABLE `'.$table_name.'` CHANGE `'.$col['field'].'` `'.$col['field'].'` '.strtoupper($col['type']).' CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL';
        echo $query.';<br />';
        if ($execute) { dbQuery($query); }
    }

    //if you already have data in your table, it needs to be converted to UTF-8 so that it will display properly
    //but, this can\t happen unless there is a primary key set
    if (!$primary_key) { echo '<br />Could not update data within table because there is no primary key set. You can get around this by editing the query at the bottom of the function.'; return; }
    $result = dbQuery('SELECT * FROM `'.$table_name.'`');
    while ($row = dbGetRow($result)) { //note: if not using Ink Plant's database helpers, make sure you have the MYSQL_ASSOC flag set
        $updates = array();
        foreach ($text_columns as $col) {
            $field = $col['field'];
            if ($row[$field]) {
                $original = stripslashes($row[$field]);
                $new = $original;
                //$new = utf8_decode($new);
                $new = htmlentities($new); //note: if that doesn't work, try this: $new = htmlentities($new, ENT_COMPAT, 'UTF-8');
                $new = utf8_encode($new);
                if ($original != $new) { $updates[] = '`'.$field.'`=\''.dbEscape($new).'\''; }
            }
        }
        if (count($updates) > 0) {
            $query = 'UPDATE `'.$table_name.'` SET '.implode($updates,',').' WHERE `'.$primary_key.'` = \''.$row[$primary_key].'\' LIMIT 1';
            echo $query.';<br />';
            if ($execute) { dbQuery($query); }
        }
    }
    return true;
}

?>

The second part of this is making sure that your users' browsers know that they should be displaying UTF-8. This is easy to do – Just add the following meta tag in between <head> and </head> on the top of all your pages:

<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />

Taking this whole thing one step further, you can also change the collation of your entire MySQL database by using the following syntax:

ALTER DATABASE `db_name` CHARACTER SET utf8 COLLATE utf8_general_ci


2 Comments

December 18, 2011, 10:34 am
This script is very useful, I used it to nonvert my big DB, thanks!
July 14, 2012, 4:07 pm
Thank you so much : you saved my life ! :-)

Leave a Comment

Name
Email
Website
Comment
Name and email are required. Your email will not be published.

This post was published on March 13th, 2011 by Robert James Reese in the following categories: HTML, MySQL, PHP.

Before using any of the code or other content in this post, you must read and agree to our Terms & Conditions.

Copyright © 2014, Ink Plant. All rights reserved.