I have been trying to get Movable Type to run Unicode natively for a while. When Movable Type was upgraded to version 3.3, I saw my chance. This new version has a lot of the needed code for encoding and decoding etc. and made my job much easier than before.
If you remember my previous travails, DBD::mysql module lacked UTF8 support. Almost immediately after my changes, the develper release of DBD::mysql finally included a UTF8 patch. But that was too late for me. Plus I am going to wait for it to be included in a regular release since DBD::mysql is somewhat complicated.
What I did was to set the UTF-8 flag for everything coming out of the database using a wrapper around the DBI module. I used Pavel Kudinov’s code for that, which is given below.
# UTF8DBI.pm re-implementation by Pavel Kudinov http://search.cpan.org/~kudinov/
# originally from: http://dysphoria.net/code/perl-utf8/
package UTF8DBI ; use base DBI ;
package UTF8DBI::db; use base DBI::db;
package UTF8DBI::st; use base DBI::st;
sub _utf8_() {
use Encode;
if (ref $_ eq 'ARRAY'){ &_utf8_() foreach @$_ }
elsif (ref $_ eq 'HASH' ){ &_utf8_() foreach values %$_ }
else { Encode::_utf8_on($_) };
$_;
};
sub fetch { return _utf8_ for shift->SUPER::fetch (@_) };
sub fetchrow_arrayref { return _utf8_ for shift->SUPER::fetchrow_arrayref(@_) };
sub fetchrow_hashref { return _utf8_ for shift->SUPER::fetchrow_hashref (@_) };
sub fetchall_arrayref { return _utf8_ for shift->SUPER::fetchall_arrayref(@_) };
sub fetchall_hashref { return _utf8_ for shift->SUPER::fetchall_hashref (@_) };
sub fetchcol_arrayref { return _utf8_ for shift->SUPER::fetchcol_arrayref(@_) };
sub fetchrow_array { @{shift-> fetchrow_arrayref(@_)} };
1;
With that code, I needed to replace calls to DBI module with calls to UTF8DBI module as shown in the patches below.
--- lib/MT/ObjectDriver/DBI.pm.orig 2006-09-06 19:27:17.000000000 -0700
+++ lib/MT/ObjectDriver/DBI.pm 2006-09-06 19:23:09.000000000 -0700
@@ -7,7 +7,7 @@
package MT::ObjectDriver::DBI;
use strict;
-use DBI;
+use UTF8DBI;
use MT::Util qw( offset_time_list );
use MT::ObjectDriver;
--- lib/MT/ObjectDriver/DBI/mysql.pm.orig 2006-09-06 19:26:55.000000000 -0700
+++ lib/MT/ObjectDriver/DBI/mysql.pm 2006-09-06 19:24:20.000000000 -0700
@@ -93,10 +93,10 @@
$dsn .= ';hostname=' . $cfg->DBHost if $cfg->DBHost;
$dsn .= ';mysql_socket=' . $cfg->DBSocket if $cfg->DBSocket;
$dsn .= ';port=' . $cfg->DBPort if $cfg->DBPort;
- $driver->{dbh} = DBI->connect($dsn, $cfg->DBUser, $cfg->DBPassword,
+ $driver->{dbh} = UTF8DBI->connect($dsn, $cfg->DBUser, $cfg->DBPassword,
{ RaiseError => 0, PrintError => 0 })
or return $driver->error(MT->translate("Connection error: [_1]",
- $DBI::errstr));
+ $UTF8DBI::errstr));
$driver;
}
However, that didn’t fix all the problems. The Perl CGI module was still working in Latin1 mode. I could wrap that into a UTF8CGI module but the newer versions of CGI module support Unicode. So I just upgraded the version of CGI bundled with Movable Type. Still I needed to tell the CGI module that the character set in use was UTF-8. I could either do that every single time the CGI module was called or I could just set the default character set to UTF-8. Since this CGI module was in the Movable Type extlib folder, I decided to modify its default character set.
--- extlib/CGI.pm.orig 2006-09-15 10:39:30.000000000 -0700
+++ extlib/CGI.pm 2006-09-15 10:39:59.000000000 -0700
@@ -517,8 +517,8 @@
$fh = to_filehandle($initializer) if $initializer;
- # set charset to the safe ISO-8859-1
- $self->charset('ISO-8859-1');
+ # set charset to utf-8
+ $self->charset('utf-8');
METHOD: {
I also set the utf8 mode for writing the files to disk.
--- lib/MT/FileMgr/Local.pm.orig 2006-09-27 06:56:39.000000000 -0700
+++ lib/MT/FileMgr/Local.pm 2006-09-27 06:57:36.000000000 -0700
@@ -75,6 +75,9 @@
binmode(FH);
binmode($from) if $fmgr->is_handle($from);
}
+ else {
+ binmode(FH, ":utf8");
+ }
## Lock file unless NoLocking specified.
flock FH, LOCK_EX unless $fmgr->{cfg}->NoLocking;
seek FH, 0, 0;
These changes caused problems with file uploads through the Movable Type interface. I expected this since I have run into this problem with PHP and mbstring as well. The following patch fixed this issue.
--- lib/MT/App/CMS.pm.orig 2006-10-08 21:17:11.000000000 -0700
+++ lib/MT/App/CMS.pm 2006-10-08 21:17:37.000000000 -0700
@@ -8334,6 +8334,7 @@
$app->validate_magic() or return;
my $q = $app->param;
+ $q->charset('iso-8859-1');
my($fh, $no_upload);
if ($ENV{MOD_PERL}) {
my $up = $q->upload('file');
Then it was time to comment out the liberally sprinkled code to switch off the utf8 flag in Movable Type.
--- lib/MT/I18N/default.pm.orig 2006-09-16 20:22:22.000000000 -0700
+++ lib/MT/I18N/default.pm 2006-09-16 20:23:26.000000000 -0700
@@ -292,7 +292,7 @@
$text = $class->_conv_to_utf8($text, $enc) if $enc ne 'utf-8';
Encode::_utf8_on($text);
$text = substr($text, $startpos, $length);
- Encode::_utf8_off($text);
+# Encode::_utf8_off($text);
$text = $class->_conv_from_utf8($text, $enc) if $enc ne 'utf-8';
$text;
}
@@ -322,7 +322,7 @@
}
}
- Encode::_utf8_off($text) if $to eq 'utf-8';
+# Encode::_utf8_off($text) if $to eq 'utf-8';
$text;
}
Finally I had to make changes to the MTHash plugin that I use to force comment previews. The Digest::SHA1 module only accepts bytes, therefore, the UTF-8 characters had to be encoded as bytes before being passed to any functions in the module. Here is my patch:
--- lib/MT/App/Comments.pm.orig 2006-09-16 21:01:21.000000000 -0700
+++ lib/MT/App/Comments.pm 2006-09-16 21:03:08.000000000 -0700
@@ -266,9 +266,10 @@
require Digest::SHA1;
my $sha1 = Digest::SHA1->new;
- $sha1->add($q->param('text') . $q->param('entry_id') . $app->remote_ip
- . $q->param('author') . $q->param('email') . $q->param('url')
- . $q->param('convert_breaks'));
+ my $octets = Encode::encode_utf8($q->param('text') . $q->param('entry_id') . $app->remote_ip
+ . $q->param('author') . $q->param('email') . $q->param('url')
+ . $q->param('convert_breaks'));
+ $sha1->add($octets);
my $salt_file = MT::ConfigMgr->instance->PluginPath .'/salt.txt';
my $FH;
open($FH, $salt_file) or die "cannot open file <$salt_file> ($!)";
--- plugins/MTHash.pl.orig 2006-09-16 20:29:22.000000000 -0700
+++ plugins/MTHash.pl 2006-09-16 20:57:22.000000000 -0700
@@ -32,7 +32,8 @@
or return $ctx->error($ctx->errstr);
my $sha1 = Digest::SHA1->new;
- $sha1->add($content);
+ my $octets = Encode::encode_utf8($content);
+ $sha1->add($octets);
my $salt_file = MT::ConfigMgr->instance->PluginPath .'/salt.txt';
open(FH, $salt_file) or die "cannot open file <$salt_file> ($!)";
$sha1->addfile(FH);
One thing that I still need to do is to fix the Serializer and Un-serializer used by Movable Type plugins.
1 comment
Comments are closed.