Common UTF-8 Problems

There are three areas that should be looked at if you're having trouble entering or displaying UTF-8 characters in Sakai:

1) Make sure tomcat's server.xml includes UTF-8 encoding in all its connectors.

1a) The following example assumes port 8080 is the primary browser port:

<Connector port="8080" maxHttpHeaderSize="8192" URIEncoding="UTF-8"
maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
enableLookups="false" redirectPort="8443" acceptCount="100"
connectionTimeout="20000" disableUploadTimeout="true" />

1b) If you have setup your Tomcat instance so it runs behind Apache, then you will also need to add the UTF-8 attribute to that connector, in server.xml as well. For example:

<Connector port="8009"
enableLookups="false" redirectPort="8443" protocol="AJP/1.3" URIEncoding="UTF-8" />

2) Make sure your database is created with UTF-8 encoding.

2a) For example, this will work for MySql:

create database sakai default character set utf8;

2b) Oracle UTF-8 settings is allegedly as follows:

http://docs.oracle.com/cd/B10500_01/server.920/a96529/ch10.htm#1009904

SQL> SHUTDOWN IMMEDIATE; – or NORMAL
SQL> STARTUP MOUNT;
SQL> ALTER SYSTEM ENABLE RESTRICED SESSION;
SQL> ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0;
SQL> ALTER DATABASE OPEN;
SQL> ALTER DATABASE CHARACTER SET AL32UTF8;
SQL> SHUTDOWN IMMEDIATE; – or NORMAL
SQL> STARTUP;

A good reference for Oracle 9i (especially Figure 2-8 & 2-9) is at http://download.oracle.com/docs/cd/B10500_01/server.920/a96529/ch2.htm

If that doesn't work or it gives the error "ORA-12712: new character set must be a superset of old character set" you can try

ALTER DATABASE CHARACTER SET INTERNAL_USE AL32UTF8;

The above command will skip the check of character set subset or superset.

3) Make sure the (MySql) connector is defined for UTF-8 encoding in the sakai.properties file. Note that previous releases of Sakai had an incorrect default value:

   WRONG: url@javax.sql.BaseDataSource=jdbc:mysql://127.0.0.1:3306/sakai?useUnicode=true&amp;characterEncoding=UTF-8

   CORRECT: url@javax.sql.BaseDataSource=jdbc:mysql://127.0.0.1:3306/sakai?useUnicode=true&characterEncoding=UTF-8

There is a fourth issue that only affects Sakai pre-2.1.0 running on MySQL. This patch can be backported manually to these systems. More information, including the patch file, is available in JIRA ticket 1737

One more thing - make sure your MySQL connector is v3.1.14 or better. Some tools (RWiki and JForum) sore content as blob's, and
do byte-to-string conversions. The wrong version of the MySQL driver will cause puzzling problems with UTF-8 characters.

4) Windows users only (Tomcat as a windows service)

Set the property -Dfile.encoding=UTF-8 in tomcat properties. (Open command window -> type "tomcat5w" -> "Java" ->"Java Options:")

You can check you file encoding with this (FileEncoding.java) java file.

FileEncoding.java
import java.lang.*;
public class FileEncoding {
    public static void main(String[] args) {
        System.out.println("file.encoding=" + System.getProperty("file.encoding"));
    }
}