Uploaded files not using UTF-8 Unicode encoding do not display properly


If a text or html file is uploaded which contains Unicode characters, but not UTF-8 encoding, it will not be displayed correctly if the user attempts to "Edit Content" in resources. Only UTF-8 encoding is typically supported in web applications.

There isn't an easy solution (that I can think of). Some possibiities:
1) Test for correct encoding on files when uploaded or edited – but this would likely be unacceptably slow.
2) Always display a warning in the edit window that non-UTF-8 content will not be displayed correctly.

One way to reproduce this problem is to create a MS Word document with non-Latin characters, and save as Unicode Text UTF-16).





Jean-François Lévêque May 6, 2013 at 4:58 AM

Shouldn't we document that CLE requires UTF-8 and close this as non-issue?

Beth Kirschner May 3, 2013 at 10:42 AM

Test for this is in the description:
One way to reproduce this problem is to create a MS Word document with non-Latin characters, and save as Unicode Text UTF-16).

Jean-François Lévêque May 3, 2013 at 5:02 AM

What is the test for this, Beth?

Sam Ottenhoff June 7, 2011 at 7:30 AM

I believe the SAK issue was fixed in KNL.

Any OAE issues should be filed there.

Pierre-Luc Rigaux June 6, 2011 at 9:52 AM


I know that problem, I work with Apache Sling and I got the same problem. Basically, ressources can't have an UTF-8 URI. It's a bug.

In my project I patched a Sling file, it worked, but it was just a workaround. The real problem was in Felix. You can see the bug report at https://issues.apache.org/jira/browse/FELIX-1979. Luckily now, it's fixed.

For Nakamura to solve this problem, you just have to change org.apache.felix.http.whiteboard from version 2.0.4 to version 2.2.0.

Glad to help





Affects versions




Created October 26, 2007 at 8:32 AM
Updated May 6, 2013 at 8:13 AM
Resolved May 6, 2013 at 8:13 AM