Uploaded files not using UTF-8 Unicode encoding do not display properly
GENERAL
TESTING
GENERAL
TESTING
Description
If a text or html file is uploaded which contains Unicode characters, but not UTF-8 encoding, it will not be displayed correctly if the user attempts to "Edit Content" in resources. Only UTF-8 encoding is typically supported in web applications.
There isn't an easy solution (that I can think of). Some possibiities: 1) Test for correct encoding on files when uploaded or edited – but this would likely be unacceptably slow. 2) Always display a warning in the edit window that non-UTF-8 content will not be displayed correctly.
One way to reproduce this problem is to create a MS Word document with non-Latin characters, and save as Unicode Text UTF-16).
Shouldn't we document that CLE requires UTF-8 and close this as non-issue?
Beth Kirschner May 3, 2013 at 10:42 AM
Test for this is in the description: One way to reproduce this problem is to create a MS Word document with non-Latin characters, and save as Unicode Text UTF-16).
Jean-François Lévêque May 3, 2013 at 5:02 AM
What is the test for this, Beth?
Sam Ottenhoff June 7, 2011 at 7:30 AM
I believe the SAK issue was fixed in KNL.
Any OAE issues should be filed there.
Pierre-Luc Rigaux June 6, 2011 at 9:52 AM
Hello,
I know that problem, I work with Apache Sling and I got the same problem. Basically, ressources can't have an UTF-8 URI. It's a bug.
In my project I patched a Sling file, it worked, but it was just a workaround. The real problem was in Felix. You can see the bug report at https://issues.apache.org/jira/browse/FELIX-1979. Luckily now, it's fixed.
For Nakamura to solve this problem, you just have to change org.apache.felix.http.whiteboard from version 2.0.4 to version 2.2.0.
If a text or html file is uploaded which contains Unicode characters, but not UTF-8 encoding, it will not be displayed correctly if the user attempts to "Edit Content" in resources. Only UTF-8 encoding is typically supported in web applications.
There isn't an easy solution (that I can think of). Some possibiities:
1) Test for correct encoding on files when uploaded or edited – but this would likely be unacceptably slow.
2) Always display a warning in the edit window that non-UTF-8 content will not be displayed correctly.
One way to reproduce this problem is to create a MS Word document with non-Latin characters, and save as Unicode Text UTF-16).