Uploaded files not using UTF-8 Unicode encoding do not display properly

Description

If a text or html file is uploaded which contains Unicode characters, but not UTF-8 encoding, it will not be displayed correctly if the user attempts to "Edit Content" in resources. Only UTF-8 encoding is typically supported in web applications.

There isn't an easy solution (that I can think of). Some possibiities:
1) Test for correct encoding on files when uploaded or edited – but this would likely be unacceptably slow.
2) Always display a warning in the edit window that non-UTF-8 content will not be displayed correctly.

One way to reproduce this problem is to create a MS Word document with non-Latin characters, and save as Unicode Text UTF-16).

Attachments

1

Activity

Show:

Jean-François Lévêque May 6, 2013 at 4:58 AM

Shouldn't we document that CLE requires UTF-8 and close this as non-issue?

Beth Kirschner May 3, 2013 at 10:42 AM

Test for this is in the description:
One way to reproduce this problem is to create a MS Word document with non-Latin characters, and save as Unicode Text UTF-16).

Jean-François Lévêque May 3, 2013 at 5:02 AM

What is the test for this, Beth?

Sam Ottenhoff June 7, 2011 at 7:30 AM

I believe the SAK issue was fixed in KNL.

Any OAE issues should be filed there.

Pierre-Luc Rigaux June 6, 2011 at 9:52 AM

Hello,

I know that problem, I work with Apache Sling and I got the same problem. Basically, ressources can't have an UTF-8 URI. It's a bug.

In my project I patched a Sling file, it worked, but it was just a workaround. The real problem was in Felix. You can see the bug report at https://issues.apache.org/jira/browse/FELIX-1979. Luckily now, it's fixed.

For Nakamura to solve this problem, you just have to change org.apache.felix.http.whiteboard from version 2.0.4 to version 2.2.0.

Glad to help

Pier

Non-Issue

Details

Priority

Affects versions

Components

Assignee

Reporter

Created October 26, 2007 at 8:32 AM
Updated May 6, 2013 at 8:13 AM
Resolved May 6, 2013 at 8:13 AM