Inserting emojis that uses 4-byte utf-8 does not work with MySQL
Description
is depended on by
Activity

Sam Ottenhoff September 13, 2016 at 12:53 PM
Confirmed on nightly instances: I can't post a message to Forums with a 4-byte emoji to Sakai 11. To Sakai master the emoji is stripped as expected and my post succeeds.

Sam Ottenhoff August 8, 2016 at 11:27 AM
Yes, let's backport this fix to 10.x. Can someone confirm this works on Sakai 11?
Matthew Buckett August 8, 2016 at 11:20 AMEdited
We just had a failure locally on 10.x because an email for the email archive tool contained an emoji and it failed to get inserted into the DB.

Matthew Jones April 1, 2016 at 12:56 PMEdited
It looks like as of MySQL 5.7.7 the large_prefix is default to be on, and the old prefix size (767) is deprecated. So I think this plan is
Sakai 11, have a property that will remove 4-byte utf-8 characters (or convert to something else) on by default. Shouldn't be needed for Oracle though.
In Sakai 12+ maybe have a complete conversion for people running with this property or MySQL 5.7.7+.

Matthew Jones March 24, 2016 at 8:38 AMEdited
Yeah, that sounds like a good quick fix for 11. What would we replace them with, A space? Some other character? It looks pretty straightforward. But I'd take the patch if they had something in production too.
Some patterns like this might be enough too, if we also don't care about a replacement.
byte[] converttoBytes = string.getBytes("UTF-8");
string = new String(converttoBytes, "UTF-8");
There is an emoji library (https://github.com/vdurmont/emoji-java) that can remove emojis but I know there's characters outside of the emoji range that would also cause problems.
This article is the more appropriate one for this case.
http://stackoverflow.com/questions/12867000/how-to-remove-surrogate-characters-in-java
Details
Details
Priority
Affects versions
Fix versions
Components
Assignee

Reporter

Adding an emoji(common when using mobile devices) that uses 4 bytes in the editor does not work. Behaviour differs from tool to tool. For example in Lessons tool when pasting the "grinning face with smiling eyes" from http://apps.timwhitlock.info/emoji/tables/unicode in to the editor user will be presented with the error:
The correct solution to this is to convert from utf8 to utf8mb4(https://mathiasbynens.be/notes/mysql-utf8mb4).
Please feel free to edit to change to the correct components, I am not sure how to label this issue.