Tinymce and emojis

Guy · November 29, 2022, 3:30pm

Hello,
I have a problem with html editor. When i insert an emoji by menu insert/emoji, i can see the emoji. But when i save record, i show special character. My database in in utf8mb4. I think the problem it’s only with htlm editor.
Do you have same problem

sc 9.9 and 9.7
Thanks
Guy

alejandro.canavesi.f · April 7, 2023, 8:19am

I have the same problem

Guy · April 7, 2023, 12:31pm

Hi,
Always same problem here with 9.9.009
Guy

Guy · April 15, 2026, 12:29pm

Hi,
Here the patch
ACTUAL BEHAVIOR

The emoji is replaced with a replacement character followed by the
literal hex string of the low surrogate, e.g.
“Tu continues l’aventure avec nous ! �dd2b”

EXPECTED BEHAVIOR

The emoji should remain intact, as it does on the initial (pre-AJAX)
render and after a full page refresh.

ROOT CAUSE ANALYSIS

File: scriptcase/devel/lib/php/json.php
Class: Services_JSON
Method: encode()
Lines: approximately 326-342 (ScriptCase 9.13)

The buggy code:

case (($ord_var_c & 0xF8) == 0xF0):
// characters U-00010000 - U-001FFFFF, mask 11110XXX
$char = pack(‘C*’, $ord_var_c,
ord($var[$c + 1]),
ord($var[$c + 2]),
ord($var[$c + 3]));
$c += 3;
$utf16 = $this->utf82utf16($char);
$ascii .= sprintf(’\u%04s’, bin2hex($utf16));
break;

The call mb_convert_encoding($utf8, ‘UTF-16’, ‘UTF-8’) on a 4-byte
UTF-8 input returns a 4-byte UTF-16 surrogate pair (for example
D83E DD2B for U+1F92B), not a single UTF-16 code unit.

The expression sprintf(’\u%04s’, bin2hex($utf16)) then emits a
SINGLE \u escape followed by 8 hex digits (e.g. \ud83edd2b), which
is NOT valid JSON. The JSON specification requires surrogate pairs
outside the BMP to be encoded as two separate \uXXXX escapes.

JSON parsers read only the first 4 hex digits of each \u escape, so
\ud83edd2b is parsed as the lone high surrogate U+D83E followed by
the literal text “dd2b”. When rendered to the DOM, the lone surrogate
displays as U+FFFD (replacement character) and “dd2b” remains as
visible text, which matches the observed corruption exactly.

The companion utf82utf16() helper also lacks explicit handling of
the supplementary plane (comment in code: “ignoring UTF-32 for now,
sorry”), indicating this code path was never properly implemented
for characters outside the BMP.

PROPOSED FIX

Replace the 4-byte branch with a correct surrogate pair emission:

case (($ord_var_c & 0xF8) == 0xF0):
// characters U-00010000 - U-001FFFFF, mask 11110XXX
$b1 = $ord_var_c;
$b2 = ord($var[$c + 1]);
$b3 = ord($var[$c + 2]);
$b4 = ord($var[$c + 3]);
$c += 3;
// Decode 4-byte UTF-8 to Unicode code point
$codepoint = (($b1 & 0x07) << 18)
| (($b2 & 0x3F) << 12)
| (($b3 & 0x3F) << 6)
| ($b4 & 0x3F);
// Encode as UTF-16 surrogate pair (two \uXXXX escapes)
$codepoint -= 0x10000;
$high = 0xD800 | (($codepoint >> 10) & 0x3FF);
$low = 0xDC00 | ($codepoint & 0x3FF);
$ascii .= sprintf(’\u%04x\u%04x’, $high, $low);
break;

This patch does not touch the ASCII, 2-byte or 3-byte branches of
the switch, so it is backward-compatible with all existing behavior.
It has been applied in a production ScriptCase 9.13 installation
and fully resolves the issue on Grid pagination.

I sent an email to ScriptCase asking them to create a patch
Guy

Tinymce and emojis

Hi, Here the patch ACTUAL BEHAVIOR

EXPECTED BEHAVIOR

ROOT CAUSE ANALYSIS

PROPOSED FIX

Hi,
Here the patch
ACTUAL BEHAVIOR