Closed Bug 1781724 Opened 2 years ago Closed 2 years ago

TextEncoder broken on some strings

Categories

(Core :: JavaScript Engine, defect)

Firefox 102
defect

Tracking

()

VERIFIED FIXED
105 Branch
Tracking Status
firefox-esr91 --- wontfix
firefox-esr102 --- verified
firefox103 --- wontfix
firefox104 --- verified
firefox105 --- verified

People

(Reporter: ondras, Assigned: emilio)

References

(Regression)

Details

(Keywords: regression)

Attachments

(1 file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0

Steps to reproduce:

TextEncoder.encode stops encoding early on a particular string. However, if the string contents are accessed in a different way, TextEncoder starts to behave correctly. Perhaps the string's internal representation gets changed in the meantime (contents are the same all the time).

Actual results:

See https://jsfiddle.net/ondras/9y3254hu/: a string is built via repeated String.fromCharCode calls. It then get serialized three times:

  1. via TextEncoder
  2. via encodeURIComponent
  3. via TextEncoder again

The bug manifests with the first serialization being shorter than the third. In particular, my Firefox 102 displays:

"Encoded lengths: 695, 3980, 2818"

Expected results:

The first and third serialization should be the same (2818 bytes).

Component: Untriaged → Internationalization
Product: Firefox → Core

Weird, might be a bug with how JSString -> UTF8String works.

Flags: needinfo?(emilio)
Keywords: regression

Minimal testcase:

charCodes = [111,117,353,237,109,32,116,97,107,116,111,55357,56842,46];
str = "";
charCodes.forEach(cc => str += String.fromCharCode(cc));
encoded1 = new TextEncoder().encode(str);
encoded2 = encodeURIComponent(str);
encoded3 = new TextEncoder().encode(str);
[encoded1.length, encoded3.length];

AR:

Array [ 17, 18 ]

ER:

Array [ 18, 18 ]

encodeInto shows that all input characters are not consumed:

charCodes = [111,117,353,237,109,32,116,97,107,116,111,55357,56842,46];
str = "";
charCodes.forEach(cc => str += String.fromCharCode(cc));
ary = new Uint8Array(charCodes.length * 3);
encoded1 = new TextEncoder().encodeInto(str, ary);...

Array(4) [ 13, 17, 14, 18 ]

I think JS_EncodeStringToUTF8BufferPartial is broken.

Component: Internationalization → JavaScript Engine

(In reply to Masatoshi Kimura [:emk] from comment #4)

I think JS_EncodeStringToUTF8BufferPartial is broken.

Yes, that also matches the regression window from comment #2. Bug 1561567 is the likely culprit, unfortunately :hsivonen isn't currently available to further investigate this bug.

Test case adjusted to work in the js-shell:

charCodes = [111,117,353,237,109,32,116,97,107,116,111,55357,56842,46];
str = "";
charCodes.forEach(cc => str += String.fromCharCode(cc));
ary = new Uint8Array(charCodes.length * 3);
encoded1 = encodeAsUtf8InBuffer(str, ary);
print(encoded1);
print(ary);

// Linearise rope.
str = str.split("").join("");

ary = new Uint8Array(charCodes.length * 3);
encoded1 = encodeAsUtf8InBuffer(str, ary);
print(encoded1);
print(ary);
Regressed by: 1561567
Assignee: nobody → emilio
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Flags: needinfo?(emilio)

We can't exit early if src.IsEmpty() if the stack is non-empty.

Set release status flags based on info from the regressing bug 1561567

Pushed by ealvarez@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/84c168e750bb
Fix JSString::encodeUTF8Partial with some ropes. r=anba

Comment on attachment 9287278 [details]
Bug 1781724 - Fix JSString::encodeUTF8Partial with some ropes. r=anba

Beta/Release Uplift Approval Request

  • User impact if declined: Some string conversions don't work properly. Arguably, this only happens in rather odd cases (you have to be in a deep rope-represented string, with the beginning of a surrogate at the end of the rope, and just one character at the right side of the rope), but it's still unpredictable from the developer point of view, so it'd be nice to fix.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: comment 0 + other comments have other more reduced test-cases.
  • List of other uplifts needed: none
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Pretty simple fix + test.
  • String changes made/needed: none
  • Is Android affected?: Yes
Attachment #9287278 - Flags: approval-mozilla-beta?
Flags: qe-verify+
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 105 Branch
QA Whiteboard: [qa-triaged]

Reproduced the issue with Firefox 105.0a1 (2022-07-26) on Windows 10x64 using the jsfiddle from comment 0 and the minimal test case from 3. The results from comment 0 are Encoded lengths: 695, 3980, 2818, and comment 3 test case ran in consle is Array [ 17, 18 ].
The issue is verified fixed with Firefox 105.0a1 (20220728215301) on Windows 10x64, macOS 11 and Ubuntu 20.04. The result for https://jsfiddle.net/ondras/9y3254hu/ is Encoded lengths: 2818, 3980, 2818, and the result for comment 3 test case is Array [ 18, 18 ].

Comment on attachment 9287278 [details]
Bug 1781724 - Fix JSString::encodeUTF8Partial with some ropes. r=anba

Approved for 104.0b4

Attachment #9287278 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Verified fixed with Firefox 104.0b4 (20220731190208) on Windows 10x64, macOS 11 and Ubuntu 20.04. The result for https://jsfiddle.net/ondras/9y3254hu/ is Encoded lengths: 2818, 3980, 2818, and the result for the comment 3 test case is Array [ 18, 18 ].

Status: RESOLVED → VERIFIED
Flags: qe-verify+

Is this something we should fix on ESR102 as well? Please nominate if so.

Flags: needinfo?(emilio)

Comment on attachment 9287278 [details]
Bug 1781724 - Fix JSString::encodeUTF8Partial with some ropes. r=anba

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: Simple fix for rather obscure string handling bug that affects a lot of DOM string APIs, potentially.
  • User impact if declined: see above.
  • Fix Landed on Version: 104
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Patch is simple and well understood.
Flags: needinfo?(emilio)
Attachment #9287278 - Flags: approval-mozilla-esr102?

Comment on attachment 9287278 [details]
Bug 1781724 - Fix JSString::encodeUTF8Partial with some ropes. r=anba

Approved for 102.2esr.

Attachment #9287278 - Flags: approval-mozilla-esr102? → approval-mozilla-esr102+

Verified fixed with Firefox 102.2.0esr (20220804141626) on Windows 10x64, macOS 11 and Ubuntu 21. The result for https://jsfiddle.net/ondras/9y3254hu/ is Encoded lengths: 2818, 3980, 2818, and the result for the comment 3 test case is Array [ 18, 18 ].

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: