Closed Bug 1561567 Opened 5 months ago Closed 2 months ago

Introduce rope-walking conversion from JS strings to UTF-8 in a byte span

Categories

(Core :: JavaScript Engine, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
mozilla71
Tracking Status
firefox71 --- fixed

People

(Reporter: hsivonen, Assigned: hsivonen)

References

(Blocks 3 open bugs)

Details

Attachments

(2 files, 1 obsolete file)

Introduce a public function in the JS API that takes a JSString* and a mozilla::Span<uint8_t>, by walking the rope without linearizing or inflating the string writes the UTF-8 conversion to the span (as the length of the span permits) and returns mozilla::Tuple<size_t, size_t> with the number of code units read and written (with the buffer end behavior prescribed by TextEncoder.encodeInto).

The conversion should use encoding_rs::mem (for SIMD acceleration) on a per-rope-segment basis taking into account the case of a non-final UTF-16 rope segment ending with a high surrogate and the next segmement being a UTF-16 segment starting with a low surrogate.

(Optimizations for short rope segments can be considered a follow-up if necessary and are explicitly not part of this bug.)

Blocks: 1449861
Blocks: 1561573
Depends on: 1490601
Priority: -- → P3

Todo: Surrogate pairs crossing rope segment boundaries.

Unit tests will go into bug 1561573.

Blocks: 1578396
Blocks: 1578437

Just posted a patch to add a testing function for encoding a string as UTF-8 into a buffer -- should be easy to test for expected UTF-8 bytes using that. Local manual testing of it atop the patch here (and preceding patches) seems to check out, but actually running the adapted encoding test with it seems like the real judge of its correctness.

Blocks: 1581743
Attachment #9092117 - Attachment description: Bug 1561567 - Implement a |[read, written] = encodeAsUtf8InBuffer(str, uint8Array)| testing function for use in testing string encoding as UTF-8 into a buffer. r=arai → Bug 1561567 - Implement a |[read, written] = encodeAsUtf8InBuffer(str, uint8Array)| testing function for use in testing string encoding as UTF-8 into a buffer.

The test works for me locally, but fails on try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a2ec39c0628bd3e23cf8fd981b9c6766caee7b57&selectedJob=267010608

Why does it get a different compartment configuration on try? What can I do about it?

Flags: needinfo?(jwalden)
Pushed by hsivonen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6fbb5c422ee1
Introduce rope-walking conversion from JS strings to UTF-8 in a byte span. r=jwalden
https://hg.mozilla.org/integration/autoland/rev/ac0ff32d087b
Implement a |[read, written] = encodeAsUtf8InBuffer(str, uint8Array)| testing function for use in testing string encoding as UTF-8 into a buffer. r=arai
See Also: → 1582050

(In reply to Henri Sivonen (:hsivonen) from comment #8)

The test works for me locally, but fails on try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a2ec39c0628bd3e23cf8fd981b9c6766caee7b57&selectedJob=267010608

Why does it get a different compartment configuration on try? What can I do about it?

I landed the patches here, because I was spending too much time daily just managing the patch queue. Let's sort out the non-DOM test in bug 1582050.

Flags: needinfo?(jwalden)

Comment on attachment 9093232 [details]
Bug 1561567 test - Test JS_EncodeStringToUTF8BufferPartial without DOM code.

Revision D46131 was moved to bug 1582050. Setting attachment 9093232 [details] to obsolete.

Attachment #9093232 - Attachment is obsolete: true
Status: ASSIGNED → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla71
You need to log in before you can comment on or make changes to this bug.