Closed Bug 1271014 Opened 9 years ago Closed 9 years ago

make DeflateStringToUTF8Buffer return partial result if dest buffer runs out of space

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla49

Tracking Flags:

Tracking

Status

firefox49

---

fixed

People

(Reporter: myk, Assigned: myk)

Details

(Whiteboard: [v8api])

Attachments

(2 files, 3 obsolete files)

return partial result and num bytes/chars written 9 years ago Myk Melez [:myk] [@mykmelez] 5.62 KB, patch	jandem : feedback+	Details \| Diff \| Splinter Review
test DeflateStringToUTF8Buffer 9 years ago Myk Melez [:myk] [@mykmelez] 11.71 KB, patch	jandem : review+	Details \| Diff \| Splinter Review
return partial result and num bytes/chars written 9 years ago Myk Melez [:myk] [@mykmelez] 5.57 KB, patch	jandem : review+	Details \| Diff \| Splinter Review
test DeflateStringToUTF8Buffer 9 years ago Myk Melez [:myk] [@mykmelez] 11.66 KB, patch		Details \| Diff \| Splinter Review
test DeflateStringToUTF8Buffer 9 years ago Myk Melez [:myk] [@mykmelez] 11.65 KB, patch	jandem : review+	Details \| Diff \| Splinter Review

Myk Melez [:myk] [@mykmelez]

Assignee

Description

•

9 years ago

SpiderShim currently has a custom implementation of DeflateStringToUTF8Buffer [1], since neither of the existing ones (the public one in CharacterEncoding and a private one in CTypes) does quite what it wants, which is to return a partial result if the destination buffer runs out of space (CharacterEncoding requires the buffer to have enough space [2]) and continue on a bad surrogate (CTypes aborts on a bad surrogate [3]). In order to support SpiderShim, the public DeflateStringToUTF8Buffer in CharacterEncoding should return a partial result if the destination buffer runs out of space. [1] https://github.com/mozilla/spidernode/blob/master/deps/spidershim/src/v8string.cc#L455-L538 [2] https://dxr.mozilla.org/mozilla-central/rev/e5a10b/js/public/CharacterEncoding.h#208-213 [3] https://dxr.mozilla.org/mozilla-central/rev/e5a10b/js/src/ctypes/CTypes.cpp#172-177

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 1

•

9 years ago

I'll refactor the SpiderShim code for this.

Assignee: nobody → myk

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 2

•

9 years ago

Attached patch return partial result and num bytes/chars written (obsolete) — Details — Splinter Review

This patch refactors the SpiderShim implementation to be a compatible replacement for the existing function in CharacterEncoding. It adds two optional parameters, *dstlenp and *numcharsp, which enable the function to write a partial result (up to *dstlenp bytes) and return the number of bytes and chars written (by updating *dstlenp and *numcharsp). (The SpiderShim implementation includes a third parameter to configure the encoder to write an invalid surrogate to the buffer instead of replacing it with the UTF-8 replacement character, but Node doesn't use that feature of the V8 API, so I won't add it here.) This passes the check-spidermonkey tests, although I don't think those exercise this API, at least not directly.

Attachment #8750502 - Flags: review?(terrence)

Myk Melez [:myk] [@mykmelez]

Assignee

Updated

•

9 years ago

Status: NEW → ASSIGNED

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 3

•

9 years ago

Comment on attachment 8750502 [details] [diff] [review] return partial result and num bytes/chars written It looks like Terrence is on PTO, so I'm redirecting this review request to another reviewer who has touched this area of code in the past.

Attachment #8750502 - Flags: review?(terrence) → review?(jdemooij)

Jan de Mooij [:jandem]

Comment 4

•

9 years ago

I can review this patch tomorrow - I didn't take a close look but we should definitely add some jsapi-tests for all interesting (edge) cases here.

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 5

•

9 years ago

(In reply to Jan de Mooij [:jandem] from comment #4) > I can review this patch tomorrow - I didn't take a close look but we should > definitely add some jsapi-tests for all interesting (edge) cases here. I can do so, although there aren't currently tests for DeflateStringToUTF8Buffer, neither for edge cases nor for basic functionality.

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 6

•

9 years ago

Attached patch test DeflateStringToUTF8Buffer (obsolete) — Details — Splinter Review

Here's a jsapi-test for both the existing and the new functionality, including basic tests and tests of various edge cases.

Attachment #8751516 - Flags: review?(jdemooij)

Jan de Mooij [:jandem]

Comment 7

•

9 years ago

Comment on attachment 8750502 [details] [diff] [review] return partial result and num bytes/chars written Review of attachment 8750502 [details] [diff] [review]: ----------------------------------------------------------------- Looks good, but a suggestion below to simplify the code a bit. If you can post an updated patch with that fixed I can r+ quickly. ::: js/src/vm/CharacterEncoding.cpp @@ +73,5 @@ > ? ::GetDeflatedUTF8StringLength(s->latin1Chars(nogc), s->length()) > : ::GetDeflatedUTF8StringLength(s->twoByteChars(nogc), s->length()); > } > > +const char16_t UTF8_REPLACEMENT_CHAR = u'\uFFFD'; According to the following page, VS 2013 doesn't support this syntax yet: https://developer.mozilla.org/en-US/docs/Using_CXX_in_Mozilla_code So to be safe, I'd just use: static const char16_t UTF8_REPLACEMENT_CHAR = 0xFFFD; @@ +98,5 @@ > if (srclen < 1) { > + v = UTF8_REPLACEMENT_CHAR; > + } else { > + char16_t c2 = *src; > + if ((c2 < 0xDC00) || (c2 > 0xDFFF)) { Pre-existing, but please remove the parentheses around the < and > expressions while you're here. (I like extra parentheses when there's potential confusion about the evaluation order, but (x < y || a > b) is common enough that it shouldn't confuse anyone :) @@ +113,5 @@ > if (v < 0x0080) { > /* no encoding necessary - performance hack */ > + if (dstlenp && dstlen == 0) { > + goto finish; > + } Nit: no {} (because each of condition, else-body, then-body fit on a single line). The control-flow and the subtraction at the end seem more complicated than necessary though. What if we do the following: - At the start of this function: if (dstlenp) *dstlenp = 0; Then at the end of the loop body: if (dstlenp) *dstlenp += utf8Len; And to bounds check here: if (dstlenp && *dstlenp + 1 < origDstlen) return; What do you think? That way we always maintain the correct state so we can just return when we're done.

Attachment #8750502 - Flags: review?(jdemooij) → feedback+

Jan de Mooij [:jandem]

Comment 8

•

9 years ago

Comment on attachment 8751516 [details] [diff] [review] test DeflateStringToUTF8Buffer Review of attachment 8751516 [details] [diff] [review]: ----------------------------------------------------------------- Beautiful! Thank you for adding these tests. ::: js/src/jsapi-tests/testDeflateStringToUTF8Buffer.cpp @@ +22,5 @@ > + // Test with an ASCII string, which calls JSFlatString::latin1Chars > + // to retrieve the characters from the string and generates UTF-8 output > + // that is identical to the ASCII input. > + > + str = JS_NewStringCopyZ(cx, "Ohai"); // { 0x4F, 0x68, 0x61, 0x69 } Nit: add a MOZ_RELEASE_ASSERT(str); here and after the JS_New* calls below. (We don't have to handle OOM gracefully in tests but crashing immediately helps if this ever fails.) @@ +98,5 @@ > + // Test with a Latin-1 string, which calls JSFlatString::latin1Chars > + // like with the ASCII string but generates UTF-8 output that is different > + // from the ASCII input. > + > + str = JS_NewUCStringCopyZ(cx, u"Óhãï"); // { 0xD3, 0x68, 0xE3, 0xEF } VS 2013 comment applies here too, you can use MOZ_UTF16("foo") instead. Hopefully we'll drop support for VS 2013 soon.

Attachment #8751516 - Flags: review?(jdemooij) → review+

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 9

•

9 years ago

Attached patch return partial result and num bytes/chars written — Details — Splinter Review

(In reply to Jan de Mooij [:jandem] from comment #7) > ::: js/src/vm/CharacterEncoding.cpp > @@ +73,5 @@ > > ? ::GetDeflatedUTF8StringLength(s->latin1Chars(nogc), s->length()) > > : ::GetDeflatedUTF8StringLength(s->twoByteChars(nogc), s->length()); > > } > > > > +const char16_t UTF8_REPLACEMENT_CHAR = u'\uFFFD'; > > According to the following page, VS 2013 doesn't support this syntax yet: > https://developer.mozilla.org/en-US/docs/Using_CXX_in_Mozilla_code Ah, thanks for the reference, I was unaware of that page! > So to be safe, I'd just use: > > static const char16_t UTF8_REPLACEMENT_CHAR = 0xFFFD; Sounds good, fixed in this patch. > @@ +98,5 @@ > > if (srclen < 1) { > > + v = UTF8_REPLACEMENT_CHAR; > > + } else { > > + char16_t c2 = *src; > > + if ((c2 < 0xDC00) || (c2 > 0xDFFF)) { > > Pre-existing, but please remove the parentheses around the < and > > expressions while you're here. Done! > (I like extra parentheses when there's potential confusion about the > evaluation order, but (x < y || a > b) is common enough that it shouldn't > confuse anyone :) Indeed, I noticed that too but figured I'd resist the urge to make unrelated changes. Thanks for encouraging my inner pedant! ;-) > @@ +113,5 @@ > > if (v < 0x0080) { > > /* no encoding necessary - performance hack */ > > + if (dstlenp && dstlen == 0) { > > + goto finish; > > + } > > Nit: no {} (because each of condition, else-body, then-body fit on a single > line). Right, fixed. > The control-flow and the subtraction at the end seem more complicated than > necessary though. What if we do the following: > > - At the start of this function: > > if (dstlenp) > *dstlenp = 0; > > Then at the end of the loop body: > > if (dstlenp) > *dstlenp += utf8Len; > > And to bounds check here: > > if (dstlenp && *dstlenp + 1 < origDstlen) > return; Here presumably you mean: > if (dstlenp && *dstlenp + 1 > origDstlen) > return; And then the check for the general case would be: > if (dstlenp && *dstlenp + utf8Len > origDstlen) > return; (We could refactor a bit further, setting utf8Len first based on the value of |v| and then consolidating these conditional early returns. But I don't think it'd add clarity.) > What do you think? That way we always maintain the correct state so we can > just return when we're done. I like it! I adapted the original implementation from the DeflateStringToUTF8Buffer function in CTypes.cpp <https://dxr.mozilla.org/mozilla-central/rev/3461f3/js/src/ctypes/CTypes.cpp#124-187>, but I prefer yours, as it maintains the loop invariant that *dstlenp is the number of bytes written to the destination buffer. So I've adopted your implementation in this version of the patch. The only changes I made are the aforementioned correction to the conditionals, s/origDstlen/capacity/ (to make its purpose clearer), and consolidation of the code that initializes capacity and *dstlenp, which is now: > size_t capacity = 0; > if (dstlenp) { > capacity = *dstlenp; > *dstlenp = 0; > }

Attachment #8750502 - Attachment is obsolete: true

Attachment #8751817 - Flags: review?(jdemooij)

Jan de Mooij [:jandem]

Comment 10

•

9 years ago

Comment on attachment 8751817 [details] [diff] [review] return partial result and num bytes/chars written Review of attachment 8751817 [details] [diff] [review]: ----------------------------------------------------------------- Thanks, looks great.

Attachment #8751817 - Flags: review?(jdemooij) → review+

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 11

•

9 years ago

Attached patch test DeflateStringToUTF8Buffer (obsolete) — Details — Splinter Review

(In reply to Jan de Mooij [:jandem] from comment #8) > Nit: add a MOZ_RELEASE_ASSERT(str); here and after the JS_New* calls below. > > (We don't have to handle OOM gracefully in tests but crashing immediately > helps if this ever fails.) Indeed. I added that to this patch. > VS 2013 comment applies here too, you can use MOZ_UTF16("foo") instead. > Hopefully we'll drop support for VS 2013 soon. Right. Fixed in this patch.

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 12

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=bcfacefb3a6a

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 13

•

9 years ago

It looks like one (or more) of the tests is failing on Windows in the try run. I'll build and debug there.

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 14

•

9 years ago

Attached patch test DeflateStringToUTF8Buffer — Details — Splinter Review

(In reply to Myk Melez [:myk] [@mykmelez] from comment #13) > It looks like one (or more) of the tests is failing on Windows in the try > run. I'll build and debug there. The problem on Windows is that the encoding of wide string literals (L"") is undefined in the C++ specification for characters outside the ASCII range, so compilers are free to choose an encoding, and MSVC is choosing a different one. This hasn't been an issue in Mozilla code before because we've only used MOZ_UTF16 with ASCII string literals to date. The fix (until we require MSVC 2015 and get rid of MOZ_UTF16, anyway) is to specify the characters via hexadecimal escape sequences, i.e.: >- str = JS_NewUCStringCopyZ(cx, MOZ_UTF16("Óhãï")); // { 0xD3, 0x68, 0xE3, 0xEF } >+ str = JS_NewUCStringCopyZ(cx, MOZ_UTF16("\xD3\x68\xE3\xEF")); // u"Óhãï" > … >- str = JS_NewUCStringCopyZ(cx, MOZ_UTF16("Όhȃї")); // { 0x038C, 0x0068, 0x0203, 0x0457 } >+ str = JS_NewUCStringCopyZ(cx, MOZ_UTF16("\x038C\x0068\x0203\x0457")); // u"Όhȃї" Here's an updated patch with that change. https://treeherder.mozilla.org/#/jobs?repo=try&revision=73185ae37418

Attachment #8751516 - Attachment is obsolete: true

Attachment #8751827 - Attachment is obsolete: true

Attachment #8752398 - Flags: review?(jdemooij)

Jan de Mooij [:jandem]

Updated

•

9 years ago

Attachment #8752398 - Flags: review?(jdemooij) → review+

Pulsebot

Comment 15

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/6a5fe7a3af40 https://hg.mozilla.org/integration/mozilla-inbound/rev/ea6e8a3c5880

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 16

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/6a5fe7a3af40 https://hg.mozilla.org/mozilla-central/rev/ea6e8a3c5880

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

status-firefox49: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla49

You need to log in before you can comment on or make changes to this bug.