Closed
Bug 1271014
Opened 9 years ago
Closed 9 years ago
make DeflateStringToUTF8Buffer return partial result if dest buffer runs out of space
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla49
Tracking | Status | |
---|---|---|
firefox49 | --- | fixed |
People
(Reporter: myk, Assigned: myk)
Details
(Whiteboard: [v8api])
Attachments
(2 files, 3 obsolete files)
5.57 KB,
patch
|
jandem
:
review+
|
Details | Diff | Splinter Review |
11.65 KB,
patch
|
jandem
:
review+
|
Details | Diff | Splinter Review |
SpiderShim currently has a custom implementation of DeflateStringToUTF8Buffer [1], since neither of the existing ones (the public one in CharacterEncoding and a private one in CTypes) does quite what it wants, which is to return a partial result if the destination buffer runs out of space (CharacterEncoding requires the buffer to have enough space [2]) and continue on a bad surrogate (CTypes aborts on a bad surrogate [3]).
In order to support SpiderShim, the public DeflateStringToUTF8Buffer in CharacterEncoding should return a partial result if the destination buffer runs out of space.
[1] https://github.com/mozilla/spidernode/blob/master/deps/spidershim/src/v8string.cc#L455-L538
[2] https://dxr.mozilla.org/mozilla-central/rev/e5a10b/js/public/CharacterEncoding.h#208-213
[3] https://dxr.mozilla.org/mozilla-central/rev/e5a10b/js/src/ctypes/CTypes.cpp#172-177
Assignee | ||
Comment 2•9 years ago
|
||
This patch refactors the SpiderShim implementation to be a compatible replacement for the existing function in CharacterEncoding.
It adds two optional parameters, *dstlenp and *numcharsp, which enable the function to write a partial result (up to *dstlenp bytes) and return the number of bytes and chars written (by updating *dstlenp and *numcharsp).
(The SpiderShim implementation includes a third parameter to configure the encoder to write an invalid surrogate to the buffer instead of replacing it with the UTF-8 replacement character, but Node doesn't use that feature of the V8 API, so I won't add it here.)
This passes the check-spidermonkey tests, although I don't think those exercise this API, at least not directly.
Attachment #8750502 -
Flags: review?(terrence)
Assignee | ||
Updated•9 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 3•9 years ago
|
||
Comment on attachment 8750502 [details] [diff] [review]
return partial result and num bytes/chars written
It looks like Terrence is on PTO, so I'm redirecting this review request to another reviewer who has touched this area of code in the past.
Attachment #8750502 -
Flags: review?(terrence) → review?(jdemooij)
Comment 4•9 years ago
|
||
I can review this patch tomorrow - I didn't take a close look but we should definitely add some jsapi-tests for all interesting (edge) cases here.
Assignee | ||
Comment 5•9 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #4)
> I can review this patch tomorrow - I didn't take a close look but we should
> definitely add some jsapi-tests for all interesting (edge) cases here.
I can do so, although there aren't currently tests for DeflateStringToUTF8Buffer, neither for edge cases nor for basic functionality.
Assignee | ||
Comment 6•9 years ago
|
||
Here's a jsapi-test for both the existing and the new functionality, including basic tests and tests of various edge cases.
Attachment #8751516 -
Flags: review?(jdemooij)
Comment 7•9 years ago
|
||
Comment on attachment 8750502 [details] [diff] [review]
return partial result and num bytes/chars written
Review of attachment 8750502 [details] [diff] [review]:
-----------------------------------------------------------------
Looks good, but a suggestion below to simplify the code a bit. If you can post an updated patch with that fixed I can r+ quickly.
::: js/src/vm/CharacterEncoding.cpp
@@ +73,5 @@
> ? ::GetDeflatedUTF8StringLength(s->latin1Chars(nogc), s->length())
> : ::GetDeflatedUTF8StringLength(s->twoByteChars(nogc), s->length());
> }
>
> +const char16_t UTF8_REPLACEMENT_CHAR = u'\uFFFD';
According to the following page, VS 2013 doesn't support this syntax yet: https://developer.mozilla.org/en-US/docs/Using_CXX_in_Mozilla_code
So to be safe, I'd just use:
static const char16_t UTF8_REPLACEMENT_CHAR = 0xFFFD;
@@ +98,5 @@
> if (srclen < 1) {
> + v = UTF8_REPLACEMENT_CHAR;
> + } else {
> + char16_t c2 = *src;
> + if ((c2 < 0xDC00) || (c2 > 0xDFFF)) {
Pre-existing, but please remove the parentheses around the < and > expressions while you're here.
(I like extra parentheses when there's potential confusion about the evaluation order, but (x < y || a > b) is common enough that it shouldn't confuse anyone :)
@@ +113,5 @@
> if (v < 0x0080) {
> /* no encoding necessary - performance hack */
> + if (dstlenp && dstlen == 0) {
> + goto finish;
> + }
Nit: no {} (because each of condition, else-body, then-body fit on a single line).
The control-flow and the subtraction at the end seem more complicated than necessary though. What if we do the following:
- At the start of this function:
if (dstlenp)
*dstlenp = 0;
Then at the end of the loop body:
if (dstlenp)
*dstlenp += utf8Len;
And to bounds check here:
if (dstlenp && *dstlenp + 1 < origDstlen)
return;
What do you think? That way we always maintain the correct state so we can just return when we're done.
Attachment #8750502 -
Flags: review?(jdemooij) → feedback+
Comment 8•9 years ago
|
||
Comment on attachment 8751516 [details] [diff] [review]
test DeflateStringToUTF8Buffer
Review of attachment 8751516 [details] [diff] [review]:
-----------------------------------------------------------------
Beautiful! Thank you for adding these tests.
::: js/src/jsapi-tests/testDeflateStringToUTF8Buffer.cpp
@@ +22,5 @@
> + // Test with an ASCII string, which calls JSFlatString::latin1Chars
> + // to retrieve the characters from the string and generates UTF-8 output
> + // that is identical to the ASCII input.
> +
> + str = JS_NewStringCopyZ(cx, "Ohai"); // { 0x4F, 0x68, 0x61, 0x69 }
Nit: add a MOZ_RELEASE_ASSERT(str); here and after the JS_New* calls below.
(We don't have to handle OOM gracefully in tests but crashing immediately helps if this ever fails.)
@@ +98,5 @@
> + // Test with a Latin-1 string, which calls JSFlatString::latin1Chars
> + // like with the ASCII string but generates UTF-8 output that is different
> + // from the ASCII input.
> +
> + str = JS_NewUCStringCopyZ(cx, u"Óhãï"); // { 0xD3, 0x68, 0xE3, 0xEF }
VS 2013 comment applies here too, you can use MOZ_UTF16("foo") instead. Hopefully we'll drop support for VS 2013 soon.
Attachment #8751516 -
Flags: review?(jdemooij) → review+
Assignee | ||
Comment 9•9 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #7)
> ::: js/src/vm/CharacterEncoding.cpp
> @@ +73,5 @@
> > ? ::GetDeflatedUTF8StringLength(s->latin1Chars(nogc), s->length())
> > : ::GetDeflatedUTF8StringLength(s->twoByteChars(nogc), s->length());
> > }
> >
> > +const char16_t UTF8_REPLACEMENT_CHAR = u'\uFFFD';
>
> According to the following page, VS 2013 doesn't support this syntax yet:
> https://developer.mozilla.org/en-US/docs/Using_CXX_in_Mozilla_code
Ah, thanks for the reference, I was unaware of that page!
> So to be safe, I'd just use:
>
> static const char16_t UTF8_REPLACEMENT_CHAR = 0xFFFD;
Sounds good, fixed in this patch.
> @@ +98,5 @@
> > if (srclen < 1) {
> > + v = UTF8_REPLACEMENT_CHAR;
> > + } else {
> > + char16_t c2 = *src;
> > + if ((c2 < 0xDC00) || (c2 > 0xDFFF)) {
>
> Pre-existing, but please remove the parentheses around the < and >
> expressions while you're here.
Done!
> (I like extra parentheses when there's potential confusion about the
> evaluation order, but (x < y || a > b) is common enough that it shouldn't
> confuse anyone :)
Indeed, I noticed that too but figured I'd resist the urge to make unrelated changes. Thanks for encouraging my inner pedant! ;-)
> @@ +113,5 @@
> > if (v < 0x0080) {
> > /* no encoding necessary - performance hack */
> > + if (dstlenp && dstlen == 0) {
> > + goto finish;
> > + }
>
> Nit: no {} (because each of condition, else-body, then-body fit on a single
> line).
Right, fixed.
> The control-flow and the subtraction at the end seem more complicated than
> necessary though. What if we do the following:
>
> - At the start of this function:
>
> if (dstlenp)
> *dstlenp = 0;
>
> Then at the end of the loop body:
>
> if (dstlenp)
> *dstlenp += utf8Len;
>
> And to bounds check here:
>
> if (dstlenp && *dstlenp + 1 < origDstlen)
> return;
Here presumably you mean:
> if (dstlenp && *dstlenp + 1 > origDstlen)
> return;
And then the check for the general case would be:
> if (dstlenp && *dstlenp + utf8Len > origDstlen)
> return;
(We could refactor a bit further, setting utf8Len first based on the value of |v| and then consolidating these conditional early returns. But I don't think it'd add clarity.)
> What do you think? That way we always maintain the correct state so we can
> just return when we're done.
I like it! I adapted the original implementation from the DeflateStringToUTF8Buffer function in CTypes.cpp <https://dxr.mozilla.org/mozilla-central/rev/3461f3/js/src/ctypes/CTypes.cpp#124-187>, but I prefer yours, as it maintains the loop invariant that *dstlenp is the number of bytes written to the destination buffer.
So I've adopted your implementation in this version of the patch. The only changes I made are the aforementioned correction to the conditionals, s/origDstlen/capacity/ (to make its purpose clearer), and consolidation of the code that initializes capacity and *dstlenp, which is now:
> size_t capacity = 0;
> if (dstlenp) {
> capacity = *dstlenp;
> *dstlenp = 0;
> }
Attachment #8750502 -
Attachment is obsolete: true
Attachment #8751817 -
Flags: review?(jdemooij)
Comment 10•9 years ago
|
||
Comment on attachment 8751817 [details] [diff] [review]
return partial result and num bytes/chars written
Review of attachment 8751817 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks, looks great.
Attachment #8751817 -
Flags: review?(jdemooij) → review+
Assignee | ||
Comment 11•9 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #8)
> Nit: add a MOZ_RELEASE_ASSERT(str); here and after the JS_New* calls below.
>
> (We don't have to handle OOM gracefully in tests but crashing immediately
> helps if this ever fails.)
Indeed. I added that to this patch.
> VS 2013 comment applies here too, you can use MOZ_UTF16("foo") instead.
> Hopefully we'll drop support for VS 2013 soon.
Right. Fixed in this patch.
Assignee | ||
Comment 12•9 years ago
|
||
Assignee | ||
Comment 13•9 years ago
|
||
It looks like one (or more) of the tests is failing on Windows in the try run. I'll build and debug there.
Assignee | ||
Comment 14•9 years ago
|
||
(In reply to Myk Melez [:myk] [@mykmelez] from comment #13)
> It looks like one (or more) of the tests is failing on Windows in the try
> run. I'll build and debug there.
The problem on Windows is that the encoding of wide string literals (L"") is undefined in the C++ specification for characters outside the ASCII range, so compilers are free to choose an encoding, and MSVC is choosing a different one.
This hasn't been an issue in Mozilla code before because we've only used MOZ_UTF16 with ASCII string literals to date.
The fix (until we require MSVC 2015 and get rid of MOZ_UTF16, anyway) is to specify the characters via hexadecimal escape sequences, i.e.:
>- str = JS_NewUCStringCopyZ(cx, MOZ_UTF16("Óhãï")); // { 0xD3, 0x68, 0xE3, 0xEF }
>+ str = JS_NewUCStringCopyZ(cx, MOZ_UTF16("\xD3\x68\xE3\xEF")); // u"Óhãï"
> …
>- str = JS_NewUCStringCopyZ(cx, MOZ_UTF16("Όhȃї")); // { 0x038C, 0x0068, 0x0203, 0x0457 }
>+ str = JS_NewUCStringCopyZ(cx, MOZ_UTF16("\x038C\x0068\x0203\x0457")); // u"Όhȃї"
Here's an updated patch with that change.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=73185ae37418
Attachment #8751516 -
Attachment is obsolete: true
Attachment #8751827 -
Attachment is obsolete: true
Attachment #8752398 -
Flags: review?(jdemooij)
Updated•9 years ago
|
Attachment #8752398 -
Flags: review?(jdemooij) → review+
Comment 15•9 years ago
|
||
Comment 16•9 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/6a5fe7a3af40
https://hg.mozilla.org/mozilla-central/rev/ea6e8a3c5880
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
status-firefox49:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
You need to log in
before you can comment on or make changes to this bug.
Description
•