If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Make more code work with Latin1 strings

RESOLVED FIXED in mozilla33

Status

()

Core
JavaScript Engine
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: jandem, Assigned: jandem)

Tracking

unspecified
mozilla33
Points:
---
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(10 attachments, 2 obsolete attachments)

7.13 KB, patch
njn
: review+
Details | Diff | Splinter Review
1.70 KB, patch
luke
: review+
Details | Diff | Splinter Review
3.90 KB, patch
njn
: review+
Details | Diff | Splinter Review
3.20 KB, patch
bhackett
: review+
Details | Diff | Splinter Review
8.96 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
1.07 KB, patch
luke
: review+
Details | Diff | Splinter Review
1.75 KB, patch
njn
: review+
Details | Diff | Splinter Review
10.16 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
12.96 KB, patch
njn
: review+
Details | Diff | Splinter Review
8.86 KB, patch
luke
: review+
Details | Diff | Splinter Review
Will post a number of patches I wrote last week for starting the shell and sunspider without asserting.
Created attachment 8440670 [details] [diff] [review]
Part 1 - Frontend

Although the parser and TokenStream will still work on jschar, there are some functions like TokenStream::checkForKeyword and IsIdentifier that take JSStrings/JSAtoms that can have Latin1 chars.
Attachment #8440670 - Flags: review?(n.nethercote)
(Assignee)

Updated

3 years ago
Attachment #8440670 - Flags: review?(n.nethercote)
Created attachment 8440678 [details] [diff] [review]
Part 1 - Frontend

There was a problem with the previous patch: reportError etc can GC, so the AutoCheckCannotGC would fail. This patch avoids that problem.
Attachment #8440670 - Attachment is obsolete: true
Attachment #8440678 - Flags: review?(n.nethercote)
Created attachment 8440687 [details] [diff] [review]
Part 2 - string.split

Very small patch. This is the separator-is-string case; it can also be a regex but I'll deal with that once the regex patches are in.
Attachment #8440687 - Flags: review?(luke)
Created attachment 8440693 [details] [diff] [review]
Part 3 - CompareStrings
Attachment #8440693 - Flags: review?(n.nethercote)
Created attachment 8440698 [details] [diff] [review]
Part 4 - StringEqualsAscii and EqualStringsHelper
Attachment #8440698 - Flags: review?(bhackett1024)

Updated

3 years ago
Attachment #8440687 - Flags: review?(luke) → review+
Attachment #8440678 - Flags: review?(n.nethercote) → review+
Created attachment 8440711 [details] [diff] [review]
Part 5 - Eval JSON optimization

Make the eval("JSON") optimization work.

I restructured the code a bit so that we only use AutoStableStringChars if the string looks like JSON. We also skip scanning for 0x2028/0x2029 if the string is Latin1.
Attachment #8440711 - Flags: review?(jwalden+bmo)
Comment on attachment 8440693 [details] [diff] [review]
Part 3 - CompareStrings

Review of attachment 8440693 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsstr.h
@@ -53,2 @@
>  {
> -    size_t n = Min(l1, l2);

|l1| and |l2| are terrible identifiers. Thanks for changing them :)
Attachment #8440693 - Flags: review?(n.nethercote) → review+
Created attachment 8440749 [details] [diff] [review]
Part 6 - flattenInternal fix

I forgot to convert a line in JSRope::flattenInternal where it assumed TwoByte strings. Not adding a testcase as the sunspider jit-tests hit this.
Attachment #8440749 - Flags: review?(luke)
Created attachment 8440805 [details] [diff] [review]
Part 7 - Sprinter::putString
(Assignee)

Updated

3 years ago
Attachment #8440805 - Flags: review?(n.nethercote)
Created attachment 8440851 [details] [diff] [review]
Part 8 - JS_EncodeStringToUTF8, print() etc
Attachment #8440851 - Flags: review?(jwalden+bmo)
Created attachment 8440855 [details] [diff] [review]
Part 8 - JS_EncodeStringToUTF8, print() etc
Attachment #8440851 - Attachment is obsolete: true
Attachment #8440851 - Flags: review?(jwalden+bmo)
Attachment #8440855 - Flags: review?(jwalden+bmo)
Created attachment 8440906 [details] [diff] [review]
Part 9 - eval
Attachment #8440906 - Flags: review?(n.nethercote)
Created attachment 8440936 [details] [diff] [review]
Part 10 - join

The patch specializes CharSeparatorOp for Latin1Char, to avoid the branch in append(jschar) in the common case.

StringSeparatorOp stores a HandleLinearString, as GetElement can GC. I think this is fine; most join calls use a single char as separator. We can optimize this later if needed.
Attachment #8440936 - Flags: review?(luke)
Comment on attachment 8440711 [details] [diff] [review]
Part 5 - Eval JSON optimization

Review of attachment 8440711 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Eval.cpp
@@ +146,5 @@
>  };
>  
> +template <typename CharT>
> +static bool
> +EvalStringIsJSON(Range<const CharT> chars)

Hurrah for range-checking assertions in all this code!  \o/

EvalString*Is*JSON is an overly assertive name, if I read this right.  EvalStringMightBeJSON seems better, right?

@@ +164,5 @@
>          // Rather than force the JSON parser to handle this quirk when used by
>          // eval, we simply don't use the JSON parser when either character
>          // appears in the provided string.  See bug 657367.
> +        if (sizeof(CharT) > 1) {
> +            for (const CharT *cp = &chars[1], *end = &chars[length - 1]; cp < end; cp++) {

for (RangedPtr<const Chart> cp = chars.start() + 1, end = chars.end() - 1;
     cp < end;
     cp++)

I do sort of like the &chars[#] style of code, but that'd require operator[] return something other than CharT, that implicitly converts to CharT.  Scope-creepy, and not sure about style sense of it, and then you'd have to overload operator& which is itself dubious, so the above seems better to me.  If you think otherwise, feel free to say something and we can tweak operator[] for this case.

::: js/src/jit-test/tests/latin1/json.js
@@ +41,5 @@
> +    // TwoByte
> +    arr = eval("[1, 2, 3, \"abc\u1200\"]");
> +    assertEq(JSON.stringify(arr), '[1,2,3,"abc\u1200"]');
> +}
> +testEvalHack();

Probably worth something like

function testEvalHackNotJSON() {
  // Latin1
  var arr = eval(toLatin1("[]; var q; [1, 2, 3, \"abc\"]"));
  assertEq(JSON.stringify(arr), '[1,2,3,"abc"]');

  // TwoByte
  arr = eval("[]; var z; [1, 2, 3, \"abc\u1200\"]");
  assertEq(JSON.stringify(arr), '[1,2,3,"abc\u1200"]');

  try {
    eval("[1, 2, 3, \"abc\u2028\"]");
    throw new Error("U+2028 shouldn't eval");
  } catch (e) {
    assertEq(e instanceof SyntaxError, true,
             "should have thrown a SyntaxError, instead got " + e);
  }
}
testEvalHackNotJSON();

too.
Attachment #8440711 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8440855 [details] [diff] [review]
Part 8 - JS_EncodeStringToUTF8, print() etc

Review of attachment 8440855 [details] [diff] [review]:
-----------------------------------------------------------------

Oh, before I forget: it's probably worth making const Range the parameter type, unless we expect to overwrite the pointer/length in the range.  (Which seems a very uncommon thing, perhaps not something we should ever do, even if it might seemingly make sense to do it.)  I noticed one or two instances of this in the last patch, which would need changing for this.

::: js/src/shell/js.cpp
@@ -320,5 @@
>      return nullptr;
>  }
>  
> -static char *
> -JSStringToUTF8(JSContext *cx, JSString *str)

Woo, duplicate method die die die.

::: js/src/vm/CharacterEncoding.cpp
@@ +35,2 @@
>          if (c < 0x80)
>              continue;

Probably worth surrounding the whole loop in an |if (sizeof(CharT) > 1)| with an else that does the simple thing.

@@ +78,2 @@
>  static bool
> +DeflateStringToUTF8Buffer(js::ThreadSafeContext *cx, const CharT *src, size_t srclen,

Followup to take src/srclen as a Range?

@@ +153,5 @@
>      if (!utf8)
>          return UTF8CharsZ();
>  
>      /* Encode to UTF8. */
> +    DeflateStringToUTF8Buffer(cx, str, chars.length(), (char *)utf8, &len);

I don't think this (char*) is needed.  And if it is, followup to remove it.
Attachment #8440855 - Flags: review?(jwalden+bmo) → review+
Attachment #8440698 - Flags: review?(bhackett1024) → review+
Attachment #8440805 - Flags: review?(n.nethercote) → review+
Comment on attachment 8440906 [details] [diff] [review]
Part 9 - eval

Review of attachment 8440906 [details] [diff] [review]:
-----------------------------------------------------------------

r=me once you fix up maybeTakeOwnership(), see below.

::: js/src/builtin/Eval.cpp
@@ +337,5 @@
> +
> +        const jschar *chars = flatChars.twoByteRange().start().get();
> +        SourceBufferHolder::Ownership ownership = flatChars.maybeTakeOwnership()
> +                                                  ? SourceBufferHolder::GiveOwnership
> +                                                  : SourceBufferHolder::NoOwnership;

maybeTakeOwnership() returns a bool, but you're assigning to a variable with type |Ownership|.

Also, I'm having trouble with the name of this function. |flatChars.maybeTakeOwnership()| makes it sound like |flatChars| is taking ownership, but I think it's actually giving up ownership? I find this part confusing.

@@ +410,5 @@
> +
> +        const jschar *chars = flatChars.twoByteRange().start().get();
> +        SourceBufferHolder::Ownership ownership = flatChars.maybeTakeOwnership()
> +                                                  ? SourceBufferHolder::GiveOwnership
> +                                                  : SourceBufferHolder::NoOwnership;

Ditto.
Attachment #8440906 - Flags: review?(n.nethercote) → review+
Parts 1-3:

https://hg.mozilla.org/integration/mozilla-inbound/rev/361aecdb5e5d
https://hg.mozilla.org/integration/mozilla-inbound/rev/10a5ed0d867f
https://hg.mozilla.org/integration/mozilla-inbound/rev/548ea28a6cd6
Keywords: leave-open
Parts 4, 5, 7, 8:

https://hg.mozilla.org/integration/mozilla-inbound/rev/752ad0d02cb3
https://hg.mozilla.org/integration/mozilla-inbound/rev/b4932e874227
https://hg.mozilla.org/integration/mozilla-inbound/rev/56e4788545db
https://hg.mozilla.org/integration/mozilla-inbound/rev/212f914cb83a

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #14)
> EvalString*Is*JSON is an overly assertive name, if I read this right. 
> EvalStringMightBeJSON seems better, right?

Agreed, done.

> for (RangedPtr<const Chart> cp = chars.start() + 1, end = chars.end() - 1;
>      cp < end;
>      cp++)
> 
> I do sort of like the &chars[#] style of code, but that'd require operator[]
> return something other than CharT, that implicitly converts to CharT. 
> Scope-creepy, and not sure about style sense of it, and then you'd have to
> overload operator& which is itself dubious, so the above seems better to me.
> If you think otherwise, feel free to say something and we can tweak
> operator[] for this case.

OK I went with the RangedPtr, it seems reasonable.

> Probably worth something like
> 
> function testEvalHackNotJSON() {

Thanks for the test, I added it to part 9. Can't add it to this patch because eval will assert on non-Latin1, non-JSON strings :)

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #15)
> Oh, before I forget: it's probably worth making const Range the parameter
> type, unless we expect to overwrite the pointer/length in the range.  (Which
> seems a very uncommon thing, perhaps not something we should ever do, even
> if it might seemingly make sense to do it.)  I noticed one or two instances
> of this in the last patch, which would need changing for this.

I fixed this in EvalStringMightBeJSON, then after pushing I realized ParseEvalStringAsJSON has the same issue. Will change that one too when I push the other patches.

> Probably worth surrounding the whole loop in an |if (sizeof(CharT) > 1)|
> with an else that does the simple thing.

The code looks like this:

    jschar c = *chars;
    if (c < 0x80)
        continue;
    if (0xD800 <= c && c <= 0xDFFF) {

If *chars is a Latin1Char, the compiler should be smart enough to eliminate the second "if". I just tested this with Clang and it's even vectorizing this loop with SIMD instructions somehow :)

> Followup to take src/srclen as a Range?

Filed bug 1026490; there's more we can do to clean this up.

> I don't think this (char*) is needed.  And if it is, followup to remove it.

Done.

Updated

3 years ago
Attachment #8440749 - Flags: review?(luke) → review+

Comment 19

3 years ago
Comment on attachment 8440936 [details] [diff] [review]
Part 10 - join

Nice
Attachment #8440936 - Flags: review?(luke) → review+
Parts 6, 9, 10:

https://hg.mozilla.org/integration/mozilla-inbound/rev/82ae65ae640f
https://hg.mozilla.org/integration/mozilla-inbound/rev/793e644f5d9b
https://hg.mozilla.org/integration/mozilla-inbound/rev/56b033bd8c47

(In reply to Nicholas Nethercote [:njn] from comment #16)
> > +        SourceBufferHolder::Ownership ownership = flatChars.maybeTakeOwnership()
> > +                                                  ? SourceBufferHolder::GiveOwnership
> > +                                                  : SourceBufferHolder::NoOwnership;
> 
> maybeTakeOwnership() returns a bool, but you're assigning to a variable with
> type |Ownership|.

No the bool is used in a ternary that returns an |Ownership|.

> Also, I'm having trouble with the name of this function.
> |flatChars.maybeTakeOwnership()| makes it sound like |flatChars| is taking
> ownership, but I think it's actually giving up ownership? I find this part
> confusing.

I changed it to maybeGiveOwnershipToCaller. Let me know if you think it's still confusing.
Keywords: leave-open
And thanks for all the quick reviews!
https://hg.mozilla.org/mozilla-central/rev/361aecdb5e5d
https://hg.mozilla.org/mozilla-central/rev/10a5ed0d867f
https://hg.mozilla.org/mozilla-central/rev/548ea28a6cd6
https://hg.mozilla.org/mozilla-central/rev/752ad0d02cb3
https://hg.mozilla.org/mozilla-central/rev/b4932e874227
https://hg.mozilla.org/mozilla-central/rev/56e4788545db
https://hg.mozilla.org/mozilla-central/rev/212f914cb83a
https://hg.mozilla.org/mozilla-central/rev/82ae65ae640f
https://hg.mozilla.org/mozilla-central/rev/793e644f5d9b
https://hg.mozilla.org/mozilla-central/rev/56b033bd8c47
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
> I changed it to maybeGiveOwnershipToCaller. Let me know if you think it's
> still confusing.

Much better, thanks!
You need to log in before you can comment on or make changes to this bug.