Closed Bug 1025875 Opened 10 years ago Closed 10 years ago

Make more code work with Latin1 strings

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla33

People

(Reporter: jandem, Assigned: jandem)

References

Details

Attachments

(10 files, 2 obsolete files)

7.13 KB, patch
n.nethercote
: review+
Details | Diff | Splinter Review
1.70 KB, patch
luke
: review+
Details | Diff | Splinter Review
3.90 KB, patch
n.nethercote
: review+
Details | Diff | Splinter Review
3.20 KB, patch
bhackett1024
: review+
Details | Diff | Splinter Review
8.96 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
1.07 KB, patch
luke
: review+
Details | Diff | Splinter Review
1.75 KB, patch
n.nethercote
: review+
Details | Diff | Splinter Review
10.16 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
12.96 KB, patch
n.nethercote
: review+
Details | Diff | Splinter Review
8.86 KB, patch
luke
: review+
Details | Diff | Splinter Review
Will post a number of patches I wrote last week for starting the shell and sunspider without asserting.
Attached patch Part 1 - Frontend (obsolete) — Splinter Review
Although the parser and TokenStream will still work on jschar, there are some functions like TokenStream::checkForKeyword and IsIdentifier that take JSStrings/JSAtoms that can have Latin1 chars.
Attachment #8440670 - Flags: review?(n.nethercote)
Attachment #8440670 - Flags: review?(n.nethercote)
There was a problem with the previous patch: reportError etc can GC, so the AutoCheckCannotGC would fail. This patch avoids that problem.
Attachment #8440670 - Attachment is obsolete: true
Attachment #8440678 - Flags: review?(n.nethercote)
Very small patch. This is the separator-is-string case; it can also be a regex but I'll deal with that once the regex patches are in.
Attachment #8440687 - Flags: review?(luke)
Attachment #8440693 - Flags: review?(n.nethercote)
Attachment #8440698 - Flags: review?(bhackett1024)
Attachment #8440687 - Flags: review?(luke) → review+
Attachment #8440678 - Flags: review?(n.nethercote) → review+
Make the eval("JSON") optimization work.

I restructured the code a bit so that we only use AutoStableStringChars if the string looks like JSON. We also skip scanning for 0x2028/0x2029 if the string is Latin1.
Attachment #8440711 - Flags: review?(jwalden+bmo)
Comment on attachment 8440693 [details] [diff] [review]
Part 3 - CompareStrings

Review of attachment 8440693 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsstr.h
@@ -53,2 @@
>  {
> -    size_t n = Min(l1, l2);

|l1| and |l2| are terrible identifiers. Thanks for changing them :)
Attachment #8440693 - Flags: review?(n.nethercote) → review+
I forgot to convert a line in JSRope::flattenInternal where it assumed TwoByte strings. Not adding a testcase as the sunspider jit-tests hit this.
Attachment #8440749 - Flags: review?(luke)
Attachment #8440805 - Flags: review?(n.nethercote)
Attachment #8440851 - Flags: review?(jwalden+bmo)
Attachment #8440851 - Attachment is obsolete: true
Attachment #8440851 - Flags: review?(jwalden+bmo)
Attachment #8440855 - Flags: review?(jwalden+bmo)
Attached patch Part 9 - evalSplinter Review
Attachment #8440906 - Flags: review?(n.nethercote)
Attached patch Part 10 - joinSplinter Review
The patch specializes CharSeparatorOp for Latin1Char, to avoid the branch in append(jschar) in the common case.

StringSeparatorOp stores a HandleLinearString, as GetElement can GC. I think this is fine; most join calls use a single char as separator. We can optimize this later if needed.
Attachment #8440936 - Flags: review?(luke)
Comment on attachment 8440711 [details] [diff] [review]
Part 5 - Eval JSON optimization

Review of attachment 8440711 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Eval.cpp
@@ +146,5 @@
>  };
>  
> +template <typename CharT>
> +static bool
> +EvalStringIsJSON(Range<const CharT> chars)

Hurrah for range-checking assertions in all this code!  \o/

EvalString*Is*JSON is an overly assertive name, if I read this right.  EvalStringMightBeJSON seems better, right?

@@ +164,5 @@
>          // Rather than force the JSON parser to handle this quirk when used by
>          // eval, we simply don't use the JSON parser when either character
>          // appears in the provided string.  See bug 657367.
> +        if (sizeof(CharT) > 1) {
> +            for (const CharT *cp = &chars[1], *end = &chars[length - 1]; cp < end; cp++) {

for (RangedPtr<const Chart> cp = chars.start() + 1, end = chars.end() - 1;
     cp < end;
     cp++)

I do sort of like the &chars[#] style of code, but that'd require operator[] return something other than CharT, that implicitly converts to CharT.  Scope-creepy, and not sure about style sense of it, and then you'd have to overload operator& which is itself dubious, so the above seems better to me.  If you think otherwise, feel free to say something and we can tweak operator[] for this case.

::: js/src/jit-test/tests/latin1/json.js
@@ +41,5 @@
> +    // TwoByte
> +    arr = eval("[1, 2, 3, \"abc\u1200\"]");
> +    assertEq(JSON.stringify(arr), '[1,2,3,"abc\u1200"]');
> +}
> +testEvalHack();

Probably worth something like

function testEvalHackNotJSON() {
  // Latin1
  var arr = eval(toLatin1("[]; var q; [1, 2, 3, \"abc\"]"));
  assertEq(JSON.stringify(arr), '[1,2,3,"abc"]');

  // TwoByte
  arr = eval("[]; var z; [1, 2, 3, \"abc\u1200\"]");
  assertEq(JSON.stringify(arr), '[1,2,3,"abc\u1200"]');

  try {
    eval("[1, 2, 3, \"abc\u2028\"]");
    throw new Error("U+2028 shouldn't eval");
  } catch (e) {
    assertEq(e instanceof SyntaxError, true,
             "should have thrown a SyntaxError, instead got " + e);
  }
}
testEvalHackNotJSON();

too.
Attachment #8440711 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8440855 [details] [diff] [review]
Part 8 - JS_EncodeStringToUTF8, print() etc

Review of attachment 8440855 [details] [diff] [review]:
-----------------------------------------------------------------

Oh, before I forget: it's probably worth making const Range the parameter type, unless we expect to overwrite the pointer/length in the range.  (Which seems a very uncommon thing, perhaps not something we should ever do, even if it might seemingly make sense to do it.)  I noticed one or two instances of this in the last patch, which would need changing for this.

::: js/src/shell/js.cpp
@@ -320,5 @@
>      return nullptr;
>  }
>  
> -static char *
> -JSStringToUTF8(JSContext *cx, JSString *str)

Woo, duplicate method die die die.

::: js/src/vm/CharacterEncoding.cpp
@@ +35,2 @@
>          if (c < 0x80)
>              continue;

Probably worth surrounding the whole loop in an |if (sizeof(CharT) > 1)| with an else that does the simple thing.

@@ +78,2 @@
>  static bool
> +DeflateStringToUTF8Buffer(js::ThreadSafeContext *cx, const CharT *src, size_t srclen,

Followup to take src/srclen as a Range?

@@ +153,5 @@
>      if (!utf8)
>          return UTF8CharsZ();
>  
>      /* Encode to UTF8. */
> +    DeflateStringToUTF8Buffer(cx, str, chars.length(), (char *)utf8, &len);

I don't think this (char*) is needed.  And if it is, followup to remove it.
Attachment #8440855 - Flags: review?(jwalden+bmo) → review+
Attachment #8440698 - Flags: review?(bhackett1024) → review+
Attachment #8440805 - Flags: review?(n.nethercote) → review+
Comment on attachment 8440906 [details] [diff] [review]
Part 9 - eval

Review of attachment 8440906 [details] [diff] [review]:
-----------------------------------------------------------------

r=me once you fix up maybeTakeOwnership(), see below.

::: js/src/builtin/Eval.cpp
@@ +337,5 @@
> +
> +        const jschar *chars = flatChars.twoByteRange().start().get();
> +        SourceBufferHolder::Ownership ownership = flatChars.maybeTakeOwnership()
> +                                                  ? SourceBufferHolder::GiveOwnership
> +                                                  : SourceBufferHolder::NoOwnership;

maybeTakeOwnership() returns a bool, but you're assigning to a variable with type |Ownership|.

Also, I'm having trouble with the name of this function. |flatChars.maybeTakeOwnership()| makes it sound like |flatChars| is taking ownership, but I think it's actually giving up ownership? I find this part confusing.

@@ +410,5 @@
> +
> +        const jschar *chars = flatChars.twoByteRange().start().get();
> +        SourceBufferHolder::Ownership ownership = flatChars.maybeTakeOwnership()
> +                                                  ? SourceBufferHolder::GiveOwnership
> +                                                  : SourceBufferHolder::NoOwnership;

Ditto.
Attachment #8440906 - Flags: review?(n.nethercote) → review+
Parts 4, 5, 7, 8:

https://hg.mozilla.org/integration/mozilla-inbound/rev/752ad0d02cb3
https://hg.mozilla.org/integration/mozilla-inbound/rev/b4932e874227
https://hg.mozilla.org/integration/mozilla-inbound/rev/56e4788545db
https://hg.mozilla.org/integration/mozilla-inbound/rev/212f914cb83a

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #14)
> EvalString*Is*JSON is an overly assertive name, if I read this right. 
> EvalStringMightBeJSON seems better, right?

Agreed, done.

> for (RangedPtr<const Chart> cp = chars.start() + 1, end = chars.end() - 1;
>      cp < end;
>      cp++)
> 
> I do sort of like the &chars[#] style of code, but that'd require operator[]
> return something other than CharT, that implicitly converts to CharT. 
> Scope-creepy, and not sure about style sense of it, and then you'd have to
> overload operator& which is itself dubious, so the above seems better to me.
> If you think otherwise, feel free to say something and we can tweak
> operator[] for this case.

OK I went with the RangedPtr, it seems reasonable.

> Probably worth something like
> 
> function testEvalHackNotJSON() {

Thanks for the test, I added it to part 9. Can't add it to this patch because eval will assert on non-Latin1, non-JSON strings :)

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #15)
> Oh, before I forget: it's probably worth making const Range the parameter
> type, unless we expect to overwrite the pointer/length in the range.  (Which
> seems a very uncommon thing, perhaps not something we should ever do, even
> if it might seemingly make sense to do it.)  I noticed one or two instances
> of this in the last patch, which would need changing for this.

I fixed this in EvalStringMightBeJSON, then after pushing I realized ParseEvalStringAsJSON has the same issue. Will change that one too when I push the other patches.

> Probably worth surrounding the whole loop in an |if (sizeof(CharT) > 1)|
> with an else that does the simple thing.

The code looks like this:

    jschar c = *chars;
    if (c < 0x80)
        continue;
    if (0xD800 <= c && c <= 0xDFFF) {

If *chars is a Latin1Char, the compiler should be smart enough to eliminate the second "if". I just tested this with Clang and it's even vectorizing this loop with SIMD instructions somehow :)

> Followup to take src/srclen as a Range?

Filed bug 1026490; there's more we can do to clean this up.

> I don't think this (char*) is needed.  And if it is, followup to remove it.

Done.
Attachment #8440749 - Flags: review?(luke) → review+
Comment on attachment 8440936 [details] [diff] [review]
Part 10 - join

Nice
Attachment #8440936 - Flags: review?(luke) → review+
Parts 6, 9, 10:

https://hg.mozilla.org/integration/mozilla-inbound/rev/82ae65ae640f
https://hg.mozilla.org/integration/mozilla-inbound/rev/793e644f5d9b
https://hg.mozilla.org/integration/mozilla-inbound/rev/56b033bd8c47

(In reply to Nicholas Nethercote [:njn] from comment #16)
> > +        SourceBufferHolder::Ownership ownership = flatChars.maybeTakeOwnership()
> > +                                                  ? SourceBufferHolder::GiveOwnership
> > +                                                  : SourceBufferHolder::NoOwnership;
> 
> maybeTakeOwnership() returns a bool, but you're assigning to a variable with
> type |Ownership|.

No the bool is used in a ternary that returns an |Ownership|.

> Also, I'm having trouble with the name of this function.
> |flatChars.maybeTakeOwnership()| makes it sound like |flatChars| is taking
> ownership, but I think it's actually giving up ownership? I find this part
> confusing.

I changed it to maybeGiveOwnershipToCaller. Let me know if you think it's still confusing.
Keywords: leave-open
And thanks for all the quick reviews!
> I changed it to maybeGiveOwnershipToCaller. Let me know if you think it's
> still confusing.

Much better, thanks!
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: