Closed
Bug 1025875
Opened 10 years ago
Closed 10 years ago
Make more code work with Latin1 strings
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla33
People
(Reporter: jandem, Assigned: jandem)
References
Details
Attachments
(10 files, 2 obsolete files)
7.13 KB,
patch
|
n.nethercote
:
review+
|
Details | Diff | Splinter Review |
1.70 KB,
patch
|
luke
:
review+
|
Details | Diff | Splinter Review |
3.90 KB,
patch
|
n.nethercote
:
review+
|
Details | Diff | Splinter Review |
3.20 KB,
patch
|
bhackett1024
:
review+
|
Details | Diff | Splinter Review |
8.96 KB,
patch
|
Waldo
:
review+
|
Details | Diff | Splinter Review |
1.07 KB,
patch
|
luke
:
review+
|
Details | Diff | Splinter Review |
1.75 KB,
patch
|
n.nethercote
:
review+
|
Details | Diff | Splinter Review |
10.16 KB,
patch
|
Waldo
:
review+
|
Details | Diff | Splinter Review |
12.96 KB,
patch
|
n.nethercote
:
review+
|
Details | Diff | Splinter Review |
8.86 KB,
patch
|
luke
:
review+
|
Details | Diff | Splinter Review |
Will post a number of patches I wrote last week for starting the shell and sunspider without asserting.
Assignee | ||
Comment 1•10 years ago
|
||
Although the parser and TokenStream will still work on jschar, there are some functions like TokenStream::checkForKeyword and IsIdentifier that take JSStrings/JSAtoms that can have Latin1 chars.
Attachment #8440670 -
Flags: review?(n.nethercote)
Assignee | ||
Updated•10 years ago
|
Attachment #8440670 -
Flags: review?(n.nethercote)
Assignee | ||
Comment 2•10 years ago
|
||
There was a problem with the previous patch: reportError etc can GC, so the AutoCheckCannotGC would fail. This patch avoids that problem.
Attachment #8440670 -
Attachment is obsolete: true
Attachment #8440678 -
Flags: review?(n.nethercote)
Assignee | ||
Comment 3•10 years ago
|
||
Very small patch. This is the separator-is-string case; it can also be a regex but I'll deal with that once the regex patches are in.
Attachment #8440687 -
Flags: review?(luke)
Assignee | ||
Comment 4•10 years ago
|
||
Attachment #8440693 -
Flags: review?(n.nethercote)
Assignee | ||
Comment 5•10 years ago
|
||
Attachment #8440698 -
Flags: review?(bhackett1024)
Updated•10 years ago
|
Attachment #8440687 -
Flags: review?(luke) → review+
Updated•10 years ago
|
Attachment #8440678 -
Flags: review?(n.nethercote) → review+
Assignee | ||
Comment 6•10 years ago
|
||
Make the eval("JSON") optimization work. I restructured the code a bit so that we only use AutoStableStringChars if the string looks like JSON. We also skip scanning for 0x2028/0x2029 if the string is Latin1.
Attachment #8440711 -
Flags: review?(jwalden+bmo)
Comment 7•10 years ago
|
||
Comment on attachment 8440693 [details] [diff] [review] Part 3 - CompareStrings Review of attachment 8440693 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jsstr.h @@ -53,2 @@ > { > - size_t n = Min(l1, l2); |l1| and |l2| are terrible identifiers. Thanks for changing them :)
Attachment #8440693 -
Flags: review?(n.nethercote) → review+
Assignee | ||
Comment 8•10 years ago
|
||
I forgot to convert a line in JSRope::flattenInternal where it assumed TwoByte strings. Not adding a testcase as the sunspider jit-tests hit this.
Attachment #8440749 -
Flags: review?(luke)
Assignee | ||
Comment 9•10 years ago
|
||
Assignee | ||
Updated•10 years ago
|
Attachment #8440805 -
Flags: review?(n.nethercote)
Assignee | ||
Comment 10•10 years ago
|
||
Attachment #8440851 -
Flags: review?(jwalden+bmo)
Assignee | ||
Comment 11•10 years ago
|
||
Attachment #8440851 -
Attachment is obsolete: true
Attachment #8440851 -
Flags: review?(jwalden+bmo)
Attachment #8440855 -
Flags: review?(jwalden+bmo)
Assignee | ||
Comment 12•10 years ago
|
||
Attachment #8440906 -
Flags: review?(n.nethercote)
Assignee | ||
Comment 13•10 years ago
|
||
The patch specializes CharSeparatorOp for Latin1Char, to avoid the branch in append(jschar) in the common case. StringSeparatorOp stores a HandleLinearString, as GetElement can GC. I think this is fine; most join calls use a single char as separator. We can optimize this later if needed.
Attachment #8440936 -
Flags: review?(luke)
Comment 14•10 years ago
|
||
Comment on attachment 8440711 [details] [diff] [review] Part 5 - Eval JSON optimization Review of attachment 8440711 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/builtin/Eval.cpp @@ +146,5 @@ > }; > > +template <typename CharT> > +static bool > +EvalStringIsJSON(Range<const CharT> chars) Hurrah for range-checking assertions in all this code! \o/ EvalString*Is*JSON is an overly assertive name, if I read this right. EvalStringMightBeJSON seems better, right? @@ +164,5 @@ > // Rather than force the JSON parser to handle this quirk when used by > // eval, we simply don't use the JSON parser when either character > // appears in the provided string. See bug 657367. > + if (sizeof(CharT) > 1) { > + for (const CharT *cp = &chars[1], *end = &chars[length - 1]; cp < end; cp++) { for (RangedPtr<const Chart> cp = chars.start() + 1, end = chars.end() - 1; cp < end; cp++) I do sort of like the &chars[#] style of code, but that'd require operator[] return something other than CharT, that implicitly converts to CharT. Scope-creepy, and not sure about style sense of it, and then you'd have to overload operator& which is itself dubious, so the above seems better to me. If you think otherwise, feel free to say something and we can tweak operator[] for this case. ::: js/src/jit-test/tests/latin1/json.js @@ +41,5 @@ > + // TwoByte > + arr = eval("[1, 2, 3, \"abc\u1200\"]"); > + assertEq(JSON.stringify(arr), '[1,2,3,"abc\u1200"]'); > +} > +testEvalHack(); Probably worth something like function testEvalHackNotJSON() { // Latin1 var arr = eval(toLatin1("[]; var q; [1, 2, 3, \"abc\"]")); assertEq(JSON.stringify(arr), '[1,2,3,"abc"]'); // TwoByte arr = eval("[]; var z; [1, 2, 3, \"abc\u1200\"]"); assertEq(JSON.stringify(arr), '[1,2,3,"abc\u1200"]'); try { eval("[1, 2, 3, \"abc\u2028\"]"); throw new Error("U+2028 shouldn't eval"); } catch (e) { assertEq(e instanceof SyntaxError, true, "should have thrown a SyntaxError, instead got " + e); } } testEvalHackNotJSON(); too.
Attachment #8440711 -
Flags: review?(jwalden+bmo) → review+
Comment 15•10 years ago
|
||
Comment on attachment 8440855 [details] [diff] [review] Part 8 - JS_EncodeStringToUTF8, print() etc Review of attachment 8440855 [details] [diff] [review]: ----------------------------------------------------------------- Oh, before I forget: it's probably worth making const Range the parameter type, unless we expect to overwrite the pointer/length in the range. (Which seems a very uncommon thing, perhaps not something we should ever do, even if it might seemingly make sense to do it.) I noticed one or two instances of this in the last patch, which would need changing for this. ::: js/src/shell/js.cpp @@ -320,5 @@ > return nullptr; > } > > -static char * > -JSStringToUTF8(JSContext *cx, JSString *str) Woo, duplicate method die die die. ::: js/src/vm/CharacterEncoding.cpp @@ +35,2 @@ > if (c < 0x80) > continue; Probably worth surrounding the whole loop in an |if (sizeof(CharT) > 1)| with an else that does the simple thing. @@ +78,2 @@ > static bool > +DeflateStringToUTF8Buffer(js::ThreadSafeContext *cx, const CharT *src, size_t srclen, Followup to take src/srclen as a Range? @@ +153,5 @@ > if (!utf8) > return UTF8CharsZ(); > > /* Encode to UTF8. */ > + DeflateStringToUTF8Buffer(cx, str, chars.length(), (char *)utf8, &len); I don't think this (char*) is needed. And if it is, followup to remove it.
Attachment #8440855 -
Flags: review?(jwalden+bmo) → review+
Updated•10 years ago
|
Attachment #8440698 -
Flags: review?(bhackett1024) → review+
Updated•10 years ago
|
Attachment #8440805 -
Flags: review?(n.nethercote) → review+
Comment 16•10 years ago
|
||
Comment on attachment 8440906 [details] [diff] [review] Part 9 - eval Review of attachment 8440906 [details] [diff] [review]: ----------------------------------------------------------------- r=me once you fix up maybeTakeOwnership(), see below. ::: js/src/builtin/Eval.cpp @@ +337,5 @@ > + > + const jschar *chars = flatChars.twoByteRange().start().get(); > + SourceBufferHolder::Ownership ownership = flatChars.maybeTakeOwnership() > + ? SourceBufferHolder::GiveOwnership > + : SourceBufferHolder::NoOwnership; maybeTakeOwnership() returns a bool, but you're assigning to a variable with type |Ownership|. Also, I'm having trouble with the name of this function. |flatChars.maybeTakeOwnership()| makes it sound like |flatChars| is taking ownership, but I think it's actually giving up ownership? I find this part confusing. @@ +410,5 @@ > + > + const jschar *chars = flatChars.twoByteRange().start().get(); > + SourceBufferHolder::Ownership ownership = flatChars.maybeTakeOwnership() > + ? SourceBufferHolder::GiveOwnership > + : SourceBufferHolder::NoOwnership; Ditto.
Attachment #8440906 -
Flags: review?(n.nethercote) → review+
Assignee | ||
Comment 17•10 years ago
|
||
Parts 1-3: https://hg.mozilla.org/integration/mozilla-inbound/rev/361aecdb5e5d https://hg.mozilla.org/integration/mozilla-inbound/rev/10a5ed0d867f https://hg.mozilla.org/integration/mozilla-inbound/rev/548ea28a6cd6
Keywords: leave-open
Assignee | ||
Comment 18•10 years ago
|
||
Parts 4, 5, 7, 8: https://hg.mozilla.org/integration/mozilla-inbound/rev/752ad0d02cb3 https://hg.mozilla.org/integration/mozilla-inbound/rev/b4932e874227 https://hg.mozilla.org/integration/mozilla-inbound/rev/56e4788545db https://hg.mozilla.org/integration/mozilla-inbound/rev/212f914cb83a (In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #14) > EvalString*Is*JSON is an overly assertive name, if I read this right. > EvalStringMightBeJSON seems better, right? Agreed, done. > for (RangedPtr<const Chart> cp = chars.start() + 1, end = chars.end() - 1; > cp < end; > cp++) > > I do sort of like the &chars[#] style of code, but that'd require operator[] > return something other than CharT, that implicitly converts to CharT. > Scope-creepy, and not sure about style sense of it, and then you'd have to > overload operator& which is itself dubious, so the above seems better to me. > If you think otherwise, feel free to say something and we can tweak > operator[] for this case. OK I went with the RangedPtr, it seems reasonable. > Probably worth something like > > function testEvalHackNotJSON() { Thanks for the test, I added it to part 9. Can't add it to this patch because eval will assert on non-Latin1, non-JSON strings :) (In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #15) > Oh, before I forget: it's probably worth making const Range the parameter > type, unless we expect to overwrite the pointer/length in the range. (Which > seems a very uncommon thing, perhaps not something we should ever do, even > if it might seemingly make sense to do it.) I noticed one or two instances > of this in the last patch, which would need changing for this. I fixed this in EvalStringMightBeJSON, then after pushing I realized ParseEvalStringAsJSON has the same issue. Will change that one too when I push the other patches. > Probably worth surrounding the whole loop in an |if (sizeof(CharT) > 1)| > with an else that does the simple thing. The code looks like this: jschar c = *chars; if (c < 0x80) continue; if (0xD800 <= c && c <= 0xDFFF) { If *chars is a Latin1Char, the compiler should be smart enough to eliminate the second "if". I just tested this with Clang and it's even vectorizing this loop with SIMD instructions somehow :) > Followup to take src/srclen as a Range? Filed bug 1026490; there's more we can do to clean this up. > I don't think this (char*) is needed. And if it is, followup to remove it. Done.
Updated•10 years ago
|
Attachment #8440749 -
Flags: review?(luke) → review+
Comment 19•10 years ago
|
||
Comment on attachment 8440936 [details] [diff] [review] Part 10 - join Nice
Attachment #8440936 -
Flags: review?(luke) → review+
Assignee | ||
Comment 20•10 years ago
|
||
Parts 6, 9, 10: https://hg.mozilla.org/integration/mozilla-inbound/rev/82ae65ae640f https://hg.mozilla.org/integration/mozilla-inbound/rev/793e644f5d9b https://hg.mozilla.org/integration/mozilla-inbound/rev/56b033bd8c47 (In reply to Nicholas Nethercote [:njn] from comment #16) > > + SourceBufferHolder::Ownership ownership = flatChars.maybeTakeOwnership() > > + ? SourceBufferHolder::GiveOwnership > > + : SourceBufferHolder::NoOwnership; > > maybeTakeOwnership() returns a bool, but you're assigning to a variable with > type |Ownership|. No the bool is used in a ternary that returns an |Ownership|. > Also, I'm having trouble with the name of this function. > |flatChars.maybeTakeOwnership()| makes it sound like |flatChars| is taking > ownership, but I think it's actually giving up ownership? I find this part > confusing. I changed it to maybeGiveOwnershipToCaller. Let me know if you think it's still confusing.
Keywords: leave-open
Assignee | ||
Comment 21•10 years ago
|
||
And thanks for all the quick reviews!
Comment 22•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/361aecdb5e5d https://hg.mozilla.org/mozilla-central/rev/10a5ed0d867f https://hg.mozilla.org/mozilla-central/rev/548ea28a6cd6 https://hg.mozilla.org/mozilla-central/rev/752ad0d02cb3 https://hg.mozilla.org/mozilla-central/rev/b4932e874227 https://hg.mozilla.org/mozilla-central/rev/56e4788545db https://hg.mozilla.org/mozilla-central/rev/212f914cb83a https://hg.mozilla.org/mozilla-central/rev/82ae65ae640f https://hg.mozilla.org/mozilla-central/rev/793e644f5d9b https://hg.mozilla.org/mozilla-central/rev/56b033bd8c47
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
Comment 23•10 years ago
|
||
> I changed it to maybeGiveOwnershipToCaller. Let me know if you think it's
> still confusing.
Much better, thanks!
You need to log in
before you can comment on or make changes to this bug.
Description
•