Closed Bug 1023778 Opened 10 years ago Closed 10 years ago

JSON parser should handle Latin1 strings

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla33

People

(Reporter: jandem, Assigned: jandem)

References

Details

Attachments

(8 files, 1 obsolete file)

      No description provided.
This class adds JSONParserBase as we discussed. This class holds all state and methods that can be shared between the two encodings.
Attachment #8438316 - Flags: review?(jwalden+bmo)
Attached patch Part 2 - Parameterize JSONParser (obsolete) — Splinter Review
With this patch everything compiles, because we only instantiate JSONParser<jschar>, but we need more work to handle other char types.
Attachment #8438333 - Flags: review?(jwalden+bmo)
Attachment #8438333 - Attachment is obsolete: true
Attachment #8438333 - Flags: review?(jwalden+bmo)
Attachment #8438334 - Flags: review?(jwalden+bmo)
The JSON parser uses AtomizeChars and js_NewStringCopyN, this patch makes these handle Latin1 strings.
Attachment #8438468 - Flags: review?(luke)
This patch doesn't use it yet, but we now compile the JSON parser for Latin1Chars.
Attachment #8438470 - Flags: review?(jwalden+bmo)
AutoStableStringChars gives us access to a string's chars across potential GCs.

Once we allocate strings and chars in the nursery, AutoStableStringChars will have to make a copy of the chars. I think/hope this will be fine; only relatively small strings will have their chars allocated in the nursery, and I think there will be very few places where we need this class.
Attachment #8438473 - Flags: review?(terrence)
Attachment #8438478 - Flags: review?(jwalden+bmo)
Attached patch Part 7 - TestsSplinter Review
Attachment #8438479 - Flags: review?(jwalden+bmo)
Comment on attachment 8438473 [details] [diff] [review]
Part 5 - Add AutoStableStringChars

Review of attachment 8438473 [details] [diff] [review]:
-----------------------------------------------------------------

r=me

I think once we store string data in the nursery, we'll also need to make this a CustomRooted and make ::trace update s_. That will also require updating the analysis, so we should wait on that -- for now this is a perfect placeholder.

::: js/src/vm/String.h
@@ +1072,5 @@
> +    State state_;
> +    bool ownsChars_;
> +
> +  public:
> +    explicit AutoStableStringChars(JSContext *cx, JSLinearString *s)

|explicit| is only required on one-arg constructors. (Although it's fine if we've decided to put it unconditionally on all constructors and I just missed the memo.)
Attachment #8438473 - Flags: review?(terrence) → review+
(In reply to Terrence Cole [:terrence] from comment #9)
> I think once we store string data in the nursery, we'll also need to make
> this a CustomRooted and make ::trace update s_. That will also require
> updating the analysis, so we should wait on that -- for now this is a
> perfect placeholder.

My plan was to have AutoStableStringChars::init* copy the chars to some malloc'ed buffer for nursery-allocated chars, and then never touch the string again (the RootedString is only there to keep it alive). Why would we need the CustomRooted/trace?

> |explicit| is only required on one-arg constructors. (Although it's fine if
> we've decided to put it unconditionally on all constructors and I just
> missed the memo.)

Initially the constructor had one argument and the string was passed to init. Then I moved the argument to the constructor and forgot to remove the explicit, will fix.

Thanks for the quick review!
Comment on attachment 8438316 [details] [diff] [review]
Part 1 - Add JSONParserBase class

Review of attachment 8438316 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Eval.cpp
@@ +173,5 @@
>  
>              if (cp == end) {
>                  bool isArray = (chars[0] == '[');
>                  JSONParser parser(cx, isArray ? chars : chars + 1U, isArray ? length : length - 2,
> +                                  JSONParserBase::NoError);

Why won't JSONParser::NoError still work?  Several places in JSONParser.cpp have the same sort of thing in play.

::: js/src/jsonparser.h
@@ +187,5 @@
> +  public:
> +    /* Public API */
> +
> +    /* Create a parser for the provided JSON data. */
> +    JSONParser(JSContext *cx, JS::ConstTwoByteChars data, size_t length,

It is incredibly strange that TwoByteChars is a Range, yet ConstTwoByteChars is a RangedPtr.  Both should be ranges.  Please fix this in a separate patch.  For now, I'll let this slide.

@@ +192,5 @@
> +               ErrorHandling errorHandling = RaiseError)
> +      : JSONParserBase(cx, errorHandling),
> +        current(data),
> +        begin(data),
> +        end((data + length).get(), data.get(), length)

ConstTwoByteChars being a Range would let you have

  current(data.start()),
  begin(current),
  end(data.end()),
Attachment #8438316 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8438334 [details] [diff] [review]
Part 2 - Parameterize JSONParser

Review of attachment 8438334 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsonparser.cpp
@@ +108,1 @@
>  template<JSONParserBase::StringType ST>

template<>...template<> everywhere.

</meme>

Okay, more seriously, the spacing here should be consistent.
Attachment #8438334 - Flags: review?(jwalden+bmo) → review+
Attachment #8438470 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8438478 [details] [diff] [review]
Part 6 - Make JSON.parse handle Latin1 strings

Review of attachment 8438478 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsapi.cpp
@@ +5683,5 @@
>      AssertHeapIsIdle(cx);
>      CHECK_REQUEST(cx);
>  
>      RootedValue reviver(cx, NullValue());
> +    return ParseJSONWithReviver(cx, mozilla::Range<const jschar>(chars, len), reviver, vp);

Oh hey, you're just fixing this later.  :-)

I wonder why TwoByteChars (holding non-const jschars) isn't a typedef of Range<jschar>, and ConstTwoByteChars (holding const jschars) isn't a typedef of Range<const jschar>.  We should fix that.

::: js/src/json.cpp
@@ +778,4 @@
>                           HandleValue reviver, MutableHandleValue vp)
>  {
>      /* 15.12.2 steps 2-3. */
> +    JSONParser<CharT> parser(cx, chars.start(), chars.length());

Again having ConstTwoByteChars be a Range would make more sense to let you just pass |chars| directly.

@@ +823,4 @@
>      if (!flat)
>          return false;
>  
>      JS::Anchor<JSString *> anchor(flat);

Are we to the point of not scanning the stack these days, I think?  If so, preserving the Rooted above and getting rid of this Anchor seems good.

Or, no, ASSC contains a Rooted.  Is there a way we can have the ensureFlat flow directly into that Rooted or so, somehow?  Musing, mostly.

@@ +832,5 @@
>      RootedValue reviver(cx, args.get(1));
>  
>      /* Steps 2-5. */
> +    if (flatChars.isLatin1()) {
> +        mozilla::Range<const Latin1Char> chars(flatChars.latin1Chars(), flat->length());

Could latin1Chars() return a Range, maybe?

@@ +837,5 @@
> +        return ParseJSONWithReviver(cx, chars, reviver, args.rval());
> +    }
> +
> +    mozilla::Range<const jschar> chars(flatChars.twoByteChars(), flat->length());
> +    return ParseJSONWithReviver(cx, chars, reviver, args.rval());

Would also be nice for twoByteChars to return a Range.
Attachment #8438478 - Flags: review?(jwalden+bmo) → review+
Attachment #8438479 - Flags: review?(jwalden+bmo) → review+
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #11)
> Why won't JSONParser::NoError still work?  Several places in JSONParser.cpp
> have the same sort of thing in play.

It'd have to be JSONParser<CharT>::NoError or JSONParser<jschar>::NoError with the other patches.

> It is incredibly strange that TwoByteChars is a Range, yet ConstTwoByteChars
> is a RangedPtr.  Both should be ranges.  Please fix this in a separate
> patch.  For now, I'll let this slide.

Yes I noticed it too; pretty weird. The other patches rewrite all this code to use Range/RangedPtr directly though.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #13)
> Are we to the point of not scanning the stack these days, I think?  If so,
> preserving the Rooted above and getting rid of this Anchor seems good.
> 
> Or, no, ASSC contains a Rooted.  Is there a way we can have the ensureFlat
> flow directly into that Rooted or so, somehow?  Musing, mostly.

I think it's good to keep the Anchor a little longer until the conservative scanner dies, hopefully soon. I don't think there's a nice way to make the result of ensureFlat flow into that Rooted, because we also need the OOM check...

> Could latin1Chars() return a Range, maybe?
> 
> Would also be nice for twoByteChars to return a Range.

Yes, nice idea.
The JSONParser constructor now takes a Range instead of RangedPtr + length. I also added latin1Range and twoByteRange methods to AutoStableStringChars (I want to keep the jschar*/Latin1Char* methods to avoid .start().get() etc when we don't want a Range).

I also tried to make ConstTwoByteChars a Range instead of RangedPtr, but it requires a bunch of changes everywhere and as it's a pre-existing issue I think this is better done as a follow-up bug.
Attachment #8439101 - Flags: review?(jwalden+bmo)
Depends on: 1017107
(In reply to Jan de Mooij [:jandem] from comment #15)
> (I want to keep the jschar*/Latin1Char* methods to avoid .start().get() etc
> when we don't want a Range).

Actually it feels to me like exposing the data *only* as a Range is nice, in that users will always have both at the ready when needed.  The raw pointer has no real use on its own, only in connection with its extent of validity.

As for .start().get(), I think we need to push RangedPtr down into users a bit harder, so this extra typing isn't needed.  Worth keeping in mind for patches and reviews, I think.

> I also tried to make ConstTwoByteChars a Range instead of RangedPtr, but it
> requires a bunch of changes everywhere and as it's a pre-existing issue I
> think this is better done as a follow-up bug.

Fine by me.
Comment on attachment 8439101 [details] [diff] [review]
Part 8 - Address comments

Review of attachment 8439101 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Eval.cpp
@@ +175,5 @@
>  
>              if (cp == end) {
>                  bool isArray = (chars[0] == '[');
> +                Range<const jschar> jsonChars(isArray ? chars.get() : chars.get() + 1U,
> +                                              isArray ? length : length - 2);

I might do

  auto jsonChars = isArray
                   ? Range<const jschar>(chars.get(), length)
                   : Range<const jschar>(chars.get() + 1U, length - 2);

which seems more readable to me about the pointer/length in use.
Attachment #8439101 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8438468 [details] [diff] [review]
Part 3 - AtomizeChars and js_NewStringCopyN

Review of attachment 8438468 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsstr.cpp
@@ +4293,4 @@
>  {
> +    if (EnableLatin1Strings) {
> +        if (JSFatInlineString::lengthFits<CharT>(n))
> +            return NewFatInlineString<allowGC>(cx, mozilla::Range<CharT>(const_cast<CharT*>(s), n));

Could you instead change NewFatInlineString to take a Range<const CharT>?

@@ +4304,5 @@
> +
> +        JSFlatString *str = js_NewString<allowGC>(cx, news.get(), n);
> +        if (str)
> +            news.forget();
> +        return str;

Same comment about preferring "if (!str) return nullptr;" as other bug.
Attachment #8438468 - Flags: review?(luke) → review+
And 3-8:

https://hg.mozilla.org/integration/mozilla-inbound/rev/4a930d3fe2aa
https://hg.mozilla.org/integration/mozilla-inbound/rev/bc9e05912b9a
https://hg.mozilla.org/integration/mozilla-inbound/rev/43dea8212085
https://hg.mozilla.org/integration/mozilla-inbound/rev/bea335771e45
https://hg.mozilla.org/integration/mozilla-inbound/rev/b1d281ccac3b
https://hg.mozilla.org/integration/mozilla-inbound/rev/30daebca67b3

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #16)
> Actually it feels to me like exposing the data *only* as a Range is nice, in
> that users will always have both at the ready when needed.  The raw pointer
> has no real use on its own, only in connection with its extent of validity.
> 
> As for .start().get(), I think we need to push RangedPtr down into users a
> bit harder, so this extra typing isn't needed.  Worth keeping in mind for
> patches and reviews, I think.

OK, done. We can always add it back if it becomes too verbose.

> Fine by me.

Bug 1025184.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #17)
> I might do
> 
>   auto jsonChars = isArray
>                    ? Range<const jschar>(chars.get(), length)
>                    : Range<const jschar>(chars.get() + 1U, length - 2);
> 
> which seems more readable to me about the pointer/length in use.

Done.

(In reply to Luke Wagner [:luke] from comment #18)
> Could you instead change NewFatInlineString to take a Range<const CharT>?

Done.

> Same comment about preferring "if (!str) return nullptr;" as other bug.

Fixed.
Keywords: leave-open
https://hg.mozilla.org/mozilla-central/rev/090d9d226f63
https://hg.mozilla.org/mozilla-central/rev/c2714755f3f7
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: