Closed Bug 836968 Opened 9 years ago Closed 9 years ago

Speed up JSON parsing and type information when operating on many similar objects


(Core :: JavaScript Engine, defect)

Other Branch
Not set





(Reporter: bhackett1024, Assigned: bhackett1024)




(2 files, 1 obsolete file)

The patterns of property names in JSON objects will typically repeat themselves for a given workload --- JSON encodes XML like data and the encoded objects should be well structured and similar to one another.  Currently, our JSON parser does not take advantage of this, and calls the generic define path for every encountered property.  This define path is slow, and requires lots of roots which will get slower with exact marking.

It would be better if the parser waited until it collected all the properties for an object, then used a cache to quickly compute its final shape, going the define route if the properties are new.  Allocating the object last also means that the object can be sized to the number of properties it will have, improving memory utilization / performance.

Similarly, if arrays are not allocated until all their elements have been parsed then they can be sized to their number of elements.

TI already has a cache for giving initializer objects types according to their properties.  This can be adapted to be used by JSON, and as an added bonus means that the JSON objects will have accurate type information, fixing a long standing performance issue (we currently don't use TI at all with objects that were produced by JSON.parse).

Patches to fix this coming up in a few minutes.  Kraken has a test, json-parse-financial, that pounds on JSON.parse.  These patches improve our score on this benchmark from 136ms to 77ms (v8 is 87ms).
Attached patch JSON changes (obsolete) — Splinter Review
JSON patch.  This restructures the parser to keep its intermediate state in a new data structure rather than in a stack of values, and only allocates the JS objects and arrays when all their contents are known.
Assignee: general → bhackett1024
Attachment #708843 - Flags: review?(jwalden+bmo)
Attached patch TI changesSplinter Review
Changes to allow TI to allocate an object from a list of ids/values, rather than from an object whose shape has already been computed.
Attachment #708844 - Flags: review?(jdemooij)
Comment on attachment 708843 [details] [diff] [review]
JSON changes

Review of attachment 708843 [details] [diff] [review]:

Looks pretty nice.  Mostly I think I want more comments/documentation of how it all fits together before I'll be fully happy with it -- that and a little renaming/slight reordering, which probably wants another look just to be safe.  Sorry for the delay; I'll try to see if I can turn this around faster next time, so you're not waiting forever on review of a smaller set of changes to what I've already seen.

::: js/src/jsapi.h
@@ +1091,5 @@
>          IONALLOC =    -29, /* js::ion::AutoTempAllocatorRooter */
>          WRAPVECTOR =  -30, /* js::AutoWrapperVector */
>          WRAPPER =     -31, /* js::AutoWrapperRooter */
> +        OBJOBJHASHMAP=-32, /* js::AutoObjectObjectHashMap */
> +        JSONPARSER =  -33  /* JSONParser */


::: js/src/jsonparser.cpp
@@ +18,5 @@
>  using mozilla::RangedPtr;
> +JSONParser::~JSONParser()
> +{
> +    for (size_t i = 0; i < stack.length(); i++) {

size_t i = 0, len = stack.length(); i < len; i++

Same sort of thing for all the other loops in the patch, too.

@@ +519,5 @@
> +inline bool
> +JSONParser::finishObject(MutableHandleValue vp)
> +{
> +    PropertyVector &properties = stack.back().properties();

Let's have the caller pass this, seeing as two callers already know it.  (And the last case is for a legacy parsing mode that I'm almost certain we can just remove now.)  Then just assert here that the passed-in properties truly are the back of the stack.

@@ +527,5 @@
> +     * properties.
> +     */
> +    JSObject *obj = NULL;
> +    if (cx->typeInferenceEnabled())
> +        obj = cx->compartment->types.newTypedObject(cx, properties.begin(), properties.length());

It would be nice to make this method take a mozilla::Range so that you don't have to compute separate values here.  Maybe a followup to add a range() method to Vector, that could be used for this?

@@ +534,5 @@
> +        vp.set(ObjectValue(*obj));
> +        freeProperties.append(&properties);
> +        stack.popBack();
> +        return true;
> +    }

This is the same as the tail end of the function, except for it missing a failure-check on the append.  Could you instead make this the contents of finishObject():

JSObject *obj = NULL;
if (cx->TIEnabled() {
    obj = ...;
if (!obj) {
    obj = createFinishedObject(&properties);
    if (!obj)
        return false;

if (!freeProperties.append(&properties))
    return false;
return true;

And then have a separate createFinishedObject() that does all the messy, slow bits separately.

@@ +569,5 @@
> +        cx->compartment->types.fixObjectType(cx, nobj);
> +
> +    vp.setObject(*nobj);
> +    if (!freeProperties.append(&properties))
> +        return false;

We're going to accumulate empty vectors up to the max depth of the JSON structure when summed across freeProperties and freeElements, right?  This kind of worries me, a little.  But I guess let's wait for the real world to complain before deciding to sometimes just delete empty arrays whenever the free lists get too long.

@@ +577,5 @@
> +
> +inline bool
> +JSONParser::finishArray(MutableHandleValue vp)
> +{
> +    ElementVector &elements = stack.back().elements();

Assert this too, and have callers pass in this value.  Again, only the legacy parsing case doesn't have it readily available.

@@ +612,5 @@
> +            properties.back().value = value;
> +
> +            token = advanceAfterProperty();
> +            if (token == ObjectClose) {
> +                if (!finishObject(&value))

With finishObject taking a PropertyVector, you can pass |properties| here.

@@ +631,5 @@
>            JSONMember:
>              if (token == String) {
> +                jsid id = AtomToId(atomValue());
> +                PropertyVector &properties = stack.back().properties();
> +                if (!properties.append(IdValue(id)))

IdValue suggests to me an id that's a value.  I think we even had an IdValue method at one point, to make it even more misleading.

ObjectMember would be a better name, and more in line with the surrounding naming in the parser, but I guess you're trying to use this outside just here.  So that's probably out.  Maybe IdAndValue?  Let's find something that works better here than just IdValue, tho.

@@ +729,3 @@
>                  token = advanceAfterObjectOpen();
> +                if (token == ObjectClose) {
> +                    if (!finishObject(&value))

You can pass properties here.

@@ +747,5 @@
>                       * profile upgrades from 3.6 become unsupported.)  Permit
>                       * such trailing commas only when specifically
>                       * instructed to do so.
>                       */
> +                    if (!finishArray(&value))

Here you'll have to pass &stack.back().elements(), as you don't already have it.

@@ -656,5 @@
>          }
>      }
>      JS_ASSERT(end == current);
> -    JS_ASSERT(valueStack.length() == 1);

You should be able to assert |stack.length() == 0| now, as you've basically combined the two stacks stateStack and valueStack into one.  (Definitely a good change to have less stack pushing/popping happening here, by the way -- should have seen it myself.)  Arguably this is vacuous given the only way out of the while (true) loop is past an empty check.  But given the complexity of this, I think there's little harm, and a little extra fast-reader clarity, to have it nonetheless.

::: js/src/jsonparser.h
@@ +42,5 @@
>                   ObjectOpen, ObjectClose,
>                   Colon, Comma,
>                   OOM, Error };
> +
> +    typedef Vector<Value, 20> ElementVector;

All this stuff here needs documentation comments by it.  Or, probably better, one overview comment for how all the fields/classes work.  Looking back, I probably should have added more comments about this stuff before.  But at least it was all in a single method, so you only had to look at one file to understand it.  With it now spread across header and implementation files, it's starting to get unwieldy.

Parsing the recursive structure of JSON requires tracking information for any array or object being processed,

@@ +45,5 @@
> +
> +    typedef Vector<Value, 20> ElementVector;
> +    typedef Vector<IdValue, 10> PropertyVector;
> +
> +    enum ParserState { FinishArrayElement, FinishObjectMember, JSONValue };

Looking closer, with the new-ish semantics you have here (heck, probably with the old semantics had I the wit to notice it), there's no reason JSONValue has to exist at all.  We should just be able to goto directly to the value parsing bit from the get-go, reducing this to two states.

We should also rename them to InArray and InObject, because the idea of their being related to "finishing" isn't really true any more, with the semantic changes you've made.  Make this a separate patch please, atop the big stuff being done in this bug, to keep the hard parts more easily reviewable.

From there the next obvious step is to maybe convert the switch to if-else or something, but probably best not to string work here along that far -- punt that to a followup on more-stable code.

@@ +49,5 @@
> +    enum ParserState { FinishArrayElement, FinishObjectMember, JSONValue };
> +
> +    struct StackEntry {
> +        ParserState state;
> +        void *vector;

Could you make vector private here so it's only accessible via the asserting accessors?

@@ +53,5 @@
> +        void *vector;
> +
> +        ElementVector &elements() {
> +            JS_ASSERT(state == FinishArrayElement);
> +            return * (ElementVector *) vector;

C++ static_cast<>.

@@ +58,5 @@
> +        }
> +
> +        PropertyVector &properties() {
> +            JS_ASSERT(state == FinishObjectMember);
> +            return * (PropertyVector *) vector;

And here too.

@@ +70,5 @@
> +          : state(FinishObjectMember), vector(properties)
> +        {}
> +    };
> +
> +    Vector<StackEntry, 10> stack;

Stack of object/array properties/elements state for the structure currently being parsed, at each level of nested-structure parsing.  At the top level of parsing this stack.

@@ +73,5 @@
> +
> +    Vector<StackEntry, 10> stack;
> +
> +    Vector<ElementVector*, 5> freeElements;
> +    Vector<PropertyVector*, 5> freeProperties;

These could use a separate comment after them, something like "Previously-used, now-emptied vectors available for use in calculating elements/properties of newly-encountered JSON structures."

@@ +99,4 @@
>          current(data),
>          end((data + length).get(), data.get(), length),
>          parsingMode(parsingMode),
> +        errorHandling(errorHandling),

Probably parsingMode and errorHandling, and the corresponding fields, should move to the end of the class to possibly pack slightly better.
Attachment #708843 - Flags: review?(jwalden+bmo) → review-
Attached patch JSON changesSplinter Review
Updated patch.  I left in JSONValue as it is still needed at the start of the parse and seems preferable to just have the field be uninitialized, though added comments for the various ParserStates.  I think the enums will be int sized so it shouldn't matter much where in the structure they're placed, and there isn't really value in trying to micro-optimize layout for stack allocated structures.
Attachment #708843 - Attachment is obsolete: true
Attachment #723032 - Flags: review?(jwalden+bmo)
Comment on attachment 723032 [details] [diff] [review]
JSON changes

Review of attachment 723032 [details] [diff] [review]:

Good comments; future readers are going to be happy about that.

::: js/src/jsonparser.cpp
@@ +523,5 @@
> +    /*
> +     * Look for an existing cached type and shape for objects with this set of
> +     * properties.
> +     */
> +    if (cx->typeInferenceEnabled()) {

I'd thought we might want this inline in the other method to avoid call overhead when there's a pattern match, but it's totally reasonable waiting to do that until data demonstrates it.
Attachment #723032 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 708844 [details] [diff] [review]
TI changes

Review of attachment 708844 [details] [diff] [review]:

::: js/src/jsinfer.cpp
@@ +3269,5 @@
> +          : properties(properties), nproperties(nproperties), nfixed(nfixed)
> +        {}
> +    };
> +
> +    static inline uint32_t hash(const Lookup &lookup) {

Nit: s/uint32_t/HashNumber

@@ +3341,5 @@
>      if (obj->slotSpan() == 0 || obj->inDictionaryMode() || !obj->hasEmptyElements())
>          return;
> +    Vector<IdValue> properties(cx);
> +    properties.resize(obj->slotSpan());

Do we have to nuke types if resize returns false?
Attachment #708844 - Flags: review?(jdemooij) → review+
And those suspicions turned out to be correct.
Updated patch that doesn't seem to trigger this failure on try.  The earlier patch added a weak shape pointer to ObjectTableEntry but didn't check to see if it was about to be finalized during sweeping so that it could end up pointing to garbage after a GC.
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla22
Depends on: 851807
This seems to have caused bug 851807, currently the #1 topcrasher on the 3-16 nightly.
Depends on: 851806
Backed out from trunk in  The topcrash should be fixed by bug 851635 but I can't push that bug because the trees are a mess right now.
Depends on: 852563
Depends on: 852865
You need to log in before you can comment on or make changes to this bug.