Closed Bug 1580246 Opened 5 years ago Closed 5 years ago

Investigate removal of singleton object run-once optimization

Categories

(Core :: JavaScript Engine, task, P1)

task

Tracking

()

RESOLVED FIXED
mozilla72
Tracking Status
firefox72 --- fixed

People

(Reporter: cfallin, Assigned: cfallin)

References

Details

Attachments

(2 files, 2 obsolete files)

As per :tcampbell, this optimization blocks the adoption of a new frontend parser, because it mutates the parse output. We'd prefer the parser result (including literal objects built by the script, or templates or code to create them) to be immutable.

Current patch works by generating bytecode (JSOP_NEWINIT + property additions) instead of a literal object (JSOP_NEWOBJECT) when the object is in singleton context. The JSOP_NEWINIT opcode's arg (previously unused extra 32 bits so sizeof(NEWINIT) == sizeof(NEWOBJECT)) is repurposed as a 1-bit singleton-context hint, plus a hint for the number of property slots (else the new object gets the generic default and some object-size tests fail due to increased memory).

Raptor-on-try runs:

With change
Without change

It looks like we have a significant negative impact on page-load times still with the current form of this change: e.g., looking at TP6-1, YouTube pageload goes from 583ms to 608ms (+4.3%). Others not quite as severe, but that YouTube degradation is no good... Will look further into where the degradation comes from.

Confirmed that the objects created with patch applied are still singleton-group objects (via printf in js::NewObjectOperation, comparing baseline to build with patch). I've been playing with the Gecko Profiler but it's difficult to do a differential comparison between the two runs. My working hypothesis is that we're simply seeing the cost of the additional bytecode execution throughput -- surely it's more expensive to dispatch a bunch of property-store ops from the interpreter than directly from the ObjectEmitter?

If that's the case, then I can think of a few different approaches. We could design a very lightweight mini-bytecode just for object literals, then emit this from ObjectEmitter and quickly instantiate it into a new object in a nice optimized tight loop at runtime. We could also eat the cost simultaneously to a transition to a new parser that requires immutable parse results, assuming the new parser is faster to a degree that it would still be a win overall. Other thoughts?

I wonder, could we emit an opcode which has the starting and ending position of the object in the source and use a “JSON” parser when the opcode is executed?

You're probably right that the interpreter dispatch loop is adding too much overhead to object initialization. Nicolas idea is interesting - it's basically turning the literal creation into a single large operation that takes the object structure as an immediate operand. There may be issues with that approach if it turns out that we have some implicit constraints on the size of bytecode instructions that aren't well known, but should be fixable.

Attachment #9091790 - Attachment description: Bug 1580246: Remove singleton object literals. WIP. → Bug 1580246: remove singleton object literals.

Updated patch and did another raptor run -- here's the Perfherder comparison: link. This version is a little better -- it uses NEWOBJECT (not OBJECT) rather than NEWINIT when possible, which uses the shape that was derived during the parse object build rather than simply the slot count. The YouTube perf hit is now ~2%, but there are several other not-great impacts (e.g. Outlook at 9.5% worse).

Reusing the JSON parser is an interesting thought. I had a similar thought but a little more barebones -- basically record a series of constant values and slot numbers, along with the shape, so the object alloc/init is a fast loop that's just stashing values into memory. Interestingly this actually applies outside of singleton context now too, so we potentially have wins, not just reductions of perf degradation...

The JSON parser couldn't be reused, because it handles Latin-1 and UTF-16 -- not UTF-8 and UTF-16 as the parser requires. JSON has different semantics in certain cases from normal object literal syntax. And tokenizing and parsing generally is really actually a lot of work, to recreate all those things like string parsing and the like that can be cached if you do something faintly like what we do now, but just not quite as bad.

The barebones idea seems pretty clearly like the right tack here.

Potential upsides of a simple object-creation-IR is that it would be:

  • Easier to integrate a potential new frontend
  • Could be used to replace the 'template objects' of the JITs which are adhoc objects that carry information (such as target group and #slots) through the JITs.
  • Could be used to represent ECMAScript field-initializers that we'd currently use some sort of lambda expression for.

(I don't know if this motivates such a design immediately or not, but they are interesting future tie-ins)

I want to state that the initial goal of Brian Hackett was to avoid creating the object twice, as it would consume memory, and some of these objects are Huge.

Having any form of persistent representation will consume extra memory in addition to the object created at the execution.

To avoid memory regressions, these run-once opcode should probably discard their intermediate representation once executed.

(In reply to Jeff Walden [:Waldo] from comment #7)

JSON has different semantics […]

This is why I added “quotes” in comment 4. However, as we do not discard the source, we have the ability to reuse it as much as we want.

Priority: -- → P1

(In reply to Nicolas B. Pierron [:nbp] from comment #9)

I want to state that the initial goal of Brian Hackett was to avoid creating the object twice, as it would consume memory, and some of these objects are Huge.

Having any form of persistent representation will consume extra memory in addition to the object created at the execution.

To avoid memory regressions, these run-once opcode should probably discard their intermediate representation once executed.

The need to reduce memory overhead is definitely noted. I suppose I'm trying to reconcile the various requirements -- (a) completely immutable parser output (so no stealing pieces to use directly by the script), (b) low memory (so holding an extra object in addition to the runtime result is undesirable), and (c) fast construction (so we don't regress on pageload times).

Our options are:

  1. JSOP_OBJECT, with object built by parser and stolen by script as it runs at most once. Ideal for (b) and (c) (as low overhead as possible -- the object has to be built once anyway) but fails requirement (a).

  2. JSOP_OBJECT with cloneSingletons flag set. Fits (a) but fails (b) (two copies of object) and (c) (deep clone is slow).

  3. JSOP_NEWOBJECT, with a template object (shape but no values) built by parser and used to create the new object at runtime, then bytecode to fill in the values. This is what had a ~2% pageload regression on YouTube and 9.5% on Outlook above. So fits hard requirement (a) but fails (c). Memory is less than in option 2 (the template object has no values) but we still have the constant field values encoded as a bunch of bytecode snippets, which is less than ideal.

  4. JSOP_NEWINIT (new {} object) and bytecode. Strictly worse than option 3 because it retains the bytecode-execution overhead but also has to re-derive the shape.

  5. some new IR (IR may be over-selling it: just a list of field slots/shapes and values). Meets requirement (a), hopefully faster than option 3 (stream through list of memory pokes, rather than dispatch bytecode), and less memory than options 3 or 4 (list is more compact than bytecode).

Note also that options 2, 3, 4, 5 also work outside of singleton context, i.e., this could accelerate a loop that builds lots of constant-field object literals, too.

It seems to me that option 5 is the least bad; we're forced to avoid the current design (option 1) as long requirement (a) exists.

As for memory use reduction and run-once scripts: from first principles, this is a GC semantics question, in the sense that we could inform the GC that part of a script is dead once it's executed and won't execute again, true? This seems more general than just object literals: I would imagine there's more data on the script object that could be freed as soon as we pass a certain PC in the bytecode. (Sort of like a marathon race where the start of the course is cleaned and reopened as the runners are still on the latter half.) Factoring out that idea might allow the script (parse result) to be immutable in principle, but partially reclaimable in practice if there's memory pressure... just a thought.

Attachment #9091790 - Attachment description: Bug 1580246: remove singleton object literals. → Bug 1580246: remove singleton object literals. WIP.

An update: I built a prototype of something like option 5 (diff here on Phab) -- no Raptor run as there are some issues with a full Firefox build, but a JS shell run with a little test that evals an object literal in a loop shows a bit too much overhead (~20% vs. JSOP_OBJECT) so I'm trying to understand more where the overhead is coming from. As per conversation with Ted today, something like nbp's idea of a lazy literal parse is interesting medium-to-long-term, but I'll try to get something working in the shorter term as well. I'm planning to try something lighter-weight than the ObjectFactory / IR-like approach -- perhaps just use JSOP_NEWOBJECT together with some sort of batching for the field value initialization (a put-these-N-constant-values-in-these-N-slots opcode).

Attachment #9091790 - Attachment description: Bug 1580246: remove singleton object literals. WIP. → Bug 1580246: Remove singleton object run-once optimization by emitting NEWOBJECT instead of OBJECT opcode.

Regarding storing information twice: with the planned JSScript/LazyScript merge it will likely become possible to discard bytecode (and related data structures) for top-level scripts so this will be less of a concern then.

I was unsure of the reliability of the earlier try runs so I ran another here -- does anyone know how the confidence scores are computed and how much, err, confidence I should put into the confidence? (I looked briefly at the options and saw that number of experiment repetitions is a knob -- but it's already at 10 and I don't want to go overkill on testing resources). In any case, the scores moved a bit in both directions, but we still see ~1-2% increase in pageload (sometimes more) in a bunch of cases.

I instrumented the object allocation (JSOP_NEWOBJECT/JSOP_OBJECT) and found that overall, we are not talking about a large number of singleton objects. E.g., browsing to YouTube manually and scrolling a bit, I see 8956 object allocations, but only 96 are in singleton context. This seems to suggest (to me anyway) that the actual perf issue might have more to do with TI and/or object layout -- there's something different about the created object -- than the overhead at allocation time. (Thoughts on that?) I'll poke at this more today.

Blocks: 1584646

Update after more discussion with folks (Matt, Ted, Kannan, Iain) -- the first step here will be to isolate the object allocation as a separate GC-using action at the end of parse, as part of Matt's work on bug 1584646. That removes the more near-term blockage for the parser work, and also gives us a separate object-factory thing that we can then experiment with using more (e.g., maybe for every object literal allocation, even inside loops).

Attachment #9098397 - Attachment description: Bug 1580246: WIP. → Bug 1580246: WIP. Works for all tiers but Ion (run test case with --no-ion).
Attachment #9098397 - Attachment description: Bug 1580246: WIP. Works for all tiers but Ion (run test case with --no-ion). → Bug 1580246: WIP.
Attachment #9098397 - Attachment description: Bug 1580246: WIP. → Bug 1580246: Remove object-literal singleton objects allocated at parse. r=tcampbell,djvj

Here's a Raptor comparison of the latest patch vs. its baseline revision: link. I respawned each tp6 job until I had 6 runs per tp6-{1..10}. The confidence scores are still a bit spread out despite the repetitions, but the high- and medium-confidence results are all 1-5% perf improvement (pageload decrease) with no regressions, and even counting low-confidence/high-variance results, there's only one +5% and one +1% pageload result. Overall looks pretty good IMHO -- other thoughts?

The latest patch actually replaces all "compatible" object and array allocations: these are pure-constant objects (field values are not computed expressions or other objects) with no other exotic things like property spreads. This removes JSOP_OBJECT with the run-once "please steal this object off the script" semantics, but also removes JSOP_NEWOBJECT with template objects completely. The optimization where NEWINIT (new {} empty object) + bytecode was previously replaced by a NEWOBJECT is now gone, but that was a smaller win anyway because it only gave the shape right away -- the values still had to be filled in with extra bytecode. In any case, the perf results seem to indicate that overall this is both a simplicity win and a performance win.

Some additional try runs show a regression on Speedometer (77.22 --> 71.63) and Octane (34155 --> 27604) -- note that this is N=1 (additional runs are taking a bit to run on Taskcluster at the moment) but we do seem to have some real signal here. Octane regression is all in splay.js, which allocates an object literal for every splay-tree node (source). We suspect that this regression might have to do with no longer having COW-array functionality. Unsure about the Speedometer case. In any case, after some discussion with Ted, the short-term plan is to try a version of the patch that uses JSOP_OBJLITERAL only in run-once (singleton) contexts, where JSOP_OBJECT used to be used, and let the frontend emit a NEWINIT / INITPROP*n sequence otherwise (without patch, this would have been NEWOBJECT, but we don't build the template object in the frontend anymore either). Longer-term, we can consider redefining JSOP_OBJECT semantics from "please just steal this object, you'll only run once" to "please nicely clone this object and leave the original for others to enjoy", and then transmuting JSOP_OBJLITERAL into JSOP_OBJECT with the canonical source object at some point after parse (say, at "GC publish" time, or when tiering up to Baseline or Ion).

Latest runs with above change: TP6 shows just one high-confidence result, a 3.47% pageload improvement (reduction) on LinkedIn, and a few medium-confidence results too (2-3% improvement on Yahoo Mail and Imgur), with the rest of the numbers low-confidence deltas (mostly ~0-1% improvements though).

Speedometer: -1.47% impact. This is not perfect, but as per conversation with :djvj, probably acceptable given that this patch blocks an immediate need (GC-free parsing).

Octane reduces performance from 34180 to 28291 (-17.2%), but this is all from splay.js, which creates object literals in hot paths of the splay tree setup. Our hypothesis is that the removal of template objects from the parser in this patch is causing this. In theory, this could be addressed in a followup by using JSOP_OBJLITERAL in a different mode where it only specifies a template object (no field values), then using this where JSOP_NEWOBJECT is used today. As per conversation with :djvj this morning, though, we don't want to block on this single benchmark for landing this patch first.

Update: I was informed that even the 1.x% degradation on Speedometer might be an issue, so we're working to reduce this (ideally to zero impact).

I made another attempt at things, using JSOP_OBJLITERAL to build a template object where pre-patch we would have used JSOP_NEWOBJECT (and previous to this latest change, we were instead using JSOP_NEWINIT). It turns out that this seems to do even worse: another try run shows Speedometer at ~68 (baseline is 77), or -11% impact. I'm now working on understanding why.

I've updated the patch to take a new approach. Instead of adding an OBJLITERAL opcode, and implementing this in C++ interp, BL interp, BL compiler, and Ion in a way that has to be ~as performant as existing OBJECT/NEWOBJECT, the patch now uses the ObjLiteralWriter to store descriptions of objects that the parser would have allocated (either as templates or as run-once singletons) and then allocates these right at the end of parse. The intent is to slot into the GC-free parse work neatly. Another patch at a later time can work out the performance implications of carrying the object-literal ops into the backend in some other form later.

Performance is unchanged now w.r.t. the unpatched baseline: TP6 (all low-confidence deltas, i.e. in the noise), Speedometer (-0.12% with low confidence in delta), and Octane (-0.28% with low confidence in delta).

Attachment #9098397 - Attachment description: Bug 1580246: Remove object-literal singleton objects allocated at parse. r=tcampbell,djvj → Bug 1580246: Remove object-literal singleton objects allocated at parse. r=mgaudet

Depends on D47985

Attachment #9105427 - Attachment is obsolete: true
Pushed by mgaudet@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8e192f9c8f84
Remove object-literal singleton objects allocated at parse. r=djvj,mgaudet
Attachment #9091790 - Attachment is obsolete: true

Backed out for build bustages on BytecodeSection.h

backout: https://hg.mozilla.org/integration/autoland/rev/e38292f13cdf83367480cd192dd8e197d0a40ea9

push: https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&revision=8e192f9c8f844623a1b0a2d96b0c8ca22543f183&searchStr=build&selectedJob=274455092

failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=274455092&repo=autoland&lineNumber=5187

[task 2019-11-04T18:44:16.461Z] 18:44:16 INFO - In file included from /builds/worker/workspace/build/src/js/src/frontend/BytecodeControlStructures.h:17:
[task 2019-11-04T18:44:16.461Z] 18:44:16 ERROR - /builds/worker/workspace/build/src/js/src/frontend/BytecodeSection.h:78:51: error: Type 'js::ObjLiteralCreationData' must not be used as parameter
[task 2019-11-04T18:44:16.463Z] 18:44:16 INFO - MOZ_MUST_USE bool append(ObjLiteralCreationData objlit, uint32_t* index) {
[task 2019-11-04T18:44:16.464Z] 18:44:16 INFO - ^
[task 2019-11-04T18:44:16.464Z] 18:44:16 INFO - /builds/worker/workspace/build/src/js/src/frontend/BytecodeSection.h:78:51: note: Please consider passing a const reference instead
[task 2019-11-04T18:44:16.465Z] 18:44:16 INFO - /builds/worker/workspace/build/src/js/src/frontend/ObjLiteral.h:395:20: note: 'js::ObjLiteralCreationData' is a non-param type because member 'writer_' is a non-param type 'js::ObjLiteralWriter'
[task 2019-11-04T18:44:16.465Z] 18:44:16 INFO - ObjLiteralWriter writer_;
[task 2019-11-04T18:44:16.466Z] 18:44:16 INFO - ^
[task 2019-11-04T18:44:16.466Z] 18:44:16 INFO - /builds/worker/workspace/build/src/js/src/frontend/ObjLiteral.h:161:8: note: 'js::ObjLiteralWriter' is a non-param type because it inherits from a non-param type 'js::ObjLiteralWriterBase'
[task 2019-11-04T18:44:16.466Z] 18:44:16 INFO - struct ObjLiteralWriter : private ObjLiteralWriterBase {
[task 2019-11-04T18:44:16.466Z] 18:44:16 INFO - ^
[task 2019-11-04T18:44:16.466Z] 18:44:16 INFO - /builds/worker/workspace/build/src/js/src/frontend/ObjLiteral.h:110:23: note: 'js::ObjLiteralWriterBase' is a non-param type because member 'code_' is a non-param type 'Vector<uint8_t, 64>' (aka 'Vector<unsigned char, 64UL, js::TempAllocPolicy>')
[task 2019-11-04T18:44:16.466Z] 18:44:16 INFO - Vector<uint8_t, 64> code_;
[task 2019-11-04T18:44:16.466Z] 18:44:16 INFO - ^
[task 2019-11-04T18:44:16.467Z] 18:44:16 INFO - 1 error generated.
[task 2019-11-04T18:44:16.467Z] 18:44:16 INFO - /builds/worker/workspace/build/src/config/rules.mk:787: recipe for target 'Unified_cpp_js_src_frontend2.o' failed
[task 2019-11-04T18:44:16.467Z] 18:44:16 ERROR - make[4]: *** [Unified_cpp_js_src_frontend2.o] Error 1
[task 2019-11-04T18:44:16.467Z] 18:44:16 INFO - make[4]: Leaving directory '/builds/worker/workspace/build/src/obj-firefox/js/src/frontend'
[task 2019-11-04T18:44:16.467Z] 18:44:16 INFO - make[4]: *** Waiting for unfini

Flags: needinfo?(cfallin)
Pushed by mgaudet@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/78d02a12be59
Remove object-literal singleton objects allocated at parse. r=djvj,mgaudet
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla72

No, definitely not -- we ran extensive try-run tests; the latest run with ~no impact is linked in Phab (on my way in so can't find it at the moment). I can do a sanity check try run in a few hours to be extra-super sure.

Flags: needinfo?(cfallin)

Hmm. One thing to note is that comparison is only one run; but I'd like to see some retriggers and how that goes.

Here's the last run we did on this patch, for reference (5 baseline / 5 patched runs): link

Speedometer showed 0.04% delta, and the other TP6 benchmarks are mostly in the noise. So I wouldn't expect this to result in 10% on another run.

I just added some dromaeo test jobs (which also host the kraken test) to those two try runs.

...and, in doing so, I just noticed that your try comparison was on plain 'opt' builds, so they are missing out on LTO and PGO optimizations, which could leave you with misleading results. I recommend using only 'shippable' builds for performance testing.

Yeah Kraken is part of Talos, not Raptor, maybe that's why it was missed on Try. Kraken definitely depends on singleton types for the global arrays.

Ugh -- stanford-crypto-aes is a torture test for the JS parser:

https://hg.mozilla.org/projects/kraken/file/tip/tests/kraken-1.1/stanford-crypto-aes-data.js#l223

That's a 400KB array-of-objects-of-arrays literal. We're adding work to parse (extra stage text->ObjLiteral->object, rather than text->object), so it's definitely going to have some overhead. I wonder if the time is going into parse or if something else is becoming wacky with TI...

(In reply to Chris Fallin [:cfallin] from comment #33)

I wonder if the time is going into parse or if something else is becoming wacky with TI...

It shouldn't be parse time because the data file is loaded before the harness starts measuring (I think that's why it was moved into a separate file). As long as it's in the global scope lazy/full parsing also doesn't apply.

OK, I was able to reproduce this slowdown from command line (for posterity: just concatenate stanford-crypto-aes-data.js and stanford-crypto-aes.js and run the result with the js shell).

A look at perf diff between the baseline and this patch shows +35% time in JIT code, and +24% in js::NativeSetProperty, and a bunch of 1-2% increases in callees of the latter. The basic problem is that we're generating bytecode to build the object literal described by that 400KB of source code, whereas without this patch, the parser is able to directly allocate the tree (array-of-objects-of-arrays) in one go and because this is in singleton context, this becomes simply a JSOP_OBJECT in the bytecode.

A possible fix would be to handle nested object literal construction with the ObjLiteral infrastructure, but that's a large amount of additional complexity (and it still would not reach parity with the baseline in this case), and we didn't see an issue with the current approach on the more mainstream web tests (e.g., TP6). Anyway, whether or not to do that is a value judgment, and I'm open to discussion. Thoughts?

(In reply to Jan de Mooij [:jandem] from comment #34)

(In reply to Chris Fallin [:cfallin] from comment #33)

I wonder if the time is going into parse or if something else is becoming wacky with TI...

It shouldn't be parse time because the data file is loaded before the harness starts measuring (I think that's why it was moved into a separate file). As long as it's in the global scope lazy/full parsing also doesn't apply.

Ah, OK, that differs from my test just now. I will try to find the harness and run locally...

Issue still remains when measuring only run-time, after the main data array has been created.

Perf delta is still showing up as increased time in JIT code -- no useful information beyond that.

I've tried several hypotheses and have been unable as yet to work out what's going on:

  • I've worked through the cases in BytecodeEmitter::getConstantValue, which is what the singleton case that we're replacing had used to build the object at parse time, and the only possible difference I can see is that nested constant objects (e.g. from the top-level array) are always created with their own group via ObjectGroup::newPlainObject, whereas we do that only in singleton context. (Our weird hybrid solution is necessary to get an exact match to what code was doing before in non-singleton cases and avoid a nasty heap increase -- see last week's Phab comments.) I tried manually forcing every object created by ObjLiteral to be a singleton (force singleton flag to true in js/src/frontend/ObjLiteral.cpp) and this does not make a difference in this test's runtime.

  • I thought perhaps it had to do with the way in which we're encoding the integers -- they're uint32s (some of them bigger than 2^31), and I had to do a funny dance with the embedded Values that we use to build the object literals to avoid some test failures (Value::setInt32 if int32, Value::setDouble otherwise), so I thought perhaps we were falling into doing floating-point math when we should be doing integer math. However, in getConstantValue in the baseline, the same logic is used (via Value::setNumber, which I suppose I could have used instead...).

I'm out of ideas. It's difficult to diff directly, because there are two different code paths that are supposed to produce identical on-heap objects. Jan, do you have any other suggestions, and/or would you mind looking at this? I feel like I'm bashing my head against a wall (and I've done that for a month already to get this patch to avoid other pitfalls).

I looked a bit into this. There's the following JSOP_SETELEM (line 47 in my beautified version):

        d[a] = d[a - b] ^ c

Where we now end up doing a VM call to js::SetObjectElementWithReceiver instead of doing something more efficient.

From what I can tell this is because all those non-singleton objects have the same shape but not the same group, so we get too polymorphic for the IC to handle. This is because the IC requires a type barrier so we do a group guard here.

Our options are (1) give these objects the same group (maybe ObjectGroup::newPlainObject?) (2) fix the ICs to not get polymorphic when the group is different.

Flags: needinfo?(cfallin)

Thanks for looking into that, Jan. I've gone ahead and created a rollback for this, for now (see bug 1594753), and am working on a fix.

The groups heuristics have been central to all of the perf and heap-bloat issues -- the second-to-last revision of this patch actually used ObjectGroup::newPlainObject to allocate all literal objects but this turned out to be a ~2% heap increase vs. an exact match of what the old code did, namely NewBuiltinClassInstance<PlainObject> for template objects to give to JSOP_NEWOBJECT. (See diff between the patches here -- this is what fixed the 2% heap regression.) So I don't think it's quite as simple as just using ObjectGroup::newPlainObject.

My principle in fixing all the regressions has been "do exactly what the old code did (w.r.t. final heap state), because it reached a local maximum and any change in behavior will become a regression somewhere". So, digging into the old code:

  • Singleton initializers (for JSOP_OBJECT) are always allocated with ObjectGroup::newPlainObject.

  • The object that becomes the template object (fields only, no values) for JSOP_NEWOBJECT is always allocated with NewBuiltinClassInstance<PlainObject>.

  • The choice between the two is determined by checkSingletonContext().

However, the old code also supported nested singleton initializers, via BytecodeEmitter::getConstantValue: in this case, there is an array of objects whose fields are arrays. The old code built this entire tree during parse and then emitted a single JSOP_OBJECT to produce the value. The ObjLiteral infrastructure doesn't support nested object creation with values (call this the "deep constant" case) but it does support template objects as before. So in the new code, what is likely happening (I haven't disassembled yet) is (i) emit the leaf-node arrays as JSOP_OBJECT, (ii) emit a bunch of JSOP_NEWOBJECT with templates, each of which is allocated as a singleton because it's in singleton context, (iii) assign the arrays to the objects, (iv) create a new top-level array, and (v) assign the objects into the array slots.

The procedural initialization shouldn't matter for this regression because it's happening at data-load time, not runtime, as you mentioned earlier. However, the key difference leading to the different groups is step (ii). I think the right fix should be to switch the "is it a singleton" logic for template objects from singleton-context to "never" -- this matches what the old code did.

Given the history of this patch, I have some guarded skepticism (there's a distinct possibility we'll hit some other obscure regression) but I'll give this a try and run the usual tests with it.

Flags: needinfo?(cfallin)

I think I've fixed it locally; try-runs are going here.

It turns out to have been really simple -- the patch is here with just an additional if-condition clause. The above thoughts weren't quite right -- it didn't have to do with (general) objects but with arrays -- and in particular, the way that the heuristics choose to either embed an object with JSOP_NEWARRAY_COPYONWRITE or bytecode to build the array from scratch.

Prior to this patch:

  • In non-singleton context, all array literals of size >= 5 are built by embedding an array object (with values) in bytecode and using the COW opcode.
  • In non-singleton context, all array literals of size <= 4 are built by just generating the bytecode to fill in slots, as this is (according to the source comment) faster.
  • In singleton context, all array literals with compatible constants are built by BytecodeEmitter::getConstantValue at parse time and used straight from the script via JSOP_OBJECT.

With this original patch, prior to today's fix:

  • In non-singleton or singleton context, array literals of size >= 5 are build with ObjLiteralWriter and instantiated with either JSOP_OBJECT (singleton) or JSOP_NEWARRAY_COPYONWRITE (non-singleton).
  • In non-singleton or singleton context, array literals of size <= 4 are built by generated bytecode.

So the fix is simply to force use of ObjLiteralWriter no matter what in singleton context. So, for small arrays in singleton context, we go from producing arrays with individual groups to COW'ing a canonical array, which preserves the group. The overall behavior during build is still less optimal than the entire deep-constant tree that the old code could do, but once constructed, the heap state should now be the same as before the patch.

The rollback is Lando'ing at the moment, so IMHO we should let that go through, and wait for the above tests, and make sure everything is happy, then try landing again tomorrow or Monday.

Instead, this patch introduces a new ObjLiteral mini-bytecode format
that is used to carry object-literal information from parse time to a
later time at which GC objects are safe to allocate. The mini-bytecode simply
specifies a list of fields and constant field values.

The original intent of this patch (realized in previous versions of it)
was to make this an opcode, and completely replace object creation
sequences (NEWINIT, INITPROP, INITPROP, ...) with one OBJLITERAL opcode.
However, there are quite a few performance regressions that occur when
replacing the finely-tuned set of optimizations around this with a new
mechanism.

As a result, this patch only defers allocation of the objects until the
very end of parse. Each object literal adds an ObjLiteralCreationData
instance to the GC-things list, and when the GC-things list is processed
to perform deferred allocations, the described objects will be created.

This is a rebased version of the original patch (landed as D47985 and
then backed out) with the Kraken regression (bug 1594753) fixed as noted
in the bug above.

I think this should be ready to try a re-land now. The Phab patch above is rebased to m-c (after landing the rollback), and I fired off a few more try-runs just in case on Friday: awsy, talos, js-bench. Everything looks good, I think. (Unfortunately, Taskcluster seems to have gone through an upgrade/config change over the weekend, and I can't add any more jobs to those try-runs as a result ("Wrong version of actions.json, unable to continue"), but I think this combined with the Kraken run above should be enough?)

Attachment #9107555 - Attachment description: Bug 1580246: Remove object-literal singleton objects allocated at parse. r=mgaudet → Bug 1580246: Remove object-literal singleton objects allocated at parse. r=mgaudet,jandem
Pushed by mgaudet@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/21f755c04005
Remove object-literal singleton objects allocated at parse. r=mgaudet,jandem
Regressions: 1597988
Regressions: 1610192
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: