Closed Bug 1488579 Opened Last year Closed 7 months ago

Generate fast constructors for PlainObjects

Categories

(Core :: JavaScript Engine: JIT, enhancement, P3)

enhancement

Tracking

()

RESOLVED INACTIVE
Tracking Status
firefox63 --- affected

People

(Reporter: mgaudet, Assigned: mgaudet)

References

Details

Attachments

(1 file, 4 obsolete files)

One of the barriers standing in the way of removing unboxed objects is the speed of JSON.parse. 

Investigation suggests that the majority of that speedup happens because of the generated constructor that is created for an unboxed layout. 

In principle, for the JSON.parse case, it seems that there's nothing that would stop us from generating a similar constructor for regular plain objects.
Attached is a 1-file version of Kraken's parse-financial benchmark. 

As well, this version is modified to run the benchmark in a loop to help identify noise.
Attachment #9006380 - Attachment description: Kraken Parse Financial with inline Data (1 file versio) → Kraken Parse Financial with inline Data (1 file version)
This patch adds constructor generation to a PlainObject construction path, as
well as adds an object-construction-method-selector to
JSONParserBase::finishObject for the purposes of evaluation.

FastConstruction:
- A constructor is generated for a plain object with a set of properties after
  a certain numberr of allocations have been seen (100 by default here).

The object construction is exposed via the 'JSON_MODE' environment variable.
The modes are as follows:

JSON_MODE=0: FastConstruct, only tenure allocate if explicitly requested
JSON_MODE=1: FastConstruct, tenure allocate if explicitly requested, or if the
             object group has been marked as shouldPreTenure.
JSON_MODE=2: MaybeUnboxedObject construct: This construction path may fill in
             an unboxed layout, and may use an unboxed layout constructor
JSON_MODE=3: Regular PlainObject construction, nothing special at all.
Assignee: nobody → mgaudet
Status: NEW → ASSIGNED
Here's the results of the modes from my machine: 

    $ for i  in 0 1 2 3; do echo -ne "$i\t"; JSON_MODE=$i  ./dist/bin/js  ~/perf/kraken-parse-financial.js; done;
    0       Times: 34,29,30,29,29,28,27,28,28,27 average:  28.9
    1       Times: 54,50,50,49,49,52,56,55,54,53 average:  52.2
    2       Times: 34,33,32,32,31,32,30,31,30,30 average:  31.5
    3       Times: 52,48,48,49,49,48,48,48,53,51 average:  49.4

As well: 

    JSON_MODE=2 ./dist/bin/js --no-unboxed-objects  ~/perf/kraken-parse-financial.js
    Times: 34,33,32,32,36,34,34,33,34,35 average:  33.7

Ok, my diagnosis of the constructor being the root cause of unboxed objects speed seems to have been misremembered from last time I was looking at this. These results seem to indicate it might be worth re-investigating why ObjectGroup::newPlainObject is so fast even with unboxed objects disabled. 

Another thing stands out to me.

JSON_MODE 0 and 1 are identical outside of the tenured allocation check: For some reason the object group seems to have its pre-tenured bits set, and this impacts performance quite badly. What I'm unclear on is if this indicates a more fundamental issue with our pretenuring heuristic. 

ni?sfink as he was interested in this based on IRC discussions.
Flags: needinfo?(sphink)
Severity: normal → enhancement
Priority: -- → P3
QA Contact: sdetar
QA Contact: sdetar
Will update more Monday, but I figured the tenuring issue out, so dropping the ni?
Flags: needinfo?(sphink)
See Also: → 1500920
This patch adds constructor generation to a PlainObject construction path, as
well as adds an object-construction-method-selector to
JSONParserBase::finishObject for the purposes of evaluation.

FastConstruction:
- A constructor is generated for a plain object with a set of properties after
  a certain numberr of allocations have been seen (100 by default here).

The object construction is exposed via the 'JSON_MODE' environment variable.
The modes are as follows:

JSON_MODE=0: FastConstruct, only tenure allocate if explicitly requested
JSON_MODE=1: FastConstruct, tenure allocate if explicitly requested, or if the
             object group has been marked as shouldPreTenure.
JSON_MODE=2: MaybeUnboxedObject construct: This construction path may fill in
             an unboxed layout, and may use an unboxed layout constructor
JSON_MODE=3: MaybeUnboxedObject no construct : This construction path may fill in
             an unboxed layout, and will not generate a constructor.
JSON_MODE=4: Regular PlainObject construction, nothing special at all.
JSON_MODE=5: Regular PlainObject construction, forced tenuring.
Attachment #9006390 - Attachment is obsolete: true
Current status: Very orange try build [1] due to issues sweeping the Fast Construction table. 

  0  INTERNED_STRING_TO_JSID(JSContext*, JSString*) [StringType.h : 996 + 0x0]
  1  <name omitted> [jsfriendapi.cpp : 1425 + 0x7]
  2  js::DispatchTyped<IsAboutToBeFinalizedInternalFunctor<jsid>, bool *> [jsfriendapi.h : 2532 + 0xb]
  3  bool js::gc::IsAboutToBeFinalizedInternal<jsid>(jsid*) [Marking.cpp : 3499 + 0x5]
  4  js::ObjectGroupRealm::FastConstructionTableSweepPolicy::needsSweep(js::ObjectGroupRealm::FastConstructionKey*, js::ObjectGroupRealm::FastConstructionEntry*) [Marking.h : 102 + 0x8]
  5  JS::GCHashMap<js::ObjectGroupRealm::FastConstructionKey, js::ObjectGroupRealm::FastConstructionEntry, js::ObjectGroupRealm::FastConstructionKey, js::SystemAllocPolicy, js::ObjectGroupRealm::FastConstructionTableSweepPolicy>::sweep() [GCHashTable.h : 80 + 0x8]
  6  js::ObjectGroupRealm::sweep() [ObjectGroup.cpp : 2009 + 0x5]
  7  SweepObjectGroups(js::GCParallelTask*) [Realm.h : 557 + 0x5]
  8  js::GCParallelTask::runFromHelperThread(js::AutoLockHelperThreadState&) [GCParallelTask.h : 130 + 0x8]
  9  js::HelperThread::handleGCParallelWorkload(js::AutoLockHelperThreadState&) [HelperThreads.cpp : 1782 + 0xb]
 10  js::HelperThread::threadLoop() [HelperThreads.cpp : 2642 + 0x14]
 11  js::HelperThread::ThreadMain(void*) 

Having said that, from a performance perspective I now have pretty much identical performance between the plain object constructors and the unboxed object constructors. There was an error in checking the tenured bit, described further in Bug 1500920.

[1]: https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=207007027&revision=b1fcebdb433d484566381087cdacdeb09494711a
Adds a new object constructor function

    js::NewPlainObjectWithPropertiesMaybeFastConstructed

This object creation path keeps track of objects with a particular set of
properties in a weak reference table.  If enough object construction occurs for
a particular entry in the table, and certain safety checks are passed, this
codepath will compile and execute a specialized constructor for future objects
with these properties.

This object construction path is only used at the moment in the JSON parser.
Attachment #9019088 - Attachment is obsolete: true
Curiously this shows a large (and significant) degradation  on perfherder for kraken-audio-beat-detector [1]. Which is baffling, as, AFAICT JSON.parse isn't invoked there [2]. 

JSON.parse financial does show a small degredation as well (though not marked as significant by perfherder). In my local testing however, I would put it at ~10-15% 

[1]: https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-central&newProject=try&newRevision=fae2ce55e42f&originalSignature=123a08fb5b7095bf46b82502d49f9b71e16aa3ad&newSignature=123a08fb5b7095bf46b82502d49f9b71e16aa3ad&framework=1&selectedTimeRange=172800

[2]: https://github.com/mozilla/arewefastyet/tree/master/benchmarks/kraken/tests/kraken-1.1
(In reply to Matthew Gaudet (he/him) [:mgaudet] from comment #8)
> Curiously this shows a large (and significant) degradation  on perfherder
> for kraken-audio-beat-detector [1]. Which is baffling, as, AFAICT JSON.parse
> isn't invoked there [2]. 

You don't see it locally? I wonder if it's an extra GC.

> JSON.parse financial does show a small degredation as well (though not
> marked as significant by perfherder). In my local testing however, I would
> put it at ~10-15%

What about "--no-unboxed-objects without patch" vs "--no-unboxed-objects with patch"?
(In reply to Jan de Mooij [:jandem] from comment #9)
> (In reply to Matthew Gaudet (he/him) [:mgaudet] from comment #8)
> > Curiously this shows a large (and significant) degradation  on perfherder
> > for kraken-audio-beat-detector [1]. Which is baffling, as, AFAICT JSON.parse
> > isn't invoked there [2]. 
> 
> You don't see it locally? I wonder if it's an extra GC.

I'm going to take a closer look at this in a minute, will get back to you. 

> 
> > JSON.parse financial does show a small degredation as well (though not
> > marked as significant by perfherder). In my local testing however, I would
> > put it at ~10-15%
> 
> What about "--no-unboxed-objects without patch" vs "--no-unboxed-objects
> with patch"?

So, to try to combat the variance I'm seeing on my machine, I am averaging twenty iterations. 

No Patch, No Unboxed Objects:   37,35,33,31,32,32,32,32,34,37,35,36,37,38,36,35,32,32,34,29 average:  33.95
No Patch, With Unboxed Objects: 32,30,30,30,28,28,28,30,28,29,29,28,29,29,28,30,26,26,26,26 average:  28.5
Patched,  No Unboxed Objects:   33,31,30,30,29,29,28,29,29,29,29,31,29,32,30,30,27,27,26,27 average:  29.25
Patched,  With Unboxed Objects: 30,29,29,29,29,28,29,28,28,29,29,30,31,30,31,29,27,29,27,29 average:  29

(Have I mentioned this is a terrible benchmark yet?)
(In reply to Jan de Mooij [:jandem] from comment #9)
> (In reply to Matthew Gaudet (he/him) [:mgaudet] from comment #8)
> > Curiously this shows a large (and significant) degradation  on perfherder
> > for kraken-audio-beat-detector [1]. Which is baffling, as, AFAICT JSON.parse
> > isn't invoked there [2]. 
> 
> You don't see it locally? I wonder if it's an extra GC.

No luck reproducing locally either, but the benchmark seems riddled with high variance.
Sorry for the terrible delay :/

I started looking at this. One thing that worries me is that this patch falls back to the slower NewPlainObjectWithProperties path in more cases than the current code. For instance, (even) with --no-unboxed-objects I expect the micro-benchmark below will regress with this patch.

So I wonder if there's a more incremental approach that just replaces the makeConstructorCode logic with something that handles both unboxed and native objects, but keeps using ObjectGroup::newPlainObject. Maybe we could start by moving UnboxedLayout::constructorCode_ to the plain object HashMap?

---

var arr = [];
for (var i = 0; i < 18; i++) {
    arr.push(`"x${i}":8`);
}
var str = "{" + arr.join(",") + "}";
function f() {
    var t = new Date;
    for (var i = 0; i < 100000; i++) {
        o = JSON.parse(str);
    }
    print(new Date - t);
}
f();
Blocks: 1505574
https://bugzilla.mozilla.org/show_bug.cgi?id=1505574#c2 Has some performance discussion of this patch. 

While it works for kraken-parse-financial, it doesn't help reduce the impacts on Speedometer.

To see if it was the threshold for generating a constructor, I dropped the threshold to 20, and compared against 100 (both with unboxed objects disabled): https://mzl.la/2PkoVke 

No effect
Attachment #9019510 - Attachment description: Bug 1488579 - Code generate constructors for PlainObjects r?jandem → Bug 1488579: Code generate constructors for PlainObjects r?jandem
I looked at the GC stuff here some, but I think I'm getting a little lost as to what the desired behavior is.

Where I came in, there was an unrooted PlainObjectEntry being passed in and held live across a GC that invalidated its contents. So I suggested implementing a trace() method that traces everything necessary to keep the entry alive and unswept, and using a Rooted<PlainObjectEntry>. (Except that seems to be wrong; it should be Rooted<PlainObjectEntry*>, which means we need to make it use NonGCPointerPolicy. But it's the same idea.)

However, there is already a GCPolicy<js::ObjectGroupRealm::PlainObjectEntry> trace() method that is used for a different purpose. It traces only constructorCode_. Presumably that gets called during plainObjectTable->trace(trc) in ObjectGroupRealm::trace() in Realm::traceRoots(). So... all constructor code is unconditionally rooted? Is the intention that it only get swept during compacting GCs or something?

By specializing templates the right way, we can certainly make plainObjectTable->trace() do something different than Rooted<PlainObjectEntry*>'s trace() does. But I'm trying to understand what's actually desired. A Rooted<PlainObjectEntry*> clearly should keep everything in that entry alive. Should constructorCode always be kept alive once it enters this plainObjectTable cache? Or at least, kept alive until a compacting GC?

If so, then... ugh. Rooted<T> uses GCPolicy<T>::trace() with no way to override, and GCHashMap<K,V> calls GCPolicy<K>::trace() and GCPolicy<V>::trace() with no way to override. You can use your own sweep policy no problem, but the trace policy appears to be hardcoded.

I guess the thing to do would be to make PlainObjectEntry::trace() trace everything (for Rooted<>), and instead of calling plainObjectTable->trace(), do the table tracing manually for constructor roots. I think it's all public stuff, so just doing a loop like this should work:

  for (Enum e(*this); !e.empty(); e.popFront()) {
    TraceNullableEdge(trc, &entry->constructorCode_, "constructor code");
  }

maybe? It actually makes a little more sense to me to do this trace directly when handling plainObjectTable, since it's the semantics of that instance in particular that warrants the funky tracing.

I think https://searchfox.org/mozilla-central/source/js/src/jit/RematerializedFrame.h#264-267 is the code to copy from that makes Rooted<Entry*> work.
Attachment #9026430 - Attachment is obsolete: true
Attachment #9019510 - Attachment is obsolete: true

Resolving this as inactive: Unless we find a need to resuscitate this as an optimization, we'll keep this closed.

Status: ASSIGNED → RESOLVED
Closed: 7 months ago
Resolution: --- → INACTIVE
You need to log in before you can comment on or make changes to this bug.