Closed Bug 962256 Opened 9 years ago Closed 9 years ago

Exact rooting regressed dromaeo_css and mootools by ~5%


(Core :: JavaScript Engine, defect)

Not set



Tracking Status
firefox29 + fixed
firefox30 --- fixed


(Reporter: terrence, Assigned: bzbarsky)




(4 files, 2 obsolete files)

See this for a starting place:

We should be able to profile the harder-hit tests to find the regressing roots.
More precisely, it regressed dromaeo_css by 5%.  The mootools subtest regressed by 10%.  The subtests of the mootools subtest I don't have data on from compare-talos, but it's worth running on to see what's going on there.
Could you help us understand how this will impact users?
Flags: needinfo?(bzbarsky)
How a 10% regression in our handling of a popular library will impact users?  Probably by making web pages slower.
Flags: needinfo?(bzbarsky)
(In reply to Boris Zbarsky [:bz] from comment #3)
> How a 10% regression in our handling of a popular library will impact users?
> Probably by making web pages slower.

Thanks for that I was not aware of the popularity of the library. Tracking based on Comment 3.
For the mootools:div~div test, most of the extra time looks like it's distributed like:

6.3% - js::Proxy::callProp
  6.1% - js::Proxy::get
    5.5% - mozilla::dom::HTMLDocumentBinding::DOMProxyHandler::get
      1.8% - JS_ForwardGetPropertyTo
      1.6% - JS_HasPropertyById
      0.9% - JS_AlreadyHasOwnPropertyById
5.2% - js::jit::DoGetElemFallback
  2.1% - proxy_GetElement
    1.9% - js::Proxy::get
      1.4% - mozilla::dom::NodeListBinding::DOMProxyHandler::get
1.5% - js::jit::DoGetPropFallback
1.2% - nsINode::QuerySelectorAll

Percentages are increase of total time.

Also of note: Call->Invoke->Invoke->RunScript->Interpret appears to be about 0.2% slower in this trace.
JS_ForwardGetPropertyTo ends up re-rooting everything.  Maybe we should just handlify it?  The callsites in mozilla::dom::HTMLDocumentBinding::DOMProxyHandler::get are all already rooted.

JS_HasPropertyById is already handlified, though....  So is JS_AlreadyHasOwnPropertyById.

nsINode::QuerySelectorAll shouldn't be affected by exact rooting at all.  :(
I can take a look at handlifying JS_ForwardGetPropertyTo.  It doesn't seem to be on the list of bug 959787.
Depends on: 963652
I'd guess that JS_HasPropertyById and JS_AlreadyHasOwnPropertyById are more expensive with exact rooting because they locally root the return value of a function they call (an object and a shape).  Handlification doesn't help because they aren't returning those values to any other function, they are just checking that the return value satisfies some property.

JS_HasPropertyById only checks that the returned shape isn't null (ignoring the object), and JS_AlreadyHasOwnPropertyById only checks that the returned object is the same as the object passed in (ignoring the shape).  In theory, I could imagine creating some specialized version of LookupPropertyById that avoids the need to root the return values, by performing those checks deep within the callees, but in practice it would have to happen deep within some call stack, and it seems to involve some kind of generic function callbacks, so I'm not sure if you could even do it.
Two of those four roots aren't used after the call, so it may be safish to from-marked-location on them, as attached. Unfortunately that code calls into the ops table, so if someone is using those outparams to root and expects them to stay live, it could be quite bad.

Another option would be to make the ops support <NoGC>. Or to have some ballast rooted slots around. We could also have an interface that passes in the temporaries so that the caller could shave roots, maybe using the JS stack. All of that is going to be much more code, however.
Attached patch shave_roots_bl_get_stubs-v0.diff (obsolete) — Splinter Review
This is a bit less sketchy. It removes or delays some roots in baseline's GetProp and GetElem fallbacks. I don't think it introduces any new rooting issues, but I've kicked off a try run to verify. Please disregard the review request if that is red.

It seems to save about 0.3% - 1.2%, depending on the trace. Having run a handful more traces now, I can say that the stack from main up to event dispatch is either 0% or 6% slower, depending on the trace. I'm guessing this is a layout/code size/timing issue related to adding tons of new inline code more than specific instructions. Would be nice to hit this with VTune to be sure.

The majority of the remaining overhead (~5%) that might feasibly be in our control appears to be the Rooted in Proxy's INVOKE_ON_PROTOTYPE. I'm going to try getPrototype<NoGC> next.
Attachment #8365567 - Attachment is obsolete: true
Attachment #8366269 - Flags: review?(sphink)
I was (unsurprisingly) reading instrument's output incorrectly. For GetElem, the above patch actually does:

          Total   Self
Without:  660ms  255ms
With:     619ms  188ms
No EXR:   536ms  160ms

Not perfect, but still a substantial improvement. Looks like there is one hazard for me to deal with.

The GetProp stub is restored to its original performance, but is too few samples to make any real conclusions either way.
This should be better.
Attachment #8366269 - Attachment is obsolete: true
Attachment #8366269 - Flags: review?(sphink)
Attachment #8366346 - Flags: review?(sphink)
Proxy::get is not implicated: we bail to handler->get before creating the root. One instance of Proxy::get is even faster in the traces I'm looking at. It looks like the 1% performance shift either way is down to noise.

What *is* implicated is mozilla::dom::HTMLDocumentBinding::DOMProxyHandler::get: the self time here goes from 74ms before to 158ms afterwards. Other than that it's down to 5ms here, 7ms there. There may be some fat in the SetElementIC to trim too, including 20ms in DefinePropertyOrElement. Note, another instance of DOMProxyHandler for NodeListBinding is only a tiny bit slower too, so it may be something specific to HTMLDocumentBinding.
Attachment #8366369 - Attachment description: cssquery-mootools.html → A single-file testcase useful for profiling.
Looking at the generated code, it seems that HTMLDocumentBinding::DOMProxyHandler::get has 4 new Rooted where 1 would do and NodeListBinding::DOMProxyHandler::get has maybe 1, so the numbers above do seem reasonable.
Hmm.  So the Rooted in HTMLDocumentBinding::DOMProxyHandler::get are:

1)  unforgeableHolder.  Really does need to be rooted, afaict.
2)  expando.  Needs to be rooted if not null.
3)  nameVal.  Needs to be rooted, afaict.
4)  result.  This could probably be a raw JSObject*.

How would 1 Rooted do here?
Those 4 roots have non-overlapping lifetimes, so for the objects you could do what we do for the interpreter: Rooted<JSObject*> root; Rooted<JSObject*>& unforgeableHolder = root; Rooted<JSObject*>& expando = root; etc. Or just re-use the root with a generic name.

The Value nameVal is only used in the else branch of a MOZ_LIKELY if-statement, so could be moved into the MOZ_UNLIKELY half and hopefully not created on most calls.

Of course all this is generated code, so those interdependent pieces might be quite hard to implement in practice. I was not even able to find where any of that is generated with my brief grepping.
Ah, I see.  I'll write up a patch to try that so we can test it.
Comment on attachment 8366346 [details] [diff] [review]

Switching review, since Steve is sick this week.
Attachment #8366346 - Flags: review?(sphink) → review?(jcoppeard)
Comment on attachment 8366346 [details] [diff] [review]

Review of attachment 8366346 [details] [diff] [review]:

So reduces the amount of rooting by rooting things later or not at all, mostly the JSScript where this is not used past something that can GC.

As such it looks fine, except for:

::: js/src/jit/BaselineIC.cpp
@@ +3971,2 @@
>          // Handle optimized arguments[i] access.
>          if (!GetElemOptimizedArguments(cx, frame, &lhsCopy, rhs, res, &isOptimizedArgs))

It turns out that GetElemOptimizedArguments() can set isOptimizedArgs to false and also update lhsCopy, meaning that this rearrangement is not valid.

We need to root lhsCopy in the original place so that GelElementOperation() below can see the new value.

The name 'isOptimizedArgs' is pretty misleading - it seems it really means 'is optimised args, and has been handled already'.  Maybe renaming to 'handledOptimizedArgs' would make this clearer.
Attachment #8366346 - Flags: review?(jcoppeard) → review+
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
How do the numbers look now?  Should I still be looking into the comment 16 issue?
Flags: needinfo?(terrence)
(In reply to Boris Zbarsky [:bz] from comment #22)
> How do the numbers look now?  Should I still be looking into the comment 16
> issue?[[72,63,21]]&sel=none&displayrange=30&datatype=running

Hard to interpret. We're maybe about halfway back to where we were on dromaeo-css on 11 Jan, but still ahead of where we were on 10 Jan? The roots in the last patch appeared to be about 20% of the overall slowdown with the roots in comment 16 being about 60%. I think we should still at least try to take a stab at those still. Deadline for uplift is this weekend; is there anything I can do to help?
Flags: needinfo?(terrence)
Resolution: FIXED → ---
Another interesting question: why are we even ending up in Proxy::get here?  What's a typical jsid coming through on those gets?
Note that this basically backs out bug 950991, since JSAutoCompartment in fact can't GC now
Assignee: terrence → bzbarsky
I filed bug 965992 on not calling into get() here at all.  Once we land the attached patch, I think we should just mark this resolved and then focus on avoiding the slow path.  ;)
Depends on: 965992
Agreed! I'm going to close this, as I think we've reached satisfaction with our investigation.
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Target Milestone: mozilla29 → ---
Whoops! I didn't realize that Boris' patch was still waiting on review.
Resolution: FIXED → ---
Flags: needinfo?(bobbyholley)
And please don't forget the uplift request to 29 ;)
Hrm.  Which parts of this need uplifting to 29?  We should not have landed parts of the bug without others... :(
That might sound stupid but patches which fixed this bug ? :)
Yes, and which of those are not on 29 already?
Sorry, I don't know. This bug is on my radar because it still open and tracked.
Comment on attachment 8368199 [details] [diff] [review]
yet another part.  Eliminate some unnecessary roots in HTMLDocumentBinding::DOMProxyHandler::get.

Review of attachment 8368199 [details] [diff] [review]:

The generated code here looks reasonable enough. I'll trust Boris on the design bits. r=bholley
Attachment #8368199 - Flags: review?(peterv) → review+
Flags: needinfo?(bobbyholley)
So to answer my own question, the parts that are not my patch already landed on 29, so don't need backporting.
Comment on attachment 8368199 [details] [diff] [review]
yet another part.  Eliminate some unnecessary roots in HTMLDocumentBinding::DOMProxyHandler::get.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Exact rooting turn on
User impact if declined: Slightly slower DOM manipulation
Testing completed (on m-c, etc.): Compiles and runs.
Risk to taking this patch (and alternatives if risky): Risk should be pretty low.
String or IDL/UUID changes made by this patch:  None.
Attachment #8368199 - Flags: approval-mozilla-aurora?
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
Attachment #8368199 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.