Closed Bug 1273481 Opened 4 years ago Closed 3 years ago

addEventListener much slower than in Chrome and Safari

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: jandem, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Noticed this on the Knockout benchmark in bug 1241091.

See the attached addEventListener micro-benchmark:

Safari 9.1:  202 ms
Chrome 52:  1366 ms
Nightly 49: 2238 ms (with asyncStack > 2800, but disabled on release)

A quick profile shows:

* 35% under dom::GetIncumbentGlobal(). Most of it is under JS::GetScriptedCallerGlobal(). This API we can easily make much faster, I think. I'll work on that.

* 26% under dom::CallbackObject::Init. Almost all of this is PLDHashTable::Add.

* 27% under nsINode::AddEventListener

I can work on GetScriptedCallerGlobal and then take a closer look at the other two.
FWIW, a recent Servo build gets 400-420 ms, much closer to Safari's numbers.
One could of course argue that testing addEventListener performance is super silly, since it is usually done very rarely (comparing to other DOM operations), but fine.

Is there a easily runnable testcase for this?


If dom::CallbackObject::Init is about mozilla::HoldJSObjects hashtable usage, we can make that quite a bit faster by appending initially to some array and populate the hashtable on demand.
(In reply to Olli Pettay [:smaug] from comment #2)
> One could of course argue that testing addEventListener performance is super
> silly, since it is usually done very rarely (comparing to other DOM
> operations), but fine.

The frameworks benchmark in bug 1241091 happens to hit addEventListener a lot, especially the Knockout version. I agree that addEventListener shouldn't be super hot, but frameworks/benchmarks do a lot of weird stuff.

> Is there a easily runnable testcase for this?

Oops forgot to attach my micro benchmark, will do it now.
Attached file Microbenchmark
Hmm, for this particular benchmark it would be better if we didn't create the binding layer EventListener at all, but just did the comparison using JS object and the type+capture flags.
That shouldn't be too hard to implement, assuming EventTarget.addEventListener took object? as listener. But silly it certainly would be.

Another option, perhaps better, is that callee of some method which takes a callback  or callback interface should somehow tell to the caller (binding layer) that it accepted the listener, and only if so, the binding layer would call holdjsobjects etc.
Depends on: 1273511
> If dom::CallbackObject::Init is about mozilla::HoldJSObjects hashtable usage

It is.  Lots of stuff there gets inlined (e.g. my profiler on Mac claims we call PLDHashTable::Add directly from the CallbackObject ctor).  But that's the source of the hashtable bits.

> Another option, perhaps better, is that callee of some method which takes a callback or
> callback interface should somehow tell to the caller (binding layer) that it accepted
> the listener, and only if so, the binding layer would call holdjsobjects etc.

This we can theoretically do, for the specific case of function arguments.

But here's my question: if we don't call HoldJSObjects, but we're storing a pointer to a JS thing, and a GC happens, will we end up tracing our thing just for purposes of it moving?  I'm guessing we won't, so we need to handle that somehow, right?
So a possible proposal; please let me know if this makes sense:

1)  We add a way to construct a CallbackObject without it calling HoldJSObjects.
2)  We use this in bindings for operation arguments (the cases when isMember is false).
3)  We store the resulting CallbackObject in a Rooted<> specialization that traces its JS bits,
    to make sure they stay alive and get updated on move while the Rooted is on the stack
4)  When coming off the stack, we check the refcount of the CallbackObject.  If it's not 1,
    we call HoldJSObjects on it.
well, we could either perhaps ensure that we don't GC during the method call, or store the JS object in JS::Handle, and only put it to JS::Heap once HoldJSObjects has been called.
And the proposal sounds ok. (4) should probably move the rooted JS object to the relevant JS::Heap member variable.
(In reply to Olli Pettay [:smaug] from comment #9)
> And the proposal sounds ok. (4) should probably move the rooted JS object to
> the relevant JS::Heap member variable.

Well, I think the idea is that you always store the JS object in the member variable, but you have a Rooted subclass that just looks at the CallbackObject to do the trace.

The proposal makes sense to me, assuming that we are often going to not have to HoldJSObjects when we care about performance. (I don't know what is going on in this test case.)
Can Rooted<> play well with Heap<>? I mean, in case of moving GC, do we have easy ways to ask Heap<> stuff to be moved inside Rooted<>? Maybe tracing just works in this case. Can't recall the APIs for this stuff now.

With this case we currently always call HoldJSObjects, but we actually need to call it only once out of all those 2000000 calls.
> we could either perhaps ensure that we don't GC during the method call

Can't be done.  Need to ensure this not just during the method call, but also during processing of the later arguments.  So it totally breaks even for addEventListener.

> or store the JS object in JS::Handle, and only put it to JS::Heap once HoldJSObjects has been called.

We could do that.  But then we'd have to break the optimization as soon as we enable (if ever) async stacks, because _that_ JSObject is only available inside the CallbackObject impl.  Hence my proposal to just trace everything inside the CallbackObject.
The patches in bug 1273661 implement comment 7 and shave off somewhere around 300ms (on a baseline of 1900-2000) for me.
Depends on: 1273828
OK, I tried this again with the patches for bug 1273661, bug 1273511, and bug 1273828.  That brings the time for the attachment down to about 400-500ms.  Safari is at 200-250 on the same hardware, Chrome at 1600-1700.

Most of the time (80+% certainly, maybe more; depends on the profiler overhead because I'm comparing profiler-measured times to wall-clock times) is still under addEventListener.  This breaks down like so (all percentages of the addEventListener time):

30% dom::GetIncumbentGlobal:
  6% is getting the current JSContext
 10% JS::GetScriptedCallerGlobal
  5% xpc::NativeGlobal (complete with icky QI bits; we could make this better!)
  2% getting subject principal
  2% Subsumes checks

30% nsINode::AddEventListener:
  5% GetListenerManagerForNode, mostly the hashtable Add call.
  3% IsChromeDoc() calls
 20% AddEventListenerByType; well over half of this is the GetEventMessageAndAtom call.

12% malloc.  Presumably for the EventListener thing.

 9% CallbackObject constructor, partly self time, partly the postBarrier call the Heap assignment does.

 8% CallbackObject::AddRef, mostly the NS_CycleCollectorSuspect3 call
    (shouldn't that be in Release? Is my profiler lying?)

 5% post barrier when clearing out the mCallback member in HoldJSObjectsIfMoreThanOneOwner.

If we really care, there are some more possible wins here, but I'm not sure how much we care.  Getting rid of 50% of the remaining time would take some work, for sure.
(In reply to Boris Zbarsky [:bz] from comment #14)
> OK, I tried this again with the patches for bug 1273661, bug 1273511, and
> bug 1273828.  That brings the time for the attachment down to about
> 400-500ms.  Safari is at 200-250 on the same hardware, Chrome at 1600-1700.

Awesome! I think that's good enough for now: addEventListener wasn't a *huge* part of the (original) total benchmark time, but this should definitely help a bit.

I'll investigate more and file bugs. I think these micro-benchmarks are silly but they often reveal some performance issues that are relatively easy to fix.
(In reply to Boris Zbarsky [:bz] from comment #14)
>  8% CallbackObject::AddRef, mostly the NS_CycleCollectorSuspect3 call
>     (shouldn't that be in Release? Is my profiler lying?)

It is either-or. AddRef calls suspect if it is not suspected already, otherwise Release.
This change was done during incremental CC stuff.
Jandem, sorry to ask in the wrong place, but does "javascript.options.asyncstack" affect other benchmarks?
I tried this benchmark and was seeing a regression between Release and Nightly, but in the end it as due to that option. Nightly got 3x faster after disabling it. (And I know it's not right to benchmark on Nightly).
> but does "javascript.options.asyncstack" affect other benchmarks?

Yes.  It makes Promises slower and it makes all webidl callbacks slower.
I'm marking this fixed, since the obvious slow stuff has been fixed and Nightly (when the async stack stuff is disabled) is quite a bit faster than blink (Nightly 575, Chrome52 1600 on my linux64). And the testcase is totally insane.
Feel free to open a new bug for "slower than Safari" case if needed.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.