Open Bug 1425466 Opened 6 years ago Updated 2 years ago

[meta] investigate reducing xpconnect overhead based on frontend usage

Categories

(Firefox :: General, enhancement, P3)

enhancement

Tracking

()

Tracking Status
firefox59 --- affected

People

(Reporter: Gijs, Unassigned)

References

(Depends on 4 open bugs)

Details

(Keywords: meta)

Attachments

(1 file)

Attached is a log of startup + shutdown of a clean profile, and all the js-to-cpp xpconnect accesses, excluding QI. (NB: so this doesn't include accesses the other way round where cpp accesses JS)

I don't yet know if it's going to be more helpful to have this type of log where we actually do more stuff (e.g. for some mochitest or talos runs) but this seems like an interesting start. I'll also look at the QI-specific fastpath we have, and I'll be looking at the opposite direction (cpp -> js). But even in this under-1h effort to instrument some of this, there's some obvious low-hanging fruit, so I figured I'd start off a metabug.

Specifically, based on the head of this file:

3078 Calling native method equals on iface nsIJSID with 2 args
1544 Calling native method import on iface nsIXPCComponents_Utils with 5 args
 710 Calling native method filename on iface nsIPluginTag with 1 args
 612 Calling native method ID on iface nsIXULAppInfo with 1 args
 574 Calling native method utils on iface nsIXPCComponents with 1 args
 516 Calling native method getBoolPref on iface nsIPrefBranch with 4 args
 456 Calling native method addMessageListener on iface nsIMessageListenerManager with 3 args
 453 Calling native method interfaces on iface nsIXPCComponents with 1 args
 441 Calling native method name on iface nsIPluginTag with 1 args
 437 Calling native method name on iface nsIJSID with 1 args

- we do a *lot* of interface id equals / name operations from JS, most likely from within QI implementations. We should be able to reduce that number dramatically, or if that doesn't work, build a fast path for it in xpconnect. It's not clear that JS QI implementations (which includes the XPCOMUtils-generated ones) that return 'this' really actually need to do any work (ie are there actually cases where it would be bad to return `this` without checking iid), and perhaps using nsIClassInfo can reduce this somewhat, too.
- turns out we use Cu.import a lot! We could just put it on e.g. ChromeUtils or some other webidl interface, which would likely make it faster. There may be other stuff on Cu that will want to move there.
- no idea why we're calling so much stuff on nsIPluginTag or nsIXULAppInfo.ID . We shouldn't need to. I suspect the PluginTag stuff is because of the blocklist stuff on startup. Florian, do we have bugs on file to make the blocklist thing better?
- Global replace of Components.{interfaces,classes,utils} usage with Ci/c/u (and making them available on globals) will reduce some of that overhead.
- we should probably look if there are prefs we're looking up "manually" (without caching) some crazy order of magnitude ( > 5).
- perhaps we can move the message listener stuff to webidl? I believe this might be bug 888600.

There's going to be more stuff here, I'll file dep bugs on some of this as I work through it.
(In reply to :Gijs from comment #0)

> - Global replace of Components.{interfaces,classes,utils} usage with Ci/c/u
> (and making them available on globals) will reduce some of that overhead.

See bug 767640.
Depends on: 767640
(In reply to :Gijs from comment #0)
> 3078 Calling native method equals on iface nsIJSID with 2 args

These are probably from XPCOMUtils.generateQI. We could probably add a native helper for that. From what I remember of the startup profiles I looked at, I don't think it would save more than 10ms.

> 1544 Calling native method import on iface nsIXPCComponents_Utils with 5 args

Bug 1388221 will probably help with this.

>  710 Calling native method filename on iface nsIPluginTag with 1 args

That... is extremely unfortunate.

>  612 Calling native method ID on iface nsIXULAppInfo with 1 args

These can probably mostly switch to AppConstants

>  516 Calling native method getBoolPref on iface nsIPrefBranch with 4 args

I wonder how many of these can use cached prefs, instead. The pref service has plenty of overhead beyond XPConnect.

>  456 Calling native method addMessageListener on iface
> nsIMessageListenerManager with 3 args

This is bug 888600
Depends on: 1384727, 888600
(In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're blocked) from comment #3)
> (In reply to :Gijs from comment #0)
> > 3078 Calling native method equals on iface nsIJSID with 2 args
> 
> These are probably from XPCOMUtils.generateQI. We could probably add a
> native helper for that. From what I remember of the startup profiles I
> looked at, I don't think it would save more than 10ms.

AIUI, if we fix 1384727, these ids will just be ints that we could compare directly in JS without ever going into native code. 

In any case, even if it's "only 10ms", I'm not 100% sure how reliable sample-based profiling is for stuff that is this fast individually, but used this frequently. I could be wrong, though. It certainly seems like something that we could improve, and 10ms on startup seems meaningful enough if it doesn't require rearchitecting the world.


The other thing I'm currently looking at is that for a few of these calls, overhead seems to be much, *much* larger than for other calls (like 1-30 millis. instead of ~10 micros. for *individual* calls, counting *only* (a subset of) the xpconnect overhead).
(In reply to :Gijs from comment #4)
> (In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're
> blocked) from comment #3)
> > (In reply to :Gijs from comment #0)
> > > 3078 Calling native method equals on iface nsIJSID with 2 args
> > 
> > These are probably from XPCOMUtils.generateQI. We could probably add a
> > native helper for that. From what I remember of the startup profiles I
> > looked at, I don't think it would save more than 10ms.
> 
> AIUI, if we fix 1384727, these ids will just be ints that we could compare
> directly in JS without ever going into native code. 

Unlikely. We'd probably still have nsIID objects, but they'd have numeric IDs rather than UUID IDs, and even accessing those would require an XPConnect call.

> In any case, even if it's "only 10ms", I'm not 100% sure how reliable
> sample-based profiling is for stuff that is this fast individually, but used
> this frequently. I could be wrong, though. It certainly seems like something
> that we could improve, and 10ms on startup seems meaningful enough if it
> doesn't require rearchitecting the world.

Just thinking in terms of cost-benefit analysis. It probably wouldn't be a huge amount of work to replace XPCOMUtils.generateQI with a native version, though. And we could probably even fast path that case when QIing an XPCWrappedJS.

> The other thing I'm currently looking at is that for a few of these calls,
> overhead seems to be much, *much* larger than for other calls (like 1-30
> millis. instead of ~10 micros. for *individual* calls, counting *only* (a
> subset of) the xpconnect overhead).

Are you sure those calls aren't just triggering GCs?
(In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're blocked) from comment #5)
> > The other thing I'm currently looking at is that for a few of these calls,
> > overhead seems to be much, *much* larger than for other calls (like 1-30
> > millis. instead of ~10 micros. for *individual* calls, counting *only* (a
> > subset of) the xpconnect overhead).
> 
> Are you sure those calls aren't just triggering GCs?

The ones I've looked at seem to be slow in NativeData2JS. Like, accessing the content window through the `content` property on the frame message manager (in browser-child.js) and through `getInterface(nsIDOMWindow)` in browser.xml seems to take between 1 and 30-40 milliseconds. This is reproducibly the same set of calls every time I run my naive test (open browser, wait a bit, quit browser). When looking at this with :froydnj, we figured the overhead will be creating (xray?) wrappers for the first time for that content window, though things on the order of 30 milliseconds still seem like too much to be spending on this (this is a non-pgo opt build on a fast mbp).

I don't know how to easily tell whether GC is happening in the meantime - at the moment I'm just printf-instrumenting with mozilla::TimeStamp diffs inside XPCNativeWrapper's CallMethodHelper::Call (and subtracting time spent in NS_InvokeByIndex to exclude the 'actual' method call), because the individual callsites are mostly way too quick for these bits to accurately show up in sample-based profiling. If there are better points to do this logging, I'd be happy to know.
(In reply to :Gijs from comment #6)
> (In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're
> blocked) from comment #5)
> > > The other thing I'm currently looking at is that for a few of these calls,
> > > overhead seems to be much, *much* larger than for other calls (like 1-30
> > > millis. instead of ~10 micros. for *individual* calls, counting *only* (a
> > > subset of) the xpconnect overhead).
> > 
> > Are you sure those calls aren't just triggering GCs?
> 
> The ones I've looked at seem to be slow in NativeData2JS. Like, accessing
> the content window through the `content` property on the frame message
> manager (in browser-child.js) and through `getInterface(nsIDOMWindow)` in
> browser.xml seems to take between 1 and 30-40 milliseconds.

Interesting. So, when NativeData2JS is slow, it's usually either because we're creating XPCNativeWrappers, or because we're GCing. For DOM windows, we use a preserved wrapper that's... special, but mostly WebIDL. So the only slowness there should be in creating CCWs or X-rays. It shouldn't be to the order of milliseconds unless we GC (which we very often do).

Do you have a profile handy that shows this?

> I don't know how to easily tell whether GC is happening in the meantime - at
> the moment I'm just printf-instrumenting with mozilla::TimeStamp diffs
> inside XPCNativeWrapper's CallMethodHelper::Call

If you're just printf instrumenting, I'd be willing to bet that this is just GC overhead. I see GCs from NativeData2JS in profiles all the time.
(In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're blocked) from comment #7)
> (In reply to :Gijs from comment #6)
> > (In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're
> > blocked) from comment #5)
> > > > The other thing I'm currently looking at is that for a few of these calls,
> > > > overhead seems to be much, *much* larger than for other calls (like 1-30
> > > > millis. instead of ~10 micros. for *individual* calls, counting *only* (a
> > > > subset of) the xpconnect overhead).
> > > 
> > > Are you sure those calls aren't just triggering GCs?
> > 
> > The ones I've looked at seem to be slow in NativeData2JS. Like, accessing
> > the content window through the `content` property on the frame message
> > manager (in browser-child.js) and through `getInterface(nsIDOMWindow)` in
> > browser.xml seems to take between 1 and 30-40 milliseconds.
> 
> Interesting. So, when NativeData2JS is slow, it's usually either because
> we're creating XPCNativeWrappers, or because we're GCing. For DOM windows,
> we use a preserved wrapper that's... special, but mostly WebIDL. So the only
> slowness there should be in creating CCWs or X-rays. It shouldn't be to the
> order of milliseconds unless we GC (which we very often do).
> 
> Do you have a profile handy that shows this?

I made one, I think. Expand from here:

https://perfht.ml/2jXhoXa

and, uh, at least as far as I can tell, it seems that actually getting `content` off the frame message manager is triggering the creation of about:blank, and the stylesheet cache for it, and goodness knows what else.

Off-hand, this seems confusing because it implies that native code could totally get a pointer to this window without all that stuff being there, but when we create the JS wrapper, we ensure we have a content viewer for the doc and we take a long time - is that expected?

If not, it may be worth filing a separate dep to figure this out. Of course, the browser-child js code is fetching the outer window ID and for that to happen maybe we need to do all that work anyway... not sure how easy it is to disentangle that.
(In reply to :Gijs from comment #8)
> and, uh, at least as far as I can tell, it seems that actually getting
> `content` off the frame message manager is triggering the creation of
> about:blank, and the stylesheet cache for it, and goodness knows what else.

Ah, yes, unfortunately this is a problem we run into all over the place, and
not really XPConnect-related... :( Basically, when we initially create a
docShell, we don't create a document or window for it, but that has to be
invisible to JS. So if JS code tries to access the document or window before
it's ready, we have to create an about:blank content viewer eagerly, and then
usually wind up blowing it away once the actual load starts.

We should really try to do less of that, but it's probably a separate issue.

> Off-hand, this seems confusing because it implies that native code could
> totally get a pointer to this window without all that stuff being there, but
> when we create the JS wrapper, we ensure we have a content viewer for the
> doc and we take a long time - is that expected?

It can, to some extent, yes. But the same sort of automatic eager creation
happens for native code unless it uses something like GetExtantDoc().

> If not, it may be worth filing a separate dep to figure this out. Of course,
> the browser-child js code is fetching the outer window ID and for that to
> happen maybe we need to do all that work anyway... not sure how easy it is
> to disentangle that.

Agreed. We definitely wind up creating about:blank content viewers that we
don't need way too often.
Depends on: 1425586
See Also: → 1425587
Depends on: 1425600
Depends on: 1425611
Depends on: 1425613
Depends on: 1425616
Depends on: 1421969
Depends on: 1425617
Depends on: 1425619
Depends on: 1436584
Depends on: 1446940
No longer depends on: 1446940
Depends on: 1456035
Priority: -- → P3
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: