Closed Bug 1331684 Opened 5 years ago Closed 5 years ago

RemoteAddonsChild.jsm shouldLoad sync IPC messages are too expensive

Categories

(Toolkit :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: ehsan.akhgari, Assigned: kmag)

References

(Blocks 1 open bug)

Details

(Whiteboard: [qf:p1])

Attachments

(1 file)

See this profile for example: <https://clptr.io/2jkIRR5>.  All of the Msg_RpcMessage sync IPC messages are coming from <http://searchfox.org/mozilla-central/rev/790b2cb423ea1ecb5746722d51633caec9bab95a/toolkit/components/addoncompat/RemoteAddonsChild.jsm#187>.  This is taking a total of ~1.5 seconds in this profile.  As far as I can tell, we do this once for each load where we need to check the content policy in the child process.  I have uBlock Origin installed.

mconley, is there something we can do about this?
Flags: needinfo?(mconley)
The solution as far as uBlock is concerned is migrating it to a the WebExtension version, which already exists. I'm not sure if there's a good general fix we can implement before we turn off XUL extension support and rip out this code, but it would be nice if we had one.

We do also run into this problem for WebExtensions, but to a much smaller extent, since we only use those listeners for non-HTTP URLs. But we're going to need a better solution there, too, since we need to be able to handle extension WebRequest listeners asynchronously, and content policy checks need to be synchronous. The synchronous requests that we send there[1] can actually only be blocked by legacy extensions using WebRequest.jsm, currently.

[1]: http://searchfox.org/mozilla-central/rev/790b2cb423ea1ecb5746722d51633caec9bab95a/toolkit/modules/addons/WebRequestContent.js#169
Component: Add-ons Manager → General
Product: Toolkit → Core
Do we only run this code when uBlock Origin (and the like) are installed?  I'm asking because I disabled the add-on and re-profiled and got the same result...
In theory we should only run it when an add-on with e10s shims is installed and registers a content-policy listener in the parent process. And, actually, at this point the latest version of uBlock Origin should have shims disabled, so we shouldn't run it even then.
(In reply to :Ehsan Akhgari from comment #2)
> Do we only run this code when uBlock Origin (and the like) are installed? 
> I'm asking because I disabled the add-on and re-profiled and got the same
> result...

That's pretty strange. kmag is right on all counts - having uBlock use the WebExtension APIs would avoid the shims, naturally. If the add-on is marked multiprocessCompatible, it'll also skip the shims.

Which version of uBlock do you have? Did you restart after you'd disabled the add-on? Perhaps it didn't clean itself up properly.
Flags: needinfo?(mconley) → needinfo?(ehsan)
(In reply to Mike Conley (:mconley)  - PTO on Jan 20th from comment #4)
> (In reply to :Ehsan Akhgari from comment #2)
> > Do we only run this code when uBlock Origin (and the like) are installed? 
> > I'm asking because I disabled the add-on and re-profiled and got the same
> > result...
> 
> That's pretty strange. kmag is right on all counts - having uBlock use the
> WebExtension APIs would avoid the shims, naturally. If the add-on is marked
> multiprocessCompatible, it'll also skip the shims.
> 
> Which version of uBlock do you have?

1.10.4.

> Did you restart after you'd disabled
> the add-on? Perhaps it didn't clean itself up properly.

I didn't restart, but I hope that doesn't matter!

How do I check to see whether this add-on is multiprocessCompatible?  And how do I check the rest of my add-ons to find the culprit?  I'm not sure how to tell whether an add-on is using e10s shims.
Flags: needinfo?(ehsan)
Since there was some IRC confusion about this, I'm using uBlock *Origin* not uBlock.
(In reply to :Ehsan Akhgari from comment #5)
> (In reply to Mike Conley (:mconley)  - PTO on Jan 20th from comment #4)
> > (In reply to :Ehsan Akhgari from comment #2)
> > > Do we only run this code when uBlock Origin (and the like) are installed? 
> > > I'm asking because I disabled the add-on and re-profiled and got the same
> > > result...
> > 
> > That's pretty strange. kmag is right on all counts - having uBlock use the
> > WebExtension APIs would avoid the shims, naturally. If the add-on is marked
> > multiprocessCompatible, it'll also skip the shims.
> > 
> > Which version of uBlock do you have?
> 
> 1.10.4.
> 
> > Did you restart after you'd disabled
> > the add-on? Perhaps it didn't clean itself up properly.
> 
> I didn't restart, but I hope that doesn't matter!
> 
> How do I check to see whether this add-on is multiprocessCompatible?  And
> how do I check the rest of my add-ons to find the culprit?  I'm not sure how
> to tell whether an add-on is using e10s shims.

I thought we'd added that flag to telemetry and about:support, but apparently we didn't. The only way to check is to look at the install.rdf and check for the multiprocessCompatible flag. uBlock Origin 1.10.4 is definitely marked multiprocessCompatible, though.

Can you post your complete list of add-ons from about:support?
From your profile data, it looks like this is being caused by the MEGA add-on. It uses multi-process shims and has a content policy registered in the parent process:

https://dxr.mozilla.org/addons/source/addons/420906/bootstrap.js#190

That add-on has a quarter of a million users, too, so perhaps we should try to get it fixed (needinfoing Jorge and Andreas for that).

It might also be worth adding some telemetry to find out how much performance we're losing from this in the wild, and what add-ons are causing it...
Flags: needinfo?(jorge)
Flags: needinfo?(awagner)
(In reply to Kris Maglione [:kmag] from comment #8)
> From your profile data, it looks like this is being caused by the MEGA
> add-on. It uses multi-process shims and has a content policy registered in
> the parent process:
> 
> https://dxr.mozilla.org/addons/source/addons/420906/bootstrap.js#190

Ah yes, that makes perfect sense.

> That add-on has a quarter of a million users, too, so perhaps we should try
> to get it fixed (needinfoing Jorge and Andreas for that).

That would be wonderful.  I'll disable it locally for now.

> It might also be worth adding some telemetry to find out how much
> performance we're losing from this in the wild, and what add-ons are causing
> it...

Yes, I think that's a great idea.  FWIW (and sorry if I'm repeating what you already know) the performance implications of this can be extremely bad.  The shouldLoad function gets called once for every resource Gecko tries to load (typically through a call to NS_CheckContentLoadPolicy).  For example, a page with a 100 images/css/js files can trigger 100 of these synchronous IPC calls.  In my main browsing profile, in cases where the chrome process is busy, this sometimes takes a few seconds overall while loading large pages.
(In reply to :Ehsan Akhgari from comment #9)
> Yes, I think that's a great idea.  FWIW (and sorry if I'm repeating what you
> already know) the performance implications of this can be extremely bad. 
> The shouldLoad function gets called once for every resource Gecko tries to
> load (typically through a call to NS_CheckContentLoadPolicy).  For example,
> a page with a 100 images/css/js files can trigger 100 of these synchronous
> IPC calls.  In my main browsing profile, in cases where the chrome process
> is busy, this sometimes takes a few seconds overall while loading large
> pages.

Yes, I'd expect the impact to be pretty bad, but I'd like to have some more data on how many add-ons and how many users are affected before we decide how to handle it. There are a few options I can think of:

- Disable e10s for users of affected add-ons, and warn if they try to re-enable it without disabling those add-ons.
- Warn e10s users when those add-ons are causing problems via self-support. I don't think this problem will show up in the usual add-on performance number, since most of the overhead is in the non-add-on-specific RemoteAddonChild code.
- Try to get developers of the more popular add-ons to stop using these shims.
- Softblock these add-ons when e10s is enabled.
- Disable the content-policy shims entirely, and deal with whatever add-on breakage results.

I don't think we can actually make this code more performant without doing something extremely complicated, like trying to predict and pre-dispatch the checks, so those are the only kinds of options we're left with.

But which ones we choose depends a lot on the specifics.
Oh, or, possibly:

- Send the requests asynchronously, and cancel the load of the resource later if they're denied.

That would obviously have issues for privacy add-ons, and for loads of scripts and other things with side-effects (some of which we might be able to work around), but it would probably at least be better than killing the shims entirely.
Adding Mega contact. Kris, are there any good workarounds that don't involve migrating to WebExtensions, or is that the only way forward?
Flags: needinfo?(jorge)
Flags: needinfo?(awagner)
Registering the content policy in the content process rather than the parent, and disabling shims by marking the extension multiprocessCompatible, should be enough.
(In reply to Kris Maglione [:kmag] from comment #11)
> Oh, or, possibly:
> 
> - Send the requests asynchronously, and cancel the load of the resource
> later if they're denied.
> 
> That would obviously have issues for privacy add-ons, and for loads of
> scripts and other things with side-effects (some of which we might be able
> to work around), but it would probably at least be better than killing the
> shims entirely.

Doing anything async will block nsIContentPolicy semantics unfortunately, so that's not an option.

(In reply to Kris Maglione [:kmag] from comment #10)
> Yes, I'd expect the impact to be pretty bad, but I'd like to have some more
> data on how many add-ons and how many users are affected before we decide
> how to handle it. There are a few options I can think of:
> 
> - Disable e10s for users of affected add-ons, and warn if they try to
> re-enable it without disabling those add-ons.
> - Warn e10s users when those add-ons are causing problems via self-support.
> I don't think this problem will show up in the usual add-on performance
> number, since most of the overhead is in the non-add-on-specific
> RemoteAddonChild code.
> - Try to get developers of the more popular add-ons to stop using these
> shims.
> - Softblock these add-ons when e10s is enabled.
> - Disable the content-policy shims entirely, and deal with whatever add-on
> breakage results.
> 
> I don't think we can actually make this code more performant without doing
> something extremely complicated, like trying to predict and pre-dispatch the
> checks, so those are the only kinds of options we're left with.

We can't really predict anything useful because we need to know everything about the load, such as which document it is coming from, what its content policy type is, and so on.  I agree that there is no performant way of supporting this without changing the content policy API, which realistically isn't going to happen.  What we want is for the API to be async, but due to the place that it is called (by consumers of the channel before opening the channel) it would be extremely hard to change all consumers to not initiate their loads as one synchronous operation.
(In reply to :Ehsan Akhgari from comment #14)
> Doing anything async will block nsIContentPolicy semantics unfortunately, so
> that's not an option.

Right, we wouldn't be able to preserve the same semantics. We'd have to
either,

1) Allow the load initially and then, after getting a block response, cancel
the in-progress load, or remove the already loaded media, or,

2) Block the load initially, and then after getting an accept response,
attempt loading it again.

That's obviously not ideal, but as an alternative to removing the stubs
entirely, it could be worse.

> We can't really predict anything useful because we need to know everything
> about the load, such as which document it is coming from, what its content
> policy type is, and so on.

I suspect we could make Good Enough predictions about certain loads, like
batches of image nodes from the HTML parser. I just don't think it's worth the
trouble when the real solution is to stop using the shims.

> I agree that there is no performant way of supporting this without changing
> the content policy API, which realistically isn't going to happen.  What we
> want is for the API to be async, but due to the place that it is called (by
> consumers of the channel before opening the channel) it would be extremely
> hard to change all consumers to not initiate their loads as one synchronous
> operation.

I think that at some point, we're going to have to make the effort. Or at
least add a second async content policy implementation and use it in as many
places as possible. WebExtension APIs are required to be asynchronous, since
we extension code to be able to run in its own process. They currently handle
HTTP content blocking by suspending requests until the handlers make a
decision, but we also need to be able to handle blocking things like data:
URLs, and preventing windows from opening when a link click is blocked. (It
would be nice to handle that for window.open too, but our hands are tied with
that one needing to be synchronous)
(In reply to Kris Maglione [:kmag] from comment #15)
> > I agree that there is no performant way of supporting this without changing
> > the content policy API, which realistically isn't going to happen.  What we
> > want is for the API to be async, but due to the place that it is called (by
> > consumers of the channel before opening the channel) it would be extremely
> > hard to change all consumers to not initiate their loads as one synchronous
> > operation.
> 
> I think that at some point, we're going to have to make the effort. Or at
> least add a second async content policy implementation and use it in as many
> places as possible. WebExtension APIs are required to be asynchronous, since
> we extension code to be able to run in its own process.

I agree.  I think in the ideal world these checks would run in the chrome process instead of the content process, as part of the content security checks.  This makes us be able to trust the validity of the checks in the case of a compromised child, and it also allows us to properly suspend the channel while all checks (sync or async) are in progress.  However the nsIContentPolicy API is pretty popular in old school extensions, so our hands are fairly tied while we need to support the current API.  There is also a significant amount of engineering involved in actually implementing this change.

> They currently handle
> HTTP content blocking by suspending requests until the handlers make a
> decision, but we also need to be able to handle blocking things like data:
> URLs, and preventing windows from opening when a link click is blocked. (It
> would be nice to handle that for window.open too, but our hands are tied with
> that one needing to be synchronous)

We actually run a nested event loop when you call window.open() from a web page waiting for the XUL window opened to finish its loading, so at least in theory we should be able to receive information about whether the URL being opened can be loaded before returning from open() so we'd have the opportunity to return null when that happens or something...
    > The solution as far as uBlock is concerned is migrating it to a the WebExtension version, which already exists.

    I know that WebExtension is the ultimate solution (I do build a webext version[1]), but for the present the next stable version of uBO (post 1.10.4) will no longer install a nsIContentPolicy handler for FF 44+.[2] The latest dev version on AMO contains the change.[3]

    I did benchmark an improvement in page load speed without the nsIContentPolicy handler.[4] It has been a while now, but last time I measured the webext version of uBO, it was slower than the non-webext version, and that was at a time where the nsIContentPolicy handler was still in use. Anyways, I understand all this is still under development.

    [1] https://github.com/gorhill/uBlock/releases -- see "uBlock0.webext.zip"

    [2] https://github.com/gorhill/uBlock/commit/a303c7800ed441c1741e203e04bdee2a945f02c0#diff-d3ea8a3bc47566813b636414574b74ca

    [3] https://addons.mozilla.org/en-US/firefox/addon/ublock-origin/versions/beta

    [4] Using http://www.raymondhill.net/ublock/pageloadspeed.html and benchmarking https://en.wikipedia.org/wiki/List_of_country_calling_codes
(In reply to Kris Maglione [:kmag] from comment #15)
> I think that at some point, we're going to have to make the effort. Or at
> least add a second async content policy implementation and use it in as many
> places as possible. WebExtension APIs are required to be asynchronous, since
> we extension code to be able to run in its own process. They currently handle
> HTTP content blocking by suspending requests until the handlers make a
> decision, but we also need to be able to handle blocking things like data:
> URLs, and preventing windows from opening when a link click is blocked. (It
> would be nice to handle that for window.open too, but our hands are tied with
> that one needing to be synchronous)

I think those will be necessary anyway since webextensions don't provide the full capability of nsIContentPolicy. They can't access the DOM node that triggered the request. see bug 1210990
(In reply to The 8472 from comment #18)
> (In reply to Kris Maglione [:kmag] from comment #15)
> > I think that at some point, we're going to have to make the effort. Or at
> > least add a second async content policy implementation and use it in as many
> > places as possible. WebExtension APIs are required to be asynchronous, since
> > we extension code to be able to run in its own process. They currently handle
> > HTTP content blocking by suspending requests until the handlers make a
> > decision, but we also need to be able to handle blocking things like data:
> > URLs, and preventing windows from opening when a link click is blocked. (It
> > would be nice to handle that for window.open too, but our hands are tied with
> > that one needing to be synchronous)
> 
> I think those will be necessary anyway since webextensions don't provide the
> full capability of nsIContentPolicy. They can't access the DOM node that
> triggered the request. see bug 1210990

Do you mean synchronous access to the DOM node in question?  That seems like something that we can't support efficiently across the process boundary. :(
It would not have to be across process boundaries. Such a callback could work in the context of a content script.
You're right!
Attachment #8836536 - Flags: review?(benjamin)
Comment on attachment 8836536 [details]
Bug 1331684: Record time spent blocked on shouldLoad shims per add-on.  data-r?bsmedberg

https://reviewboard.mozilla.org/r/111944/#review113838

I'd like to see this again with descriptions.

::: toolkit/components/telemetry/Histograms.json:37
(Diff revision 1)
> +    "expires_in_version": "58",
> +    "kind": "exponential",
> +    "high": 60000,
> +    "n_buckets": 20,
> +    "keyed": true,
> +    "description": "The amount of time the content process blocked processing an add-on's shouldLoad shim for a given document prior to the load event.",

This needs to describe the key. I presume that it's the addon ID?

Reading the histogram description, it seems that one value will be recorded per document? Is that so? Can you make that more explicit in the description?
Attachment #8836536 - Flags: review?(benjamin)
(In reply to Benjamin Smedberg [:bsmedberg] from comment #23)
> This needs to describe the key. I presume that it's the addon ID?

Yes, it's the add-on ID.

> Reading the histogram description, it seems that one value will be recorded
> per document? Is that so? Can you make that more explicit in the description?

Yes, once per document for each of the two histograms.
Comment on attachment 8836536 [details]
Bug 1331684: Record time spent blocked on shouldLoad shims per add-on.  data-r?bsmedberg

https://reviewboard.mozilla.org/r/111944/#review114320
Attachment #8836536 - Flags: review?(wmccloskey) → review+
Attachment #8836536 - Flags: review?(benjamin)
Comment on attachment 8836536 [details]
Bug 1331684: Record time spent blocked on shouldLoad shims per add-on.  data-r?bsmedberg

https://reviewboard.mozilla.org/r/111944/#review114624

data-r=me
Attachment #8836536 - Flags: review?(benjamin) → review+
Assignee: nobody → kmaglione+bmo
Keywords: leave-open
https://hg.mozilla.org/integration/mozilla-inbound/rev/cea69ac2d670c7b506f4eae1cc31b9889923b1c8
Bug 1331684: Record time spent blocked on shouldLoad shims per add-on. r=billm data-r=bsmedberg
Well, from the initial telemetry, it looks like this is a pretty big problem:

https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2017-02-20&keys=firefox%2540mega.co.nz!wrc%2540avast.com!adblockultimate%2540adblockultimate.net!%257Bfe272bd1-5f76-4ea4-8501-a05d35d823fc%257D&max_channel_version=nightly%252F54&measure=ADDON_CONTENT_POLICY_SHIM_BLOCKING_LOADING_MS&min_channel_version=null&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2017-02-18&table=0&trim=1&use_submission_date=0

It looks like the various AdBlock forks tend to block for longer than half a second prior to page load, and sometimes longer than 10 seconds. Avast averages about 30ms, but is sometimes much longer. Mega is not the most expensive, but it does affect the most users.

Jorge, how do you thing we should start handling this? As I understand it, the Mega add-on is dead, so perhaps we should consider blocklisting it. Maybe we should contact some of the other authors and ask for updates.
Flags: needinfo?(jorge)
I think this is a pretty good argument that we should not allow add-ons that aren't explicitly marked as multiprocessCompatible on release. Even if we blacklist the ones that are causing problems here, I don't think enough work has been done to measure the performance impact of other shims.
I'll contact the developers. I can't find "{257Bfe272bd1-5f76-4ea4-8501-a05d35d823fc}" on AMO, however.
Flags: needinfo?(jorge)
Thanks for the analysis, Kris, this is really great!

Should we also land similar probes for other shims to find the add-ons impacted negatively by them?
(In reply to :Ehsan Akhgari from comment #33)
> Should we also land similar probes for other shims to find the add-ons
> impacted negatively by them?

I think Bill's right. We need to discuss whether we want to allow shims on release at all. If we don't, it's probably not worth the effort of instrumenting the other shims.
Blocks: 1322441
Whiteboard: [qf:p1]
Kris, what form does the action on this bug take?  We'd like to figure out what to do with this bug in the QF triage.  Do we need to land a mozilla-central change as part of this bug, or do something else?  Thanks!
Flags: needinfo?(kmaglione+bmo)
Bill came up with a list of safe add-ons for e10s, and Kev is working on changing the roll-out plan based on that.

We should probably still consider disabling these shims, or at least warning users of those add-ons that they shouldn't use them in e10s mode, but it looks like they won't be rolling out to release users.
Flags: needinfo?(kmaglione+bmo)
Kris - can we move this bug out of Core::General ?
Flags: needinfo?(kmaglione+bmo)
We're removing support for shims, so this is no longer something we need to address.
Status: NEW → RESOLVED
Closed: 5 years ago
Flags: needinfo?(kmaglione+bmo)
Keywords: leave-open
Product: Core → Toolkit
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.