Loading youtube takes 3x longer in e10s when using Adblock Plus

RESOLVED FIXED in mozilla36

Status

()

defect
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: Ehsan, Assigned: billm)

Tracking

(Blocks 1 bug)

unspecified
mozilla36
x86
macOS
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(e10s+)

Details

Attachments

(6 attachments, 1 obsolete attachment)

e10s is generally slower in loading all web pages, but it is particularly slow in loading youtube.com.  Testing here in a non-e10s and e10s window in the same running browser, I get roughly 3s of load time in non-e10s and 14s in e10s mode.  That difference in behavior is very evident, it would be nice to know what's going on here...

Comment 1

5 years ago
HTML5 youtube or Flash-youtube? If this is Flash-related, any profiling or diagnosis should wait for bug 874016 which will change the architecture significantly.
Flags: needinfo?(ehsan.akhgari)
This is youtube.com, the home page.  Running |document.querySelectorAll("embed").length| in the Web Console on that page returns 1, so there is one Flash plugin on that page.  But I cannot reproduce this on <https://www.adobe.com/software/flash/about/>.

Also, note that terrible load times are totally reproducible on HTML5 video pages as well.
Flags: needinfo?(ehsan.akhgari)
See also bug 1022988 about a regression in the speedometer benchmark, which relies heavily on frame loading
See Also: → 1022988
Flags: needinfo?(mconley)

Updated

5 years ago
Blocks: e10s-perf

Comment 4

5 years ago
Ehsan:

Can you take my suggestion from bug 1022988 comment 3, i.e. run this with NSPR logging, and see if there's any appreciable delays between AsyncOpen of channels on the child and their corresponding AsyncOpen on the parent?  (And in the other direction, OnStart/OnData/OnStop)?  That could give us some idea of whether necko is delaying things here, or something else.

If that's too complicated just posting the NSPR log (or emailing it to me if you're got passwords/cookies in it) would be a start.

My simple tests of repeated necko loads in e10s xpcshell has so far never shown any significant latency addition under e10s. Obviously things could get complicated if the main thread on the child gets real busy, etc.  But the issue here may be more that other things in e10s are slow, and so they don't launch network loads until a later timestamp.
Flags: needinfo?(ehsan.akhgari)

Comment 5

5 years ago
s/other things in e10s are slow/other things in a full e10s browser (vs xpcshell) are slow/
(In reply to Jason Duell (:jduell) from comment #4)
> Ehsan:
> 
> Can you take my suggestion from bug 1022988 comment 3, i.e. run this with
> NSPR logging, and see if there's any appreciable delays between AsyncOpen of
> channels on the child and their corresponding AsyncOpen on the parent?  (And
> in the other direction, OnStart/OnData/OnStop)?  That could give us some
> idea of whether necko is delaying things here, or something else.

Sure.  What modules do you want logged?  And is that logging turned on in Nightly or do I need to kick of a local opt build?

> If that's too complicated just posting the NSPR log (or emailing it to me if
> you're got passwords/cookies in it) would be a start.

I'd probably do this.  :-)


(Note that this may or may not be networking related...)
Flags: needinfo?(ehsan.akhgari) → needinfo?(jduell.mcbugs)

Comment 7

5 years ago
Ehsan,

The modules listed here will be fine: (make sure to use 'timestamp')

   https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging

Logging is enabled on all of our builds--you just need to set the environment variables to turn it on.
Flags: needinfo?(jduell.mcbugs)
Sent the log to Jason.
Jason, did you get a chance to look at my log?
Flags: needinfo?(jduell.mcbugs)

Comment 10

5 years ago
Very sorry to take so long with this.  I realized quickly that the NSPR log format was too hard to get decent data out of, so I need to instrument necko a bit to get some relevant statistics more easily.  Here's what I came up with:

I captured both the "start time" of each URI loaded from going to www.youtube.com (measured as a delta from the start time of the initial www.youtube.com load), and the "load time" (the delta from AsyncOpen to OnStopRequest). The "start time" is to get a sense of whether we're taking longer to request network channels under e10s (i.e. the layers above necko are churning more slowly than in single-process mode).

The summary is that I'm not seeing anything like a 3x slowdown under e10s in loading youtube.com (I'm on linux, with an opt build of m-c).  Experientially there's no major discernable difference in the viewport experience, and the stats I gathered don't show much difference either.

Over 7 runs (both e10s and single-process), I got these aggregrate statistics:

Delta between start of initial www.youtube.com and start of other channels: 
single process average: 2590 ms 
e10s: average channel launch time was 117ms later than single process

Channel load time: 
single process average: 263 ms
e10s: average was 90ms faster!

There's some variance in these numbers, but it's not massive.

The shorter e10s channel load time is counterintuitive (logically e10s loads are exactly the same as single process, except with IPDL overhead, so you'd think they ought to be >= in timing, not shorter).  My guess is that the main thread event loop is busier (higher latency) in single-process mode, and there are several places (on-modify-request, on-examine-response) where the networking thread essentially does a blocking wait on a main-thread event dispatch. (and the OnStart/Data/Stop events obviously may wait in the queue longer).  This is a really interesting result, and makes me wonder if we've got some serious optimizations possible if we can skip/refactor the way we do our event dispatching in necko.  In any case, it looks like e10s necko is not a serious perf hit, but actually a win.

The delay in launching necko channels does look like a performance loss.  I don't know what to speculate is causing things to take longer here.  But the difference is more like 5% than 300%.

Notes:
1) I cleared the cache before each run, so these are all network load times, not cache
2) There's some "dark matter" here: I only measured URIs that were loaded by both a single process and an e10s run.  About 1/3rd of URIs were different each time I loaded youtube, so those loads aren't captured in the stats here.

Ehsan, are you still seeing a 3x slowdown?
Flags: needinfo?(jduell.mcbugs)

Comment 11

5 years ago
In case anyone is curious about how I gathered the data here.

Comment 13

5 years ago
I'm happy to post the other 6 logs if someone is interested.

Updated

5 years ago
Flags: needinfo?(ehsan.akhgari)

Comment 14

5 years ago
I'm a little puzzled as to why we take so long to launch page subresources in these traces.  We issue .css loads only 80 ms after launching the initial youtube.com URI, which means we're getting data back by then.  But then it's 1000ms before we issue the next resource load.
I'm definitely still seeing the 3x slowdown (basically for my real usage, I have completely stopped using youtube in e10s windows, I just load it in a non-e10s window.)

I think comment 14 and comment 10 are probably suggesting that the problem is not in the networking code.  I think Necko was mostly just a guess, and perhaps it's the wrong one. :-)
Flags: needinfo?(ehsan.akhgari)

Comment 16

5 years ago
Ehsan, note that comment 14 applies to both e10s and single-process modes.

Any chance you could apply the first patch and give me the (tweaked) NSPR log with some e10s/non-e10s runs?  Make sure to clear the cache before each run.

I agree the problem is probably not necko here, but I'd expect we'll see a difference in how soon your channels are launched and that might help focus a little on where the slowdown is. A stab in the dark, but a fairly low-cost one...

BTW you only need the nsHttp module to get what I need (will keep log size down):

  NSPR_LOG_MODULES=timestamp,nsHttp:5
Flags: needinfo?(ehsan.akhgari)
Sent the log from a build with Jason's patch to him via email.
Flags: needinfo?(ehsan.akhgari)

Comment 19

5 years ago
ehsan: your e10s log was missing info for the 1st youtube.com load.  I think I found the issue and put up a new instrumentation patch.  Would you mind applying it and re-running the logs?  Probably best to re-run both single/e10s, since youtube.com's URI set changes with the weather.
Flags: needinfo?(ehsan.akhgari)
Log sent.
Flags: needinfo?(ehsan.akhgari)

Comment 21

5 years ago
Posted file results.ehsan
So ehsan is living in a very different world than me :)  He's getting incredible crappy results:

Average channel launch (after initial youtube.com load) in single process: 3474 ms
(e10s loads started on average 15601 ms later)
Average Load time (AsyncOpen->OnStopRequest) of channel, single process: 648
(e10s channels took on average 621 ms longer to load)

He gave me two run--the stats above are for the attached log.  The summary of the other one is similar:

Average channel launch (after initial youtube.com load) in single process: 2593 ms
(e10s loads started on average 16713 ms later)
Average Load time (AsyncOpen->OnStopRequest) of channel, single process: 622
(e10s channels took on average 352 ms longer to load)

Necko results here aren't great (50%-100% slower to load channels) but the real killer is that channels are getting launched on average 15-16 *seconds* later than in single process mode.

I've got no clue at this point why ehsan's e10s is so slow while mine is basically just as fast as single-process mode.  needinfo'ing ehsan in case he's got any useful idea/info (are you running opt/debug? what platform? Addons?)
Flags: needinfo?(ehsan.akhgari)

Comment 22

5 years ago
Comment on attachment 8499712 [details]
results.ehsan

Note: my benchmarking patch turns out not to handle redirects well, so redirected channels aren't included in the results.  I don't expect that it skews the results too much, but it's why the initial "www.youtube.com" URI isn't in the results (it gets redirected to https)
Posted file about:support
I have no idea what's going on here, attached my about:supports.  This is normal Nightly in my main profile.
Flags: needinfo?(ehsan.akhgari)
Hah!  I just disabled Adblock Plus, and the problem went away!  Jason, do you have this extension?  If not, can you please install it and test with that?
Flags: needinfo?(jduell.mcbugs)

Comment 25

5 years ago
Yeah, the problem here is definitely AdBlock Plus.

CPOW?

Average channel launch (after initial youtube.com load) in single process: 2043 ms
(e10s loads started on average 8047 ms later)
Average Load time (AsyncOpen->OnStopRequest) of channel, single process: 453
(e10s channels took on average 1346 ms longer to load)
Flags: needinfo?(jduell.mcbugs)
Summary: Loading youtube takes more than 3x the amount of time in e10s compared to non-e10s windows → Loading youtube takes 3x longer in e10s when using Adblock Plus
I think the next step here would be to profile this.
Depends on: 1008435
Bug 1073664 might be relevant here, it has a profile of LastPass walking the DOM via CPOW that is also incredibly slow.
As we've traced the slowdown to ABP, I don't think we needinfo from me anymore.
Flags: needinfo?(mconley)
I will look into what could be done here on either side.
Assignee: nobody → evilpies
Sorry to steal this Tom. This looks like a bug in my code. But please don't let that stop you from switching Adblock to use content scripts :-).

Here are some notes. I'll try to post a patch tomorrow.

- Youtube HTML player loads the same 1x1 image via a data: URL over 700 times. Each one triggers the content policy in Adblock.
- The content process doesn't return to the event loop during this time.
- Meanwhile, the chrome process has already started loading video and sending it to the content process over IPC.
- Each time the content process runs the content policy, it sends a sync message to the parent asking if it should block the request. While it's waiting for the sync response, it iterates over all the HTTP messages that have come in to see if it should dispatch them. In doing so, it copies them all. Some of them are quite big (> 100KB of data).

Simply having to process 700 requests is kind of bad, but it's something CPOWs should be able to handle. At 50us per CPOW request, that's not so bad. There's a lot of redundancy in the CPOW traffic, which I'd like to fix. But I think the real problem here is the last bullet. Prior to bug 1049879, we didn't need to copy all these messages. I think that avoiding the copies (which is definitely possible) will help a ton here.

However, Ehsan filed this bug before bug 1049879 landed, so he was experiencing a different problem. Maybe just too many CPOWs, I don't know. In any case, I'll fix that stuff in other bugs.
Please see the previous comment for an explanation.
Assignee: evilpies → wmccloskey
Status: NEW → ASSIGNED
Attachment #8513780 - Flags: review?(dvander)
Comment on attachment 8513780 [details] [diff] [review]
ipc-slowness-fix

Review of attachment 8513780 [details] [diff] [review]:
-----------------------------------------------------------------

Seems like IPC::Message could do with move semantics at some point.
Attachment #8513780 - Flags: review?(dvander) → review+
Haha. Yeah, I was thinking about doing that, but it looked kinda hard. I filed bug 1091258 though.
https://hg.mozilla.org/integration/mozilla-inbound/rev/5fa716a9fa1b

Bug 1082849 will track improvements in CPOW traffic, which will also help Youtube+Adblock.
https://hg.mozilla.org/mozilla-central/rev/5fa716a9fa1b
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
You need to log in before you can comment on or make changes to this bug.