Closed Bug 1022988 Opened 6 years ago Closed 3 years ago

[e10s] Speedometer benchmark regresses significantly in e10s mode

Categories

(Core :: General, defect, P3)

x86_64
Linux
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
e10s + ---

People

(Reporter: johns, Assigned: mrbkap)

References

(Blocks 2 open bugs, )

Details

(Whiteboard: [qf:p1])

Benchmark: http://browserbench.org/Speedometer/
More info: https://www.webkit.org/blog/3395/speedometer-benchmark-for-web-app-responsiveness/

I drop from ~47 to ~27 runs-per-minute in e10s mode. Tested in linux x64. Basic layers vs acceleration doesn't seem to have an impact.
tracking-e10s: --- → ?
See Also: 1022239
So looking at this it looks like there are noticeably longer pauses between refreshing the iframe the test is using, which happens 480 times during the test. The networking panel shows second long gaps between each burst of network activity, but not us *in* a network request, though I don't know if that is maybe not tracking time spent in IPC or some such.

@sworkman, do you know if there's a reasonable way to profile these requests in e10s mode, to try and see where the slowdown is in these loads? We may end up having to wait for the gecko profiler to be e10s ready to get a good insight here.
Flags: needinfo?(sworkman)
I don't know any offhand. Jason wrote our IPDL code, so he might have an idea.
Flags: needinfo?(sworkman) → needinfo?(jduell.mcbugs)
I assume we track network timings in IPC mode, i.e. the sort of timings in between AsyncOpen and actually getting data, etc.

I suppose it's possible that somehow there could be a big delay involving necko e10s.  The easiest way to tell might be to use NSPR logging with timestamps on, and see if there seems to be a big delay between HttpChannelChild::AsyncOpen on the child and nsHttpChannel:AsyncOpen on the parent (or the OnStartRequest/OnDataAvailable/OnStopRequest calls going in the opposite direction).

If the main thread on the child is very busy it's possible that it could block necko from delivering data.  The HTML parser in non-e10s mode now has OnDataAvailable delivered off-main thread, but we're still stuck with main thread delivery on the child for now.
Flags: needinfo?(jduell.mcbugs)
Retesting this in recently nightly, I get ~50 in non-e10s and ~40 in e10s builds, rerunning each test twice and getting very similar numbers. So we've closed about half of the gap, but there's still a definite regression
See Also: → 1062713
Assignee: nobody → mrbkap
Blocks: e10s-perf
:johns, I don't know if you'd find this useful, but my patches in bug 1062713 comment 10 may give you more info here.  Specifically they can help distinguish between delays that are caused by taking a longer time to launch a necko channel, vs the necko channel taking longer to load.
Flags: needinfo?(jmathies)
ie11: 33.9
e10s: 45.8
non-e10s: 51.7
chrome: 96.8

11.4% e10s regression on Windows.

P2 maybe?
Flags: needinfo?(jmathies) → needinfo?(blassey.bugs)
Not part of our release criteria and still better than Chrome, so I'm going to go with a P3 for now.
Flags: needinfo?(blassey.bugs)
Priority: -- → P3
(In reply to Brad Lassey [:blassey] (use needinfo?) from comment #7)
> Not part of our release criteria and still better than Chrome, so I'm going
> to go with a P3 for now.

A higher score is better, I think you meant we're still better than ie? Chrome is currently kicking e10s and non-e10s butt which isn't surprising since it's their test suite.
http://browserbench.org/Speedometer/
Firefox 47.0.1 - 31.3
Firefox 50.0a1 (e10s) - 35.1
Chromium 51.0.2704.106 - 69.24 
Firefox is slower than Chromium by almost 2x times.
We're seeing at least a 2X perf difference here between Chromium (now 56) and Firefox (now 51). Not only on Speedometer, but in our own (large) codebase.

Any thoughts from the Mozilla team here? Can we get this changed from being a P3 to something a bit more urgent?
Filing a separate bug with a way to reproduce the problem you're experiencing in your own code would be valuable. There's no guarantee that there's any relationship between the problem with Speedometer and your code.
No longer blocks: TimeToFirstPaint_FB
Is this Speedometer regression with e10s still an issue?
Whiteboard: [qf]
Flags: needinfo?(bob)
Tested SM2 (https://sm.duct.io/) with the 2017-08-10 nightly off m-c rev 4d54ac07b8c97f0e6713dab2ba694023b5b2f3b5. Clean profile for each run.

Win10, Firefox 64-bit
e10s on:  118.7
e10s off: 114.3

Win10, Firefox 32-bit
e10s on:  123.8
e10s off: 122.4

Ubuntu 17.04 VM, Firefox 64-bit
e10s on:  84.96
e10s off: 70.76

Looks like e10s is a win now, if anything. Let me know if you need anything else.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
Flags: needinfo?(bob)
\o/  Thanks for testing!
Yes thank you for testing it so quickly. Updating bug so it reflects in QF reporting.
Whiteboard: [qf] → [qf:p1]
You need to log in before you can comment on or make changes to this bug.