Closed
Bug 1347666
Opened 8 years ago
Closed 2 months ago
Huff Post page load CPU usage much higher in Nightly vs 45 ESR
Categories
(Core :: General, defect)
Core
General
Tracking
()
RESOLVED
WORKSFORME
Performance Impact | low |
People
(Reporter: Harald, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: perf)
Attachments
(1 file)
309.58 KB,
image/png
|
Details |
(Accessible behind VPN on http or auth0 on https):
Combined dashboard [1] shows that huff post article page loads about 35% slower in Nightly vs ESR 45.
Firefox 45: http://wpt1.dev.mozaws.net/result/170308_EQ_4B/1/details/
Nighty: http://wpt1.dev.mozaws.net/result/170313_PC_4C/1/details/
Patrick McManaus looked at the results and pointed out that CPU usage is much higher in Nightly.
[1]: https://metrics.mozilla.com/protected/shiny/dstrohmeier/poor_nightly/ (pick uff post in the 'Aggregated boxplots over time' section to get clickable scatterplot)
Comment 1•8 years ago
|
||
its this page
http://www.huffingtonpost.com/entry/why-you-need-to-quit-your-job_us_58b5ac8de4b02f3f81e44cb8
and that's a 55 nightly.. not sure if we can get the build cset or not
Reporter | ||
Updated•8 years ago
|
Blocks: QuantumFlow
Updated•8 years ago
|
Whiteboard: [qf] → [qf:investigate]
Comment 2•8 years ago
|
||
Someone needs to capture a profile.
Profile from today's nightly on Win64: https://perfht.ml/2mSvicf
(I'm new to this; please let me know if I captured the wrong thing.)
Comment 4•8 years ago
|
||
There is definitely something weird going on here but I don't think that's your fault! There are a fair number of sync IPC messages going around here <https://perfht.ml/2neDRkl> and I just refuse to believe that they all happen to finish in exactly half a millisecond, even though it seems like they all took one sample. Was your sampling frequency 0.5ms?
Markus, this looks like a super fishy profile. Can you please have a look?
Flags: needinfo?(mstange)
Comment 5•8 years ago
|
||
I think the profile looks fine, with the exception of the profile interval: It looks like the chosen interval was 0.5ms but the actual interval we were able to achieve was 2ms. I've seen this before on Windows profiles. Either we are slower to capture stacks on Windows, or we're running into timer resolution problems, or something else is happening.
The tracing markers at the top give you the exact times for the sync IPC messages. Most of them are below 0.5ms. For example I can see one GetCookieString message that took 0.11ms.
That's not hard to believe because the parent process looks completely idle.
Flags: needinfo?(mstange)
You're right -- I played around with the addon when I first installed it, and later forgot that I had set my sampling frequency to 0.5ms. Is there a recommended value that I should use for reporting profiles in bugs?
Comment 7•8 years ago
|
||
Usually 1ms, but since we apparently don't manage to go lower than 2ms on Windows, I recommend 2ms on Windows.
If the intended interval is close to the actual interval, the numbers on perf.html are more trustworthy and the graphs look better.
(In the thread stack graphs at the top, we draw each sample's bar with the width of the intended interval. In the profile you posted, if you zoom in a few times (by selecting a range and clicking the magnifying glass button), you'll see large gaps between the individual sample bars. If the actual interval and the intended interval match, these bars will be mostly adjacent.)
Comment 8•8 years ago
|
||
I filed bug 1350593 about the profiler resolution issue on Windows. I was wondering why those lines look like that...
David, can you please re-profile? Thanks!
Oh sorry, I thought I had already posted a new one.
Here's one at 2ms: https://perfht.ml/2nX5UXc
The ads took forever to load, maybe bad hotel wifi?
Comment 10•8 years ago
|
||
And that one seems to have a problem with symbols. Third time's the charm? https://perfht.ml/2o6UUmG
Comment 11•8 years ago
|
||
The reflow times may be worth looking into (no idea whether they are unusual or not.) The paint times also seem really high, but it seems like they're all in some kind of JIT code? And a lot in CrossProcessSemaphore::Wait().
Other than these, I don't see anything else too unusual.
Comment 12•8 years ago
|
||
maybe run a profile of esr45 to see if it can repro the huge difference?
Comment 13•8 years ago
|
||
Does the profiler work on esr45?
Comment 14•8 years ago
|
||
You won't get C++ stacks from the official esr45 builds so it'll not be as useful. But it'll work if you compile the esr45 source as "nightly".
Comment 15•8 years ago
|
||
I'd be interested in the profile comparison too.
CrossProcessSemaphore::Wait() should just be an alternative blocking point (we previously blocked in a sync IPDL call when sending a layers transaction, now we block for the compositor to have finished with the transaction when we try paint next). I'd expect this to just move blocking time from one location to the other, and hopefully be a shorter wait.
If the semaphore time is actually worse, then we might have an actual regression where we're not releasing read locks when we should.
Flags: needinfo?(matt.woodrow)
Comment 16•8 years ago
|
||
I am not particularly keen to build esr45 locally, so I took a profile with xperf.
At first glance, e10s seems to be slowing us down a lot, compared to nightly without e10s.
Comment 17•8 years ago
|
||
Does the Gecko Profiler not work in ESR45?
Comment 18•8 years ago
|
||
(In reply to Jonathan Watt [:jwatt] from comment #17)
> Does the Gecko Profiler not work in ESR45?
I wouldn't count on it.
Comment 19•8 years ago
|
||
(In reply to :Ehsan Akhgari (busy) from comment #11)
> The reflow times may be worth looking into (no idea whether they are unusual
> or not.) The paint times also seem really high, but it seems like they're
> all in some kind of JIT code? And a lot in CrossProcessSemaphore::Wait().
>
> Other than these, I don't see anything else too unusual.
During the one particularly long rasterize in the profile posted by David. The paint times just seem to be in D3D/D2D code, but nothing that makes it particularly obvious what kind of work is being done. From what I can tell it seems like the GPU was busy during that draw. CrossProcessSemaphore::Wait() can happen when the GPU is busy as well.
What I've seen for higher CPU usage when the GPU is busy, is related to the mQuery spin at the end of EndFrame, I have some patches locally to reduce that code's CPU footprint. However it should be noted they don't affect user facing performance (except maybe of other applications than Firefox)
Flags: needinfo?(bas)
Updated•7 years ago
|
Whiteboard: [qf:investigate] → [qf:p3]
Updated•3 years ago
|
Performance Impact: --- → P3
Whiteboard: [qf:p3]
Updated•2 years ago
|
Severity: normal → S3
Comment 20•2 months ago
|
||
This is what I get today : https://share.firefox.dev/4dUEIgx
Loading felt pretty fast to me. I will close this bug, but please reopen if people can still reproduce.
Status: NEW → RESOLVED
Closed: 2 months ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•