1347666 - Huff Post page load CPU usage much higher in Nightly vs 45 ESR

Reporter

Description

•

8 years ago

(Accessible behind VPN on http or auth0 on https): Combined dashboard [1] shows that huff post article page loads about 35% slower in Nightly vs ESR 45. Firefox 45: http://wpt1.dev.mozaws.net/result/170308_EQ_4B/1/details/ Nighty: http://wpt1.dev.mozaws.net/result/170313_PC_4C/1/details/ Patrick McManaus looked at the results and pointed out that CPU usage is much higher in Nightly. [1]: https://metrics.mozilla.com/protected/shiny/dstrohmeier/poor_nightly/ (pick uff post in the 'Aggregated boxplots over time' section to get clickable scatterplot)

Patrick McManus [:mcmanus]

Comment 1

•

8 years ago

its this page http://www.huffingtonpost.com/entry/why-you-need-to-quit-your-job_us_58b5ac8de4b02f3f81e44cb8 and that's a 55 nightly.. not sure if we can get the build cset or not

:Harald Kirschner :digitarald

Reporter

Updated

•

8 years ago

Blocks: QuantumFlow

Andrew Overholt [:overholt]

Updated

•

8 years ago

Whiteboard: [qf] → [qf:investigate]

(no longer active)

Comment 2

•

8 years ago

Someone needs to capture a profile.

(Away)

Comment 3

•

8 years ago

Profile from today's nightly on Win64: https://perfht.ml/2mSvicf (I'm new to this; please let me know if I captured the wrong thing.)

(no longer active)

Updated

•

8 years ago

Comment 4

•

8 years ago

There is definitely something weird going on here but I don't think that's your fault! There are a fair number of sync IPC messages going around here <https://perfht.ml/2neDRkl> and I just refuse to believe that they all happen to finish in exactly half a millisecond, even though it seems like they all took one sample. Was your sampling frequency 0.5ms? Markus, this looks like a super fishy profile. Can you please have a look?

Flags: needinfo?(mstange)

Markus Stange [:mstange]

Comment 5

•

8 years ago

I think the profile looks fine, with the exception of the profile interval: It looks like the chosen interval was 0.5ms but the actual interval we were able to achieve was 2ms. I've seen this before on Windows profiles. Either we are slower to capture stacks on Windows, or we're running into timer resolution problems, or something else is happening. The tracing markers at the top give you the exact times for the sync IPC messages. Most of them are below 0.5ms. For example I can see one GetCookieString message that took 0.11ms. That's not hard to believe because the parent process looks completely idle.

Flags: needinfo?(mstange)

(Away)

Comment 6

•

8 years ago

You're right -- I played around with the addon when I first installed it, and later forgot that I had set my sampling frequency to 0.5ms. Is there a recommended value that I should use for reporting profiles in bugs?

Markus Stange [:mstange]

Comment 7

•

8 years ago

Usually 1ms, but since we apparently don't manage to go lower than 2ms on Windows, I recommend 2ms on Windows. If the intended interval is close to the actual interval, the numbers on perf.html are more trustworthy and the graphs look better. (In the thread stack graphs at the top, we draw each sample's bar with the width of the intended interval. In the profile you posted, if you zoom in a few times (by selecting a range and clicking the magnifying glass button), you'll see large gaps between the individual sample bars. If the actual interval and the intended interval match, these bars will be mostly adjacent.)

(no longer active)

Comment 8

•

8 years ago

I filed bug 1350593 about the profiler resolution issue on Windows. I was wondering why those lines look like that... David, can you please re-profile? Thanks!

(Away)

Comment 9

•

8 years ago

Oh sorry, I thought I had already posted a new one. Here's one at 2ms: https://perfht.ml/2nX5UXc The ads took forever to load, maybe bad hotel wifi?

(Away)

Comment 10

•

8 years ago

And that one seems to have a problem with symbols. Third time's the charm? https://perfht.ml/2o6UUmG

(no longer active)

Comment 11

•

8 years ago

The reflow times may be worth looking into (no idea whether they are unusual or not.) The paint times also seem really high, but it seems like they're all in some kind of JIT code? And a lot in CrossProcessSemaphore::Wait(). Other than these, I don't see anything else too unusual.

Blocks: FastReflows

Flags: needinfo?(matt.woodrow)

Flags: needinfo?(bas)

Patrick McManus [:mcmanus]

Comment 12

•

8 years ago

maybe run a profile of esr45 to see if it can repro the huge difference?

(Away)

Comment 13

•

8 years ago

Does the profiler work on esr45?

Markus Stange [:mstange]

Comment 14

•

8 years ago

You won't get C++ stacks from the official esr45 builds so it'll not be as useful. But it'll work if you compile the esr45 source as "nightly".

Matt Woodrow (:mattwoodrow)

Comment 15

•

8 years ago

I'd be interested in the profile comparison too. CrossProcessSemaphore::Wait() should just be an alternative blocking point (we previously blocked in a sync IPDL call when sending a layers transaction, now we block for the compositor to have finished with the transaction when we try paint next). I'd expect this to just move blocking time from one location to the other, and hopefully be a shorter wait. If the semaphore time is actually worse, then we might have an actual regression where we're not releasing read locks when we should.

Flags: needinfo?(matt.woodrow)

(Away)

Comment 16

•

8 years ago

Attached image cpu usage graphs — Details

I am not particularly keen to build esr45 locally, so I took a profile with xperf. At first glance, e10s seems to be slowing us down a lot, compared to nightly without e10s.

Jonathan Watt [:jwatt]

Comment 17

•

8 years ago

Does the Gecko Profiler not work in ESR45?

(no longer active)

Comment 18

•

8 years ago

(In reply to Jonathan Watt [:jwatt] from comment #17) > Does the Gecko Profiler not work in ESR45? I wouldn't count on it.

Bas Schouten (:bas.schouten)

Comment 19

•

8 years ago

(In reply to :Ehsan Akhgari (busy) from comment #11) > The reflow times may be worth looking into (no idea whether they are unusual > or not.) The paint times also seem really high, but it seems like they're > all in some kind of JIT code? And a lot in CrossProcessSemaphore::Wait(). > > Other than these, I don't see anything else too unusual. During the one particularly long rasterize in the profile posted by David. The paint times just seem to be in D3D/D2D code, but nothing that makes it particularly obvious what kind of work is being done. From what I can tell it seems like the GPU was busy during that draw. CrossProcessSemaphore::Wait() can happen when the GPU is busy as well. What I've seen for higher CPU usage when the GPU is busy, is related to the mQuery spin at the end of EndFrame, I have some patches locally to reduce that code's CPU footprint. However it should be noted they don't affect user facing performance (except maybe of other applications than Firefox)

Flags: needinfo?(bas)

Naveed Ihsanullah [:naveed]

Updated

•

7 years ago

Whiteboard: [qf:investigate] → [qf:p3]

Firefox Bug Husbandry Bot

Updated

•

7 years ago

Keywords: perf

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

3 years ago

Performance Impact: --- → P3

Whiteboard: [qf:p3]

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Mayank Bansal

Comment 20

•

2 months ago

This is what I get today : https://share.firefox.dev/4dUEIgx
Loading felt pretty fast to me. I will close this bug, but please reopen if people can still reproduce.

Status: NEW → RESOLVED

Closed: 2 months ago

Resolution: --- → WORKSFORME