Closed Bug 658799 Opened 13 years ago Closed 8 years ago

8.3% Tp4 CPU% regression on Windows

Categories

(Core :: Networking: HTTP, defect)

x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
firefox6 - ---

People

(Reporter: dao, Unassigned)

References

Details

(Whiteboard: nominated at filing)

Regression :( Tp4 (%CPU) increase 8.32% on XP Firefox
-----------------------------------------------------
    Previous: avg 52.138 stddev 0.987 of 30 runs up to revision f594c196fac7
    New     : avg 56.477 stddev 0.446 of 5 runs since revision f1d79c22fd71
    Change  : +4.339 (8.32% / z=4.398)
    Graph   : http://mzl.la/ljrnVR

Changeset range: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f594c196fac7&tochange=f1d79c22fd71

Regression :( Tp4 (%CPU) increase 8.27% on Win7 Firefox
-------------------------------------------------------
    Previous: avg 60.215 stddev 1.474 of 30 runs up to revision f594c196fac7
    New     : avg 65.194 stddev 0.390 of 5 runs since revision f1d79c22fd71
    Change  : +4.979 (8.27% / z=3.378)
    Graph   : http://mzl.la/lduVoO

Changeset range: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f594c196fac7&tochange=f1d79c22fd71
Depends on: 658580
quick summary - this is a platform regression in cpu (see 658580), not in time. 

backing out just the patch that seems responsible is not an option because the patch is a syn retry crash workaround of an ssl bug. So we'd be back to crashes.

Backing out the whole feature is not, imo, a good idea because in doing so you would sacrifice improvements in time (both measured and largely uncaptured by our tests because they don't induce latency) which are far more important. in this case the reason cpu doesn't translate into time is the added cpu is no doubt being used during ssl connection establishment, which is generally an idle time.

we can of course do better - thus 658580. My opinion is live with it because it has no practical cost in time and get it improved for the next train.
(In reply to comment #1)
> quick summary - this is a platform regression in cpu (see 658580), not in
> time. 
> 
> backing out just the patch that seems responsible is not an option because
> the patch is a syn retry crash workaround of an ssl bug. So we'd be back to
> crashes.

Crashes we have in Firefox 4 and 5? Are these topcrashes? If not, increasing the CPU usage by 8% for Windows users doesn't sound like the right tradeoff.

Is bug 658580 going to be ready for Firefox 6?
(In reply to comment #2)

> Crashes we have in Firefox 4 and 5? Are these topcrashes? 

perhaps I wasn't clear or perhaps we disagree (or perhaps both!).

for clarity - the patch alone cannot be backed out because it will make m-c resume crashing. It's not a good option. The entire feature (which is not on 4 or 5) including this patch could be backed out. Doing so would give up improvements in time, which are currently being had at the expense of increased CPU in this particular test. Personally, I think that's an ok tradeoff.


> Is bug 658580 going to be ready for Firefox 6?

unlikely to be resolved in the next 72 hrs.
(In reply to comment #3)
> for clarity - the patch alone cannot be backed out because it will make m-c
> resume crashing. It's not a good option.

I'm not saying it's a good option, but neither is the extra CPU load.

Since the crash wouldn't affect 4 or 5, the interesting question is: Is it a hypothetical problem that was only hit in unrealistic tests? Did it affect nightly users, did it show up in crash stats?

> The entire feature (which is not on
> 4 or 5) including this patch could be backed out. Doing so would give up
> improvements in time, which are currently being had at the expense of
> increased CPU in this particular test. Personally, I think that's an ok
> tradeoff.

How big is the time improvement? (Did talos reflect that as well?)

> > Is bug 658580 going to be ready for Firefox 6?
> 
> unlikely to be resolved in the next 72 hrs.

How about the next few weeks while Firefox 6 is on the aurora branch?
(In reply to comment #4)
> (In reply to comment #3)
> > for clarity - the patch alone cannot be backed out because it will make m-c
> > resume crashing. It's not a good option.
> 
> I'm not saying it's a good option, but neither is the extra CPU load.
> 
> Since the crash wouldn't affect 4 or 5, the interesting question is:

> Is it a
> hypothetical problem that was only hit in unrealistic tests? Did it affect
> nightly users, did it show up in crash stats?

It showed up as an intermittent orange about a dozen times in 20 days on mozilla-central. It was also reported as a failure for "Topsite Crashtests for Trunk", though it seems clear that the triggering factor in a connection failure can just as (or more) likely be the way the test harness is running as it is the content - so in that sense 'topsite' is not obviously relevant. Dbaron ran into it in a nightly. 

So, it's not obscure but you are right that it certainly doesn't happen so often as to make the browser useless - we all run nightlies.

I don't have any information one way or another about general nightly users.

> How big is the time improvement? (Did talos reflect that as well?)

I've got a mail from ted showing 3.9% on win7 on tp4 as part of it. And, I've I said, tp4 won't capture the real good wins because they scale with network latency and those are the ones I am more interested in. I'm not sure if I mentioned nick hurley is building something to try and be able to quantify this kind of thing objectively across different sets of conditions.

> How about the next few weeks while Firefox 6 is on the aurora branch?

it's certainly plausible. But there is too little information for me to say with any certainty. I think we could have a good answer to that question within perhaps a fortnight maybe even with a fix depending on the underlying issue. Maybe that is a reasonable way to track this.
Whiteboard: nominated at filing
not going to track but we'd consider a patch for Aurora (you've got 5 weeks left.)
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.