5% dromaeo_css talos regression on windows from bug 563878

RESOLVED WONTFIX

Status

()

Core
Layout
RESOLVED WONTFIX
8 years ago
8 years ago

People

(Reporter: tnikkel, Assigned: tnikkel)

Tracking

Trunk
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Assignee)

Description

8 years ago
See bug 563878 comment 73 and on.
(Assignee)

Comment 1

8 years ago
After much trying I could not reproduce any slowdown on Linux.

Using opt builds (but not PGO) on Windows 7 I could also not reproduce any slowdown. Numbers are at http://dromaeo.com/?id=110620,110630 where the left column is before my push, and the right column includes all the changesets of my push.
(Assignee)

Comment 2

8 years ago
I believe PGO requires Visual Studio professional, which I do not have. So I will be doing a bunch of try server pushes to try to bisect this in my changesets.
(Assignee)

Comment 3

8 years ago
I didn't do a push to try server with a changeset before my push to "calibrate" and make sure the numbers correspond to m-c numbers, but it appears they do, and it seems that the first changeset in my push, http://hg.mozilla.org/mozilla-central/rev/9c9ee87d42fa , is causing the regression.
That's .... really weird.  Are you hitting the new cross-doc code much during those dromaeo tests?
And is this a windows-pgo-only regression?
(Assignee)

Comment 6

8 years ago
(In reply to comment #5)
> And is this a windows-pgo-only regression?

The only place I've been able to reproduce is on tinderbox with windows. I've tried opt builds on linux and windows and failed to reproduce. Whether it is PGO or something else about the machines, configurations, setup, or how I'm running the tests I can't say.
(Assignee)

Comment 7

8 years ago
(In reply to comment #4)
> That's .... really weird.  Are you hitting the new cross-doc code much during
> those dromaeo tests?

The only thing that uses the new cross-doc code in that changeset is nsDisplayListBuilder::ToReferenceFrame. I'm going to try undoing that.
Hmm.  I thought you linked me to a dromaeo.com url showing the slowdown...

Using the try pgo builds on your machine on dromaeo.com doesn't show the slowdown?
(Assignee)

Comment 9

8 years ago
(In reply to comment #8)
> Hmm.  I thought you linked me to a dromaeo.com url showing the slowdown...

After running more tests I got confusing results and rerunning the test on changesets I'd already tried I got wildly different numbers. After I'd run enough times it looked like there was no slowdown with some outliers.

After that I tried on windows 7 on the same machine and there was no slowdown.

> Using the try pgo builds on your machine on dromaeo.com doesn't show the
> slowdown?

I will try that.
(Assignee)

Comment 10

8 years ago
(In reply to comment #8)
> Using the try pgo builds on your machine on dromaeo.com doesn't show the
> slowdown?

It does show the slowdown.
So it's entirely possible that there's some sort of weird pgo effect here, btw....  :(
(Assignee)

Comment 12

8 years ago
I've been trying many different permutations of the regressing changeset on try server, so far nothing gets the performance back.
(Assignee)

Comment 13

8 years ago
Looks like the culprit is adding the function nsPoint::ConvertAppUnits.
(Assignee)

Comment 14

8 years ago
Created attachment 460079 [details] [diff] [review]
patch

This restores our performance on this test on try server.
Assignee: nobody → tnikkel
Attachment #460079 - Flags: review?(bzbarsky)
Comment on attachment 460079 [details] [diff] [review]
patch

r=me, but that's _really_ weird....  Makes me wish we had better benchmarks for some of the stuff I've been profiling on Mac, since the behavior may be so different on Windows.  :(
(Assignee)

Comment 16

8 years ago
ConvertAppUnits is the only function in nsPoint that has a branch, so maybe that is why it affects PGO somehow.
(Assignee)

Comment 17

8 years ago
Landing this on m-c didn't have the desired affect of returning those numbers to what they used to be. My try server pushes were on top of the regressing changeset, so confounding factors could happen in any changeset after that one. Back to doing random try server pushes I guess.
(Assignee)

Comment 18

8 years ago
The landing was
http://hg.mozilla.org/mozilla-central/rev/ba9be0418b15

Backed it out because it didn't work and caused non-libxul builds to fail
http://hg.mozilla.org/mozilla-central/rev/dcddffb3b329
http://hg.mozilla.org/mozilla-central/rev/b8dcd7f66c98

If I push something similar I need to remember to add NS_GFX to nsPoint to not break non-libxul builds.

Comment 19

8 years ago
May be only remove the 'inline' keyword prevents the PGO regression?
(Assignee)

Comment 20

8 years ago
(In reply to comment #19)
> May be only remove the 'inline' keyword prevents the PGO regression?

If I do that I also need to add NS_GFX to the definition of nsPoint so the build doesn't fail even earlier, and then I get errors about nsPoint::ConvertAppUnits being defined multiple times.
(Assignee)

Comment 21

8 years ago
So the first changeset for which applying the patch in this bug does not fix the perf regression is http://hg.mozilla.org/mozilla-central/rev/99ea6685c1f7

I have no idea why that is.
Ted, Benjamin, any ideas on what pgo might be smoking here?
I think there are thresholds (of size, perhaps?) that cause noticeable jumps in PGO's performance, but I don't think we should try and figure out how to fix each regression.

It would be nice if we could figure out what the input into the step function is so we can measure regressions in that input better, though.
The actual workings of PGO are a mystery to me. You can try asking Microsoft.
(Assignee)

Comment 25

8 years ago
I tried applying the changeset 99ea6685c1f7 to before my original push and that too caused the perf regression.

Unless anyone has a better idea I'm going to resolve this as wont fix and move on.
(Assignee)

Updated

8 years ago
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.