Closed Bug 579963 Opened 14 years ago Closed 14 years ago

5% dromaeo_css talos regression on windows from bug 563878

Categories

(Core :: Layout, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: tnikkel, Assigned: tnikkel)

References

Details

Attachments

(1 file)

See bug 563878 comment 73 and on.
After much trying I could not reproduce any slowdown on Linux.

Using opt builds (but not PGO) on Windows 7 I could also not reproduce any slowdown. Numbers are at http://dromaeo.com/?id=110620,110630 where the left column is before my push, and the right column includes all the changesets of my push.
I believe PGO requires Visual Studio professional, which I do not have. So I will be doing a bunch of try server pushes to try to bisect this in my changesets.
I didn't do a push to try server with a changeset before my push to "calibrate" and make sure the numbers correspond to m-c numbers, but it appears they do, and it seems that the first changeset in my push, http://hg.mozilla.org/mozilla-central/rev/9c9ee87d42fa , is causing the regression.
That's .... really weird.  Are you hitting the new cross-doc code much during those dromaeo tests?
And is this a windows-pgo-only regression?
(In reply to comment #5)
> And is this a windows-pgo-only regression?

The only place I've been able to reproduce is on tinderbox with windows. I've tried opt builds on linux and windows and failed to reproduce. Whether it is PGO or something else about the machines, configurations, setup, or how I'm running the tests I can't say.
(In reply to comment #4)
> That's .... really weird.  Are you hitting the new cross-doc code much during
> those dromaeo tests?

The only thing that uses the new cross-doc code in that changeset is nsDisplayListBuilder::ToReferenceFrame. I'm going to try undoing that.
Hmm.  I thought you linked me to a dromaeo.com url showing the slowdown...

Using the try pgo builds on your machine on dromaeo.com doesn't show the slowdown?
(In reply to comment #8)
> Hmm.  I thought you linked me to a dromaeo.com url showing the slowdown...

After running more tests I got confusing results and rerunning the test on changesets I'd already tried I got wildly different numbers. After I'd run enough times it looked like there was no slowdown with some outliers.

After that I tried on windows 7 on the same machine and there was no slowdown.

> Using the try pgo builds on your machine on dromaeo.com doesn't show the
> slowdown?

I will try that.
(In reply to comment #8)
> Using the try pgo builds on your machine on dromaeo.com doesn't show the
> slowdown?

It does show the slowdown.
So it's entirely possible that there's some sort of weird pgo effect here, btw....  :(
I've been trying many different permutations of the regressing changeset on try server, so far nothing gets the performance back.
Looks like the culprit is adding the function nsPoint::ConvertAppUnits.
Attached patch patchSplinter Review
This restores our performance on this test on try server.
Assignee: nobody → tnikkel
Attachment #460079 - Flags: review?(bzbarsky)
Comment on attachment 460079 [details] [diff] [review]
patch

r=me, but that's _really_ weird....  Makes me wish we had better benchmarks for some of the stuff I've been profiling on Mac, since the behavior may be so different on Windows.  :(
ConvertAppUnits is the only function in nsPoint that has a branch, so maybe that is why it affects PGO somehow.
Landing this on m-c didn't have the desired affect of returning those numbers to what they used to be. My try server pushes were on top of the regressing changeset, so confounding factors could happen in any changeset after that one. Back to doing random try server pushes I guess.
The landing was
http://hg.mozilla.org/mozilla-central/rev/ba9be0418b15

Backed it out because it didn't work and caused non-libxul builds to fail
http://hg.mozilla.org/mozilla-central/rev/dcddffb3b329
http://hg.mozilla.org/mozilla-central/rev/b8dcd7f66c98

If I push something similar I need to remember to add NS_GFX to nsPoint to not break non-libxul builds.
May be only remove the 'inline' keyword prevents the PGO regression?
(In reply to comment #19)
> May be only remove the 'inline' keyword prevents the PGO regression?

If I do that I also need to add NS_GFX to the definition of nsPoint so the build doesn't fail even earlier, and then I get errors about nsPoint::ConvertAppUnits being defined multiple times.
So the first changeset for which applying the patch in this bug does not fix the perf regression is http://hg.mozilla.org/mozilla-central/rev/99ea6685c1f7

I have no idea why that is.
Ted, Benjamin, any ideas on what pgo might be smoking here?
I think there are thresholds (of size, perhaps?) that cause noticeable jumps in PGO's performance, but I don't think we should try and figure out how to fix each regression.

It would be nice if we could figure out what the input into the step function is so we can measure regressions in that input better, though.
The actual workings of PGO are a mystery to me. You can try asking Microsoft.
I tried applying the changeset 99ea6685c1f7 to before my original push and that too caused the perf regression.

Unless anyone has a better idea I'm going to resolve this as wont fix and move on.
<sigh>.  Yeah, indeed.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WONTFIX
Attachment #460079 - Flags: review?(bzbarsky)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: