874141 - 25ms Windows ts_paint regression from PGO exclusion

Reporter

Description

•

11 years ago

Bug 871712 switched PGO from opt-out to opt-in per directory on Windows.  This caused a regression of around 25ms (3-4%) of all Ts benchmarks on Windows platforms.  For example:

> Regression: Firefox - Ts Paint, MAX Dirty Profile - Win7 - 3.3% increase
> ------------------------------------------------------------------------
> Previous: avg 764.048 stddev 3.793 of 12 runs up to revision feccfce43b59
> New     : avg 789.267 stddev 4.972 of 12 runs since revision ea767da526ff
> Change  : +25.219 (3.3% / z=6.650)
> Graph   : http://mzl.la/13C1ZJs

https://groups.google.com/d/topic/mozilla.dev.tree-management/Dr_ZXqVKzqw/discussion

(From that thread it looks like Kraken, SVG Opacity, and TResize might be regressed by the same change, though I haven't looked closer at those yet.  The statistics on inbound were confused by several "PGO" builds right before the change that apparently were not PGOed successfully; see bug 874139.)

Matt Brubeck (:mbrubeck)

Reporter

Comment 1

•

11 years ago

Ted suggested on IRC that we might be able to improve this by tweaking the threshold for the analysis from bug 871712.

Matt Brubeck (:mbrubeck)

Reporter

Updated

•

11 years ago

Summary: 15-20ms Windows ts_paint regression from PGO exclusion → 25ms Windows ts_paint regression from PGO exclusion

(no longer active)

Comment 2

•

11 years ago

Ted, can you please see if you can make this go away by adjusting the threshold?

Assignee: nobody → ted

(not currently active) Ted Mielczarek

Assignee

Comment 3

•

11 years ago

I'm in SF this week, so I won't have time to look at this until next week, but yes, I can test.

(no longer active)

Comment 4

•

11 years ago

(In reply to comment #3)
> I'm in SF this week, so I won't have time to look at this until next week, but
> yes, I can test.

Sounds good.  Thanks!

bhavana bajaj [:bajaj]

Updated

•

11 years ago

tracking-firefox24: ? → +

(no longer active)

Comment 5

•

11 years ago

Ted, ping?

Flags: needinfo?(ted)

(not currently active) Ted Mielczarek

Assignee

Comment 6

•

11 years ago

Sorry, I forgot about this. I halved the threshold to 50,000 and pushed to try:
https://tbpl.mozilla.org/?tree=Try&rev=ff8e825786d9

Flags: needinfo?(ted)

(not currently active) Ted Mielczarek

Assignee

Comment 7

•

11 years ago

There are a bunch of results on Try, but I have a hard time reading the tea leaves. Matt, Ehsan: can one of you see if this made a difference?

Flags: needinfo?(mbrubeck)

Flags: needinfo?(ehsan)

Matt Brubeck (:mbrubeck)

Reporter

Comment 8

•

11 years ago

Here's the parent push for comparison:
https://tbpl.mozilla.org/?rev=81b227f1a522&jobname=win.*pgo.talos

Here are raw numbers (before -> after this patch) for the Ts tests.  Overall, I don't see a significant improvement, though the "dirtypaint" tests are slightly improved (based on this very small sample):

Win XP:
ts_paint: 443.58 -> 445.53
tspaint_places_generated_med: 454.95 -> 447.84
tspaint_places_generated_max: 446.47 -> 448.16

Win 7:
ts_paint: 681.68 -> 693.58
tspaint_places_generated_med: 690.53 -> 662.53
tspaint_places_generated_max: 676.63 -> 657.37

Win 8:
ts_paint: 635.32 -> 635.32
tspaint_places_generated_med: 641.58 -> 638.63
tspaint_places_generated_max: 639.95 -> 635.42

linker_max_vsize: 3276537856 ->  3346546688 (+2.1%)

Also note that all these tests are on the new iX slaves, so they can't be directly compared to numbers from comment 0 which ran on the old Talos hardware.

Flags: needinfo?(mbrubeck)

(no longer active)

Comment 9

•

11 years ago

Also, <http://perf.snarkfest.net/compare-talos/index.html?oldRevs=81b227f1a522&newRev=ff8e825786d9&submit=true> is not screaming with sadness

Flags: needinfo?(ehsan)

(not currently active) Ted Mielczarek

Assignee

Comment 10

•

11 years ago

So uh, do either of you have a verdict here? Should we try to land this patch?

Matt Brubeck (:mbrubeck)

Reporter

Comment 11

•

11 years ago

(In reply to :Ehsan Akhgari (needinfo? me!) from comment #9)
> Also,
> <http://perf.snarkfest.net/compare-talos/index.
> html?oldRevs=81b227f1a522&newRev=ff8e825786d9&submit=true> is not screaming
> with sadness

Unfortunately, that link is comparing PGO results from Try with both PGO and non-PGO results from mozilla-central because of this compare-talos bug:
https://bitbucket.org/mconnor/compare-talos/issue/13

I triggered some more runs of the "other" and "dirtypaint" suites and will do a manual comparison with slightly more data when they finish.

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #10)
> So uh, do either of you have a verdict here? Should we try to land this
> patch?

The perf win is small or nil - it definitely does not appear to restore most of the startup perf we lost when bug 871712 landed.  Meanwhile, it adds back 67 MB of the 431 MB linker vsize that we shaved off.  So I'd say it's mostly a wash, depending on how much margin we think we need for the linker memory thing.

One nice thing about bug 871712 is that it slowed the growth in linker vsize to near zero, so if the perf wins are real then maybe we can land this and take the vsize hit without dooming our future selves.

(no longer active)

Comment 12

•

11 years ago

(In reply to comment #11)
> One nice thing about bug 871712 is that it slowed the growth in linker vsize to
> near zero, so if the perf wins are real then maybe we can land this and take
> the vsize hit without dooming our future selves.

Yes, and we can always go back and revert it when we get close to the cap again.

(not currently active) Ted Mielczarek

Assignee

Comment 13

•

11 years ago

I thought about this a little bit and realized what might be at fault. If you read my ramblings about all the data munging I had to do to get the "instructions per directory" counts out of the pgomgr data, you'd find that I had to discard about 25% of the data from pgomgr because I couldn't uniquely map function names back to object files. It is entirely possible that the regression here is in some code that is being executed in our profiling run but can't be properly mapped back to its object file and thus doesn't have PGO enabled.

(no longer active)

Comment 14

•

11 years ago

(In reply to comment #13)
> I thought about this a little bit and realized what might be at fault. If you
> read my ramblings about all the data munging I had to do to get the
> "instructions per directory" counts out of the pgomgr data, you'd find that I
> had to discard about 25% of the data from pgomgr because I couldn't uniquely
> map function names back to object files. It is entirely possible that the
> regression here is in some code that is being executed in our profiling run but
> can't be properly mapped back to its object file and thus doesn't have PGO
> enabled.

Hmm, do you have any ideas on how we could do a better job at the mapping?

(not currently active) Ted Mielczarek

Assignee

Comment 15

•

11 years ago

I don't. :-(

(not currently active) Ted Mielczarek

Assignee

Comment 16

•

11 years ago

I keep getting release tracking emails about this, but I don't know that we ever reached consensus. Ehsan, Matt: do either of you know what we should do here?

Flags: needinfo?(mbrubeck)

Flags: needinfo?(ehsan)

Matt Brubeck (:mbrubeck)

Reporter

Comment 17

•

11 years ago

I think maybe we should just take the hit for now (i.e. WONTFIX this bug), since no one seems to have a feasible solution to the problem.  There may be easier and more sustainable ways to win back the startup perf than further wrestling with PGO exclusions.

Flags: needinfo?(mbrubeck)

(no longer active)

Comment 18

•

11 years ago

I agree.  Our basic choices at this point are either taking this regression or being unable to release our product to users.  I'll go for the first one.

Should I write to dev.platform about this?

Status: NEW → RESOLVED

Closed: 11 years ago

Flags: needinfo?(ehsan)

Resolution: --- → WONTFIX

(not currently active) Ted Mielczarek

Assignee

Comment 19

•

11 years ago

Doesn't seem incredibly important, but up to you. Thanks guys!

(not currently active) Ted Mielczarek

Assignee

Comment 20

•

11 years ago

Release drivers: I'm going to tracking- this bug since we've decided to WONTFIX it (see comment 17, comment 18).

tracking-firefox24: + → -

BMO Automation

Updated

•

6 years ago

Product: Core → Firefox Build System

Bugzilla

Quick Search

25ms Windows ts_paint regression from PGO exclusion

Categories

(Firefox Build System :: General, defect)

Tracking

(firefox23 unaffected, firefox24- affected)

People

(Reporter: mbrubeck, Assigned: ted)

References

Details

(Keywords: perf, regression)

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16

Comment 17

Comment 18

Comment 19

Comment 20

Updated