Closed
Bug 874141
Opened 11 years ago
Closed 11 years ago
25ms Windows ts_paint regression from PGO exclusion
Categories
(Firefox Build System :: General, defect)
Tracking
(firefox23 unaffected, firefox24- affected)
RESOLVED
WONTFIX
Tracking | Status | |
---|---|---|
firefox23 | --- | unaffected |
firefox24 | - | affected |
People
(Reporter: mbrubeck, Assigned: ted)
References
Details
(Keywords: perf, regression)
Bug 871712 switched PGO from opt-out to opt-in per directory on Windows. This caused a regression of around 25ms (3-4%) of all Ts benchmarks on Windows platforms. For example:
> Regression: Firefox - Ts Paint, MAX Dirty Profile - Win7 - 3.3% increase
> ------------------------------------------------------------------------
> Previous: avg 764.048 stddev 3.793 of 12 runs up to revision feccfce43b59
> New : avg 789.267 stddev 4.972 of 12 runs since revision ea767da526ff
> Change : +25.219 (3.3% / z=6.650)
> Graph : http://mzl.la/13C1ZJs
https://groups.google.com/d/topic/mozilla.dev.tree-management/Dr_ZXqVKzqw/discussion
(From that thread it looks like Kraken, SVG Opacity, and TResize might be regressed by the same change, though I haven't looked closer at those yet. The statistics on inbound were confused by several "PGO" builds right before the change that apparently were not PGOed successfully; see bug 874139.)
Reporter | ||
Comment 1•11 years ago
|
||
Ted suggested on IRC that we might be able to improve this by tweaking the threshold for the analysis from bug 871712.
Reporter | ||
Updated•11 years ago
|
Summary: 15-20ms Windows ts_paint regression from PGO exclusion → 25ms Windows ts_paint regression from PGO exclusion
Comment 2•11 years ago
|
||
Ted, can you please see if you can make this go away by adjusting the threshold?
Assignee: nobody → ted
Assignee | ||
Comment 3•11 years ago
|
||
I'm in SF this week, so I won't have time to look at this until next week, but yes, I can test.
Comment 4•11 years ago
|
||
(In reply to comment #3)
> I'm in SF this week, so I won't have time to look at this until next week, but
> yes, I can test.
Sounds good. Thanks!
Updated•11 years ago
|
Assignee | ||
Comment 6•11 years ago
|
||
Sorry, I forgot about this. I halved the threshold to 50,000 and pushed to try:
https://tbpl.mozilla.org/?tree=Try&rev=ff8e825786d9
Flags: needinfo?(ted)
Assignee | ||
Comment 7•11 years ago
|
||
There are a bunch of results on Try, but I have a hard time reading the tea leaves. Matt, Ehsan: can one of you see if this made a difference?
Flags: needinfo?(mbrubeck)
Flags: needinfo?(ehsan)
Reporter | ||
Comment 8•11 years ago
|
||
Here's the parent push for comparison:
https://tbpl.mozilla.org/?rev=81b227f1a522&jobname=win.*pgo.talos
Here are raw numbers (before -> after this patch) for the Ts tests. Overall, I don't see a significant improvement, though the "dirtypaint" tests are slightly improved (based on this very small sample):
Win XP:
ts_paint: 443.58 -> 445.53
tspaint_places_generated_med: 454.95 -> 447.84
tspaint_places_generated_max: 446.47 -> 448.16
Win 7:
ts_paint: 681.68 -> 693.58
tspaint_places_generated_med: 690.53 -> 662.53
tspaint_places_generated_max: 676.63 -> 657.37
Win 8:
ts_paint: 635.32 -> 635.32
tspaint_places_generated_med: 641.58 -> 638.63
tspaint_places_generated_max: 639.95 -> 635.42
linker_max_vsize: 3276537856 -> 3346546688 (+2.1%)
Also note that all these tests are on the new iX slaves, so they can't be directly compared to numbers from comment 0 which ran on the old Talos hardware.
Flags: needinfo?(mbrubeck)
Comment 9•11 years ago
|
||
Also, <http://perf.snarkfest.net/compare-talos/index.html?oldRevs=81b227f1a522&newRev=ff8e825786d9&submit=true> is not screaming with sadness
Flags: needinfo?(ehsan)
Assignee | ||
Comment 10•11 years ago
|
||
So uh, do either of you have a verdict here? Should we try to land this patch?
Reporter | ||
Comment 11•11 years ago
|
||
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #9)
> Also,
> <http://perf.snarkfest.net/compare-talos/index.
> html?oldRevs=81b227f1a522&newRev=ff8e825786d9&submit=true> is not screaming
> with sadness
Unfortunately, that link is comparing PGO results from Try with both PGO and non-PGO results from mozilla-central because of this compare-talos bug:
https://bitbucket.org/mconnor/compare-talos/issue/13
I triggered some more runs of the "other" and "dirtypaint" suites and will do a manual comparison with slightly more data when they finish.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #10)
> So uh, do either of you have a verdict here? Should we try to land this
> patch?
The perf win is small or nil - it definitely does not appear to restore most of the startup perf we lost when bug 871712 landed. Meanwhile, it adds back 67 MB of the 431 MB linker vsize that we shaved off. So I'd say it's mostly a wash, depending on how much margin we think we need for the linker memory thing.
One nice thing about bug 871712 is that it slowed the growth in linker vsize to near zero, so if the perf wins are real then maybe we can land this and take the vsize hit without dooming our future selves.
Comment 12•11 years ago
|
||
(In reply to comment #11)
> One nice thing about bug 871712 is that it slowed the growth in linker vsize to
> near zero, so if the perf wins are real then maybe we can land this and take
> the vsize hit without dooming our future selves.
Yes, and we can always go back and revert it when we get close to the cap again.
Assignee | ||
Comment 13•11 years ago
|
||
I thought about this a little bit and realized what might be at fault. If you read my ramblings about all the data munging I had to do to get the "instructions per directory" counts out of the pgomgr data, you'd find that I had to discard about 25% of the data from pgomgr because I couldn't uniquely map function names back to object files. It is entirely possible that the regression here is in some code that is being executed in our profiling run but can't be properly mapped back to its object file and thus doesn't have PGO enabled.
Comment 14•11 years ago
|
||
(In reply to comment #13)
> I thought about this a little bit and realized what might be at fault. If you
> read my ramblings about all the data munging I had to do to get the
> "instructions per directory" counts out of the pgomgr data, you'd find that I
> had to discard about 25% of the data from pgomgr because I couldn't uniquely
> map function names back to object files. It is entirely possible that the
> regression here is in some code that is being executed in our profiling run but
> can't be properly mapped back to its object file and thus doesn't have PGO
> enabled.
Hmm, do you have any ideas on how we could do a better job at the mapping?
Assignee | ||
Comment 15•11 years ago
|
||
I don't. :-(
Assignee | ||
Comment 16•11 years ago
|
||
I keep getting release tracking emails about this, but I don't know that we ever reached consensus. Ehsan, Matt: do either of you know what we should do here?
Flags: needinfo?(mbrubeck)
Flags: needinfo?(ehsan)
Reporter | ||
Comment 17•11 years ago
|
||
I think maybe we should just take the hit for now (i.e. WONTFIX this bug), since no one seems to have a feasible solution to the problem. There may be easier and more sustainable ways to win back the startup perf than further wrestling with PGO exclusions.
Flags: needinfo?(mbrubeck)
Comment 18•11 years ago
|
||
I agree. Our basic choices at this point are either taking this regression or being unable to release our product to users. I'll go for the first one.
Should I write to dev.platform about this?
Status: NEW → RESOLVED
Closed: 11 years ago
Flags: needinfo?(ehsan)
Resolution: --- → WONTFIX
Assignee | ||
Comment 19•11 years ago
|
||
Doesn't seem incredibly important, but up to you. Thanks guys!
Assignee | ||
Comment 20•11 years ago
|
||
Release drivers: I'm going to tracking- this bug since we've decided to WONTFIX it (see comment 17, comment 18).
Updated•7 years ago
|
Product: Core → Firefox Build System
You need to log in
before you can comment on or make changes to this bug.
Description
•