Closed Bug 1029968 Opened 10 years ago Closed 10 years ago

June 19 regression in all windows PGO talos performance numbers

Categories

(Core :: Graphics: Layers, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla33
Tracking Status
firefox33 - ---

People

(Reporter: dbaron, Unassigned)

References

Details

(Keywords: perf, Whiteboard: [talos_regression])

All (or at least many) Windows performance numbers on Talos regressed substantially on mozilla-inbound on June 19, in this range: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=bff872c9d4b2&tochange=e589c195f61d The regression later appeared on mozilla-central and fx-team in what I believe although haven't yet checked are corresponding ranges: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=79e69d064957&tochange=bdac18bd6c74 https://hg.mozilla.org/integration/fx-team/pushloghtml?fromchange=3a4d57044461&tochange=36efd6ffbcd0 See an example graphs at: http://mzl.la/1kWKQT9 (Tp5 Optimized WINNT 6.1) http://mzl.la/1kWLhga (Paint WINNT 5.1) but there are many more. Given that it's Windows-specific, I think bug 1027365 seems the most likely in that range at first glance.
Flags: needinfo?(nical.bugzilla)
(In reply to David Baron [:dbaron] (UTC-7) (needinfo? for questions) from comment #0) > The regression later appeared on mozilla-central and fx-team in what I > believe although haven't yet checked are corresponding ranges: > https://hg.mozilla.org/mozilla-central/ > pushloghtml?fromchange=79e69d064957&tochange=bdac18bd6c74 > https://hg.mozilla.org/integration/fx-team/ > pushloghtml?fromchange=3a4d57044461&tochange=36efd6ffbcd0 Actually, the ranges don't correspond. Maybe we changed our infrastructure?
So, let's just focus on the graphs for those two tests (which I picked sort of at random, although partly because they were larger numbers and probably important tests): The graphs for mozilla-central: http://mzl.la/1q60K5F (Tp5 Optimized WINNT 6.1) http://mzl.la/1q61eZv (Paint WINNT 5.1) The graphs for fx-team: http://mzl.la/1nzcV4u (Tp5 Optimized WINNT 6.1) http://mzl.la/1nzagI3 (Paint WINNT 5.1) Do we run non-PGO versions of these tests? Maybe we just hit some PGO cliff?
Flags: needinfo?(ehsan)
(In reply to David Baron [:dbaron] (UTC-7) (needinfo? for questions) from comment #0) > Given that it's Windows-specific, I think bug 1027365 seems the most likely > in that range at first glance. Bug 1027365 only simplified the prefs around enabling async-video and did not affect windows (async-video was already enabled without e10s and disabled with e10s). The only change is that async-video is now enabled by default on Linux+emulator and Mac+emulator
Flags: needinfo?(nical.bugzilla)
thanks for filing this! the error seems to be pgo specific, I have kicked off a few pgo builds to fill in the holes. This might take a bit of magic and luck- I will be on pto later today and tomorrow, so if we don't figure this out by Friday, I will work on it more. Here is a narrow view of the tbpl ranges to work with: https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=Windows%20XP%2032-bit%20mozilla-inbound%20pgo%20talos&fromchange=f57cf85fd128&tochange=fefe4c4ffe93
Whiteboard: [talos_regression]
(In reply to David Baron [:dbaron] (UTC-7) (needinfo? for questions) from comment #2) > So, let's just focus on the graphs for those two tests (which I picked sort > of at random, although partly because they were larger numbers and probably > important tests): > > The graphs for mozilla-central: > http://mzl.la/1q60K5F (Tp5 Optimized WINNT 6.1) > http://mzl.la/1q61eZv (Paint WINNT 5.1) > > The graphs for fx-team: > http://mzl.la/1nzcV4u (Tp5 Optimized WINNT 6.1) > http://mzl.la/1nzagI3 (Paint WINNT 5.1) > > Do we run non-PGO versions of these tests? Maybe we just hit some PGO cliff? I'm pretty sure we run both PGO and non-PGO versions of these tests (according to TBPL) but the last time I looked at this stuff was a while ago, not sure if I can provide any meaningful info here...
Flags: needinfo?(ehsan)
(In reply to David Baron [:dbaron] (UTC-7) (needinfo? for questions) from comment #2) > Do we run non-PGO versions of these tests? Yes, the branch name has the *-Non-PGO suffix on graph server: Same graphs for mozilla-central Non-PGO (branch 94): http://mzl.la/1pkNcng (Tp5 Optimized WINNT 6.1) http://mzl.la/UKM1jd (Paint WINNT 5.1)
Summary: June 19 regression in all windows talos performance numbers → June 19 regression in all windows PGO talos performance numbers
Nominating for tracking Firefox 33 given that this appears to be a 20%-70% performance regression across our primary performance tests on Windows.
regressions: Windows 7: * 10% tresize: 21.25 -> 23.75 * 16% kraken: 1650 -> 1912 * 23% dromaeo_css: 4875 -> 3750 * 30% dromaeo_dom: 1250 -> 885 * 15% session_restore: 1317 -> 1515 * 355% a11y: 165 -> 583 * 45% tpaint: 140 -> 203 * 10% ts_paint: 750 -> 823 * 15% sessionrestore_no_auto_restore: 1310 -> 1514 * 3.5% tscrollx: 3.37 -> 3.49 * 30% tsvgr_opacity: 220 -> 286 * 29% tart: 6.85 -> 8.85 * 45% cart: 43.5 -> 64 * 40% tsvgx: 212 -> 297 * 75% tp5o: 207 -> 364 * 255% tp5o_responsiveness: 38 -> 97 Windows XP: * 27% tresize: 10.3 -> 13.1 * 1% canvasmark: 6780-7000 -> 6710-6750 (lower is worse), possible noise levels * 19% session_restore: 1080 -> 1285 * 280% a11y: 170 -> 470 * 50% tpaint: 126 -> 193 * 10% ts_paint: 615 -> 675 * 18% sessionrestore_no_auto_restore: 1090 -> 1275 * 9% tscrollx: 2.4 -> 2.6, 3.1 -> 3.26 # this is bimodal and it shifted * 51% tsvgr_opacity: 352 -> 537 * 40% tart: 4.5 -> 6.45 * 38% cart: 40.25 -> 55.5 * 14% tsvgx: 484 -> 552 * 75% tp5o: 189 -> 333 * 300% tp5o_responsiveness: 29 -> 82 windows 8 is similar the offending push is: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=e589c195f61d you can see the retriggers I did here: https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=mozilla-inbound%20pgo%20talos&fromchange=70f19803d1ba&tochange=5f1041f40876 :jandem, can you take this bug and fix this.
Flags: needinfo?(jdemooij)
Yes this seems to be some kind of PGO compiler issue; there's no way my patches can regress performance like this and the non-PGO builds confirm this. See also the discussion in bug 1030706. I'll post more info in this bug tomorrow or early next week.
thanks :jandem! We can hack next week to figure out the PGO issue. Since we ship PGO this actually has a real impact on what we ship.
here is a list of all the regressions and 1 improvement as seen on mozilla.dev.tree-management: http://54.215.155.53:8080/alerts.html?rev=e589c195f61d&showAll=1
See Also: → 1030706
jmaher, is it possible this regression was fixed yesterday? Several other tests that were affected by the PGO regression seem to be fixed and I see some improvement mails on dev-tree-management. It's scary because a pretty minor string patch introduced it and another pretty small string patch "fixed" it, let's hope it stays this way.
Flags: needinfo?(jmaher)
oh, things look better. Strings are scary things for Firefox! Thanks for fixing this and following up!
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(jmaher)
Resolution: --- → FIXED
(In reply to Joel Maher (:jmaher) from comment #13) > oh, things look better. Strings are scary things for Firefox! Thanks for > fixing this and following up! To be clear, I didn't fix it intentionally. It looks like another, unrelated string patch somehow "fixed" the MSVC PGO bug... Scary because it may come back when we land another patch...
Flags: needinfo?(jdemooij)
Target Milestone: --- → mozilla33
You need to log in before you can comment on or make changes to this bug.