1226627 - 3.5% Win7 Canvasmark PGO regression on November 17, 2015 (v.45) from push 029ab28c46cc

Reporter

Description

•

9 years ago

This is a PGO only regression, it is not seen on non-pgo. This is important because we ship PGO! You can see a regression in canvasmark (higher is better): https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,975bd3027c1a2bc77dadf2224503b425474e2e09,1]&zoom=1447680842451.128,1447877789338.346,7698.884758364313,8960.96654275093 This comes from filling in some missing builds and doing extra retriggers: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&fromchange=ee70b9acae8a&tochange=233f67378440&filter-searchStr=Windows%207%20pgo%20Talos%20Performance%20Talos%20chrome%20T%28c%29 and we come up with a list of patches checked in at once: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=029ab28c46cc There are a lot of patches there, pgo builds take 4.5 hours, I don't mind pushing to try to narrow this down, thought I would see if someone had an idea of the culprit, or was expecting this and knows a valid reason why this is happening.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 1

•

9 years ago

:mats, can you take a brief look at this and let me know your thoughts?

Flags: needinfo?(mats)

Mats Palmgren (inactive)

Assignee

Comment 2

•

9 years ago

It seems unlikely any of my patches could affect this test. Most of them only touch CSS Grid related code. Bug 576927 is the only one I can think of that could affect performance on anything else. Does this test calculate font sizes *a lot*?

Flags: needinfo?(mats) → needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 3

•

9 years ago

I don't know a lot about this test specifically, when it runs, I don't believe it does a lot of text related stuff, it is more rendering of the canvas. :snorp, do you have knowledge if canvasmark calculates font sizes many times?

Flags: needinfo?(jmaher) → needinfo?(snorp)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 4

•

9 years ago

I have pushed to try to bisect this down: https://treeherder.mozilla.org/#/jobs?repo=try&author=jmaher@mozilla.com&fromchange=cef44a16b418&tochange=7f7e9ecd0d74 In ~8 hours we should have results, I can determine the cause tomorrow morning.

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 5

•

9 years ago

(In reply to Joel Maher (:jmaher) from comment #3) > I don't know a lot about this test specifically, when it runs, I don't > believe it does a lot of text related stuff, it is more rendering of the > canvas. > > :snorp, do you have knowledge if canvasmark calculates font sizes many times? Nope, no idea, sorry.

Flags: needinfo?(snorp)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 6

•

9 years ago

oh, try is 90-120 minutes faster for PGO windows builds- I have results and the culprit is: https://hg.mozilla.org/integration/mozilla-inbound/rev/4b328d0b448c (bug 576927) :mats, your initial guess was correct :) Here is a breakdown of the subtests of canvasmark if it helps: https://treeherder.allizom.org/perf.html#/comparesubtest?originalProject=try&originalRevision=465d422c1fcb&newProject=try&newRevision=7d396a39245c&originalSignature=fcc1e66ac3fc1b53d6b620e94eafc1e2aac54363&newSignature=fcc1e66ac3fc1b53d6b620e94eafc1e2aac54363

Blocks: 576927

Mats Palmgren (inactive)

Assignee

Comment 7

•

9 years ago

Is this the testsuite we're talking about? http://mxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/canvasmark/

Flags: needinfo?(jmaher)

Mats Palmgren (inactive)

Assignee

Comment 8

•

9 years ago

FTR, the code involved in that patch: from gfx/src/nsCoord.h: inline nscoord NSCoordSaturatingMultiply(nscoord aCoord, float aScale) { return _nscoordSaturatingMultiply(aCoord, aScale, false); } inline nscoord _nscoordSaturatingMultiply(nscoord aCoord, float aScale, bool requireNotNegative) { VERIFY_COORD(aCoord); if (requireNotNegative) { MOZ_ASSERT(aScale >= 0.0f, "negative scaling factors must be handled manually"); } #ifdef NS_COORD_IS_FLOAT return floorf(aCoord * aScale); #else float product = aCoord * aScale; if (requireNotNegative ? aCoord > 0 : (aCoord > 0) == (aScale > 0)) return NSToCoordRoundWithClamp(std::min<float>(nscoord_MAX, product)); return NSToCoordRoundWithClamp(std::max<float>(nscoord_MIN, product)); #endif } inline nscoord NSToCoordRound(double aValue) { #if defined(XP_WIN32) && defined(_M_IX86) && !defined(__GNUC__) && !defined(__clang__) return NS_lroundup30((float)aValue); #else return nscoord(floor(aValue + 0.5f)); #endif /* XP_WIN32 && _M_IX86 && !__GNUC__ */ } inline nscoord NSToCoordRoundWithClamp(float aValue) { #ifndef NS_COORD_IS_FLOAT // Bounds-check before converting out of float, to avoid overflow if (aValue >= nscoord_MAX) { return nscoord_MAX; } if (aValue <= nscoord_MIN) { return nscoord_MIN; } #endif return NSToCoordRound(aValue); } and from xpcom/ds/nsMathUtils.h: #if defined(XP_WIN32) && defined(_M_IX86) && !defined(__GNUC__) && !defined(__clang__) inline int32_t NS_lroundup30(float x) { /* Code derived from Laurent de Soras' paper at */ /* http://ldesoras.free.fr/doc/articles/rounding_en.pdf */ /* Rounding up on Windows is expensive using the float to */ /* int conversion and the floor function. A faster */ /* approach is to use f87 rounding while assuming the */ /* default rounding mode of rounding to the nearest */ /* integer. This rounding mode, however, actually rounds */ /* to the nearest integer so we add the floating point */ /* number to itself and add our rounding factor before */ /* doing the conversion to an integer. We then do a right */ /* shift of one bit on the integer to divide by two. */ /* This routine doesn't handle numbers larger in magnitude */ /* than 2^30 but this is fine for NSToCoordRound because */ /* Coords are limited to 2^30 in magnitude. */ static const double round_to_nearest = 0.5f; int i; __asm { fld x ; load fp argument fadd st, st(0) ; double it fadd round_to_nearest ; add the rounding factor fistp dword ptr i ; convert the result to int } return i >> 1; /* divide by 2 */ } #endif /* XP_WIN32 && _M_IX86 && !__GNUC__ */

Mats Palmgren (inactive)

Assignee

Comment 9

•

9 years ago

So from that I would guess that it's the rounding operation that is expensive. The original code before bug 576927 truncated the value so we could try that and see if it helps. BTW, after looking briefly at the tests I think it's the "FPS" and other stat numbers that triggers these calls. I suspect they are frequent and they pass the font by name and set it for every single drawing operation: http://mxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/canvasmark/scripts/canvasmark_v6.js#1202

Mats Palmgren (inactive)

Assignee

Comment 10

•

9 years ago

This should fix it I think: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ca371aedc56b

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 11

•

9 years ago

Thanks for finding a fix and pushing to try. Looking at the numbers they seem to be half fixed (which is great), but could be fully fixed depending on the baseline used. Overall, this seems like a win for us. As for the canvasmark code, yes- you found the correct source. That is my fault for not linking to the source and giving instructions for how to run it locally and on try. I am glad you were able to figure it out.

Flags: needinfo?(jmaher)

Mats Palmgren (inactive)

Assignee

Comment 12

•

9 years ago

Attached patch fix — Details — Splinter Review

Assignee: nobody → mats

Attachment #8690405 - Flags: review?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 13

•

9 years ago

Comment on attachment 8690405 [details] [diff] [review] fix Review of attachment 8690405 [details] [diff] [review]: ----------------------------------------------------------------- I find it hard to believe these measurably regress performance, but OK.

Attachment #8690405 - Flags: review?(roc) → review+

Mats Palmgren (inactive)

Assignee

Updated

•

9 years ago

Keywords: checkin-needed

Pulsebot

Comment 14

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/1981ff2a71fa

Keywords: checkin-needed

Mats Palmgren (inactive)

Assignee

Comment 15

•

9 years ago

I don't see any difference from this change, do you agree? https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,975bd3027c1a2bc77dadf2224503b425474e2e09,1]

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 16

•

9 years ago

I agree, this isn't showing any effect on the data after landing for the last 24+ hours.

Flags: needinfo?(jmaher)

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 17

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/1981ff2a71fa

Status: NEW → RESOLVED

Closed: 9 years ago

status-firefox45: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla45

Mats Palmgren (inactive)

Assignee

Updated

•

9 years ago

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Mats Palmgren (inactive)

Assignee

Comment 18

•

9 years ago

I tried this: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ffb241bad8dc but I can't seem to get a useful graph for it at: https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=ffb241bad8dc Joel, can you get the results out of that Try push?

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 19

•

9 years ago

I looked at the parent revision and found it on inbound, right now I have retriggered the test on inbound, so in 30 minutes or so we should have enough data in perfherder to see the real impact. Here is the compare view for it: https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-inbound&originalRevision=cf322a94b295&newProject=try&newRevision=ffb241bad8dc&originalSignature=fcc1e66ac3fc1b53d6b620e94eafc1e2aac54363&newSignature=fcc1e66ac3fc1b53d6b620e94eafc1e2aac54363

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 20

•

9 years ago

data is in, from what I can tell this is a win.

Mats Palmgren (inactive)

Assignee

Comment 21

•

9 years ago

Thanks! Yes, this looks nice. I'll see if I can make the same change to the other clamping functions too...

Mats Palmgren (inactive)

Assignee

Comment 22

•

9 years ago

Attached patch fix part 2 — Details — Splinter Review

fmin[f]/fmax[f] are in C99 so I think we can use it generally without #ifdef. The numbers certainly looks impressive, but I'm not sure how much I believe in them though... https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=c321d8403851&newProject=try&newRevision=bc6178eb89f9

Attachment #8693114 - Flags: review?(roc)

William Lachance (:wlach)

Comment 23

•

9 years ago

(In reply to Mats Palmgren (:mats) from comment #22) > Created attachment 8693114 [details] [diff] [review] > fix part 2 > > fmin[f]/fmax[f] are in C99 so I think we can use it generally without #ifdef. > > The numbers certainly looks impressive, but I'm not sure how much I believe > in them though... > https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla- > central&originalRevision=c321d8403851&newProject=try&newRevision=bc6178eb89f9 As hinted at in the graphs view, you should do more retriggers on the base revision to increase confidence in your results. If you click on the graph for (e.g.) cart, you'll see that the test has bimodal behaviour: https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-central,8d47c06d6df551fe085883ad80cfe3fcb2d501f6,1]&series=[try,8d47c06d6df551fe085883ad80cfe3fcb2d501f6,1]&highlightedRevisions=c321d8403851&highlightedRevisions=bc6178eb89f9&timerange=604800

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

9 years ago

Attachment #8693114 - Flags: review?(roc) → review+

William Lachance (:wlach)

Comment 24

•

9 years ago

(In reply to William Lachance (:wlach) from comment #23) > As hinted at in the graphs view, you should do more retriggers on the base > revision to increase confidence in your results. > > If you click on the graph for (e.g.) cart, you'll see that the test has > bimodal behaviour: Actually ignore that, I just realized I misread the graphs. The improvements do seem to be real, though I did a bunch of retriggers on the base (m-c) revision to be more sure.

Pulsebot

Comment 25

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/673461c0b772

Mats Palmgren (inactive)

Assignee

Comment 26

•

9 years ago

It turns out we can't use this technique in NSToCoordRoundWithClamp because it calls NSToCoordRound which calls NS_lroundup30 on Windows and that function can't deal with values larger than (2^30)-1. http://hg.mozilla.org/mozilla-central/annotate/7883e81f3c30/xpcom/ds/nsMathUtils.h#l43 nscoord_MAX is 2^30 and that makes NSToCoordRound(float(nscoord_MAX)) return a negative value on win32. (Which was a surprise to me, I didn't know NSToCoordRound had badness like that.) So I landed without the change in NSToCoordRoundWithClamp to avoid this. The author of the paper points that out and suggests using a 64-bit int for the temp value if you need the full integer range (see the PDF link in the code). I filed a follow-up bug 1228858 to investigate that.

Blocks: 1228858

Mats Palmgren (inactive)

Assignee

Comment 27

•

9 years ago

Backed out part 2 in https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=53aa14927954 because it may have caused some Talos regressions on Android, e.g. http://mzl.la/1NXppAf

Carsten Book [:Tomcat]

Comment 28

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/673461c0b772

Status: REOPENED → RESOLVED

Closed: 9 years ago → 9 years ago

Resolution: --- → FIXED

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 29

•

9 years ago

should we reopen this as the second patch was backed out?

Mats Palmgren (inactive)

Assignee

Comment 30

•

9 years ago

(In reply to Joel Maher (:jmaher) from comment #29) > should we reopen this as the second patch was backed out? I don't think so; the tcanvasmark numbers in the past 14 days have slowly improved in periods unrelated to this bug (Nov 24 - Nov 28) and is now back where it was before: https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,975bd3027c1a2bc77dadf2224503b425474e2e09,1 The two clusters on Nov 29 are before/after part 2 landed, and it shows just a minor improvement in comparison. I'm not very confident it's significant. (I'll investigate this more in bug 1228858 and I'll run some micro-benchmarks on these *Clamp functions to get more accurate figures. When time allows...)

William Lachance (:wlach)

Updated

•

9 years ago

Comment 31

•

9 years ago

Note that the Android regressions attributed to this bug are almost certainly misfiled due to an omitted clobber. See bug 1229118 for more details.

fix 9 years ago Mats Palmgren (inactive) 1.39 KB, patch	roc : review+	Details \| Diff \| Splinter Review
fix part 2 9 years ago Mats Palmgren (inactive) 3.32 KB, patch	roc : review+	Details \| Diff \| Splinter Review