3.5% Win7 Canvasmark PGO regression on November 17, 2015 (v.45) from push 029ab28c46cc

RESOLVED FIXED in Firefox 45

Status

()

defect
RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: jmaher, Assigned: mats)

Tracking

(Blocks 1 bug, {perf, regression})

Trunk
mozilla45
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox45 fixed)

Details

(Whiteboard: [talos_regression])

Attachments

(2 attachments)

(Reporter)

Description

4 years ago
This is a PGO only regression, it is not seen on non-pgo.  This is important because we ship PGO!

You can see a regression in canvasmark (higher is better):
https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,975bd3027c1a2bc77dadf2224503b425474e2e09,1]&zoom=1447680842451.128,1447877789338.346,7698.884758364313,8960.96654275093

This comes from filling in some missing builds and doing extra retriggers:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&fromchange=ee70b9acae8a&tochange=233f67378440&filter-searchStr=Windows%207%20pgo%20Talos%20Performance%20Talos%20chrome%20T%28c%29

and we come up with a list of patches checked in at once:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=029ab28c46cc

There are a lot of patches there, pgo builds take 4.5 hours, I don't mind pushing to try to narrow this down, thought I would see if someone had an idea of the culprit, or was expecting this and knows a valid reason why this is happening.
(Reporter)

Comment 1

4 years ago
:mats, can you take a brief look at this and let me know your thoughts?
Flags: needinfo?(mats)
(Assignee)

Comment 2

4 years ago
It seems unlikely any of my patches could affect this test.
Most of them only touch CSS Grid related code.  Bug 576927 is the only
one I can think of that could affect performance on anything else.
Does this test calculate font sizes *a lot*?
Flags: needinfo?(mats) → needinfo?(jmaher)
(Reporter)

Comment 3

4 years ago
I don't know a lot about this test specifically, when it runs, I don't believe it does a lot of text related stuff, it is more rendering of the canvas.

:snorp, do you have knowledge if canvasmark calculates font sizes many times?
Flags: needinfo?(jmaher) → needinfo?(snorp)
(Reporter)

Comment 4

4 years ago
I have pushed to try to bisect this down:
https://treeherder.mozilla.org/#/jobs?repo=try&author=jmaher@mozilla.com&fromchange=cef44a16b418&tochange=7f7e9ecd0d74

In ~8 hours we should have results, I can determine the cause tomorrow morning.
(In reply to Joel Maher (:jmaher) from comment #3)
> I don't know a lot about this test specifically, when it runs, I don't
> believe it does a lot of text related stuff, it is more rendering of the
> canvas.
> 
> :snorp, do you have knowledge if canvasmark calculates font sizes many times?

Nope, no idea, sorry.
Flags: needinfo?(snorp)
(Assignee)

Comment 7

4 years ago
Is this the testsuite we're talking about?
http://mxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/canvasmark/
Flags: needinfo?(jmaher)
(Assignee)

Comment 8

4 years ago
FTR, the code involved in that patch:

from gfx/src/nsCoord.h:

inline nscoord NSCoordSaturatingMultiply(nscoord aCoord, float aScale) {
  return _nscoordSaturatingMultiply(aCoord, aScale, false);
}

inline nscoord _nscoordSaturatingMultiply(nscoord aCoord, float aScale,
                                          bool requireNotNegative) {
  VERIFY_COORD(aCoord);
  if (requireNotNegative) {
    MOZ_ASSERT(aScale >= 0.0f,
               "negative scaling factors must be handled manually");
  }
#ifdef NS_COORD_IS_FLOAT
  return floorf(aCoord * aScale);
#else
  float product = aCoord * aScale;
  if (requireNotNegative ? aCoord > 0 : (aCoord > 0) == (aScale > 0))
    return NSToCoordRoundWithClamp(std::min<float>(nscoord_MAX, product));
  return NSToCoordRoundWithClamp(std::max<float>(nscoord_MIN, product));
#endif
}

inline nscoord NSToCoordRound(double aValue)
{
#if defined(XP_WIN32) && defined(_M_IX86) && !defined(__GNUC__) && !defined(__clang__)
  return NS_lroundup30((float)aValue);
#else
  return nscoord(floor(aValue + 0.5f));
#endif /* XP_WIN32 && _M_IX86 && !__GNUC__ */
}

inline nscoord NSToCoordRoundWithClamp(float aValue)
{
#ifndef NS_COORD_IS_FLOAT
  // Bounds-check before converting out of float, to avoid overflow
  if (aValue >= nscoord_MAX) {
    return nscoord_MAX;
  }
  if (aValue <= nscoord_MIN) {
    return nscoord_MIN;
  }
#endif
  return NSToCoordRound(aValue);
}


and from xpcom/ds/nsMathUtils.h:

#if defined(XP_WIN32) && defined(_M_IX86) && !defined(__GNUC__) && !defined(__clang__)
inline int32_t NS_lroundup30(float x)
{
  /* Code derived from Laurent de Soras' paper at             */
  /* http://ldesoras.free.fr/doc/articles/rounding_en.pdf     */

  /* Rounding up on Windows is expensive using the float to   */
  /* int conversion and the floor function. A faster          */
  /* approach is to use f87 rounding while assuming the       */
  /* default rounding mode of rounding to the nearest         */
  /* integer. This rounding mode, however, actually rounds    */
  /* to the nearest integer so we add the floating point      */
  /* number to itself and add our rounding factor before      */
  /* doing the conversion to an integer. We then do a right   */
  /* shift of one bit on the integer to divide by two.        */

  /* This routine doesn't handle numbers larger in magnitude  */
  /* than 2^30 but this is fine for NSToCoordRound because    */
  /* Coords are limited to 2^30 in magnitude.                 */

  static const double round_to_nearest = 0.5f;
  int i;

  __asm {
    fld     x                   ; load fp argument
    fadd    st, st(0)           ; double it
    fadd    round_to_nearest    ; add the rounding factor
    fistp   dword ptr i         ; convert the result to int
  }
  return i >> 1;                /* divide by 2 */
}
#endif /* XP_WIN32 && _M_IX86 && !__GNUC__ */
(Assignee)

Comment 9

4 years ago
So from that I would guess that it's the rounding operation that is expensive.

The original code before bug 576927 truncated the value so we could try that
and see if it helps.

BTW, after looking briefly at the tests I think it's the "FPS" and other stat
numbers that triggers these calls.  I suspect they are frequent and they pass
the font by name and set it for every single drawing operation:
http://mxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/canvasmark/scripts/canvasmark_v6.js#1202
(Reporter)

Comment 11

4 years ago
Thanks for finding a fix and pushing to try.  Looking at the numbers they seem to be half fixed (which is great), but could be fully fixed depending on the baseline used.  Overall, this seems like a win for us.

As for the canvasmark code, yes- you found the correct source.  That is my fault for not linking to the source and giving instructions for how to run it locally and on try.  I am glad you were able to figure it out.
Flags: needinfo?(jmaher)
(Assignee)

Comment 12

4 years ago
Posted patch fixSplinter Review
Assignee: nobody → mats
Attachment #8690405 - Flags: review?(roc)
Comment on attachment 8690405 [details] [diff] [review]
fix

Review of attachment 8690405 [details] [diff] [review]:
-----------------------------------------------------------------

I find it hard to believe these measurably regress performance, but OK.
Attachment #8690405 - Flags: review?(roc) → review+
(Assignee)

Updated

4 years ago
Keywords: checkin-needed
(Assignee)

Comment 15

4 years ago
I don't see any difference from this change, do you agree?
https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,975bd3027c1a2bc77dadf2224503b425474e2e09,1]
Flags: needinfo?(jmaher)
(Reporter)

Comment 16

4 years ago
I agree, this isn't showing any effect on the data after landing for the last 24+ hours.
Flags: needinfo?(jmaher)

Comment 17

4 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/1981ff2a71fa
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla45
(Assignee)

Updated

4 years ago
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 18

3 years ago
I tried this:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ffb241bad8dc
but I can't seem to get a useful graph for it at:
https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=ffb241bad8dc

Joel, can you get the results out of that Try push?
Flags: needinfo?(jmaher)
(Reporter)

Comment 19

3 years ago
I looked at the parent revision and found it on inbound, right now I have retriggered the test on inbound, so in 30 minutes or so we should have enough data in perfherder to see the real impact.  Here is the compare view for it:
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-inbound&originalRevision=cf322a94b295&newProject=try&newRevision=ffb241bad8dc&originalSignature=fcc1e66ac3fc1b53d6b620e94eafc1e2aac54363&newSignature=fcc1e66ac3fc1b53d6b620e94eafc1e2aac54363
Flags: needinfo?(jmaher)
(Reporter)

Comment 20

3 years ago
data is in, from what I can tell this is a win.
(Assignee)

Comment 21

3 years ago
Thanks!  Yes, this looks nice.  I'll see if I can make the same change to the other
clamping functions too...
(Assignee)

Comment 22

3 years ago
Posted patch fix part 2Splinter Review
fmin[f]/fmax[f] are in C99 so I think we can use it generally without #ifdef.

The numbers certainly looks impressive, but I'm not sure how much I believe
in them though...
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=c321d8403851&newProject=try&newRevision=bc6178eb89f9
Attachment #8693114 - Flags: review?(roc)
(In reply to Mats Palmgren (:mats) from comment #22)
> Created attachment 8693114 [details] [diff] [review]
> fix part 2
> 
> fmin[f]/fmax[f] are in C99 so I think we can use it generally without #ifdef.
> 
> The numbers certainly looks impressive, but I'm not sure how much I believe
> in them though...
> https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-
> central&originalRevision=c321d8403851&newProject=try&newRevision=bc6178eb89f9

As hinted at in the graphs view, you should do more retriggers on the base revision to increase confidence in your results.

If you click on the graph for (e.g.) cart, you'll see that the test has bimodal behaviour:

https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-central,8d47c06d6df551fe085883ad80cfe3fcb2d501f6,1]&series=[try,8d47c06d6df551fe085883ad80cfe3fcb2d501f6,1]&highlightedRevisions=c321d8403851&highlightedRevisions=bc6178eb89f9&timerange=604800
(In reply to William Lachance (:wlach) from comment #23)

> As hinted at in the graphs view, you should do more retriggers on the base
> revision to increase confidence in your results.
> 
> If you click on the graph for (e.g.) cart, you'll see that the test has
> bimodal behaviour:

Actually ignore that, I just realized I misread the graphs. The improvements do seem to be real, though I did a bunch of retriggers on the base (m-c) revision to be more sure.
(Assignee)

Comment 26

3 years ago
It turns out we can't use this technique in NSToCoordRoundWithClamp because it
calls NSToCoordRound which calls NS_lroundup30 on Windows and that function
can't deal with values larger than (2^30)-1.
http://hg.mozilla.org/mozilla-central/annotate/7883e81f3c30/xpcom/ds/nsMathUtils.h#l43

nscoord_MAX is 2^30 and that makes NSToCoordRound(float(nscoord_MAX)) return a negative
value on win32.  (Which was a surprise to me, I didn't know NSToCoordRound had badness
like that.)  So I landed without the change in NSToCoordRoundWithClamp to avoid this.

The author of the paper points that out and suggests using a 64-bit int for the temp
value if you need the full integer range (see the PDF link in the code).  I filed
a follow-up bug 1228858 to investigate that.
Blocks: 1228858
(Assignee)

Comment 27

3 years ago
Backed out part 2 in https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=53aa14927954
because it may have caused some Talos regressions on Android, e.g. http://mzl.la/1NXppAf

Comment 28

3 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/673461c0b772
Status: REOPENED → RESOLVED
Last Resolved: 4 years ago3 years ago
Resolution: --- → FIXED
(Reporter)

Comment 29

3 years ago
should we reopen this as the second patch was backed out?
(Assignee)

Comment 30

3 years ago
(In reply to Joel Maher (:jmaher) from comment #29)
> should we reopen this as the second patch was backed out?

I don't think so; the tcanvasmark numbers in the past 14 days have slowly
improved in periods unrelated to this bug (Nov 24 - Nov 28) and is now back
where it was before:
https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,975bd3027c1a2bc77dadf2224503b425474e2e09,1

The two clusters on Nov 29 are before/after part 2 landed, and it shows just
a minor improvement in comparison.  I'm not very confident it's significant.
(I'll investigate this more in bug 1228858 and I'll run some micro-benchmarks
on these *Clamp functions to get more accurate figures.  When time allows...)
Note that the Android regressions attributed to this bug are almost certainly misfiled due to an omitted clobber. See bug 1229118 for more details.
You need to log in before you can comment on or make changes to this bug.