Closed Bug 1026742 Opened 10 years ago Closed 10 years ago

43.8% robocheck2 android regression on inbound (fx33) from revision 32c13ea4c9e4

Categories

(Core :: Graphics: Layers, defect)

33 Branch
ARM
Android
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox31 --- unaffected
firefox32 --- wontfix
firefox33 --- wontfix

People

(Reporter: jmaher, Assigned: kats)

References

Details

(Keywords: perf, regression, Whiteboard: [talos_regression])

android 4.0 has a 43.8% regression
android 2.2 has a 26.1% regression
Summary: 43.8% robocop android regression on inbound (fx33) from revision 32c13ea4c9e4 → 43.8% robocheck2 android regression on inbound (fx33) from revision 32c13ea4c9e4
I just uplifted bug 1023882 to aurora so this regression will likely show up there as well. I also pushed a bunch of try pushes to bisect which patch in the set was responsible for this. I'll update with results here.
Version: unspecified → 33 Branch
Based on the try pushes I did, the regression appears to have come from the last patch in the set, part 8. Pretty surprising to me, I'll look into it some more.

https://tbpl.mozilla.org/?tree=Try&rev=0cb92f2a5b29 (based off cset 8508e0cc42b8)
https://tbpl.mozilla.org/?tree=Try&rev=930957239032 (based off cset 32c13ea4c9e4)
The only possible way part 8 [1] could have a functional difference is if we reach the bottom of ClientTiledThebesLayer::Render with updatedLowPrecision == false && !mPaintData.mPaintFinished. In this case, we now call EndPaint() where before we would not have run that code. I pushed another try push [2] to see if changing this back fixes the robocheck2 regression.

If this is in fact the case, then I think the new behaviour is more correct, even if it regresses the talos numbers. My reasoning is that if we reach the end of that function, then we have drawn neither new high-precision nor new low-precision content. Setting the repeat transaction flag will presumably just do the same thing over and over (and cause infinite recursion), which is why we're not setting a repeat transaction flag there. So it makes sense that we should mark the paint as finished.

The difference in talos numbers probably comes from the mLastScrollOffset value not getting updated, and causing the next paint to behave differently (and incorrectly, I would think).

Chris, thoughts? I'm inclined to mark this bug WONTFIX because I think the new behaviour is more correct than it used to be, and should actually behave more consistently in user-facing scenarios.

[1] https://hg.mozilla.org/try/rev/32c13ea4c9e4
[2] https://tbpl.mozilla.org/?tree=Try&rev=7c34238ba939
Flags: needinfo?(chrislord.net)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #4)
> The only possible way part 8 [1] could have a functional difference is if we
> reach the bottom of ClientTiledThebesLayer::Render with updatedLowPrecision
> == false && !mPaintData.mPaintFinished. In this case, we now call EndPaint()
> where before we would not have run that code. I pushed another try push [2]
> to see if changing this back fixes the robocheck2 regression.
> 
> If this is in fact the case, then I think the new behaviour is more correct,
> even if it regresses the talos numbers. My reasoning is that if we reach the
> end of that function, then we have drawn neither new high-precision nor new
> low-precision content. Setting the repeat transaction flag will presumably
> just do the same thing over and over (and cause infinite recursion), which
> is why we're not setting a repeat transaction flag there. So it makes sense
> that we should mark the paint as finished.
> 
> The difference in talos numbers probably comes from the mLastScrollOffset
> value not getting updated, and causing the next paint to behave differently
> (and incorrectly, I would think).
> 
> Chris, thoughts? I'm inclined to mark this bug WONTFIX because I think the
> new behaviour is more correct than it used to be, and should actually behave
> more consistently in user-facing scenarios.
> 
> [1] https://hg.mozilla.org/try/rev/32c13ea4c9e4
> [2] https://tbpl.mozilla.org/?tree=Try&rev=7c34238ba939

This analysis sounds reasonable, but if it was correct, if anything I'd have expected tcheck to have gotten better and tscroll to get worse (i.e. I think the result of what you're saying would have been drawing either skipped or done in low precision, which would likely negatively affect tcheck numbers).

That said, I'm often finding my expectations are challenged wrt tiling/progressive rendering, so let's wait on the numbers (and make sure to get lots of repeat tests too) - it might also be worth enabling and looking at the logging output with and without this part 8 to see the metrics and paths being chosen.
Flags: needinfo?(chrislord.net)
Btw the try push [2] in comment 4 does seem to fix the regression, so it is the case that we are reaching the end of the function with updatedLowPrecision == false && !mPaintData.mPaintFinished. Previously we would exit the function at that point without scheduling a repeat transaction or finishing the paint. Now we finish the paint, and that difference is the cause of the talos regression.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #6)
> Btw the try push [2] in comment 4 does seem to fix the regression, so it is
> the case that we are reaching the end of the function with
> updatedLowPrecision == false && !mPaintData.mPaintFinished. Previously we
> would exit the function at that point without scheduling a repeat
> transaction or finishing the paint. Now we finish the paint, and that
> difference is the cause of the talos regression.

So the only difference is that the last scroll offset is being updated?
Well, the EndPaint function does this:

  mPaintData.mLastScrollOffset = mPaintData.mScrollOffset;
  mPaintData.mPaintFinished = true;
  mPaintData.mFirstPaint = false;

So technically it also sets the mPaintFinished flag to true when it wouldn't have otherwise, and sets the mFirstPaint to false when it might not have otherwise. I don't think mFirstPaint would be the cause of this regression because this would have to happen on the very first paint for it to make any difference at all.

I suppose setting mPaintFinished to true might cause it to bounce out of [1] on the next paint? I'm not really sure why that check is there, to be honest. It seems like if we get to that point we should just be painting (or trying to). That might also cause more checkerboarding; I'll do a try push with that removed to see if it helps.

[1] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/client/ClientTiledThebesLayer.cpp?rev=fc7250e9b202#353
Actually, BeginPaint() always resets mPaintFinished back to false, so there's no way that check will ever trigger.
Assignee: nobody → bugmail.mozilla
So based on some try pushes ([1], [2], [3], [4]) it looks like a third or so of the regression comes from setting the mLastScrollOffset, and the remainder comes from setting mPaintFinished = true.

[1] https://tbpl.mozilla.org/?tree=Try&rev=7c34238ba939
[2] https://tbpl.mozilla.org/?tree=Try&rev=de06cfa07135
[3] https://tbpl.mozilla.org/?tree=Try&rev=dc5155ffcfa5
[4] https://tbpl.mozilla.org/?tree=Try&rev=cbad9abf5b66
So based on the analysis above, I think this regression is acceptable because the code is more correct now.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.