Closed Bug 741988 Opened 8 years ago Closed 8 years ago

Apparent regression in checkerboarding on taskjs.org benchmark on 2012-03-28

Categories

(Firefox for Android :: General, defect)

ARM
Android
defect
Not set

Tracking

()

RESOLVED INVALID
Tracking Status
firefox14 --- fixed
blocking-fennec1.0 --- +

People

(Reporter: wlach, Assigned: kats)

References

Details

(Whiteboard: [gfx])

Attachments

(1 file)

If you look at the eideticker dashboard, it looks like there's been a fairly big regression in checkerboarding on the taskjs benchmark as of the 28th nightly

http://wrla.ch/eideticker/dashboard/#/checkerboarding

If you actually look at the videos, you can see this difference for yourself:

BEFORE: http://wrla.ch/eideticker/dashboard/videos/video-1332914705.89.webm
AFTER: http://wrla.ch/eideticker/dashboard/videos/video-1333001105.18.webm
OS: Linux → Android
Hardware: x86_64 → ARM
Narrowed it down by running regression test manually with eideticker. It looks like the culprit is bug 729528.

Checkerboard area/duration measures (sum of percentage of frames checkerboarded over the taskjs capture):

Revision 263150bd32bb (just before the patchset in bug 729528 was committed): 59.61821547874225
Revision c69c54b0bb11 (just after the patchset was committed): 178.4899762327045
Assignee: nobody → bugmail.mozilla
Bug 729528 shouldn't have changed any of the default behaviour, so I don't see how that could trigger this change. From the videos it looks like the "After" case does an extra draw of the wrong area, which could be caused just by the order in which events get queued and handled. It's strange that it's so reproducible though, if that were the only problem.

:wlach, if I put together a couple of builds with partial backouts of that bug, would you be able to run them to see which specific patch introduced the problem?
(In reply to Kartikaya Gupta (:kats) from comment #2)

> :wlach, if I put together a couple of builds with partial backouts of that
> bug, would you be able to run them to see which specific patch introduced
> the problem?

Absolutely!
blocking-fennec1.0: --- → beta+
I've uploaded builds to http://people.mozilla.org/~kgupta/tmp/ - they are named after the rev they were built from.
Did a manual bisection, looks like commit 56aab2bfad06 is to blame:

ff25df0d0a27: 152.545836371
56aab2bfad06: 180.23932065
3e5eee67741f: 57.3819290836
Apparently calling setResolution even with the same resolution will trigger an extra draw, which is what caused the regression. :wlach got me up and running with eideticker so I tested this patch and it does appear to fix the regression.
Attachment #612659 - Flags: review?(chrislord.net)
Comment on attachment 612659 [details] [diff] [review]
Fix for regression

Review of attachment 612659 [details] [diff] [review]:
-----------------------------------------------------------------

One nit, looks fine.

::: mobile/android/chrome/content/browser.js
@@ +1643,2 @@
>        cwu.setResolution(resolution, resolution);
> +    } else if (BrowserApp.selectedTab != this && resolution != zoom) {

Although I hope that this doesn't matter, this if/else could be logically simplified to

if (BrowserApp.selectedTab == this) {
  if (resolution != this._drawZoom) {
    ...
  }
} else if (resolution != zoom) {
  ...
}

I think this reads better, perhaps.
Attachment #612659 - Flags: review?(chrislord.net) → review+
Landed on inbound with the adjusted if condition.

https://hg.mozilla.org/integration/mozilla-inbound/rev/2f28c73810cd
Target Milestone: --- → Firefox 14
https://hg.mozilla.org/mozilla-central/rev/2f28c73810cd
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Hmm, unfortunately this regression still seems to be present. Not sure what's going on, maybe somebody broke it again after your commit -- unfortunately we don't have data for the last few days because the dashboard was down. :(

http://wrla.ch/eideticker/dashboard/#/scrolling

I can try to do some manual regression finding tomorrow.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to William Lachance (:wlach) from comment #10)
> Hmm, unfortunately this regression still seems to be present. Not sure
> what's going on, maybe somebody broke it again after your commit --
> unfortunately we don't have data for the last few days because the dashboard
> was down. :(
> 
> http://wrla.ch/eideticker/dashboard/#/scrolling
> 
> I can try to do some manual regression finding tomorrow.

Did some digging. It looks like the patch did not actually fix the problem. :kats is looking into it.
I created a try build using this patch applied directly on top of 39e2091a2e20 (which is the last patch on bug 729528.

https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/kgupta@mozilla.com-0837dc27fdd9/try-android/

Running this on the eideticker checkerboard test gave me this result:

Checkerboard area/duration (sum of percents NOT percentage): 54.745578024

This value indicates the fix does work, but that some other change between 39e2091a2e20 and 2f28c73810cd has also caused or propagated the regression.
Whiteboard: [gfx]
Kats, is there a reason this patch hasn't landed yet?  Are you concerned it has other negative impacts?
The patch landed already, the bug was reopened after that. There was some other change between 39e2091a2e20 and 2f28c73810cd which also introduced a regression, but which was masked. In order to find out which one, we'll need to bisect between those two changes with 2f28c73810cd applied on top. i started doing that but was trying to think of a faster way to go about it. i can't think of anything so i'll continue bisecting.
So while bisecting I happened to run the same build twice on eideticker. On the first run I got a checkerboard value of 66.0632250463 and on the second run I got a value of 184.689369693. Given the wide range in values on a single build I'm not sure if our drawing/checkerboarding is deterministic enough to be reliable here. It might be worthwhile to modify eideticker to run each build through a few times and report the mean and stddev to at least know how wide of a range we're dealing with.
Adding wlach for comment #15.
(In reply to Kartikaya Gupta (:kats) from comment #15)
> So while bisecting I happened to run the same build twice on eideticker. On
> the first run I got a checkerboard value of 66.0632250463 and on the second
> run I got a value of 184.689369693. Given the wide range in values on a
> single build I'm not sure if our drawing/checkerboarding is deterministic
> enough to be reliable here. It might be worthwhile to modify eideticker to
> run each build through a few times and report the mean and stddev to at
> least know how wide of a range we're dealing with.

I added an option to the get-metric-for-build tool which allows this:

https://github.com/mozilla/eideticker/commit/afa5289fcffecfdc8b80ecc52015e0537a1eff65

Nothing fancy but it gets the job done.
Spoke to kats about this, he will test to see if TCheckerboard or TCheckerboard2 show the same regression.
I triggered tcheckerboard on some older m-c tbpl pushes. Results:

Revision     | Push time       | Score
-------------+-----------------+-------
244991519f53 | Mar 27 18:34:18 | 0.71
a996d3df12a3 | Mar 28 04:54:29 | 0.72
fb23c30e3d60 | Mar 28 07:11:03 | 0.73
c3fd0768d46a | Mar 28 10:41:10 | 0.66 <-- regression?
183eed448b9d | Mar 28 17:33:00 | 0.67
f0a104d6cc36 | Mar 28 17:52:28 | 0.68
36d7b3b02016 | Mar 29 04:23:08 | 0.70
563ea372f000 | Mar 29 05:48:15 | 0.69
ff3521bc6559 | Mar 29 11:38:30 | 0.69

So there is a regression from 0.73 down to 0.66 in the middle there but it seems to fix itself. I can't seem to run tcheckerboard2 on those older builds.
comments 10 and 19 seem to be contradictory. Should this be closed?
(In reply to Brad Lassey [:blassey] from comment #20)
> comments 10 and 19 seem to be contradictory. Should this be closed?

tcheckerboard is not measuring the same thing as taskjs.org (I believe it's testing checkerboarding on wikipedia.org).

Now that we're doing 5 runs on nightly, it's even more apparent/certain that there is a regression here:

http://wrla.ch/eideticker/dashboard/#/taskjs-scrolling/checkerboard

I'm not fully up to date on everything you guys are doing, but as far as I am aware we still don't know the exact cause of what happened here. I don't think we should close this bug until we do.
wlach, kats:  Killer work here.  Love seeing you guys going after regressions like this.  It's tedious, I know, but it's critical to our success.  When we see regressions, it's imperative we keep digging until resolved.  Otherwise we'll eventually bury ourselves in regressions with no clear way to dig ourselves out.  

Keep it up!  :)
(In reply to William Lachance (:wlach) from comment #21)
> I don't
> think we should close this bug until we do.

Yeah, I agree. It's just annoying because there's no easy way to debug the problem directly, and bisecting it means having to create builds manually and run them on the eideticker box which unfortunately breaks every so often :(
So :wlach suggested modifying tcheckerboard to use taskjs and bisect that instead. I did the modification and tried to verify I could reproduce the regression. I ran it on the Galaxy Tab with three builds (twice for each build):

39e2091a2e20 + fix: 0.4368016, 0.46569595  // regression was introduced at this cset
1e7736676d66 + fix: 0.496279, 0.460936     // random cset in the middle of the range
55ddaad4f602 + fix: 0.45075315, 0.44146767 // cset near the end of the range

higher numbers are better with tcheckerboard, and these numbers seem to indicate mostly about the same performance with a little improvement in the middle. Based on this I'd say I can't reproduce the regression using tcheckerboard with taskjs.

kbrosnan said he would be willing to try and bisect this on the eideticker box using custom builds the way I was doing initially (but probably with multiple runs per build to control for variability in the results).

:kbrosnan, the range for the regression I'm looking for is roughly http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=08227c5bef9b&tochange=2f28c73810cd
The story, summarized is that I introduced one regression in 39e2091a2e20 and fixed it in 2f28c73810cd. Applying the patch from 2f28c73810cd directly on top of 39e2091a2e20 and running it through eideticker verifies that it fixes the regression. However, something else introduced between these two csets introduced a second regression, which was masked by the first one. Therefore upon fixing the first one in 2f28c73810cd, the eideticker numbers did not go down as anticipated. To bisect for the second regression, we basically need to take the patch from 2f28c73810cd and apply it on top of various csets in the regression range and then use that to do the bisection. That way the masking effect of the first regression is removed.
Keywords: qawanted
blocking-fennec1.0: beta+ → +
My recent work in bug 748143 for letting eideticker work without HDMI capture might be useful for tracking this down, as it will allow you to more closely reproduce the taskjs's test actual behaviour (instead of using tcheckerboard, which works slightly differently). See also: http://wrla.ch/blog/2012/04/eideticker-with-less-eideticker/
Ok, so we all (me, kats, and kbrosnan) spent a few days being confused about the fact that there didn't seem to be a regression with newer builds. It turns out you need the old eideticker scrolling behaviour (see https://github.com/mozilla/eideticker/commit/55d63960dc0a5090cee00fe917a851db082ee0fd) to reproduce this one.

Here's a testrun of two builds (one from the 26th, the other from the 30th) with the new scrolling behaviour:

Old (26th): [181.9594292, 156.05508, 150.853626]
New (30th): [121.32609, 160.91015, 176.67826]

(arguably these are indistinguishable)

Here's a testrun of the same two builds with the old scrolling behaviour:

Old (26th): [65.3507153, 59.784056, 56.437683]
New (30th): [105.7579747, 59.385510000000004, 107.1115977]

To get the old scrolling behaviour, configure eideticker as normal then download this file...

https://raw.github.com/mozilla/eideticker/afa5289fcffecfdc8b80ecc52015e0537a1eff65/bin/devicecontroller.py

... and copy it into eideticker's bin/ directory.

So I think now we know how to test the bisection properly... just a matter of doing it now. Apologies for any confusion, it looks like this is a very subtle regression (which may diminish it's importance, though that's not my call).
Additional testrun I did with the old scrolling behaviour (just to make sure the numbers weren't just a fluke):

Old(26th): [58.410141, 57.07101599999999, 64.479705, 59.827531, 55.210143]
New(30th): [109.5797106, 110.25797, 106.4376857, 108.3811577, 58.33913199999999]
Target Milestone: Firefox 14 → ---
based on the information we've been able to gather, it doesn't look like there is a regression here
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → INVALID
Keywords: qawanted
You need to log in before you can comment on or make changes to this bug.