Closed Bug 767215 Opened 12 years ago Closed 12 years ago

Intermittent testAxisLocking, testLoad, testFlingCorrectness, testOverscroll, testPanCorrectness | Pixel at 0, 0 - Color rgba(255,255,255,255) not close enough to expected rgb(0,0,0)

Categories

(Firefox for Android Graveyard :: General, defect)

ARM
Android
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME
Firefox 18

People

(Reporter: philor, Unassigned)

References

Details

(Keywords: intermittent-failure)

Attachments

(2 files)

+++ This bug was initially created as a clone of Bug #767214 +++

https://tbpl.mozilla.org/php/getParsedLog.php?id=12875459&tree=Mozilla-Inbound
Android Tegra 250 mozilla-inbound opt test robocop on 2012-06-21 13:14:06 PDT for push 8ec957ae5ffd
slave: tegra-150

3 INFO TEST-UNEXPECTED-FAIL | testAxisLocking | Pixel at 0, 0 - Color rgba(255,255,255,255) not close enough to expected rgb(0,0,0)
Exception caught during test!
junit.framework.AssertionFailedError: 3 INFO TEST-UNEXPECTED-FAIL | testAxisLocking | Pixel at 0, 0 - Color rgba(255,255,255,255) not close enough to expected rgb(0,0,0)
	at junit.framework.Assert.fail(Assert.java:47)
	at org.mozilla.fennec.FennecMochitestAssert._logMochitestResult(FennecMochitestAssert.java:107)
	at org.mozilla.fennec.FennecMochitestAssert.ok(FennecMochitestAssert.java:137)
	at org.mozilla.fennec.FennecMochitestAssert.ispixel(FennecMochitestAssert.java:167)
	at org.mozilla.fennec.tests.PixelTest.checkScrollWithBoxes(PixelTest.java:78)
	at org.mozilla.fennec.tests.PixelTest.loadAndVerifyBoxes(PixelTest.java:94)
	at org.mozilla.fennec.tests.testAxisLocking.testAxisLocking(testAxisLocking.java:25)
	at java.lang.reflect.Method.invokeNative(Native Method)
	at java.lang.reflect.Method.invoke(Method.java:521)
	at android.test.InstrumentationTestCase.runMethod(InstrumentationTestCase.java:204)
	at android.test.InstrumentationTestCase.runTest(InstrumentationTestCase.java:194)
	at android.test.ActivityInstrumentationTestCase2.runTest(ActivityInstrumentationTestCase2.java:186)
	at org.mozilla.fennec.tests.BaseTest.runTest(BaseTest.java:80)
	at junit.framework.TestCase.runBare(TestCase.java:127)
	at junit.framework.TestResult$1.protect(TestResult.java:106)
	at junit.framework.TestResult.runProtected(TestResult.java:124)
	at junit.framework.TestResult.run(TestResult.java:109)
	at junit.framework.TestCase.run(TestCase.java:118)
	at android.test.AndroidTestRunner.runTest(AndroidTestRunner.java:169)
	at android.test.AndroidTestRunner.runTest(AndroidTestRunner.java:154)
	at android.test.InstrumentationTestRunner.onStart(InstrumentationTestRunner.java:520)
	at android.app.Instrumentation$InstrumentationThread.run(Instrumentation.java:1447)
4 INFO TEST-UNEXPECTED-FAIL | testAxisLocking | Exception caught - junit.framework.AssertionFailedError: 3 INFO TEST-UNEXPECTED-FAIL | testAxisLocking | Pixel at 0, 0 - Color rgba(255,255,255,255) not close enough to expected rgb(0,0,0)
I saw this a couple of times running robocop tests locally. Right after the URL is typed and the awesomebar is hidden, we show a blank white screen before the page is painted. The code that checks to see if the page is painted assumes the paint is done if there are no paint updates in 500ms - so if the blank white screen is up for more than 500ms it gets mistaken for the actual page and that causes this error. One fix is to increase the 500ms delay to give the actual page more time to paint. Patch forthcoming once I test it a bit more.
Assignee: nobody → bugmail.mozilla
Attached patch Increase timeoutSplinter Review
So far this seems to work pretty well.
Attachment #635870 - Flags: review?(jmaher)
This will also fix bug 767215 and bug 767266 since they are the same issue, just on different tests.
Comment on attachment 635870 [details] [diff] [review]
Increase timeout

Review of attachment 635870 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/base/tests/PixelTest.java.in
@@ +3,5 @@
>  
>  import @ANDROID_PACKAGE_NAME@.*;
>  
>  class PixelTest extends BaseTest {
> +    private static final long PAINT_CLEAR_DELAY = 1000; // milliseconds

this scares me as we are just increasing timeouts.  I know tegras are slow, but beagle board armv6 is even slower :(
Attachment #635870 - Flags: review?(jmaher) → review+
https://hg.mozilla.org/mozilla-central/rev/43363716c8a2
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
When marking dupes of [orange]s, please change the summaries so the right bug shows up in TBPL :-)
Summary: Intermittent testAxisLocking | Pixel at 0, 0 - Color rgba(255,255,255,255) not close enough to expected rgb(0,0,0) → Intermittent testAxisLocking, testLoad | Pixel at 0, 0 - Color rgba(255,255,255,255) not close enough to expected rgb(0,0,0)
:jmaher sees the same failure consistently on panda boards - bug 783639 - but I don't know if it's the same cause.
https://tbpl.mozilla.org/php/getParsedLog.php?id=14655963&tree=Firefox
Summary: Intermittent testAxisLocking, testLoad | Pixel at 0, 0 - Color rgba(255,255,255,255) not close enough to expected rgb(0,0,0) → Intermittent testAxisLocking, testLoad, testFlingCorrectness | Pixel at 0, 0 - Color rgba(255,255,255,255) not close enough to expected rgb(0,0,0)
https://tbpl.mozilla.org/php/getParsedLog.php?id=14824843&tree=Mozilla-Inbound
Summary: Intermittent testAxisLocking, testLoad, testFlingCorrectness | Pixel at 0, 0 - Color rgba(255,255,255,255) not close enough to expected rgb(0,0,0) → Intermittent testAxisLocking, testLoad, testFlingCorrectness, testOverscroll | Pixel at 0, 0 - Color rgba(255,255,255,255) not close enough to expected rgb(0,0,0)
https://tbpl.mozilla.org/php/getParsedLog.php?id=14830903&tree=Mozilla-Inbound
Summary: Intermittent testAxisLocking, testLoad, testFlingCorrectness, testOverscroll | Pixel at 0, 0 - Color rgba(255,255,255,255) not close enough to expected rgb(0,0,0) → Intermittent testAxisLocking, testLoad, testFlingCorrectness, testOverscroll, testPanCorrectness | Pixel at 0, 0 - Color rgba(255,255,255,255) not close enough to expected rgb(0,0,0)
I have reproduced these failures on try with additional logging and determined that the (255, 255, 255) pixels are actually returned by GLES20.glReadPixels(), and glError() is 0 afterwards -- that is, all indications are that the screen is actually white at that time.

Now I'm looking at the timing of the drawFinished events relative to glReadPixels ... maybe we are reading the screen before the paint is complete?
In a typical testLoad page load, on a tegra, there are 3 drawFinished events observed by the PaintExpecter. Sometimes there are only 2; sometimes 4. In the typical case, the first 2 drawFinished events occur within about 300 ms of the page load request and the 3rd one about 1000 ms after the first drawFinished -- very close to the time the blockUntilClear waits. If that 3rd drawFinished is received before the wait expires, the test waits an additional 1000 ms and succeeds; if that 3rd drawFinished is not received in time to reset the wait, the glReadPixels occurs 1000 ms earlier than normal -- and that's often okay, but not always. 

The failures that I have observed show only the first 2 drawFinished events received, then a period of 1000 ms without any additional drawFinished events. In these cases, I also notice that the throbber is still running (after "throbber start" / before "throbber stop") when glReadPixels is called.

Increasing the PaintExpecter no-events wait from 1000 ms to 3000 ms appears to eliminate the failures. (0 failures in 800 testLoads)


I'm not satisfied with the drawFinished mechanism, so I am spinning off a separate bug: 791411.
...meanwhile, let's workaround the failures by increasing the timeout again.
Assignee: bugmail.mozilla → gbrown
Attachment #661425 - Flags: review?(jmaher)
Comment on attachment 661425 [details] [diff] [review]
increase timeout again - to 3000 ms

Review of attachment 661425 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/base/tests/PixelTest.java.in
@@ +3,5 @@
>  
>  import @ANDROID_PACKAGE_NAME@.*;
>  
>  abstract class PixelTest extends BaseTest {
> +    private static final long PAINT_CLEAR_DELAY = 3000; // milliseconds

and before long we will be at 3141 for a timeout!
Attachment #661425 - Flags: review?(jmaher) → review+
https://hg.mozilla.org/mozilla-central/rev/79106468098b
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Target Milestone: Firefox 16 → Firefox 18
An unexpected consequence of this landing was a performance improvement in trobopan:

Message: 1
Date: Sat, 15 Sep 2012 19:40:52 -0000
From: nobody@cruncher.build.mozilla.org
To: dev-tree-management@lists.mozilla.org
Subject: Talos Improvement! Robocop Pan Benchmark decrease 36.5% on
        Android        2.2 (Native) Mozilla-Inbound
Message-ID:
        <20120915194052.BE2BF104477@cruncher.srv.releng.scl3.mozilla.com>
Content-Type: text/plain; charset="us-ascii"

Improvement! Robocop Pan Benchmark decrease 36.5% on Android 2.2 (Native) Mozilla-Inbound
-----------------------------------------------------------------------------------------
    Previous: avg 29523.273 stddev 3337.132 of 30 runs up to revision 6875465535eb
    New     : avg 18735.480 stddev 2168.792 of 5 runs since revision 79106468098b
    Change  : -10787.793 (36.5% / z=3.233)
    Graph   : http://mzl.la/PufJQv

Changeset range: http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=6875465535eb&tochange=79106468098b

Changesets:
  * http://hg.mozilla.org/integration/mozilla-inbound/rev/79106468098b
    : Geoff Brown <gbrown@mozilla.com> - Bug 767215 - Robocop: increase wait time for paint events; r=jmaher
    : http://bugzilla.mozilla.org/show_bug.cgi?id=767215

Bugs:
  * http://bugzilla.mozilla.org/show_bug.cgi?id=767215 - Intermittent testAxisLocking, testLoad, testFlingCorrectness, testOverscroll, testPanCorrectness | Pixel at 0, 0 - Color rgba(255,255,255,255) not close enough to expected rgb(0,0,0)

I note that the trobopan times were also much more variable before the increase in wait time. I suppose that the test was previously sometimes beginning to pan before the page load was complete.
ryanvm%gmail.com
https://tbpl.mozilla.org/php/getParsedLog.php?id=15435048&tree=Mozilla-Inbound
Android Armv6 Tegra 250 mozilla-inbound opt test robocop on 2012-09-21 22:28:19
slave: tegra-082

3 INFO TEST-UNEXPECTED-FAIL | testPanCorrectness | Pixel at 0, 0 - Color rgba(255,255,255,255) not close enough to expected rgb(0,0,0)
junit.framework.AssertionFailedError: 3 INFO TEST-UNEXPECTED-FAIL | testPanCorrectness | Pixel at 0, 0 - Color rgba(255,255,255,255) not close enough to expected rgb(0,0,0)
4 INFO TEST-UNEXPECTED-FAIL | testPanCorrectness | Exception caught - junit.framework.AssertionFailedError: 3 INFO TEST-UNEXPECTED-FAIL | testPanCorrectness | Pixel at 0, 0 - Color rgba(255,255,255,255) not close enough to expected rgb(0,0,0)
buildbot.slave.commands.TimeoutError: command timed out: 2400 seconds without output, killing pid 73337
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This is the #1 toporange of the last few days and as such is causing aggravation for sheriffs and undermining the drive to bring back people's confidence in android tests. 

Tests disabled for now:
https://hg.mozilla.org/integration/mozilla-inbound/rev/45ddecb22b2f
Whiteboard: [orange] → [orange][test disabled][leave open]
Thanks Ed - I was thinking of doing that.

I'll note that the last "fix" - increasing the timeout to 3s - appears to have been effective for several days. There were almost no m-i/m-c failures for this bug between the fix landing on Sept 15 and Sept 21.

The corresponding Sept 15 improvement in trobopan (comment 85) was lost on Sept 21, the regression blamed on bug 716575. Significant m-i/m-c failures for this bug resumed the following Monday, Sept 24.
(In reply to Ed Morley [:edmorley UTC+1] from comment #185)
> This is the #1 toporange of the last few days and as such is causing
> aggravation for sheriffs and undermining the drive to bring back people's
> confidence in android tests. 
> 
> Tests disabled for now:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/45ddecb22b2f

https://hg.mozilla.org/releases/mozilla-aurora/rev/b06d10b5f6e7
I tried increasing the timeout again to 7s, but failures continued with about the same frequency as at 3s...there's something else going wrong.
Bug 797942 is a plausible but unverified cause for these failures.
Depends on: 797942
(In reply to Geoff Brown [:gbrown] from comment #201)
> Bug 797942 is a plausible but unverified cause for these failures.

Bug 797942 was not to blame: these rc failures continued when the bug 797942 fix landed.
But now, I cannot reproduce these failures:

https://tbpl.mozilla.org/?tree=Try&rev=acfbf7dac35b

(15 android and 15 android-armv6 successful rc runs with these tests enabled).

Any objections to enabling:

    1.12 +[testLoad]
    1.15 +[testPanCorrectness]
    1.20 +[testFlingCorrectness]
    1.21 +[testOverscroll]
    1.22 +[testAxisLocking]
    1.25 +[testWebContentContextMenu]

?
Whiteboard: [orange][test disabled][leave open] → [orange][leave open]
I don't see any new failures on m-c/m-i. Probably best to leave open for now in case there is still an intermittent problem...
Assignee: gbrown → nobody
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → WORKSFORME
Whiteboard: [orange][leave open] → [orange]
Whiteboard: [orange]
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.