Closed
Bug 814282
Opened 12 years ago
Closed 12 years ago
Intermittent testVkbOverlap | testVkbOverlap - Found 0 green pixels after tapping; expected 2428
Categories
(Firefox for Android Graveyard :: Keyboards and IME, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
Firefox 22
People
(Reporter: RyanVM, Assigned: kats)
References
Details
(Keywords: intermittent-failure)
Attachments
(3 files)
https://tbpl.mozilla.org/php/getParsedLog.php?id=17251801&tree=Mozilla-Inbound
Android Armv6 Tegra 250 mozilla-inbound opt test robocop on 2012-11-21 12:44:20 PST for push 8f61af760950
slave: tegra-055
0 INFO SimpleTest START
1 INFO TEST-START | testVkbOverlap
2 INFO TEST-PASS | testVkbOverlap | Awesomebar URL typed properly - http://mochi.test:8888/tests/robocop/robocop_input.html should equal http://mochi.test:8888/tests/robocop/robocop_input.html
3 INFO TEST-PASS | testVkbOverlap | testInputVisible - Found 2428 green pixels after scrolling
4 INFO TEST-UNEXPECTED-FAIL | testVkbOverlap | testVkbOverlap - Found 0 green pixels after tapping; expected 2428
Exception caught during test!
junit.framework.AssertionFailedError: 4 INFO TEST-UNEXPECTED-FAIL | testVkbOverlap | testVkbOverlap - Found 0 green pixels after tapping; expected 2428
at junit.framework.Assert.fail(Assert.java:47)
at org.mozilla.fennec.FennecMochitestAssert._logMochitestResult(FennecMochitestAssert.java:107)
at org.mozilla.fennec.FennecMochitestAssert.ok(FennecMochitestAssert.java:136)
at org.mozilla.fennec.tests.testVkbOverlap.testVkbOverlap(testVkbOverlap.java:55)
at java.lang.reflect.Method.invokeNative(Native Method)
at java.lang.reflect.Method.invoke(Method.java:521)
at android.test.InstrumentationTestCase.runMethod(InstrumentationTestCase.java:204)
at android.test.InstrumentationTestCase.runTest(InstrumentationTestCase.java:194)
at android.test.ActivityInstrumentationTestCase2.runTest(ActivityInstrumentationTestCase2.java:186)
at org.mozilla.fennec.tests.BaseTest.runTest(BaseTest.java:120)
at junit.framework.TestCase.runBare(TestCase.java:127)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:118)
at android.test.AndroidTestRunner.runTest(AndroidTestRunner.java:169)
at android.test.AndroidTestRunner.runTest(AndroidTestRunner.java:154)
at android.test.InstrumentationTestRunner.onStart(InstrumentationTestRunner.java:520)
at android.app.Instrumentation$InstrumentationThread.run(Instrumentation.java:1447)
5 INFO TEST-UNEXPECTED-FAIL | testVkbOverlap | Exception caught - junit.framework.AssertionFailedError: 4 INFO TEST-UNEXPECTED-FAIL | testVkbOverlap | testVkbOverlap - Found 0 green pixels after tapping; expected 2428
6 INFO TEST-END | testVkbOverlap | finished in 47101ms
7 INFO TEST-START | Shutdown
8 INFO Passed: 2
9 INFO Failed: 2
10 INFO Todo: 0
11 INFO SimpleTest FINISHED
![]() |
||
Comment 1•12 years ago
|
||
Running locally on a tegra, I normally see 4 drawFinished notifications following the testInputVisible assertion:
11-23 21:23:06.261 17865 17873 I Robocop : 3 INFO TEST-PASS | testVkbOverlap | testInputVisible - Found 2428 green pixels after scrolling
11-23 21:23:06.261 17865 17873 D RobocopMotionEventHelper: Triggering down at (5.0,690.0)
11-23 21:23:06.261 17865 17873 D RobocopMotionEventHelper: Triggering up at (5.0,690.0)
11-23 21:23:06.291 17865 17922 D Robocop : Received drawFinished notification
11-23 21:23:06.301 17865 17877 I GeckoScreenshot: rect: 678.000000, 0.000000, 154.000000, 696.000000
11-23 21:23:06.614 17865 17922 D Robocop : Received drawFinished notification
11-23 21:23:06.614 17865 17877 I GeckoScreenshot: rect: 678.000000, 0.000000, 154.000000, 696.000000
11-23 21:23:06.634 17865 17922 D Robocop : Received drawFinished notification
11-23 21:23:06.861 1025 1309 D dalvikvm: GC_EXPLICIT freed 14542 objects / 680640 bytes in 43ms
11-23 21:23:06.891 17865 17922 D Robocop : Received drawFinished notification
11-23 21:23:06.891 1025 1026 D dalvikvm: GREF has decreased to 1099
11-23 21:23:07.391 17865 17873 D dalvikvm: GC_FOR_MALLOC freed 2775 objects / 3012792 bytes in 40ms
11-23 21:23:07.391 17865 17873 I dalvikvm-heap: Grow heap (frag case) to 8.310MB for 2846736-byte allocation
11-23 21:23:07.442 17865 17867 D dalvikvm: GC_FOR_MALLOC freed 108 objects / 4864 bytes in 45ms
11-23 21:23:11.401 17865 17873 I Robocop : 4 INFO TEST-PASS | testVkbOverlap | testVkbOverlap - Found 2352 green pixels after tapping; expected 2428
From the failure log in the Description, there is only 1 drawFinished:
11-21 13:36:30.873 I/Robocop (10710): 3 INFO TEST-PASS | testVkbOverlap | testInputVisible - Found 2428 green pixels after scrolling
11-21 13:36:30.873 D/RobocopMotionEventHelper(10710): Triggering down at (5.0,690.0)
11-21 13:36:30.873 D/RobocopMotionEventHelper(10710): Triggering up at (5.0,690.0)
11-21 13:36:30.903 D/Robocop (10710): Received drawFinished notification
11-21 13:36:30.903 I/GeckoScreenshot(10710): rect: 678.000000, 0.000000, 154.000000, 696.000000
11-21 13:36:31.393 I/dalvikvm-heap(10710): Grow heap (frag case) to 7.594MB for 2846736-byte allocation
11-21 13:36:31.493 I/GeckoScreenshot(10710): rect: 678.000000, 0.000000, 154.000000, 696.000000
11-21 13:36:35.595 I/Robocop (10710): 4 INFO TEST-UNEXPECTED-FAIL | testVkbOverlap | testVkbOverlap - Found 0 green pixels after tapping; expected 2428
I notice there is a delay of over 4000 ms before the failed assertion and I cannot think of why that would be (blockUntilClear(LESS_THAN_CURSOR_BLINK_PERIOD) should only wait for 450 ms after the drawFinished).
![]() |
||
Comment 2•12 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #1)
> I notice there is a delay of over 4000 ms before the failed assertion and I
> cannot think of why that would be
I see now -- that's normal and present in my "success" log above. countGreenPixels takes the majority of the time.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 6•12 years ago
|
||
Based on the log in comment 4, it looks like we might need to wait longer for the VKB to be displayed and the page to "settle" after simulating the tap. Here's my best guess as to what's happening:
> 12-05 15:56:40.226 I/Robocop (11861): 3 INFO TEST-PASS | testVkbOverlap | testInputVisible - Found 2428 green pixels after scrolling
Page with input field is done scrolling, and the input field is in view.
> 12-05 15:56:40.226 D/RobocopMotionEventHelper(11861): Triggering down at (5.0,690.0)
> 12-05 15:56:40.236 D/RobocopMotionEventHelper(11861): Triggering up at (5.0,690.0)
We simulate the tap on the input field
> 12-05 15:56:40.256 D/Robocop (11861): Received drawFinished notification
The page finishes some sort of draw, and our blockUntilClear(450) timer starts. However it looks like two screenshots also get queued up, maybe because of the focus highlight that appears/disappears on the text input field.
> 12-05 15:56:40.256 I/GeckoScreenshot(11861): rect: 678.000000, 0.000000, 154.000000, 696.000000
Screenshot the first. This takes a some time, preventing the page from processing all the input-related events and scrolling the input field into view.
> 12-05 15:56:40.756 I/dalvikvm-heap(11861): Grow heap (frag case) to 6.602MB for 2846736-byte allocation
In the meantime, our blockUntilClear timeout expires and we start grabbing the pixels from the page (this allocation is the pixel buffer).
> 12-05 15:56:40.839 I/GeckoScreenshot(11861): rect: 678.000000, 0.000000, 154.000000, 696.000000
Screenshot the second. We probably finish grabbing the pixels around here somewhere as well.
> 12-05 15:56:41.999 I/global (11861): Default buffer size used in BufferedOutputStream constructor. It would be better to be explicit if an 8k buffer is required.
> 12-05 15:56:45.849 I/Robocop (11861): 4 INFO TEST-UNEXPECTED-FAIL | testVkbOverlap | testVkbOverlap - Found 0 green pixels after tapping; expected 2428
We finish checking the pixels. The pixels were probably grabbed before the page scroll was done so there is no green.
Therefore I think inserting a delay between the tap and blockUntilClear might help with this as it will give the page time to settle into its new position and finish processing any focus change draws and/or screenshots that it's doing.
Assignee | ||
Comment 7•12 years ago
|
||
Compiled but untested.
Assignee: nobody → bugmail.mozilla
Attachment #689198 -
Flags: review?(gbrown)
![]() |
||
Comment 8•12 years ago
|
||
Comment on attachment 689198 [details] [diff] [review]
Patch
Review of attachment 689198 [details] [diff] [review]:
-----------------------------------------------------------------
nit: you could use mSolo.sleep, which doesn't throw the mostly-useless InterruptedException
Attachment #689198 -
Flags: review?(gbrown) → review+
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 10•12 years ago
|
||
Just to update the bug: we discussed an alternate solution to this racy behaviour during the mobile platform deep dive meeting today. The solution involves throwing things from the test thread onto the gecko main thread so that we can be sure they run after certain events in gecko. In this case we would queue it on some event such that the final check is guaranteed to run after the page scroll is done. This would allow us to eliminate the sleep and the test architecture would be more of a continuation-style. I plan to try this out but it will take some time so in the meantime I'll land this patch as is. This should help hopefully reduce robocop failures and also confirm my hypothesis from comment 6.
Assignee | ||
Comment 11•12 years ago
|
||
Whiteboard: [leave open]
Comment 12•12 years ago
|
||
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
![]() |
||
Comment 22•12 years ago
|
||
testVkbOverlap was failing consistently; that was fixed in bug 837318. It may have continued as an intermittent failure and went unnoticed until Panda rc was unhidden on 2013-03-05.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 82•12 years ago
|
||
kats, could you take a look at this top orange please? :-)
Flags: needinfo?(bugmail.mozilla)
Whiteboard: [leave open] → [disable-me 2013-04-01]
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 87•12 years ago
|
||
(In reply to Ed Morley [:edmorley UTC+0] from comment #82)
> kats, could you take a look at this top orange please? :-)
Looks like it only hits pandaboards, and it looks like it might be a legitimate failure in our code. Based on the log, after the input field is clicked on, the VKB does come up (as seen by the window resizing to 1280x236) but there are no green pixels on the screen. This could either happen because the page isn't scrolled correctly to move the input field into view, or it might be related to the "eglMakeCurrent:651 error 3009 (EGL_BAD_MATCH)" error output I see in the log. Either way it's something we should fix.
I pushed a try build to https://tbpl.mozilla.org/?tree=Try&rev=1b6c90ddaf1d with more logging to see if I can figure out what's going on.
Flags: needinfo?(bugmail.mozilla)
Comment 88•12 years ago
|
||
Thank you :-)
Assignee | ||
Comment 89•12 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #22)
> testVkbOverlap was failing consistently; that was fixed in bug 837318. It
> may have continued as an intermittent failure and went unnoticed until Panda
> rc was unhidden on 2013-03-05.
Although panda rc was unhidden on 2013-03-05, it was still running before that date. I wrote a script to scrape the inbound logs from ftp.mozilla.org and see if it was failing before that date (attached). The output shows that the first failure happened here:
FAIL 1362526821/ mozilla-inbound_panda_android_test-robocop-2-bm44-tests1-panda-build3.txt.gz
which is actually the same build reported in comment 15. Since then there have been 118 failures and 1844 passes, which is around a 6% failure rate. I'll retrigger the test on panda for some of the changesets prior to the one it first showed up on in an attempt to bisect the exact cset where this started happening. Note that we can do this even for csets where panda rc was hidden by using the build self-serve link and looking at the result log files in the ftp folder.
Assignee | ||
Comment 90•12 years ago
|
||
Assignee | ||
Comment 91•12 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #87)
> I pushed a try build to https://tbpl.mozilla.org/?tree=Try&rev=1b6c90ddaf1d
> with more logging to see if I can figure out what's going on.
That build didn't work, maybe because of the mozconfig changes I accidentally included. New try at https://tbpl.mozilla.org/?tree=Try&rev=e8b95e6e65e2
Assignee | ||
Comment 92•12 years ago
|
||
I found an earlier build on which I was able to trigger this failure via manual retriggers. Log at [1].
Because of the intermittentness it's tricky to definitively bisect the change that introduced this, so I'm going to hold off on that approach for now. For now I know that the change is at or before 6d8d36cce3e3.
The try push at [2] seems to indicate that the failures happen when we get two window-resize events instead of one. That is, for the success cases, when the input field is tapped, we get this:
Window-size changed to (1280,236)
but for the failure cases we get this:
Window-size changed to (1280,280)
Window-size changed to (1280,236)
In both cases I see the same GL error output so I don't think that is related. Will keep digging.
[1] https://tbpl.mozilla.org/php/getParsedLog.php?id=21061803&tree=Mozilla-Inbound
[2] https://tbpl.mozilla.org/?tree=Try&rev=e8b95e6e65e2
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 94•12 years ago
|
||
Yeah I think this is almost certainly related to the extra window-size event we get in some cases but not others. See [1] (success case) and [2] (failure case).
In the success case, the ZoomToRect message ends up animating to this viewport:
ImmutableViewportMetrics v=(0.0,18.0,1280.0,254.0) p=(0.0,0.0,1280.0,254.0) c=(0.0,0.0,1280.0,254.0) m=(0.0,0.0,0.0,0.0) z=1.0
but in the failure case, the first ZoomToRect animates to here:
ImmutableViewportMetrics v=(0.0,18.0,1280.0,298.0) p=(0.0,0.0,1280.0,298.0) c=(0.0,0.0,1280.0,298.0) m=(0.0,0.0,0.0,0.0) z=1.0
which seems fine given the window size but then the second ZoomToRect animates to here:
ImmutableViewportMetrics v=(65.51178,0.0,1345.5117,280.0) p=(0.0,0.0,1411.0236,280.0) c=(0.0,0.0,1280.0,254.0) m=(0.0,0.0,0.0,0.0) z=1.1023622
which is at y=0 and so doesn't keep the input field in view. This is certainly a bug in our code, but I don't know if it's a regression or not. It's certainly not nice of the panda boards to be sending different sets of events in some cases but I guess we should be handling it correctly regardless.
[1] https://tbpl.mozilla.org/php/getParsedLog.php?id=21108320&tree=Try&full=1
[2] https://tbpl.mozilla.org/php/getParsedLog.php?id=21108256&tree=Try&full=1
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 100•12 years ago
|
||
Based on the log (with even more debug output) at [1], I believe the fundamental problem in our code is that if the viewport size changes while we're in the middle of a bounce animation, the viewport size change gets clobbered. Our bounce animation always interpolates between mBounceStartMetrics and mBounceEndMetrics, which are calculated at the start of the bounce using the original viewport size.
In this case, we process the first viewport size change and our viewport is at v=(0.0,0.0,1280.0,280.0) when we get the first animatedZoomTo call. We do an animation to interpolate our way over to v=(0.0,18.0,1280.0,298.0). At some point during this animation the viewport size changes again, and the java code clobbers this size change because of the animation as described above. The JS code triggers a second animatedZoomTo, this time to v=(0.0,18.0,1280.0,254.0). This request arrives around 2/3 of the way through the first animation, at v=(0.0,12.667846,1280.0,292.66785).
This test is particular annoying in this case, because it so happens that the second viewport resize shrinks the viewport height from 280 down to 236, which means the page height shrinks from 298 down to 254 (since the page is 100% of screen height + height of input field). Since 254 is less than 280 (which is what the Java code still thinks the viewport height is), the java code then increases the zoom factor even though it shouldn't have needed to, and then that throws the rendering off.
I think the correct fix here is to somehow not drop the viewport size change that arrives in the middle of an animation, but I worry that may not fully fix the problems. Still, it's somewhere to start.
[1] https://tbpl.mozilla.org/php/getParsedLog.php?id=21210632&tree=Try&full=1#error0
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 102•12 years ago
|
||
I pushed a patch with my fix to https://tbpl.mozilla.org/?tree=Try&rev=f98f88263673 and it seems to be an improvement in that this "found 0 green pixels" error hasn't shown up yet. There's all sorts of other intermittents and restarts though. I'll monitor it some more and if it still looks good I'll clean up the patches and request review.
Assignee | ||
Comment 103•12 years ago
|
||
Landed bug 856039 which should fix this problem. Leaving open for now, but unassigning from myself since my work here is done. The test seems to be hidden now anyway (not sure if it was because of this bug or some other).
Assignee: bugmail.mozilla → nobody
Whiteboard: [disable-me 2013-04-01]
Reporter | ||
Comment 104•12 years ago
|
||
Agreed - the only orange I'm seeing since this landed is the Panda timeouts. Thanks Kats! Re-assigning it to you for posterity's sake since it was your work that fixed it.
Assignee: nobody → bugmail.mozilla
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 22
Updated•4 years ago
|
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•