Android 7.0 x86 geckoview-junit perma-fail: TEST-UNEXPECTED-TIMEOUT | runjunit.py | Timed out after 2400 seconds (after GeckoSessionTestRuleTest.noPendingCallbacks_withSpecificSession)
Categories
(GeckoView :: General, defect, P1)
Tracking
(geckoview64 wontfix, geckoview65 wontfix, geckoview66 fixed, firefox64 wontfix, firefox65 wontfix, firefox66 fixed)
People
(Reporter: gbrown, Assigned: mbrubeck)
References
(Blocks 1 open bug)
Details
(Keywords: intermittent-failure)
Attachments
(1 file)
+++ This bug was initially created as a clone of Bug #1506276 +++ After bug 1506276, geckoview-junit skips AccessibilityTest.testScroll so that it no longer hangs; now later geckoview-junit tests can be seen to be hanging, on Android 7.0 x86 only. https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=214411667&repo=mozilla-inbound&lineNumber=3439 [task 2018-11-28T17:03:16.631Z] 17:03:16 INFO - TEST-START | org.mozilla.geckoview.test.GeckoSessionTestRuleTest.noPendingCallbacks_withSpecificSession [task 2018-11-28T17:42:54.854Z] 17:42:54 WARNING - TEST-UNEXPECTED-TIMEOUT | runjunit.py | Timed out after 2400 seconds Note that the geckoview-junit tests on Android 7.0 x86 are running as tier 3 tasks currently (because they keep failing): That means they are hidden by default and not sheriffed. Failures are not starred by sheriffs, so they don't show up in the intermittent-failures reports -- but they are happening on every push! You can view the tasks on mozilla-central with: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&tier=1%2C2%2C3&searchStr=android%2C7.%2C0%2Cgeckoview-junit :kats provided some analysis in https://bugzilla.mozilla.org/show_bug.cgi?id=1506276#c6: > The noPendingCallback* tests look like they try to "open a session" which involves running the > code at [1] on the main thread, which appears to be a blocking I/O call, maybe? So that will > have the same problem in that will prevent vsync events from getting though and likely jam up > stuff. The RDP connection code should probably be moved off the UI thread. > > [1] https://searchfox.org/mozilla-central/rev/c0b26c40769a1e5607a1ae8be37fe64df64fc55e/mobile/android/geckoview/src/androidTest/java/org/mozilla/geckoview/test/rule/GeckoSessionTestRule.java#1234
Reporter | ||
Comment 1•6 years ago
|
||
Chris, can you find an owner for this?
Comment 2•6 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #1) > Chris, can you find an owner for this? Sure. Setting this bug's priority to P1 so GV triage will see it.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 3•5 years ago
|
||
The RDP code mentioned in comment 0 doesn't seem to be the problem. It only runs in `@WithDevToolsAPI` tests, so it doesn't run at all in the hanging test. The `noPendingCallbacks_withSpecificSession` does not hang when run on its own, only if it runs after certain other tests. By bisecting the test file, I was able to find that the smallest sequence of tests that hangs is: - createClosedSession - noPendingCallbacks_withSpecificSession
Assignee | ||
Comment 4•5 years ago
|
||
Running the tests in the debugger, the hanging test is looping forever in this `while` loop, with `index` set to 0 and `mCallRecords` empty: > while (index >= mCallRecords.size()) { > UiThreadUtils.loopUntilIdle(mTimeoutMillis); > } At this point `methodCalls` is also empty, so it's just waiting for *any* call. https://searchfox.org/mozilla-central/rev/c0b26c40769a1e5607a1ae8be37fe64df64fc55e/mobile/android/geckoview/src/androidTest/java/org/mozilla/geckoview/test/rule/GeckoSessionTestRule.java#1640
Assignee | ||
Comment 5•5 years ago
|
||
(In reply to Matt Brubeck (:mbrubeck) from comment #3) > The RDP code mentioned in comment 0 doesn't seem to be the problem. It only > runs in `@WithDevToolsAPI` tests, so it doesn't run at all in the hanging > test. I may have been wrong about this. Even without `@WithDevToolsAPI`, GeckoSessionTestRule still runs the RDP code when creating the cached "default" session. The test no longer hangs after commenting out this code and some related code to prevent the RDP code from running: https://searchfox.org/mozilla-central/rev/c0b26c40769a1e5607a1ae8be37fe64df64fc55e/mobile/android/geckoview/src/androidTest/java/org/mozilla/geckoview/test/rule/GeckoSessionTestRule.java#1191
Assignee | ||
Comment 6•5 years ago
|
||
Moving the RDP connection to a separate thread is not trivial, because it already uses threading internally and passes messages using the Looper/Handler of the thread that creates it.
Updated•5 years ago
|
Reporter | ||
Comment 7•5 years ago
|
||
Thanks for working on this Matt! I'm eager to see x86 geckoview-junit running green since running hidden-by-default is "wasting" test resources + the x86 version runs 20+ times faster than the armv7 version.
Are you stuck? Getting back to this soon?
Reporter | ||
Comment 8•5 years ago
|
||
I tried skipping troublesome tests to get a green run:
but found I needed over a dozen tests skipped...and still I get an occasional timeout.
Assignee | ||
Comment 9•5 years ago
|
||
The infinite loop (see comment 4) happens because loopUntilIdle
keeps handling a constant stream of android.view.Choreographer$FrameHandler
messages rather than timing out.
A solution would be to not reset the timeout runnable when this happens, so if it keeps happening for more than 1 second then it will be considered a timeout.
Assignee | ||
Comment 10•5 years ago
|
||
Assignee | ||
Comment 11•5 years ago
|
||
Above try push was incorrect. Cancelled and pushed this one instead:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7b5d8a1966b1e199da4fcd320240384f3515bfc3
Assignee | ||
Comment 12•5 years ago
|
||
Comment 13•5 years ago
|
||
Pushed by mbrubeck@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9eb9e58dc4bc Fix infinite loop in tests waiting for pending callbacks. r=snorp
(In reply to Matt Brubeck (:mbrubeck) from comment #9)
The infinite loop (see comment 4) happens because
loopUntilIdle
keeps handling a constant stream ofandroid.view.Choreographer$FrameHandler
messages rather than timing out.
Ah, ok. That is likely due to the change in bug 1432019. If the Compositor is paused (no surface), we shouldn't be getting those events. Most of our test sessions don't have a surface, so I think we may have a bug there. I'll file a followup.
Comment 16•5 years ago
|
||
bugherder |
Reporter | ||
Comment 17•5 years ago
|
||
Android 7.0 x86 geckoview-junit remains perma-fail (Timed out after 2400 seconds), typically hanging after an AccessibilityTest now. Is that a separate issue? New bug?
Comment 18•5 years ago
|
||
65=wontfix because we don't need to uplift this test fix.
Assignee | ||
Comment 19•5 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #17)
Android 7.0 x86 geckoview-junit remains perma-fail (Timed out after 2400 seconds), typically hanging after an AccessibilityTest now. Is that a separate issue? New bug?
Let's file a separate bug, since each perma-failing test is going to require a separate fix.
Description
•