Closed Bug 1510705 Opened 6 years ago Closed 5 years ago

Android 7.0 x86 geckoview-junit perma-fail: TEST-UNEXPECTED-TIMEOUT | runjunit.py | Timed out after 2400 seconds (after GeckoSessionTestRuleTest.noPendingCallbacks_withSpecificSession)

Categories

(GeckoView :: General, defect, P1)

x86
Android
defect

Tracking

(geckoview64 wontfix, geckoview65 wontfix, geckoview66 fixed, firefox64 wontfix, firefox65 wontfix, firefox66 fixed)

RESOLVED FIXED
mozilla66
Tracking Status
geckoview64 --- wontfix
geckoview65 --- wontfix
geckoview66 --- fixed
firefox64 --- wontfix
firefox65 --- wontfix
firefox66 --- fixed

People

(Reporter: gbrown, Assigned: mbrubeck)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

+++ This bug was initially created as a clone of Bug #1506276 +++

After bug 1506276, geckoview-junit skips AccessibilityTest.testScroll so that it no longer hangs; now later geckoview-junit tests can be seen to be hanging, on Android 7.0 x86 only.

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=214411667&repo=mozilla-inbound&lineNumber=3439

[task 2018-11-28T17:03:16.631Z] 17:03:16     INFO -  TEST-START | org.mozilla.geckoview.test.GeckoSessionTestRuleTest.noPendingCallbacks_withSpecificSession
[task 2018-11-28T17:42:54.854Z] 17:42:54  WARNING -  TEST-UNEXPECTED-TIMEOUT | runjunit.py | Timed out after 2400 seconds



Note that the geckoview-junit tests on Android 7.0 x86 are running as tier 3 tasks currently (because they keep failing): That means they are hidden by default and not sheriffed. Failures are not starred by sheriffs, so they don't show up in the intermittent-failures reports -- but they are happening on every push! 


You can view the tasks on mozilla-central with:

https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&tier=1%2C2%2C3&searchStr=android%2C7.%2C0%2Cgeckoview-junit


:kats provided some analysis in https://bugzilla.mozilla.org/show_bug.cgi?id=1506276#c6:

> The noPendingCallback* tests look like they try to "open a session" which involves running the
> code at [1] on the main thread, which appears to be a blocking I/O call, maybe? So that will
> have the same problem in that will prevent vsync events from getting though and likely jam up
> stuff. The RDP connection code should probably be moved off the UI thread.
>
> [1] https://searchfox.org/mozilla-central/rev/c0b26c40769a1e5607a1ae8be37fe64df64fc55e/mobile/android/geckoview/src/androidTest/java/org/mozilla/geckoview/test/rule/GeckoSessionTestRule.java#1234
Chris, can you find an owner for this?
Flags: needinfo?(cpeterson)
Priority: P3 → --
Blocks: 1503299
(In reply to Geoff Brown [:gbrown] from comment #1)
> Chris, can you find an owner for this?

Sure. Setting this bug's priority to P1 so GV triage will see it.
Flags: needinfo?(cpeterson)
OS: Unspecified → Android
Priority: -- → P1
Hardware: Unspecified → x86
Assignee: nobody → mbrubeck
The RDP code mentioned in comment 0 doesn't seem to be the problem.  It only runs in `@WithDevToolsAPI` tests, so it doesn't run at all in the hanging test.

The `noPendingCallbacks_withSpecificSession` does not hang when run on its own, only if it runs after certain other tests.

By bisecting the test file, I was able to find that the smallest sequence of tests that hangs is:
- createClosedSession
- noPendingCallbacks_withSpecificSession
Running the tests in the debugger, the hanging test is looping forever in this `while` loop, with `index` set to 0 and `mCallRecords` empty:

>            while (index >= mCallRecords.size()) {
>                UiThreadUtils.loopUntilIdle(mTimeoutMillis);
>            }

At this point `methodCalls` is also empty, so it's just waiting for *any* call.

https://searchfox.org/mozilla-central/rev/c0b26c40769a1e5607a1ae8be37fe64df64fc55e/mobile/android/geckoview/src/androidTest/java/org/mozilla/geckoview/test/rule/GeckoSessionTestRule.java#1640
(In reply to Matt Brubeck (:mbrubeck) from comment #3)
> The RDP code mentioned in comment 0 doesn't seem to be the problem.  It only
> runs in `@WithDevToolsAPI` tests, so it doesn't run at all in the hanging
> test.

I may have been wrong about this. Even without `@WithDevToolsAPI`, GeckoSessionTestRule still runs the RDP code when creating the cached "default" session.  The test no longer hangs after commenting out this code and some related code to prevent the RDP code from running:

https://searchfox.org/mozilla-central/rev/c0b26c40769a1e5607a1ae8be37fe64df64fc55e/mobile/android/geckoview/src/androidTest/java/org/mozilla/geckoview/test/rule/GeckoSessionTestRule.java#1191
Moving the RDP connection to a separate thread is not trivial, because it already uses threading internally and passes messages using the Looper/Handler of the thread that creates it.
Product: Firefox for Android → GeckoView

Thanks for working on this Matt! I'm eager to see x86 geckoview-junit running green since running hidden-by-default is "wasting" test resources + the x86 version runs 20+ times faster than the armv7 version.

Are you stuck? Getting back to this soon?

I tried skipping troublesome tests to get a green run:

https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=e729de2f73cd5e79116ea36a69ed79f23ab6ee4c

but found I needed over a dozen tests skipped...and still I get an occasional timeout.

The infinite loop (see comment 4) happens because loopUntilIdle keeps handling a constant stream of android.view.Choreographer$FrameHandler messages rather than timing out.

A solution would be to not reset the timeout runnable when this happens, so if it keeps happening for more than 1 second then it will be considered a timeout.

Above try push was incorrect. Cancelled and pushed this one instead:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=7b5d8a1966b1e199da4fcd320240384f3515bfc3

Pushed by mbrubeck@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9eb9e58dc4bc
Fix infinite loop in tests waiting for pending callbacks. r=snorp

(In reply to Matt Brubeck (:mbrubeck) from comment #9)

The infinite loop (see comment 4) happens because loopUntilIdle keeps handling a constant stream of android.view.Choreographer$FrameHandler messages rather than timing out.

Ah, ok. That is likely due to the change in bug 1432019. If the Compositor is paused (no surface), we shouldn't be getting those events. Most of our test sessions don't have a surface, so I think we may have a bug there. I'll file a followup.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66

Android 7.0 x86 geckoview-junit remains perma-fail (Timed out after 2400 seconds), typically hanging after an AccessibilityTest now. Is that a separate issue? New bug?

Flags: needinfo?(mbrubeck)

65=wontfix because we don't need to uplift this test fix.

(In reply to Geoff Brown [:gbrown] from comment #17)

Android 7.0 x86 geckoview-junit remains perma-fail (Timed out after 2400 seconds), typically hanging after an AccessibilityTest now. Is that a separate issue? New bug?

Let's file a separate bug, since each perma-failing test is going to require a separate fix.

Flags: needinfo?(mbrubeck)
Blocks: 1521195
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: