Android 7.0 x86 geckoview-junit perma-fail: TEST-UNEXPECTED-TIMEOUT | runjunit.py | Timed out after 2400 seconds (after GeckoSessionTestRuleTest.noPendingCallbacks_withSpecificSession)

RESOLVED FIXED in Firefox 66

Status

defect
P1
normal
RESOLVED FIXED
6 months ago
4 months ago

People

(Reporter: gbrown, Assigned: mbrubeck)

Tracking

(Blocks 2 bugs, {intermittent-failure})

unspecified
mozilla66
x86
Android
Dependency tree / graph

Firefox Tracking Flags

(geckoview64 wontfix, geckoview65 wontfix, geckoview66 fixed, firefox64 wontfix, firefox65 wontfix, firefox66 fixed)

Details

Attachments

(1 attachment)

Reporter

Description

6 months ago
+++ This bug was initially created as a clone of Bug #1506276 +++

After bug 1506276, geckoview-junit skips AccessibilityTest.testScroll so that it no longer hangs; now later geckoview-junit tests can be seen to be hanging, on Android 7.0 x86 only.

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=214411667&repo=mozilla-inbound&lineNumber=3439

[task 2018-11-28T17:03:16.631Z] 17:03:16     INFO -  TEST-START | org.mozilla.geckoview.test.GeckoSessionTestRuleTest.noPendingCallbacks_withSpecificSession
[task 2018-11-28T17:42:54.854Z] 17:42:54  WARNING -  TEST-UNEXPECTED-TIMEOUT | runjunit.py | Timed out after 2400 seconds



Note that the geckoview-junit tests on Android 7.0 x86 are running as tier 3 tasks currently (because they keep failing): That means they are hidden by default and not sheriffed. Failures are not starred by sheriffs, so they don't show up in the intermittent-failures reports -- but they are happening on every push! 


You can view the tasks on mozilla-central with:

https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&tier=1%2C2%2C3&searchStr=android%2C7.%2C0%2Cgeckoview-junit


:kats provided some analysis in https://bugzilla.mozilla.org/show_bug.cgi?id=1506276#c6:

> The noPendingCallback* tests look like they try to "open a session" which involves running the
> code at [1] on the main thread, which appears to be a blocking I/O call, maybe? So that will
> have the same problem in that will prevent vsync events from getting though and likely jam up
> stuff. The RDP connection code should probably be moved off the UI thread.
>
> [1] https://searchfox.org/mozilla-central/rev/c0b26c40769a1e5607a1ae8be37fe64df64fc55e/mobile/android/geckoview/src/androidTest/java/org/mozilla/geckoview/test/rule/GeckoSessionTestRule.java#1234
Reporter

Comment 1

6 months ago
Chris, can you find an owner for this?
Flags: needinfo?(cpeterson)
Priority: P3 → --
Reporter

Updated

6 months ago
Blocks: 1503299
(In reply to Geoff Brown [:gbrown] from comment #1)
> Chris, can you find an owner for this?

Sure. Setting this bug's priority to P1 so GV triage will see it.
Flags: needinfo?(cpeterson)
OS: Unspecified → Android
Priority: -- → P1
Hardware: Unspecified → x86
Assignee

Updated

5 months ago
Assignee: nobody → mbrubeck
Assignee

Comment 3

5 months ago
The RDP code mentioned in comment 0 doesn't seem to be the problem.  It only runs in `@WithDevToolsAPI` tests, so it doesn't run at all in the hanging test.

The `noPendingCallbacks_withSpecificSession` does not hang when run on its own, only if it runs after certain other tests.

By bisecting the test file, I was able to find that the smallest sequence of tests that hangs is:
- createClosedSession
- noPendingCallbacks_withSpecificSession
Assignee

Comment 4

5 months ago
Running the tests in the debugger, the hanging test is looping forever in this `while` loop, with `index` set to 0 and `mCallRecords` empty:

>            while (index >= mCallRecords.size()) {
>                UiThreadUtils.loopUntilIdle(mTimeoutMillis);
>            }

At this point `methodCalls` is also empty, so it's just waiting for *any* call.

https://searchfox.org/mozilla-central/rev/c0b26c40769a1e5607a1ae8be37fe64df64fc55e/mobile/android/geckoview/src/androidTest/java/org/mozilla/geckoview/test/rule/GeckoSessionTestRule.java#1640
Assignee

Comment 5

5 months ago
(In reply to Matt Brubeck (:mbrubeck) from comment #3)
> The RDP code mentioned in comment 0 doesn't seem to be the problem.  It only
> runs in `@WithDevToolsAPI` tests, so it doesn't run at all in the hanging
> test.

I may have been wrong about this. Even without `@WithDevToolsAPI`, GeckoSessionTestRule still runs the RDP code when creating the cached "default" session.  The test no longer hangs after commenting out this code and some related code to prevent the RDP code from running:

https://searchfox.org/mozilla-central/rev/c0b26c40769a1e5607a1ae8be37fe64df64fc55e/mobile/android/geckoview/src/androidTest/java/org/mozilla/geckoview/test/rule/GeckoSessionTestRule.java#1191
Assignee

Comment 6

5 months ago
Moving the RDP connection to a separate thread is not trivial, because it already uses threading internally and passes messages using the Looper/Handler of the thread that creates it.

Updated

5 months ago
Product: Firefox for Android → GeckoView
Reporter

Comment 7

5 months ago

Thanks for working on this Matt! I'm eager to see x86 geckoview-junit running green since running hidden-by-default is "wasting" test resources + the x86 version runs 20+ times faster than the armv7 version.

Are you stuck? Getting back to this soon?

Reporter

Comment 8

5 months ago

I tried skipping troublesome tests to get a green run:

https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=e729de2f73cd5e79116ea36a69ed79f23ab6ee4c

but found I needed over a dozen tests skipped...and still I get an occasional timeout.

Assignee

Comment 9

4 months ago

The infinite loop (see comment 4) happens because loopUntilIdle keeps handling a constant stream of android.view.Choreographer$FrameHandler messages rather than timing out.

A solution would be to not reset the timeout runnable when this happens, so if it keeps happening for more than 1 second then it will be considered a timeout.

Assignee

Comment 11

4 months ago

Above try push was incorrect. Cancelled and pushed this one instead:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=7b5d8a1966b1e199da4fcd320240384f3515bfc3

Comment 13

4 months ago
Pushed by mbrubeck@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9eb9e58dc4bc
Fix infinite loop in tests waiting for pending callbacks. r=snorp

(In reply to Matt Brubeck (:mbrubeck) from comment #9)

The infinite loop (see comment 4) happens because loopUntilIdle keeps handling a constant stream of android.view.Choreographer$FrameHandler messages rather than timing out.

Ah, ok. That is likely due to the change in bug 1432019. If the Compositor is paused (no surface), we shouldn't be getting those events. Most of our test sessions don't have a surface, so I think we may have a bug there. I'll file a followup.

Comment 16

4 months ago
bugherder
Status: NEW → RESOLVED
Last Resolved: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
Reporter

Comment 17

4 months ago

Android 7.0 x86 geckoview-junit remains perma-fail (Timed out after 2400 seconds), typically hanging after an AccessibilityTest now. Is that a separate issue? New bug?

Flags: needinfo?(mbrubeck)

65=wontfix because we don't need to uplift this test fix.

Assignee

Comment 19

4 months ago

(In reply to Geoff Brown [:gbrown] from comment #17)

Android 7.0 x86 geckoview-junit remains perma-fail (Timed out after 2400 seconds), typically hanging after an AccessibilityTest now. Is that a separate issue? New bug?

Let's file a separate bug, since each perma-failing test is going to require a separate fix.

Flags: needinfo?(mbrubeck)
Reporter

Updated

4 months ago
Blocks: 1521195
You need to log in before you can comment on or make changes to this bug.