Closed Bug 1607878 Opened 5 years ago Closed 5 years ago

Intermittent gv-junit runjunit.py | The previous test failed because of an error in the test harness | org.mozilla.geckoview.test.rule#TestHarnessException: org.mozilla.geckoview.test.util.UiThreadUtils$TimeoutException: Timed out after 30000ms

Categories

(GeckoView :: General, defect, P5)

Unspecified
All
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: intermittent-bug-filer, Assigned: bugzilla)

References

Details

(Keywords: intermittent-failure, Whiteboard: [geckoview:m80])

Filed by: aciure [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=284063816&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/ccSYunWJRUKXT7KJzFnwRg/runs/0/artifacts/public/logs/live_backing.log


[task 2020-01-08T18:55:53.345Z] 18:55:53 INFO - TEST-START | org.mozilla.geckoview.test.ExtensionActionTest.setBadgeBackgroundColor[#pageAction]
[task 2020-01-08T18:56:23.392Z] 18:56:23 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: numtests=622
[task 2020-01-08T18:56:23.392Z] 18:56:23 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: stream=
[task 2020-01-08T18:56:23.392Z] 18:56:23 INFO - org.mozilla.geckoview.test | Error in setBadgeBackgroundColor#pageAction:
[task 2020-01-08T18:56:23.393Z] 18:56:23 INFO - org.mozilla.geckoview.test | org.mozilla.geckoview.test.rule.TestHarnessException: org.mozilla.geckoview.test.util.UiThreadUtils$TimeoutException: Timed out after 30000ms
[task 2020-01-08T18:56:23.396Z] 18:56:23 INFO - org.mozilla.geckoview.test | at org.mozilla.geckoview.test.rule.GeckoSessionTestRule$2.initTest(GeckoSessionTestRule.java:1285)
[task 2020-01-08T18:56:23.396Z] 18:56:23 INFO - org.mozilla.geckoview.test | at org.mozilla.geckoview.test.rule.GeckoSessionTestRule$2.lambda$evaluate$0$GeckoSessionTestRule$2(GeckoSessionTestRule.java:1297)
[task 2020-01-08T18:56:23.396Z] 18:56:23 INFO - org.mozilla.geckoview.test | at org.mozilla.geckoview.test.rule.-$$Lambda$GeckoSessionTestRule$2$sIbRNaZJgAu-QrUVWSGD8JbPSWM.run(lambda)
[task 2020-01-08T18:56:23.396Z] 18:56:23 INFO - org.mozilla.geckoview.test | at android.app.Instrumentation$SyncRunnable.run(Instrumentation.java:1950)
[task 2020-01-08T18:56:23.396Z] 18:56:23 INFO - org.mozilla.geckoview.test | at android.os.Handler.handleCallback(Handler.java:751)
[task 2020-01-08T18:56:23.396Z] 18:56:23 INFO - org.mozilla.geckoview.test | at android.os.Handler.dispatchMessage(Handler.java:95)
[task 2020-01-08T18:56:23.396Z] 18:56:23 INFO - org.mozilla.geckoview.test | at android.os.Looper.loop(Looper.java:154)
[task 2020-01-08T18:56:23.396Z] 18:56:23 INFO - org.mozilla.geckoview.test | at android.app.ActivityThread.main(ActivityThread.java:6077)
[task 2020-01-08T18:56:23.396Z] 18:56:23 INFO - org.mozilla.geckoview.test | at java.lang.reflect.Method.invoke(Native Method)
[task 2020-01-08T18:56:23.396Z] 18:56:23 INFO - org.mozilla.geckoview.test | at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:866)
[task 2020-01-08T18:56:23.396Z] 18:56:23 INFO - org.mozilla.geckoview.test | at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:756)
[task 2020-01-08T18:56:23.396Z] 18:56:23 INFO - org.mozilla.geckoview.test | Caused by: org.mozilla.geckoview.test.util.UiThreadUtils$TimeoutException: Timed out after 30000ms
[task 2020-01-08T18:56:23.396Z] 18:56:23 INFO - org.mozilla.geckoview.test | at org.mozilla.geckoview.test.util.UiThreadUtils$TimeoutRunnable.run(UiThreadUtils.java:58)
[task 2020-01-08T18:56:23.396Z] 18:56:23 INFO - org.mozilla.geckoview.test | at android.os.Handler.handleCallback(Handler.java:751)
[task 2020-01-08T18:56:23.396Z] 18:56:23 INFO - org.mozilla.geckoview.test | at android.os.Handler.dispatchMessage(Handler.java:95)
[task 2020-01-08T18:56:23.396Z] 18:56:23 INFO - org.mozilla.geckoview.test | at org.mozilla.geckoview.test.util.UiThreadUtils.waitForCondition(UiThreadUtils.java:161)
[task 2020-01-08T18:56:23.396Z] 18:56:23 INFO - org.mozilla.geckoview.test | at org.mozilla.geckoview.test.rule.GeckoSessionTestRule.openSession(GeckoSessionTestRule.java:1142)
[task 2020-01-08T18:56:23.396Z] 18:56:23 INFO - org.mozilla.geckoview.test | at org.mozilla.geckoview.test.rule.GeckoSessionTestRule.prepareStatement(GeckoSessionTestRule.java:1092)
[task 2020-01-08T18:56:23.396Z] 18:56:23 INFO - org.mozilla.geckoview.test | at org.mozilla.geckoview.test.rule.GeckoSessionTestRule$2.initTest(GeckoSessionTestRule.java:1281)
[task 2020-01-08T18:56:23.396Z] 18:56:23 INFO - org.mozilla.geckoview.test | ... 10 more
[task 2020-01-08T18:56:23.396Z] 18:56:23 INFO - org.mozilla.geckoview.test |
[task 2020-01-08T18:56:23.396Z] 18:56:23 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: id=AndroidJUnitRunner
[task 2020-01-08T18:56:23.396Z] 18:56:23 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: test=setBadgeBackgroundColor[#pageAction]
[task 2020-01-08T18:56:23.396Z] 18:56:23 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: class=org.mozilla.geckoview.test.ExtensionActionTest
[task 2020-01-08T18:56:23.396Z] 18:56:23 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: stack=org.mozilla.geckoview.test.rule.TestHarnessException: org.mozilla.geckoview.test.util.UiThreadUtils$TimeoutException: Timed out after 30000ms
[task 2020-01-08T18:56:23.396Z] 18:56:23 INFO - SUITE-END | took 73s
[task 2020-01-08T18:56:23.396Z] 18:56:23 WARNING - TEST-UNEXPECTED-FAIL | runjunit.py | The previous test failed because of an error in the test harness | org.mozilla.geckoview.test.rule.TestHarnessException: org.mozilla.geckoview.test.util.UiThreadUtils$TimeoutException: Timed out after 30000ms
[task 2020-01-08T18:56:23.396Z] 18:56:23 INFO - Stopping web server
[task 2020-01-08T18:56:23.397Z] 18:56:23 INFO - Stopping web socket server
[task 2020-01-08T18:56:23.417Z] 18:56:23 INFO - Stopping ssltunnel
[task 2020-01-08T18:56:26.139Z] 18:56:26 ERROR - Return code: 255
[task 2020-01-08T18:56:26.139Z] 18:56:26 ERROR - No tests run or test summary not found
[task 2020-01-08T18:56:26.140Z] 18:56:26 INFO - TinderboxPrint: geckoview-junit<br/><em class="testfail">T-FAIL</em>
[task 2020-01-08T18:56:26.140Z] 18:56:26 INFO - ##### geckoview-junit log ends
[task 2020-01-08T18:56:26.140Z] 18:56:26 WARNING - # TBPL WARNING #
[task 2020-01-08T18:56:26.140Z] 18:56:26 WARNING - setting return code to 1
[task 2020-01-08T18:56:26.140Z] 18:56:26 WARNING - The geckoview-junit suite: geckoview-junit ran with return status: WARNING
[task 2020-01-08T18:56:26.140Z] 18:56:26 INFO - Running post-action listener: _package_coverage_data
[task 2020-01-08T18:56:26.140Z] 18:56:26 INFO - Running post-action listener: _resource_record_post_action
[task 2020-01-08T18:56:26.140Z] 18:56:26 INFO - Running post-action listener: process_java_coverage_data
[task 2020-01-08T18:56:26.140Z] 18:56:26 INFO - Running post-action listener: stop_device

Agi:, this seems to start after Bug 1581345 got landed. Can you please take a look at this?

Regressed by: 1581345

That's on purpose. These test failures happen before the test even run, so the test code is not at fault, but the intermittent was billed against the test anyway. This causes a lot of one-off intermittents that are not actually intermittent. Now we have a new intermittent category for it.

No longer regressed by: 1581345

(In reply to :Agi | ⏰ PST | he/him from comment #2)

That's on purpose. These test failures happen before the test even run, so the test code is not at fault, but the intermittent was billed against the test anyway. This causes a lot of one-off intermittents that are not actually intermittent. Now we have a new intermittent category for it.

Hmm, I don't think that's working correctly. The log here shows that we start org.mozilla.geckoview.test.ExtensionActionTest.setBadgeBackgroundColor[#pageAction], but never finish it. Clearly in this case the timeout should be charged to that test.

Yes that's expected, after junit thinks the test is starting we run a bunch of stuff (create a session, wait for initial about:blank, etc) all that stuff doesn't depend on the test.

You can see the code here: https://searchfox.org/mozilla-central/rev/c52d5f8025b5c9b2b4487159419ac9012762c40c/mobile/android/geckoview/src/androidTest/java/org/mozilla/geckoview/test/rule/GeckoSessionTestRule.java#1258-1287 anything in that method is going to execute as part of the test but it's not really the test's fault if we thrown an exception there.

Another hint that this is not the test's fault is that the exception stacktrace originates in initTest which runs before the test.

There are 61 failures in the last 7 days. All of them in "android-em-7-0-x86_64 , debug".

Flags: needinfo?(etoop)
Whiteboard: [stockwell needswork]
Flags: needinfo?(etoop)

Looks like the spike that started from the 28th is caused by Bug 1618311. Range with retriggers.
Last 7 days view: https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2020-04-24&endday=2020-05-01&tree=trunk&bug=1607878
Ed, could you please have a look over what's going on here?
Recent failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=300411286&repo=autoland

[task 2020-05-01T17:59:30.401Z] 17:59:30     INFO -  TEST-START | org.mozilla.geckoview.test.SessionLifecycleTest.collectOpen
[task 2020-05-01T18:00:00.349Z] 18:00:00     INFO -  org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: numtests=716
[task 2020-05-01T18:00:00.352Z] 18:00:00     INFO -  org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: stream=
[task 2020-05-01T18:00:00.352Z] 18:00:00     INFO -  org.mozilla.geckoview.test | Error in collectOpen(org.mozilla.geckoview.test.SessionLifecycleTest):
[task 2020-05-01T18:00:00.352Z] 18:00:00     INFO -  org.mozilla.geckoview.test | org.mozilla.geckoview.test.rule.TestHarnessException: org.mozilla.geckoview.test.util.UiThreadUtils$TimeoutException: Timed out after 30000ms
[task 2020-05-01T18:00:00.352Z] 18:00:00     INFO -  org.mozilla.geckoview.test | 	at org.mozilla.geckoview.test.rule.GeckoSessionTestRule$2.initTest(GeckoSessionTestRule.java:1271)
[task 2020-05-01T18:00:00.352Z] 18:00:00     INFO -  org.mozilla.geckoview.test | 	at org.mozilla.geckoview.test.rule.GeckoSessionTestRule$2.lambda$evaluate$0$GeckoSessionTestRule$2(GeckoSessionTestRule.java:1283)
[task 2020-05-01T18:00:00.352Z] 18:00:00     INFO -  org.mozilla.geckoview.test | 	at org.mozilla.geckoview.test.rule.-$$Lambda$GeckoSessionTestRule$2$sIbRNaZJgAu-QrUVWSGD8JbPSWM.run(lambda)
[task 2020-05-01T18:00:00.353Z] 18:00:00     INFO -  org.mozilla.geckoview.test | 	at android.app.Instrumentation$SyncRunnable.run(Instrumentation.java:1950)
[task 2020-05-01T18:00:00.353Z] 18:00:00     INFO -  org.mozilla.geckoview.test | 	at android.os.Handler.handleCallback(Handler.java:751)
[task 2020-05-01T18:00:00.353Z] 18:00:00     INFO -  org.mozilla.geckoview.test | 	at android.os.Handler.dispatchMessage(Handler.java:95)
[task 2020-05-01T18:00:00.353Z] 18:00:00     INFO -  org.mozilla.geckoview.test | 	at android.os.Looper.loop(Looper.java:154)
[task 2020-05-01T18:00:00.353Z] 18:00:00     INFO -  org.mozilla.geckoview.test | 	at android.app.ActivityThread.main(ActivityThread.java:6077)
[task 2020-05-01T18:00:00.353Z] 18:00:00     INFO -  org.mozilla.geckoview.test | 	at java.lang.reflect.Method.invoke(Native Method)
[task 2020-05-01T18:00:00.353Z] 18:00:00     INFO -  org.mozilla.geckoview.test | 	at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:866)
[task 2020-05-01T18:00:00.353Z] 18:00:00     INFO -  org.mozilla.geckoview.test | 	at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:756)
[task 2020-05-01T18:00:00.353Z] 18:00:00     INFO -  org.mozilla.geckoview.test | Caused by: org.mozilla.geckoview.test.util.UiThreadUtils$TimeoutException: Timed out after 30000ms
[task 2020-05-01T18:00:00.353Z] 18:00:00     INFO -  org.mozilla.geckoview.test | 	at org.mozilla.geckoview.test.util.UiThreadUtils$TimeoutRunnable.run(UiThreadUtils.java:52)
[task 2020-05-01T18:00:00.353Z] 18:00:00     INFO -  org.mozilla.geckoview.test | 	at android.os.Handler.handleCallback(Handler.java:751)
[task 2020-05-01T18:00:00.353Z] 18:00:00     INFO -  org.mozilla.geckoview.test | 	at android.os.Handler.dispatchMessage(Handler.java:95)
[task 2020-05-01T18:00:00.353Z] 18:00:00     INFO -  org.mozilla.geckoview.test | 	at org.mozilla.geckoview.test.util.UiThreadUtils.waitForCondition(UiThreadUtils.java:155)
[task 2020-05-01T18:00:00.353Z] 18:00:00     INFO -  org.mozilla.geckoview.test | 	at org.mozilla.geckoview.test.rule.GeckoSessionTestRule.openSession(GeckoSessionTestRule.java:1142)
[task 2020-05-01T18:00:00.353Z] 18:00:00     INFO -  org.mozilla.geckoview.test | 	at org.mozilla.geckoview.test.rule.GeckoSessionTestRule.prepareStatement(GeckoSessionTestRule.java:1090)
[task 2020-05-01T18:00:00.353Z] 18:00:00     INFO -  org.mozilla.geckoview.test | 	at org.mozilla.geckoview.test.rule.GeckoSessionTestRule$2.initTest(GeckoSessionTestRule.java:1267)
[task 2020-05-01T18:00:00.353Z] 18:00:00     INFO -  org.mozilla.geckoview.test | 	... 10 more
[task 2020-05-01T18:00:00.353Z] 18:00:00     INFO -  org.mozilla.geckoview.test |
[task 2020-05-01T18:00:00.353Z] 18:00:00     INFO -  org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: id=AndroidJUnitRunner
[task 2020-05-01T18:00:00.353Z] 18:00:00     INFO -  org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: test=collectOpen
[task 2020-05-01T18:00:00.353Z] 18:00:00     INFO -  org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: class=org.mozilla.geckoview.test.SessionLifecycleTest
[task 2020-05-01T18:00:00.353Z] 18:00:00     INFO -  org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: stack=org.mozilla.geckoview.test.rule.TestHarnessException: org.mozilla.geckoview.test.util.UiThreadUtils$TimeoutException: Timed out after 30000ms
[task 2020-05-01T18:00:00.353Z] 18:00:00     INFO -  SUITE-END | took 663s
[task 2020-05-01T18:00:00.353Z] 18:00:00  WARNING -  TEST-UNEXPECTED-FAIL | runjunit.py | The previous test failed because of an error in the test harness | org.mozilla.geckoview.test.rule.TestHarnessException: org.mozilla.geckoview.test.util.UiThreadUtils$TimeoutException: Timed out after 30000ms
[task 2020-05-01T18:00:00.353Z] 18:00:00     INFO -  Stopping web server
[task 2020-05-01T18:00:00.353Z] 18:00:00     INFO -  Stopping web socket server
[task 2020-05-01T18:00:00.373Z] 18:00:00     INFO -  Stopping ssltunnel
[task 2020-05-01T18:00:07.763Z] 18:00:07    ERROR - Return code: 255
[task 2020-05-01T18:00:07.763Z] 18:00:07    ERROR - No tests run or test summary not found
[task 2020-05-01T18:00:07.763Z] 18:00:07     INFO - TinderboxPrint: geckoview-junit<br/><em class="testfail">T-FAIL</em>
[task 2020-05-01T18:00:07.763Z] 18:00:07     INFO - ##### geckoview-junit log ends
[task 2020-05-01T18:00:07.763Z] 18:00:07  WARNING - # TBPL WARNING #
[task 2020-05-01T18:00:07.763Z] 18:00:07  WARNING - setting return code to 1
Flags: needinfo?(edilee)
Regressed by: 1618311

(In reply to Cosmin Sabou [:CosminS] from comment #16)

Looks like the spike that started from the 28th is caused by Bug 1618311. Range with retriggers.

Seems quite unlikely bug 1618311 regressed this. According to the range with retriggers, the previous commit is also failing the test frequently and one more previous is bug 1622060 that seems to have made gv-junit show up as gv-junit-e10s-multi and the test was also failing there too.

aklotz, would running the tests with dom.ipc.processCount=3 cause the intermittent timeouts to increase?

https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=junit%2Copt&tochange=5690715ef70efb9e6c835e0e0ae63371af1475f7&fromchange=92172b9cef6b88934c7d54beb7e3d311a69df127

Flags: needinfo?(edilee) → needinfo?(aklotz)
No longer regressed by: 1618311
See Also: → 1622060

Yes, it would; this is not a regression, but rather a reflection of the fact that we introduced a new test suite with a new configuration.

Flags: needinfo?(aklotz)

I'm looking at this failure: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=300412500&repo=autoland&lineNumber=11244 and it looks like our content process gets killed?

05-01 19:18:29.835  2806  2825 D GeckoViewProgress: observe: topic=oop-frameloader-crashed
05-01 19:18:29.835  2806  2825 D GeckoViewContent: observe: oop-frameloader-crashed
05-01 19:18:29.835  2806  2825 D GeckoViewProgress: observe: topic=oop-frameloader-crashed
05-01 19:18:29.835  2806  2825 D GeckoViewContent: observe: oop-frameloader-crashed
05-01 19:18:29.835  2806  2825 D GeckoViewProgress: observe: topic=oop-frameloader-crashed
05-01 19:18:29.836  2806  2825 D GeckoViewContent: observe: oop-frameloader-crashed
05-01 19:18:29.836  2806  2825 D GeckoViewProgress: observe: topic=oop-frameloader-crashed
05-01 19:18:29.836  2806  2825 D GeckoViewContent: observe: oop-frameloader-crashed

full log here: https://firefoxci.taskcluster-artifacts.net/B0WI9RMHSMS2FTu_FVPuCg/0/public/test_info//logcat-emulator-5554.log

I wonder if we can add some loginc in GeckoSessionTestRule to do something for this case.

Whiteboard: [stockwell unknown] → [stockwell unknown][geckoview:m78]

Aaron, Agi, is there anything that can be done here to reduce the failure rate?
It has 121 total failures in the last week: https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2020-05-02&endday=2020-05-09&tree=trunk&bug=1607878, 171 total failures since it was filed and it's on the disable-recommend bugs list.

Flags: needinfo?(aklotz)
Flags: needinfo?(agi)
Whiteboard: [geckoview:m78][stockwell disable-recommended] → [geckoview:m78][stockwell needswork:owner]

A couple of issues here:

  • This bug was filed for the gv-junit test suite. Now we've brought up the new gv-junit-e10s-multi test suite, where the same failures are occurring at a higher rate. I think this one should be kept for gv-junit and the gv-junit-e10s-multi failures should be tagged against a new, distinct bug from this one. I filed bug 1636720 for the latter.

  • gv-junit-e10-multi is a new test suite. I'm working to get that cleaned up, but I feel like we're being too hasty in trying to disable tests for a brand new suite. I don't see how I am supposed to be able to stabilize it in CI if I'm not allowed to run it in CI. Running gv-junit-e10s-multi as tier-2 is also not a very good option for us, as we'd like anything that permaoranges the suite the be immediately backed out.

Flags: needinfo?(aklotz)
Flags: needinfo?(agi)
Summary: Intermittent runjunit.py | The previous test failed because of an error in the test harness | org.mozilla.geckoview.test.rule.TestHarnessException: org.mozilla.geckoview.test.util.UiThreadUtils$TimeoutException: Timed out after 30000ms → Intermittent gv-junit runjunit.py | The previous test failed because of an error in the test harness | org.mozilla.geckoview.test.rule.TestHarnessException: org.mozilla.geckoview.test.util.UiThreadUtils$TimeoutException: Timed out after 30000ms

Removing this bug from the 78 sprint in favour of the e10s-multi variant.

Whiteboard: [geckoview:m78][stockwell needswork:owner] → [stockwell needswork:owner]

Failures here are actually bug 1636720.

See Also: → 1636720
Whiteboard: [stockwell disable-recommended]
Whiteboard: [stockwell disable-recommended]

No more failures since bug 1647470 landed. I think we can call this resolved!

Status: NEW → RESOLVED
Closed: 5 years ago
Depends on: 1647470
Resolution: --- → FIXED
Whiteboard: [geckoview:m80]
Assignee: nobody → aklotz

6c6f5da9-bf33-4082-a19d-7d99e8083d97 Mass rename intermittents due to Bug 1723034.

Summary: Intermittent gv-junit runjunit.py | The previous test failed because of an error in the test harness | org.mozilla.geckoview.test.rule.TestHarnessException: org.mozilla.geckoview.test.util.UiThreadUtils$TimeoutException: Timed out after 30000ms → Intermittent gv-junit runjunit.py | The previous test failed because of an error in the test harness | org.mozilla.geckoview.test.rule#TestHarnessException: org.mozilla.geckoview.test.util.UiThreadUtils$TimeoutException: Timed out after 30000ms
You need to log in before you can comment on or make changes to this bug.