Intermittent TEST-UNEXPECTED-TIMEOUT | runjunit.py | Timed out after 2400 seconds

RESOLVED FIXED in Firefox 67

Status

defect
P5
normal
RESOLVED FIXED
3 months ago
2 months ago

People

(Reporter: intermittent-bug-filer, Assigned: gbrown)

Tracking

({intermittent-failure, regression})

unspecified
mozilla67
Unspecified
All
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr60 unaffected, firefox65 unaffected, firefox66 unaffected, firefox67 fixed)

Details

Attachments

(1 attachment)

Reporter

Description

3 months ago
treeherder

#[markdown(off)]
Filed by: apavel [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=231709799&repo=autoland

https://queue.taskcluster.net/v1/task/XzFNwaJITVqITV_3M6IexA/runs/0/artifacts/public/logs/live_backing.log

[task 2019-03-04T20:09:32.664Z] 20:09:32 INFO - TEST-START | org.mozilla.geckoview.test.TextInputDelegateTest.inputConnection[#input]
[task 2019-03-04T20:17:17.665Z] 20:17:17 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: id=AndroidJUnitRunner
[task 2019-03-04T20:17:17.665Z] 20:17:17 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: current=62
[task 2019-03-04T20:17:17.665Z] 20:17:17 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: class=org.mozilla.geckoview.test.TextInputDelegateTest
[task 2019-03-04T20:17:17.666Z] 20:17:17 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: stream=.
[task 2019-03-04T20:17:17.666Z] 20:17:17 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: numtests=66
[task 2019-03-04T20:17:17.666Z] 20:17:17 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: test=inputConnection[#input]
[task 2019-03-04T20:17:17.666Z] 20:17:17 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS_CODE: 0
[task 2019-03-04T20:17:17.666Z] 20:17:17 INFO - TEST-PASS | org.mozilla.geckoview.test.TextInputDelegateTest.inputConnection[#input] | took 465003ms
[task 2019-03-04T20:17:17.766Z] 20:17:17 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: id=AndroidJUnitRunner
[task 2019-03-04T20:17:17.767Z] 20:17:17 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: current=63
[task 2019-03-04T20:17:17.767Z] 20:17:17 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: class=org.mozilla.geckoview.test.TextInputDelegateTest
[task 2019-03-04T20:17:17.767Z] 20:17:17 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: stream=
[task 2019-03-04T20:17:17.767Z] 20:17:17 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: numtests=66
[task 2019-03-04T20:17:17.768Z] 20:17:17 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: test=inputConnection[#textarea]
[task 2019-03-04T20:17:17.769Z] 20:17:17 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS_CODE: 1
[task 2019-03-04T20:17:17.769Z] 20:17:17 INFO - TEST-START | org.mozilla.geckoview.test.TextInputDelegateTest.inputConnection[#textarea]
[task 2019-03-04T20:20:51.603Z] 20:20:51 WARNING - TEST-UNEXPECTED-TIMEOUT | runjunit.py | Timed out after 2400 seconds
[task 2019-03-04T20:20:51.603Z] 20:20:51 INFO - Passed: 56
[task 2019-03-04T20:20:51.604Z] 20:20:51 INFO - Failed: 0
[task 2019-03-04T20:20:51.604Z] 20:20:51 INFO - Todo: 6
[task 2019-03-04T20:20:51.604Z] 20:20:51 INFO - SUITE-END | took 2400s
[task 2019-03-04T20:20:53.024Z] 20:20:53 INFO - Stopping web server
[task 2019-03-04T20:20:53.032Z] 20:20:53 INFO - Stopping web socket server
[task 2019-03-04T20:20:53.051Z] 20:20:53 INFO - Stopping ssltunnel
[task 2019-03-04T20:20:56.876Z] 20:20:56 INFO - Return code: 0
[task 2019-03-04T20:20:56.877Z] 20:20:56 INFO - TinderboxPrint: geckoview-junit<br/>56/0
[task 2019-03-04T20:20:56.877Z] 20:20:56 INFO - ##### geckoview-junit log ends
[task 2019-03-04T20:20:56.877Z] 20:20:56 WARNING - # TBPL WARNING #
[task 2019-03-04T20:20:56.877Z] 20:20:56 WARNING - setting return code to 1
[task 2019-03-04T20:20:56.877Z] 20:20:56 WARNING - The geckoview-junit suite: geckoview-junit ran with return status: WARNING

The code change that landed in bug 1510276 was for the Windows installer. So, there is no way this code runs on Android or for that matter junit.

Flags: needinfo?(robert.strong.bugs)

This is from 1482231, Geoff can you please take a look?

There are 23 total failures in the last 7 days on android-em-4-3-armv7-api16 opt and debug.

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=232414336&repo=autoland&lineNumber=1912

[task 2019-03-07T13:36:02.181Z] 13:36:02 INFO - TEST-START | org.mozilla.geckoview.test.GeckoSessionTestRuleTest.waitForPageStops_withSpecificSessions
[task 2019-03-07T14:12:59.299Z] 14:12:59 WARNING - TEST-UNEXPECTED-TIMEOUT | runjunit.py | Timed out after 2400 seconds
[task 2019-03-07T14:12:59.299Z] 14:12:59 INFO - Passed: 33
[task 2019-03-07T14:12:59.300Z] 14:12:59 INFO - Failed: 0
[task 2019-03-07T14:12:59.300Z] 14:12:59 INFO - Todo: 1
[task 2019-03-07T14:12:59.300Z] 14:12:59 INFO - SUITE-END | took 2400s
[task 2019-03-07T14:17:59.382Z] 14:17:59 INFO - Traceback (most recent call last):
[task 2019-03-07T14:17:59.382Z] 14:17:59 INFO - File "/builds/worker/workspace/build/tests/mochitest/runjunit.py", line 444, in run_test_harness
[task 2019-03-07T14:17:59.384Z] 14:17:59 INFO - result = runner.run_tests(options.test_filters)
[task 2019-03-07T14:17:59.384Z] 14:17:59 INFO - File "/builds/worker/workspace/build/tests/mochitest/runjunit.py", line 275, in run_tests
[task 2019-03-07T14:17:59.384Z] 14:17:59 INFO - if self.check_for_crashes():
[task 2019-03-07T14:17:59.385Z] 14:17:59 INFO - File "/builds/worker/workspace/build/tests/mochitest/runjunit.py", line 291, in check_for_crashes
[task 2019-03-07T14:17:59.385Z] 14:17:59 INFO - logcat = self.device.get_logcat()
[task 2019-03-07T14:17:59.385Z] 14:17:59 INFO - File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozdevice/adb.py", line 1569, in get_logcat
[task 2019-03-07T14:17:59.385Z] 14:17:59 INFO - lines = self.command_output(cmds, timeout=timeout).splitlines()
[task 2019-03-07T14:17:59.385Z] 14:17:59 INFO - File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozdevice/adb.py", line 1094, in command_output
[task 2019-03-07T14:17:59.385Z] 14:17:59 INFO - timeout=timeout)
[task 2019-03-07T14:17:59.386Z] 14:17:59 INFO - File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozdevice/adb.py", line 306, in command_output
[task 2019-03-07T14:17:59.386Z] 14:17:59 CRITICAL - raise ADBTimeoutError("%s" % adb_process)
[task 2019-03-07T14:17:59.386Z] 14:17:59 CRITICAL - ADBTimeoutError: args: adb-5554 wait-for-device logcat -v time -d dalvikvm:I ConnectivityService:S WifiMonitor:S WifiStateTracker:S wpa_supplicant:S NetworkStateTracker:S EmulatedCamera_Camera:S EmulatedCamera_Device:S EmulatedCamera_FakeCamera:S EmulatedCamera_FakeDevice:S EmulatedCamera_CallbackNotifier:S GnssLocationProvider:S Hyphenator:S BatteryStats:S, exitcode: None, stdout:
[task 2019-03-07T14:17:59.386Z] 14:17:59 INFO - runjunit.py | Received unexpected exception while running tests
[task 2019-03-07T14:17:59.386Z] 14:17:59 INFO - Stopping web server
[task 2019-03-07T14:17:59.396Z] 14:17:59 INFO - Stopping web socket server
[task 2019-03-07T14:17:59.416Z] 14:17:59 INFO - Stopping ssltunnel
[task 2019-03-07T14:22:59.480Z] 14:22:59 INFO - Traceback (most recent call last):
[task 2019-03-07T14:22:59.480Z] 14:22:59 INFO - File "/builds/worker/workspace/build/tests/mochitest/runjunit.py", line 128, in cleanup
[task 2019-03-07T14:22:59.480Z] 14:22:59 INFO - self.device.stop_application(self.options.app)
[task 2019-03-07T14:22:59.480Z] 14:22:59 INFO - File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozdevice/adb.py", line 3163, in stop_application
[task 2019-03-07T14:22:59.480Z] 14:22:59 INFO - timeout=timeout, root=root)
[task 2019-03-07T14:22:59.480Z] 14:22:59 INFO - File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozdevice/adb.py", line 1471, in shell_output
[task 2019-03-07T14:22:59.480Z] 14:22:59 CRITICAL - raise ADBTimeoutError("%s" % adb_process)
[task 2019-03-07T14:22:59.481Z] 14:22:59 CRITICAL - ADBTimeoutError: args: adb-5554 wait-for-device shell am force-stop org.mozilla.geckoview.test; echo adb_returncode=$?, exitcode: None, stdout:
[task 2019-03-07T14:22:59.481Z] 14:22:59 INFO - Caught and ignored an exception during cleanup
[task 2019-03-07T14:22:59.497Z] 14:22:59 ERROR - Return code: 1
[task 2019-03-07T14:22:59.498Z] 14:22:59 INFO - TinderboxPrint: geckoview-junit<br/>33/0
[task 2019-03-07T14:22:59.498Z] 14:22:59 INFO - ##### geckoview-junit log ends
[task 2019-03-07T14:22:59.498Z] 14:22:59 CRITICAL - # TBPL RETRY #
[task 2019-03-07T14:22:59.498Z] 14:22:59 WARNING - setting return code to 4
[task 2019-03-07T14:22:59.498Z] 14:22:59 CRITICAL - The geckoview-junit suite: geckoview-junit ran with return status: RETRY
[task 2019-03-07T14:22:59.498Z] 14:22:59 INFO - Running post-action listener: _package_coverage_data
[task 2019-03-07T14:22:59.499Z] 14:22:59 INFO - Running post-action listener: _resource_record_post_action
[task 2019-03-07T14:22:59.499Z] 14:22:59 INFO - Running post-action listener: process_java_coverage_data
[task 2019-03-07T14:22:59.499Z] 14:22:59 INFO - Running post-action listener: stop_device
[task 2019-03-07T14:27:59.545Z] 14:27:59 INFO - failed to pull /data/anr: ADBTimeoutError args: /builds/worker/workspace/build/android-sdk-linux/platform-tools/adb-5554 wait-for-device shell test -d /data/anr; echo adb_returncode=$?, exitcode: None, stdout:
[taskcluster:error] Task timeout after 3600 seconds. Force killing container.

Flags: needinfo?(gbrown)
Assignee

Updated

3 months ago
Depends on: 1533445
Assignee

Comment 4

3 months ago

Bug 1531611 probably transformed some retries-on-adb-timeout into failures reported here. Bug 1533445 should change those back to retries going forward. In this case that's not ideal, because it looks like the adb timeout might be caused by something happening in the test. I'll try to collect better logs from the anticipated retries.

Assignee: nobody → gbrown
Flags: needinfo?(gbrown)
Assignee

Updated

3 months ago
See Also: → 1482231

I think the timeouts happen simply if the actual tests (i.e. roughly the bits between SUITE-START and SUITE-END in the live logs) take more than 2400 s, because I'm running into this with https://treeherder.mozilla.org/#/jobs?repo=try&revision=4c33414bc4cf8ee894e15c773337da2754309be3, where word-wrapping adds several minutes to a test that happened to display loads of binary data as text and therefore takes debug-geckoview-junit-e10s-5 firmly into permafailing territory.

So the question is whether that behaviour (direct timeout even if a chunk is happily running, but the JUnit tests themselves simply take longer than 40 minutes) is intentional?

Although at the same time maybe we should have more chunks for debug builds in any case anyway, though.

Comment hidden (Intermittent Failures Robot)
Assignee

Comment 7

2 months ago

(In reply to Jan Henning [:JanH] from comment #5)

So the question is whether that behaviour (direct timeout even if a chunk is happily running, but the JUnit tests themselves simply take longer than 40 minutes) is intentional?

Yes, it is intentional. If a test harness does not time out on its own and eventually exceeds the task max-run-time, the task is killed and no artifacts are collected, which makes it really difficult to debug. geckoview-junit max-run-time is 60 minutes and includes some setup before the test harness runs, so I think 40 minutes is appropriate (Perhaps could be increased 5 or 10 minutes).

Although at the same time maybe we should have more chunks for debug builds in any case anyway, though.

We added more chunks earlier this month, and chunks other than debug gv-5 seem to be doing okay, so I am reluctant to increase the chunks again.

Assignee

Comment 8

2 months ago

Android 4.3/debug geckoview-junit-5 contains some long-running tests, like

TEST-PASS | org.mozilla.geckoview.test.TextInputDelegateTest.inputConnection[#input] | took 386263ms
TEST-PASS | org.mozilla.geckoview.test.TextInputDelegateTest.inputConnection[#textarea] | took 386855ms

I wonder if something is going wrong there.

Comment hidden (Intermittent Failures Robot)
Assignee

Comment 10

2 months ago

Disabling inputConnection on debug saves about 15 minutes of time, which helps
us avoid the timeouts seen in this bug. It continues running on opt, where it
runs much faster.

Comment 11

2 months ago
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c2beaac71305
Skip junit inputConnection test on debug only, to save lots of time; r=snorp

Comment 12

2 months ago
bugherder
Status: NEW → RESOLVED
Last Resolved: 2 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67
Comment hidden (Intermittent Failures Robot)
You need to log in before you can comment on or make changes to this bug.