Closed Bug 1482231 Opened Last year Closed 6 months ago

WebExtensionTest.registerWebExtension | Intermittent TEST-UNEXPECTED-TIMEOUT | runjunit.py | Timed out after 2400 seconds

Categories

(GeckoView :: General, defect, P2)

defect

Tracking

(firefox65 unaffected, firefox66 unaffected, firefox67 fixed)

RESOLVED FIXED
mozilla67
Tracking Status
firefox65 --- unaffected
firefox66 --- unaffected
firefox67 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: gbrown)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure, Whiteboard: [geckoview:fenix:m2])

Attachments

(1 file)

Filed by: csabou [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=193108682&repo=autoland

https://queue.taskcluster.net/v1/task/BfWa2jl6TuuKBqPuHyjZcw/runs/0/artifacts/public/logs/live_backing.log

[task 2018-08-09T18:53:26.169Z] 18:53:26     INFO -  TEST-START | org.mozilla.geckoview.test.SessionLifecycleTest.open_repeated
[task 2018-08-09T18:56:46.782Z] 18:56:46  WARNING -  TEST-UNEXPECTED-TIMEOUT | runjunit.py | Timed out after 2400 seconds
[task 2018-08-09T18:56:46.783Z] 18:56:46     INFO -  Passed: 70
[task 2018-08-09T18:56:46.783Z] 18:56:46     INFO -  Failed: 0
[task 2018-08-09T18:56:46.786Z] 18:56:46     INFO -  Todo: 3
[task 2018-08-09T18:56:46.786Z] 18:56:46     INFO -  SUITE-END | took 2400s
[task 2018-08-09T18:56:49.517Z] 18:56:49     INFO -  Stopping web server
[task 2018-08-09T18:56:49.541Z] 18:56:49     INFO -  Stopping web socket server
[task 2018-08-09T18:56:49.561Z] 18:56:49     INFO -  Stopping ssltunnel
[task 2018-08-09T18:56:55.035Z] 18:56:55     INFO - Return code: 0
[task 2018-08-09T18:56:55.035Z] 18:56:55     INFO - TinderboxPrint: geckoview-junit<br/>70/0
[task 2018-08-09T18:56:55.035Z] 18:56:55     INFO - ##### geckoview-junit log ends
[task 2018-08-09T18:56:55.036Z] 18:56:55  WARNING - # TBPL WARNING #
[task 2018-08-09T18:56:55.037Z] 18:56:55  WARNING - setting return code to 1
[task 2018-08-09T18:56:55.037Z] 18:56:55  WARNING - The geckoview-junit suite: geckoview-junit ran with return status: WARNING
[task 2018-08-09T18:56:55.038Z] 18:56:55     INFO - Running post-action listener: _package_coverage_data
[task 2018-08-09T18:56:55.044Z] 18:56:55     INFO - Running post-action listener: _resource_record_post_action
[task 2018-08-09T18:56:55.044Z] 18:56:55     INFO - Running post-action listener: process_java_coverage_data
[task 2018-08-09T18:56:55.044Z] 18:56:55     INFO - Running post-action listener: stop_emulator
[task 2018-08-09T18:56:55.054Z] 18:56:55     INFO - Killing every process called emulator64-arm
[task 2018-08-09T18:56:55.055Z] 18:56:55     INFO - Killing pid 905.
[task 2018-08-09T18:56:55.055Z] 18:56:55     INFO - [mozharness: 2018-08-09 18:56:55.055396Z] Finished run-tests step (success)
[task 2018-08-09T18:56:55.056Z] 18:56:55     INFO - Running post-run listener: _resource_record_post_run
[task 2018-08-09T18:56:55.300Z] 18:56:55     INFO - Total resource usage - Wall time: 2907s; CPU: 27.0%; Read bytes: 0; Write bytes: 1086660608; Read time: 0; Write time: 90924
[task 2018-08-09T18:56:55.300Z] 18:56:55     INFO - TinderboxPrint: CPU usage<br/>26.8%
[task 2018-08-09T18:56:55.301Z] 18:56:55     INFO - TinderboxPrint: I/O read bytes / time<br/>0 / 0
[task 2018-08-09T18:56:55.301Z] 18:56:55     INFO - TinderboxPrint: I/O write bytes / time<br/>1,086,660,608 / 90,924
[task 2018-08-09T18:56:55.304Z] 18:56:55     INFO - TinderboxPrint: CPU idle<br/>8,364.8 (72.9%)
[task 2018-08-09T18:56:55.304Z] 18:56:55     INFO - TinderboxPrint: CPU user<br/>3,069.5 (26.7%)
[task 2018-08-09T18:56:55.305Z] 18:56:55     INFO - TinderboxPrint: Swap in / out<br/>0 / 0
[task 2018-08-09T18:56:55.314Z] 18:56:55     INFO - verify-emulator - Wall time: 388s; CPU: 34.0%; Read bytes: 0; Write bytes: 495779840; Read time: 0; Write time: 33328
[task 2018-08-09T18:56:55.319Z] 18:56:55     INFO - install - Wall time: 103s; CPU: 33.0%; Read bytes: 0; Write bytes: 237273088; Read time: 0; Write time: 24664
[task 2018-08-09T18:56:55.364Z] 18:56:55     INFO - run-tests - Wall time: 2416s; CPU: 25.0%; Read bytes: 0; Write bytes: 353595392; Read time: 0; Write time: 32932
[task 2018-08-09T18:56:56.749Z] 18:56:56     INFO - Running post-run listener: copy_logs_to_upload_dir
[task 2018-08-09T18:56:56.750Z] 18:56:56     INFO - Copying logs to upload dir...
[task 2018-08-09T18:56:56.750Z] 18:56:56     INFO - mkdir: /builds/worker/workspace/build/upload/logs
[task 2018-08-09T18:56:56.753Z] 18:56:56     INFO - Copying logs to upload dir...
[task 2018-08-09T18:56:56.755Z] 18:56:56  WARNING - returning nonzero exit status 1
Summary: Intermittent TEST-UNEXPECTED-TIMEOUT | runjunit.py | Timed out after 2400 seconds - test.SessionLifecycleTest.open_repeated → Intermittent TEST-UNEXPECTED-TIMEOUT | runjunit.py | Timed out after 2400 seconds
There are 14 occurrences in the last 7 days, seems the failure rate increased starting with the 16th of November and could possibly be from bug 1507328. 

Jim can you take a look?
Flags: needinfo?(jimnchen+bmo)
Tests look okay. Might be time to split 4.3 arm debug into more chunks
Flags: needinfo?(jimnchen+bmo) → needinfo?(gbrown)
Yes, either more chunks or a longer run time is required here.
Assignee: nobody → gbrown
Flags: needinfo?(gbrown)
I considered increasing the 2400 s (40 minute) timeout, but there's not much room for increase since there's also the 60 minute task timeout, which should probably not be increased. Also, with Android 7.0 junit timing out regularly now, any increase in timeout would increase our packet.net times and costs.

So, increase chunks:

https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&selectedJob=212912376&revision=f26e13ad27e2c7d1fd258349c29101232f552cb2
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b4617999a4f3
Increase test chunks for Android 4.3/debug geckoview-junit; r=me,a=test-only
https://hg.mozilla.org/mozilla-central/rev/b4617999a4f3
Status: NEW → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 65
64=wontfix because (I am assuming) we don't need to uplift this test fix. Andreea pointed out in comment 9 that bug 1507328 is the likely cause. That bug fix landed in 65, not 64.
This is failing again: https://treeherder.mozilla.org/logviewer.html#?job_id=214074381&repo=mozilla-central&lineNumber=2315

[task 2018-11-27T10:44:11.718Z] 10:44:11     INFO -  TEST-START | org.mozilla.geckoview.test.SessionLifecycleTest.open_repeated
[task 2018-11-27T10:47:35.869Z] 10:47:35  WARNING -  TEST-UNEXPECTED-TIMEOUT | runjunit.py | Timed out after 2400 seconds
[task 2018-11-27T10:47:35.873Z] 10:47:35     INFO -  Passed: 57
[task 2018-11-27T10:47:35.873Z] 10:47:35     INFO -  Failed: 0
[task 2018-11-27T10:47:35.874Z] 10:47:35     INFO -  Todo: 2
[task 2018-11-27T10:47:35.874Z] 10:47:35     INFO -  SUITE-END | took 2400s
[task 2018-11-27T10:47:38.512Z] 10:47:38     INFO -  No code coverage file (/sdcard/tests/junit-coverage.ec) found on remote device
[task 2018-11-27T10:47:38.514Z] 10:47:38     INFO -  Stopping web server
[task 2018-11-27T10:47:38.532Z] 10:47:38     INFO -  Stopping web socket server
[task 2018-11-27T10:47:38.552Z] 10:47:38     INFO -  Stopping ssltunnel
[task 2018-11-27T10:47:44.535Z] 10:47:44    ERROR - Return code: 255
[task 2018-11-27T10:47:44.538Z] 10:47:44     INFO - TinderboxPrint: geckoview-junit<br/>57/0
[task 2018-11-27T10:47:44.538Z] 10:47:44     INFO - ##### geckoview-junit log ends
[task 2018-11-27T10:47:44.538Z] 10:47:44  WARNING - # TBPL WARNING #
[task 2018-11-27T10:47:44.538Z] 10:47:44  WARNING - setting return code to 1
[task 2018-11-27T10:47:44.538Z] 10:47:44  WARNING - The geckoview-junit suite: geckoview-junit ran with return status: WARNING
[task 2018-11-27T10:47:44.538Z] 10:47:44     INFO - Running post-action listener: _package_coverage_data
[task 2018-11-27T10:47:44.539Z] 10:47:44     INFO - Running post-action listener: _resource_record_post_action
[task 2018-11-27T10:47:44.539Z] 10:47:44     INFO - Running post-action listener: process_java_coverage_data
[task 2018-11-27T10:47:44.540Z] 10:47:44     INFO - Running post-action listener: stop_device
[task 2018-11-27T10:47:44.540Z] 10:47:44     INFO - Killing logcat pid 861.
[task 2018-11-27T10:47:44.542Z] 10:47:44     INFO - Killing every process called emulator64-arm
[task 2018-11-27T10:47:44.558Z] 10:47:44     INFO - Killing pid 455.
[task 2018-11-27T10:47:44.560Z] 10:47:44     INFO - [mozharness: 2018-11-27 10:47:44.559292Z] Finished run-tests step (success)
[task 2018-11-27T10:47:44.560Z] 10:47:44     INFO - Running post-run listener: _resource_record_post_run
[task 2018-11-27T10:47:44.884Z] 10:47:44     INFO - Total resource usage - Wall time: 2577s; CPU: 25.0%; Read bytes: 6012928; Write bytes: 666050560; Read time: 252; Write time: 78232
[task 2018-11-27T10:47:44.884Z] 10:47:44     INFO - TinderboxPrint: CPU usage<br/>25.4%
[task 2018-11-27T10:47:44.885Z] 10:47:44     INFO - TinderboxPrint: I/O read bytes / time<br/>6,012,928 / 252
[task 2018-11-27T10:47:44.885Z] 10:47:44     INFO - TinderboxPrint: I/O write bytes / time<br/>666,050,560 / 78,232
[task 2018-11-27T10:47:44.886Z] 10:47:44     INFO - TinderboxPrint: CPU idle<br/>7,548.9 (74.2%)
[task 2018-11-27T10:47:44.887Z] 10:47:44     INFO - TinderboxPrint: CPU user<br/>2,583.8 (25.4%)
[task 2018-11-27T10:47:44.888Z] 10:47:44     INFO - TinderboxPrint: Swap in / out<br/>0 / 0
[task 2018-11-27T10:47:44.891Z] 10:47:44     INFO - verify-device - Wall time: 10s; CPU: 33.0%; Read bytes: 0; Write bytes: 3649536; Read time: 0; Write time: 4004
[task 2018-11-27T10:47:44.898Z] 10:47:44     INFO - install - Wall time: 148s; CPU: 26.0%; Read bytes: 0; Write bytes: 460042240; Read time: 0; Write time: 62556
[task 2018-11-27T10:47:44.947Z] 10:47:44     INFO - run-tests - Wall time: 2419s; CPU: 25.0%; Read bytes: 6012928; Write bytes: 202342400; Read time: 252; Write time: 11668
[task 2018-11-27T10:47:46.546Z] 10:47:46  WARNING - returning nonzero exit status 1
[task 2018-11-27T10:47:46.585Z] cleanup
[task 2018-11-27T10:47:46.585Z] + cleanup
[task 2018-11-27T10:47:46.585Z] + local rv=1
[task 2018-11-27T10:47:46.585Z] + [[ -s /builds/worker/.xsession-errors ]]
[task 2018-11-27T10:47:46.585Z] + cp /builds/worker/.xsession-errors /builds/worker/artifacts/public/xsession-errors.log
[task 2018-11-27T10:47:46.588Z] + true
[task 2018-11-27T10:47:46.588Z] + cleanup_xvfb
[task 2018-11-27T10:47:46.589Z] pidof Xvfb
[task 2018-11-27T10:47:46.589Z] ++ pidof Xvfb
[task 2018-11-27T10:47:46.595Z] + local xvfb_pid=28
[task 2018-11-27T10:47:46.595Z] + local vnc=false
[task 2018-11-27T10:47:46.595Z] + local interactive=false
[task 2018-11-27T10:47:46.595Z] + '[' -n 28 ']'
[task 2018-11-27T10:47:46.595Z] + [[ false == false ]]
[task 2018-11-27T10:47:46.595Z] + [[ false == false ]]
[task 2018-11-27T10:47:46.595Z] + kill 28
[task 2018-11-27T10:47:46.595Z] + screen -XS xvfb quit
[task 2018-11-27T10:47:46.596Z] XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
[task 2018-11-27T10:47:46.596Z]       after 1066 requests (1066 known processed) with 0 events remaining.
[task 2018-11-27T10:47:46.596Z] compizconfig - Info: Backend     : ini
[task 2018-11-27T10:47:46.596Z] compizconfig - Info: Integration : true
[task 2018-11-27T10:47:46.596Z] compizconfig - Info: Profile     : default
[task 2018-11-27T10:47:47.018Z] No screen session found.
[task 2018-11-27T10:47:47.019Z] + true
[task 2018-11-27T10:47:47.019Z] + exit 1
[fetches 2018-11-27T10:47:47.019Z] removing /builds/worker/fetches
[fetches 2018-11-27T10:47:47.021Z] finished
[taskcluster 2018-11-27 10:47:47.452Z] === Task Finished ===
[taskcluster 2018-11-27 10:47:54.146Z] Unsuccessful task run with exit code: 1 completed in 2774.376 seconds
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: Firefox 65 → ---
Just one post-fix failure here so far -- in a ccov build.
Assignee: gbrown → nobody
Product: Firefox for Android → GeckoView
See Also: → 1521195

Over the last 7 days there are 39 failures on this bug. These happen only on android-em-4-3-armv7-api16.

Here is the latest failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=231043244&repo=autoland&lineNumber=2869

Flags: needinfo?(cpeterson)

Recent failures are during WebExtensionTest and started with the introduction of these tests, in bug 1518841.

:agi - Please have a look!

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=231043244&repo=autoland&lineNumber=2869

[task 2019-02-28T17:45:56.701Z] 17:45:56 INFO - TEST-START | org.mozilla.geckoview.test.WebExtensionTest.registerWebExtension
[task 2019-02-28T17:46:11.834Z] 17:46:11 WARNING - TEST-UNEXPECTED-TIMEOUT | runjunit.py | Timed out after 2400 seconds

Blocks: 1518841
Flags: needinfo?(cpeterson) → needinfo?(agi)
Priority: P5 → P2
Summary: Intermittent TEST-UNEXPECTED-TIMEOUT | runjunit.py | Timed out after 2400 seconds → WebExtensionTest.registerWebExtension | Intermittent TEST-UNEXPECTED-TIMEOUT | runjunit.py | Timed out after 2400 seconds

I think there are many timeouts that are being conflated in this bug. From what I can see most are not actually from WebExtensionTest. But I'll take a look at that one.

Flags: needinfo?(agi)
Assignee: nobody → agi

[geckoview:fenix:m2] if these might be WebExtensionTest failures since WebExtensions (bug 1518841) landed in M2.

Whiteboard: [geckoview:fenix:m2]

Yeah even looking at the logs this seems to be a problem in earlier test taking a long time (maybe it's ok, we just need to up the timeout or split it in more tests, not sure how that works).

[task 2019-02-28T17:45:56.701Z] 17:45:56     INFO -  TEST-START | org.mozilla.geckoview.test.WebExtensionTest.registerWebExtension
[task 2019-02-28T17:46:11.834Z] 17:46:11  WARNING -  TEST-UNEXPECTED-TIMEOUT | runjunit.py | Timed out after 2400 seconds

in the example above this test is taking 16s before timing out because the whole suite has been running for 2400s, bad but probably ok if the device is busy.

E.g. see this test in the same run:

[task 2019-02-28T17:36:55.552Z] 17:36:55     INFO -  TEST-START | org.mozilla.geckoview.test.TextInputDelegateTest.inputConnection[#textarea]
...
org.mozilla.geckoview.test.TextInputDelegateTest.inputConnection[#textarea] | took 351101ms

that takes 351s to run.

Ok I looked at more of these. In all of them org.mozilla.geckoview.test.TextInputDelegateTest.inputConnection takes a really long time pushing the suite over the edge of 2400s timeout. Let's see if I can figure out if this regressed recently.

See Also: → 1531797

I hadn't noticed before: It looks like all the Android 4.3 geckoview-junit test chunks are running in about 40 minutes -- close to the timeout. :agi, if you don't find any unusual regression in run time, re-assign to me, and I'll increase the timeout or number of jobs.

Yeah I think we just added a few more tests that increased the run time just enough to timeout.

Although I think we have an actual problem that I captured in Bug 1531797. Not a recent regression, it was introduced back in November.

Assignee: agi → gbrown

Increase test chunks for Android 4.3 /opt and /debug and ccov geckoview-junit.
Most chunks run in 20 to 30 minutes with this change:
https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=a952f0dde8b508d8f87867757d98ca2642cbd48c

Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/290bdcdf2b63
Increase test chunks for geckoview-junit to avoid intermittent timeouts; r=jmaher
Status: REOPENED → RESOLVED
Closed: 9 months ago6 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67
See Also: → 1532435
You need to log in before you can comment on or make changes to this bug.