Intermittent TEST-UNEXPECTED-TIMEOUT | browser/base/content/test/webrtc/browser_devices_get_user_media_screen.js | application timed out after 370 seconds with no output

RESOLVED INCOMPLETE

Status

()

defect
P3
normal
Rank:
25
RESOLVED INCOMPLETE
9 months ago
4 months ago

People

(Reporter: intermittent-bug-filer, Assigned: pehrsons)

Tracking

({intermittent-failure})

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [stockwell disabled])

Attachments

(1 attachment)

Filed by: dvarga [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=209238396&repo=mozilla-inbound

https://queue.taskcluster.net/v1/task/KLiiqhp-Qy6bah4hoPnPVw/runs/0/artifacts/public/logs/live_backing.log

[task 2018-11-01T19:02:41.958Z] 19:02:41     INFO - TEST-PASS | browser/base/content/test/webrtc/browser_devices_get_user_media_screen.js | expected notification recording-window-ended - 
[task 2018-11-01T19:02:41.958Z] 19:02:41     INFO - Switching between menu options maintains correct main action state while window sharing
[task 2018-11-01T19:02:41.959Z] 19:02:41     INFO - Buffered messages logged at 18:56:31
[task 2018-11-01T19:02:41.959Z] 19:02:41     INFO - requesting devices
[task 2018-11-01T19:02:41.959Z] 19:02:41     INFO - Buffered messages finished
[task 2018-11-01T19:02:41.959Z] 19:02:41    ERROR - TEST-UNEXPECTED-TIMEOUT | browser/base/content/test/webrtc/browser_devices_get_user_media_screen.js | application timed out after 370 seconds with no output
[task 2018-11-01T19:02:41.959Z] 19:02:41    ERROR - Force-terminating active process(es).
[task 2018-11-01T19:02:41.959Z] 19:02:41     INFO - Determining child pids from psutil...
[task 2018-11-01T19:02:41.959Z] 19:02:41     INFO - [3082, 3109, 3131, 3017, 3034]
[task 2018-11-01T19:02:41.959Z] 19:02:41     INFO - ==> process 2930 launched child process 2949
[task 2018-11-01T19:02:41.959Z] 19:02:41     INFO - ==> process 2930 launched child process 3017
[task 2018-11-01T19:02:41.959Z] 19:02:41     INFO - ==> process 2930 launched child process 3034
[task 2018-11-01T19:02:41.960Z] 19:02:41     INFO - ==> process 2930 launched child process 3082
[task 2018-11-01T19:02:41.960Z] 19:02:41     INFO - ==> process 2930 launched child process 3109
[task 2018-11-01T19:02:41.960Z] 19:02:41     INFO - ==> process 2930 launched child process 3131
[task 2018-11-01T19:02:41.960Z] 19:02:41     INFO - Found child pids: set([3109, 3017, 3082, 3034, 3131, 2949])
[task 2018-11-01T19:02:41.960Z] 19:02:41     INFO - Failed to get child procs
[task 2018-11-01T19:02:41.960Z] 19:02:41     INFO - Killing process: 3109
[task 2018-11-01T19:02:41.960Z] 19:02:41     INFO - TEST-INFO | started process screentopng
[task 2018-11-01T19:19:21.976Z] 19:19:21     INFO - Automation Error: mozprocess timed out after 1000 seconds running ['/builds/worker/workspace/build/venv/bin/python', '-u', '/builds/worker/workspace/build/tests/mochitest/runtests.py', '--total-chunks', '16', '--this-chunk', '5', '--appname=/builds/worker/workspace/build/application/firefox/firefox', '--utility-path=tests/bin', '--extra-profile-file=tests/bin/plugins', '--symbols-path=https://queue.taskcluster.net/v1/task/PV2UxCdTQdOMa4qXhY2fAg/artifacts/public/build/target.crashreporter-symbols.zip', '--certificate-path=tests/certs', '--setpref=webgl.force-enabled=true', '--quiet', '--log-raw=/builds/worker/workspace/build/blobber_upload_dir/browser-chrome-chunked_raw.log', '--log-errorsummary=/builds/worker/workspace/build/blobber_upload_dir/browser-chrome-chunked_errorsummary.log', '--use-test-media-devices', '--screenshot-on-fail', '--cleanup-crashes', '--marionette-startup-timeout=180', '--sandbox-read-whitelist=/builds/worker/workspace/build', '--log-raw=-', '--flavor=browser', '--chunk-by-runtime']
[task 2018-11-01T19:19:21.986Z] 19:19:21    ERROR - timed out after 1000 seconds of no output
[task 2018-11-01T19:19:21.987Z] 19:19:21    ERROR - Return code: -15
[task 2018-11-01T19:19:21.989Z] 19:19:21    ERROR - No suite end message was emitted by this harness.
[task 2018-11-01T19:19:21.989Z] 19:19:21     INFO - TinderboxPrint: mochitest-browser-chrome-chunked<br/>894/0/1
[task 2018-11-01T19:19:21.990Z] 19:19:21    ERROR - # TBPL FAILURE #
[task 2018-11-01T19:19:21.991Z] 19:19:21  WARNING - setting return code to 2
[task 2018-11-01T19:19:21.991Z] 19:19:21    ERROR - The mochitest suite: browser-chrome-chunked ran with return status: FAILURE
There are 37 failures in the last 7 days.
linux64 - 32 failures
linux32 -  5 failures

I did some retriggers and the fail started from https://hg.mozilla.org/integration/mozilla-inbound/rev/f391ab62002bc2a28968696ee35eaf3a30ef372f

:anba , could you please take a look?
Flags: needinfo?(andrebargull)
(In reply to Narcis Beleuzu [:NarcisB] from comment #8)
> I did some retriggers and the fail started from
> https://hg.mozilla.org/integration/mozilla-inbound/rev/
> f391ab62002bc2a28968696ee35eaf3a30ef372f

Hmm, that changeset cannot cause any issues anywhere in a Firefox build, because the change in "build/autoconf/icu.m4" only affects external embedders who use the "--with-system-icu" configuration option. This option is never enabled for any in-tree configuration. And the change "js/src/tests/jstests.list" only enabled a test in a different test suite, so it also can't have any influence over this WebRTC test.
Flags: needinfo?(andrebargull) → needinfo?(nbeleuzu)
(In reply to André Bargull [:anba] from comment #9)
> (In reply to Narcis Beleuzu [:NarcisB] from comment #8)
> > I did some retriggers and the fail started from
> > https://hg.mozilla.org/integration/mozilla-inbound/rev/
> > f391ab62002bc2a28968696ee35eaf3a30ef372f
> 
> Hmm, that changeset cannot cause any issues anywhere in a Firefox build,
> because the change in "build/autoconf/icu.m4" only affects external
> embedders who use the "--with-system-icu" configuration option. This option
> is never enabled for any in-tree configuration. And the change
> "js/src/tests/jstests.list" only enabled a test in a different test suite,
> so it also can't have any influence over this WebRTC test.

I don`t know what caused it. I`m just saying that after the retriggers, this fail started from that push.
Retriggers link: 
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&resultStatus=success%2Ctestfailed%2Cbusted%2Cexception&tochange=f391ab62002bc2a28968696ee35eaf3a30ef372f&fromchange=2e4367fd26b8c176d2329e45c178a681821e9f8e&searchStr=linux%2Cx64%2Casan%2Cmochitests%2Cwith%2Ce10s%2Ctest-linux64-asan%2Fopt-mochitest-browser-chrome-e10s-5%2Cm-e10s%28bc5%29&selectedJob=209238396
Flags: needinfo?(nbeleuzu)
There are 34 total failures in the last 7 days and 117 total failures in the last 30 days on linux64 and linux32 all build types. 
Disabled the test on linux. Joel, should I have been more specific?
Assignee: nobody → apavel
Attachment #9033849 - Flags: review?(jmaher)
Attachment #9033849 - Flags: review?(jmaher) → review+
Whiteboard: [stockwell disabled]
Pushed by rmaries@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/59468cc2550d
disabled test on linux for frequent failures r=jmaher
Keywords: checkin-needed
Interesting that this started just a day before bug 1376873 landed the first time. Otherwise it would have been a prime suspect.

What's more annoying is that I was backed out for increasing the rate of a failure on this same test just before the holidays and when now trying to reproduce and fix the issue I'm not able to get any failures at all, because the test had been disabled for other reasons in the meantime.
jib, dminor, fyi this seems to me like a hang on gUM. Not the good old cubeb hang because this is only requesting video of type "window" [1]. Could be a deadlock somewhere, but I'm not sure offhand how to effectively debug as even though we're killing the parent we're not getting any stacks from it.


[1] https://searchfox.org/mozilla-central/rev/3e7afaa5b57b3f8ed100cd1f13bb0a93b9b2cb99/browser/base/content/test/webrtc/browser_devices_get_user_media_screen.js#537
This could possibly have the same cause as bug 1421724 - namely that the global webrtc indicator has stolen focus and thus the doorhanger doesn't appear. I'll try to re-enable this after it lands.
Maybe not. There are a bunch of these issues that have left behind minidumps, all with this stack:

> [task 2018-12-18T15:36:13.784Z] 15:36:13     INFO - Crash reason:  SIGSEGV /SEGV_MAPERR
> [task 2018-12-18T15:36:13.785Z] 15:36:13     INFO - Crash address: 0x10
> [task 2018-12-18T15:36:13.785Z] 15:36:13     INFO - Process uptime: not available
> [task 2018-12-18T15:36:13.786Z] 15:36:13     INFO - 
> [task 2018-12-18T15:36:13.786Z] 15:36:13     INFO - Thread 61 (crashed)
> [task 2018-12-18T15:36:13.787Z] 15:36:13     INFO -  0  libxul.so!webrtc::DesktopDeviceInfoX11::InitializeApplicationList() [shared_x_display.h:fdf40f8943dfa2ee8de5fe3e5ba50baff2c45955 : 54 + 0x0]
> [task 2018-12-18T15:36:13.788Z] 15:36:13     INFO -     rax = 0x00007f893d2765e8   rdx = 0x00007f8962c00020
> [task 2018-12-18T15:36:13.789Z] 15:36:13     INFO -     rcx = 0x0000000000000003   rbx = 0x00007f892ee78ba0
> [task 2018-12-18T15:36:13.789Z] 15:36:13     INFO -     rsi = 0x0000000000000000   rdi = 0x0000000000000000
> [task 2018-12-18T15:36:13.790Z] 15:36:13     INFO -     rbp = 0x00007f893d278650   rsp = 0x00007f893d276540
> [task 2018-12-18T15:36:13.791Z] 15:36:13     INFO -      r8 = 0x0000000000000000    r9 = 0x00000000ffffffff
> [task 2018-12-18T15:36:13.791Z] 15:36:13     INFO -     r10 = 0x00007f8942a016b8   r11 = 0x00007f8942a01670
> [task 2018-12-18T15:36:13.792Z] 15:36:13     INFO -     r12 = 0x00007f892ee777a0   r13 = 0x00000000ffffffff
> [task 2018-12-18T15:36:13.793Z] 15:36:13     INFO -     r14 = 0x0000000000000000   r15 = 0x00007f89574e023e
> [task 2018-12-18T15:36:13.794Z] 15:36:13     INFO -     rip = 0x00007f8954fab67b
> [task 2018-12-18T15:36:13.794Z] 15:36:13     INFO -     Found by: given as instruction pointer in context
> [task 2018-12-18T15:36:13.795Z] 15:36:13     INFO -  1  libxul.so!webrtc::DesktopDeviceInfoImpl::Init() [desktop_device_info.cc:fdf40f8943dfa2ee8de5fe3e5ba50baff2c45955 : 248 + 0x9]
> [task 2018-12-18T15:36:13.795Z] 15:36:13     INFO -     rbx = 0x00007f892ee78ba0   rbp = 0x00007f893d278670
> [task 2018-12-18T15:36:13.796Z] 15:36:13     INFO -     rsp = 0x00007f893d278660   r12 = 0x00007f892ee777a0
> [task 2018-12-18T15:36:13.797Z] 15:36:13     INFO -     r13 = 0x00000000ffffffff   r14 = 0x000000000000000b
> [task 2018-12-18T15:36:13.797Z] 15:36:13     INFO -     r15 = 0x00007f89574e023e   rip = 0x00007f8954fa8520
> [task 2018-12-18T15:36:13.798Z] 15:36:13     INFO -     Found by: call frame info
> [task 2018-12-18T15:36:13.799Z] 15:36:13     INFO -  2  libxul.so!webrtc::DesktopDeviceInfoImpl::Create() [desktop_device_info_x11.cc:fdf40f8943dfa2ee8de5fe3e5ba50baff2c45955 : 17 + 0x8]
> [task 2018-12-18T15:36:13.800Z] 15:36:13     INFO -     rbx = 0x00007f892ee78ba0   rbp = 0x00007f893d278690
> [task 2018-12-18T15:36:13.801Z] 15:36:13     INFO -     rsp = 0x00007f893d278680   r12 = 0x00007f892ee777a0
> [task 2018-12-18T15:36:13.801Z] 15:36:13     INFO -     r13 = 0x00000000ffffffff   r14 = 0x000000000000000b
> [task 2018-12-18T15:36:13.802Z] 15:36:13     INFO -     r15 = 0x00007f89574e023e   rip = 0x00007f8954fab532
> [task 2018-12-18T15:36:13.803Z] 15:36:13     INFO -     Found by: call frame info
> [task 2018-12-18T15:36:13.804Z] 15:36:13     INFO -  3  libxul.so!webrtc::DesktopCaptureImpl::CreateDeviceInfo(int, webrtc::CaptureDeviceType) [tuple:fdf40f8943dfa2ee8de5fe3e5ba50baff2c45955 : 0 + 0x5]
> [task 2018-12-18T15:36:13.805Z] 15:36:13     INFO -     rbx = 0x00007f892edcea10   rbp = 0x00007f893d2786b0
> [task 2018-12-18T15:36:13.805Z] 15:36:13     INFO -     rsp = 0x00007f893d2786a0   r12 = 0x00007f892ee777a0
> [task 2018-12-18T15:36:13.806Z] 15:36:13     INFO -     r13 = 0x00000000ffffffff   r14 = 0x000000000000000b
> [task 2018-12-18T15:36:13.807Z] 15:36:13     INFO -     r15 = 0x00007f89574e023e   rip = 0x00007f8953e3be03
> [task 2018-12-18T15:36:13.808Z] 15:36:13     INFO -     Found by: call frame info
> [task 2018-12-18T15:36:13.808Z] 15:36:13     INFO -  4  libxul.so!mozilla::camera::VideoEngine::GetOrCreateVideoCaptureDeviceInfo() [VideoEngine.cpp:fdf40f8943dfa2ee8de5fe3e5ba50baff2c45955 : 197 + 0x5]
> [task 2018-12-18T15:36:13.809Z] 15:36:13     INFO -     rbx = 0x00007f892ee777b8   rbp = 0x00007f893d2786e0
> [task 2018-12-18T15:36:13.810Z] 15:36:13     INFO -     rsp = 0x00007f893d2786c0   r12 = 0x00007f892ee777a0
> [task 2018-12-18T15:36:13.811Z] 15:36:13     INFO -     r13 = 0x00000000ffffffff   r14 = 0x00007f893d2786f8
> [task 2018-12-18T15:36:13.811Z] 15:36:13     INFO -     r15 = 0x00007f89574e023e   rip = 0x00007f8953e376d8
> [task 2018-12-18T15:36:13.812Z] 15:36:13     INFO -     Found by: call frame info
> [task 2018-12-18T15:36:13.813Z] 15:36:13     INFO -  5  libxul.so!mozilla::camera::CamerasParent::RecvNumberOfCaptureDevices(mozilla::camera::CaptureEngine const&)::$_3::operator()() const [CamerasParent.cpp:fdf40f8943dfa2ee8de5fe3e5ba50baff2c45955 : 461 + 0x8]
> [task 2018-12-18T15:36:13.814Z] 15:36:13     INFO -     rbx = 0x0000000000000000   rbp = 0x00007f893d278720
> [task 2018-12-18T15:36:13.815Z] 15:36:13     INFO -     rsp = 0x00007f893d2786f0   r12 = 0x00007f892edee280
> [task 2018-12-18T15:36:13.815Z] 15:36:13     INFO -     r13 = 0x00000000ffffffff   r14 = 0x00007f893a08fea0
> [task 2018-12-18T15:36:13.816Z] 15:36:13     INFO -     r15 = 0x0000000000000001   rip = 0x00007f8953e3f028
> [task 2018-12-18T15:36:13.817Z] 15:36:13     INFO -     Found by: call frame info
> [task 2018-12-18T15:36:13.818Z] 15:36:13     INFO -  6  libxul.so!mozilla::media::LambdaRunnable<mozilla::camera::CamerasParent::RecvNumberOfCaptureDevices(mozilla::camera::CaptureEngine const&)::$_3>::Run() [MediaUtils.h:fdf40f8943dfa2ee8de5fe3e5ba50baff2c45955 : 73 + 0x9]
> [task 2018-12-18T15:36:13.819Z] 15:36:13     INFO -     rbx = 0x0000000000000000   rbp = 0x00007f893d278730
> [task 2018-12-18T15:36:13.819Z] 15:36:13     INFO -     rsp = 0x00007f893d278730   r12 = 0x00007f892edee280
> [task 2018-12-18T15:36:13.820Z] 15:36:13     INFO -     r13 = 0x00000000ffffffff   r14 = 0x0000000000000001
> [task 2018-12-18T15:36:13.821Z] 15:36:13     INFO -     r15 = 0x0000000000000001   rip = 0x00007f8953e3efdd
> [task 2018-12-18T15:36:13.822Z] 15:36:13     INFO -     Found by: call frame info
> [task 2018-12-18T15:36:13.823Z] 15:36:13     INFO -  7  libxul.so!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp:fdf40f8943dfa2ee8de5fe3e5ba50baff2c45955 : 1157 + 0x6]
> [task 2018-12-18T15:36:13.823Z] 15:36:13     INFO -     rbx = 0x0000000000000000   rbp = 0x00007f893d278c60
> [task 2018-12-18T15:36:13.824Z] 15:36:13     INFO -     rsp = 0x00007f893d278740   r12 = 0x00007f892edee280
> [task 2018-12-18T15:36:13.825Z] 15:36:13     INFO -     r13 = 0x00000000ffffffff   r14 = 0x0000000000000001
> [task 2018-12-18T15:36:13.826Z] 15:36:13     INFO -     r15 = 0x0000000000000001   rip = 0x00007f8955ad3428
> [task 2018-12-18T15:36:13.826Z] 15:36:13     INFO -     Found by: call frame info
I agree, it looks like we should check for null there and then bail early.

Seems strange that we haven't hit this problem before. I haven't found much documentation on why XOpenDisplay would fail... are we maybe leaking connections to the X server, or just trying to open too many concurrently?
Flags: needinfo?(dminor)
There's some other failure mode too, but I'm not getting a crash for this. It doesn't seem farfetched that these are related. Is the X server hosed somehow?

https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=219983803&revision=cbd104ab729347144a2e688b27b5068df7e35a72
These failed sessions also exhibit a screenshot of 0 bytes, and their logs contain this:

> [task 2019-01-07T03:46:19.784Z] 03:46:19     INFO - Killing process: 2339
> [task 2019-01-07T03:46:19.784Z] 03:46:19     INFO - TEST-INFO | started process screentopng
> [task 2019-01-07T04:02:59.818Z] 04:02:59     INFO - Automation Error: mozprocess timed out after 1000 seconds running ['/builds/worker/workspace/build/venv/bin/python', '-u', '/builds/worker/workspace/build/tests/mochitest/runtests.py', '--total-chunks', '16', '--this-chunk', '4', '--appname=/builds/worker/workspace/build/application/firefox/firefox', '--utility-path=tests/bin', '--extra-profile-file=tests/bin/plugins', '--symbols-path=/builds/worker/workspace/build/symbols', '--certificate-path=tests/certs', '--setpref=webgl.force-enabled=true', '--quiet', '--log-raw=/builds/worker/workspace/build/blobber_upload_dir/browser-chrome-chunked_raw.log', '--log-errorsummary=/builds/worker/workspace/build/blobber_upload_dir/browser-chrome-chunked_errorsummary.log', '--use-test-media-devices', '--screenshot-on-fail', '--cleanup-crashes', '--marionette-startup-timeout=180', '--sandbox-read-whitelist=/builds/worker/workspace/build', '--log-raw=-', '--flavor=browser', '--chunk-by-runtime']
> [task 2019-01-07T04:02:59.830Z] 04:02:59    ERROR - timed out after 1000 seconds of no output

and

> [task 2019-01-07T04:02:46.185Z] XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
> [task 2019-01-07T04:02:46.185Z]       after 7259 requests (7046 known processed) with 0 events remaining.


Perhaps the latter is a clue to someone.
Status: ASSIGNED → RESOLVED
Closed: 4 months ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.