Open Bug 1787039 Opened 3 years ago Updated 7 days ago

Intermittent dom/media/webrtc/tests/mochitests/test_getUserMedia_scarySources.html | single tracking bug

Categories

(Core :: WebRTC: Audio/Video, defect, P3)

defect

Tracking

()

ASSIGNED

People

(Reporter: jmaher, Assigned: bwc, NeedInfo)

References

Details

(Keywords: intermittent-failure, intermittent-testcase, leave-open, Whiteboard: [stockwell disabled])

Attachments

(2 files)

No description provided.

Additional information about this bug failures and frequency patterns can be found by running: ./mach test-info failure-report --bug 1787039

I've dug into this some, we're seeing problems because the getUserMedia call is sometimes not giving us the browser window at all.

https://treeherder.mozilla.org/logviewer?job_id=437818274&repo=try&lineNumber=18381-18384

[task 2023-11-27T15:41:42.981Z] 15:41:42     INFO - Device: C:\Windows\system32\cmd.exe
[task 2023-11-27T15:41:42.981Z] 15:41:42     INFO - Device: Primary Monitor
[task 2023-11-27T15:41:42.982Z] 15:41:42     INFO - Buffered messages finished
[task 2023-11-27T15:41:42.983Z] 15:41:42     INFO - TEST-UNEXPECTED-FAIL | dom/media/webrtc/tests/mochitests/test_getUserMedia_scarySources.html | Found 0 of our own windows 

Any ideas?

Flags: needinfo?(jib)

Figured out the cause. getUserMedia is sometimes not giving us our own window because the OS thinks it it unresponsive:

https://treeherder.mozilla.org/jobs?repo=try&revision=00e8b3b4c734230eb1df29274c76900e623747d6

We're hitting this code, and bailing out:

https://searchfox.org/mozilla-central/source/third_party/libwebrtc/modules/desktop_capture/win/window_capture_utils.cc#76

I wonder why we're getting flagged as non-responsive. Maybe we're in the middle of a GC or something? I wonder what happens if we make an exactGC call at the beginnning of this test.

Assignee: nobody → docfaraday
Status: NEW → ASSIGNED

Patch looks really promising, but since this is a rare intermittent, might not completely do the trick. Marking leave-open for now.

https://treeherder.mozilla.org/jobs?repo=try&revision=6d44d627c14dee7290f3bd8d0b4e9d77e7af2671

https://treeherder.mozilla.org/jobs?repo=try&revision=e92be1c3f20a231ef119975314b59302155ef4ec

Keywords: leave-open

This is really weird. If I wait for an exactGC to finish, it makes the problem worse, but just calling exactGC and continuing with the test while it is running seems to work.

The core problem here is that this function is just not very reliable:

https://searchfox.org/mozilla-central/source/third_party/libwebrtc/modules/desktop_capture/win/window_capture_utils.cc#291-297

We could try to work around in the test-case, but I fear that this would just be randomly fiddling with it until the timing worked out. We could also try extending the timeout being used here, but that isn't ideal either. We could also try using IsHungAppWindow instead of this polling hack, but that has its own problems.

I still don't know why the UI event loop would take more than 50ms to notice that something was posted.

Extending that timeout slightly seems to work ok (lots of failures for other bugs to pick through, unfortunately):

https://treeherder.mozilla.org/jobs?repo=try&revision=724045988aaf2ff2857fc3434e4a51656465e25e

Hmm, still seeing failures. Maybe extending it more might work, but I get the impression that the SendMessageTimeout method is fundamentally unreliable.

There is an r+ patch which didn't land and no activity in this bug for 2 weeks.
:bwc, could you have a look please?
If you still have some work to do, you can add an action "Plan Changes" in Phabricator.
For more information, please visit BugBot documentation.

Flags: needinfo?(na-g)
Flags: needinfo?(docfaraday)
Flags: needinfo?(na-g)
Flags: needinfo?(docfaraday)

There have been 48 total failures in the last 7 days.
There are:

  • 28 failures on Windows 11 x86 22H2 WebRender debug
  • 20 failures on Windows 11 x64 22H2 WebRender debug

Recent failure log.
Hi Byron! As the assignee of this bug, could you please take a look ? Thanks!

Flags: needinfo?(docfaraday)
Whiteboard: [retriggered][stockwell disable-recommended] → [retriggered][stockwell disable-recommended][stockwell needswork:owner]

I know how to fix this, but it requires making a debatable modification to the third-party libwebrtc library. If we were to switch over to using IsHungAppWindow here, that would work:

https://searchfox.org/mozilla-central/source/third_party/libwebrtc/modules/desktop_capture/win/window_capture_utils.cc#298-304

However, IsHungAppWindow is not an officially supported API, despite existing in its current form since the WinXP days and being in widespread use.

Flags: needinfo?(docfaraday)

Byron, any updates on this one?
It still has 195 total failures in the last 30 days, all on windows debug 32&64 bits: https://treeherder.mozilla.org/intermittent-failures/bugdetails?startday=2024-11-29&endday=2024-12-29&tree=trunk&failurehash=all&bug=1787039
We should skip it until a fix is in place.

Flags: needinfo?(docfaraday)
Pushed by csabou@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9a88e7daa928 Temporarily disable test_getUserMedia_scarySources on Windows 11 debug for frequent failures. r=#intermittent-reviewers,ahal
Whiteboard: [retriggered][stockwell disable-recommended][stockwell needswork:owner] → [stockwell disabled]
Flags: needinfo?(docfaraday)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: