Closed Bug 1809354 Opened 1 year ago Closed 1 year ago

Assertion failure: false (MOZ_ASSERT_UNREACHABLE: Couldn't release allocated device), at /builds/worker/checkouts/gecko/dom/media/webrtc/MediaEngineRemoteVideoSource.cpp:188

Categories

(Core :: WebRTC, defect, P2)

defect

Tracking

()

RESOLVED FIXED
112 Branch
Tracking Status
firefox-esr102 --- wontfix
firefox108 --- wontfix
firefox109 --- wontfix
firefox110 --- wontfix
firefox111 --- wontfix
firefox112 --- fixed

People

(Reporter: tsmith, Assigned: karlt)

References

(Blocks 2 open bugs, Regression)

Details

(Keywords: assertion, regression, testcase, Whiteboard: [bugmon:confirmed])

Attachments

(3 files, 1 obsolete file)

Attached file testcase.html (obsolete) —

Found while fuzzing m-c 20221110-963d3b67f524 (--enable-debug --enable-fuzzing)

To reproduce via Grizzly Replay:

$ pip install fuzzfetch grizzly-framework
$ python -m fuzzfetch -d --fuzzing -n firefox
$ python -m grizzly.replay ./firefox/firefox testcase.html

Assertion failure: false (MOZ_ASSERT_UNREACHABLE: Couldn't release allocated device), at /builds/worker/checkouts/gecko/dom/media/webrtc/MediaEngineRemoteVideoSource.cpp:188

#0 0x7fe0d5a25c0f in mozilla::MediaEngineRemoteVideoSource::Deallocate() /builds/worker/checkouts/gecko/dom/media/webrtc/MediaEngineRemoteVideoSource.cpp:188:5
#1 0x7fe0d559b0e4 in operator() /builds/worker/checkouts/gecko/dom/media/MediaManager.cpp:3906:15
#2 0x7fe0d559b0e4 in mozilla::media::LambdaTask<mozilla::DeviceListener::Stop()::$_37>::Run() /builds/worker/workspace/obj-build/dist/include/mozilla/media/MediaTaskUtils.h:36:5
#3 0x7fe0d184c475 in mozilla::TaskQueue::Runner::Run() /builds/worker/checkouts/gecko/xpcom/threads/TaskQueue.cpp:259:20
#4 0x7fe0d18681ab in nsThreadPool::Run() /builds/worker/checkouts/gecko/xpcom/threads/nsThreadPool.cpp:310:14
#5 0x7fe0d185f3d8 in nsThread::ProcessNextEvent(bool, bool*) /builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp:1191:16
#6 0x7fe0d186585d in NS_ProcessNextEvent(nsIThread*, bool) /builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp:476:10
#7 0x7fe0d24589da in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) /builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp:300:20
#8 0x7fe0d237b6d8 in MessageLoop::RunInternal() /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:381:10
#9 0x7fe0d237b5e1 in RunHandler /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:374:3
#10 0x7fe0d237b5e1 in MessageLoop::Run() /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:356:3
#11 0x7fe0d185a8d7 in nsThread::ThreadFunc(void*) /builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp:383:10
#12 0x7fe0e4608c86 in _pt_root /builds/worker/checkouts/gecko/nsprpub/pr/src/pthreads/ptthread.c:201:5
#13 0x7fe0e4eb1b42 in start_thread nptl/pthread_create.c:442:8
#14 0x7fe0e4f439ff  misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Flags: in-testsuite?

Unable to reproduce bug 1809354 using build mozilla-central 20221110163712-963d3b67f524. Without a baseline, bugmon is unable to analyze this bug.
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.

Keywords: bugmon

I may need more info here, but I've tried the following:

python -m fuzzfetch -d --fuzzing -n firefox
python3 -m fuzzfetch --build 2022-11-10 -d --fuzzing -n firefox
python3 -m fuzzfetch --build 963d3b67f524 -d --fuzzing -n firefox

I've run:

python3 -m grizzly.replay ./firefox/firefox testcase.html
python3 -m grizzly.replay --repeat 300 ./firefox/firefox testcase.html

This is running on Ubuntu22.04 and I've had no luck reproducing a failure. All my output looks similar to:

[2023-01-10 23:23:03] Starting Grizzly Replay
[2023-01-10 23:23:03] Ignoring: log-limit, timeout
[2023-01-10 23:23:03] Using time limit: 30s, timeout: 45s
[2023-01-10 23:23:03] Repeat: 1, Minimum crashes: 1, Relaunch 1
[2023-01-10 23:23:09] Running test (1/1)...
[2023-01-10 23:23:13] No results detected
[2023-01-10 23:23:13] Shutting down...
[2023-01-10 23:23:13] Done.

I'm going to mark this p3/s4 until we have more info on how to repro. Tyson, if you have additional info here, I'd be happy to try again.

Severity: -- → S4
Flags: needinfo?(twsmith)
Priority: -- → P3
Attached file testcase.html

I think that was my fault. That test case didn't work for me locally either. I must have mistyped the timeout. I've updated the test case with the correct timeout and it works for me very reliably now, sorry about that.

The grizzly command looks good to me. The only thing I'd try differently is use --relaunch 1 when using --repeat #, since this appears to be shutdown related.

I can also try to grab a Pernosco session.

Attachment #9311466 - Attachment is obsolete: true
Flags: needinfo?(twsmith)
Keywords: bugmon

I can now reproduce locally - thank you Tyson!
I've walked back through versions with the following results:

2022-03-17 good
2022-03-18 bad

In the morning I'll see if I can narrow down the commit range further.

Testcase crashes using the initial build (mozilla-central 20221110163712-963d3b67f524) but not with tip (mozilla-central 20230111043919-66c42cef8816.)

Unable to bisect testcase (Start build didn't crash!):

Start: 963d3b67f5243687fcb5bc122a22a536caa9a9fd (20221110163712)
End: 66c42cef8816a3264963ad03ee2dc74cead27fef (20230111043919)
BuildFlags: BuildFlags(asan=False, tsan=False, debug=True, fuzzing=True, coverage=False, valgrind=False, no_opt=False, fuzzilli=False, nyx=False)

Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.

Keywords: bugmon
Whiteboard: [bugmon:bisected,confirmed]

(In reply to Bugmon [:jkratzer for issues] from comment #5)

Testcase crashes using the initial build (mozilla-central 20221110163712-963d3b67f524) but not with tip (mozilla-central 20230111043919-66c42cef8816.)
This is just intermittent enough that 2023-01-11 (tip) passed, but it fails quite often, so this has not been bisected.

Whiteboard: [bugmon:bisected,confirmed] → [bugmon:confirmed]

After looking at the push log for from 2022-03-17 to 2022-03-18 I was able to trace this to Bug 1757505. I built the commit prior to Bug 1757505 landing, 9140145f3d9d, and ran with 50 repeats and no failures. I built 4eaaabf753cd for last of Bug 1757505 commits and get failures.

Karl, would you mind taking a look?

Severity: S4 → S3
Flags: needinfo?(karlt)
Priority: P3 → P2
Regressed by: 1757505

Set release status flags based on info from the regressing bug 1757505

Set release status flags based on info from the regressing bug 1757505

Michael, is there something we can do here for the releases in flight? Thanks

Flags: needinfo?(mfroman)

Not until Karl has time to look into the issue.

Flags: needinfo?(mfroman)

I can reproduce this with python -m fuzzfetch -d --fuzzing -n firefox, but not with my own local --enable-fuzzing --enable-debug build.

Assignee: nobody → karlt
Status: NEW → ASSIGNED
Flags: needinfo?(karlt)
Attached file log_stderr.txt

This is log_stderr.txt from the -l option to grizzly.replay and MOZ_LOG=CamerasChild:4,CamerasParent:4 and MOZ_IPC_MESSAGE_LOG=PCamerasChild,PCamerasParent in the environment.
Salient lines below:

CamerasChild sends ReleaseCapture and CamerasParent receives it.

[Child 642518: MediaSupervisor #1]: D/CamerasChild int mozilla::camera::CamerasChild::ReleaseCapture(mozilla::camera::PCamerasChild::CaptureEngine, const int)
[time: 1675999320045626][642518->642272] [PCamerasChild] Sending  PCameras::Msg_ReleaseCapture
[time: 1675999320045721][642272<-642518] [PCamerasParent] Received  PCameras::Msg_ReleaseCapture
[Parent 642272: IPDL Background]: D/CamerasParent virtual mozilla::ipc::IPCResult mozilla::camera::CamerasParent::RecvReleaseCapture(const mozilla::camera::PCamerasParent::CaptureEngine &, const int &)
[Parent 642272: IPDL Background]: D/CamerasParent RecvReleaseCamera device nr 2

PCamerasParent::Send__delete__() is called. This is only triggered after profileBeforeChange. CamerasParent can't send or receive after this point.
The CamerasChild receives the associated ActorDestroy. This notifies the "MediaSupervisor #1" thread waiting for the reply from the parent, which then returns and error.
"Dispatch for IPC" did not actually fail, but no response was received because the CamerasParent is no longer an IPC Actor.
The assertion fails because it expects a response from ReleaseCapture.

[time: 1675999320740894][642272->642518] [PCamerasParent] Sending  PCameras::Msg___delete__
[Parent 642272: IPDL Background]: D/CamerasParent virtual void mozilla::camera::CamerasParent::ActorDestroy(mozilla::ipc::IProtocol::ActorDestroyReason)
[time: 1675999320741058][642518<-642272] [PCamerasChild] Received  PCameras::Msg___delete__
[Child 642518: Cameras IPC]: D/CamerasChild ActorDestroy
[Child 642518: MediaSupervisor #1]: D/CamerasChild Cameras dispatch for IPC failed in ReleaseCapture
Assertion failure: false (MOZ_ASSERT_UNREACHABLE: Couldn't release allocated device), at /builds/worker/checkouts/gecko/dom/media/webrtc/MediaEngineRemoteVideoSource.cpp:188

The CamerasParent bails out of responding to ReleaseCapture. "child not alive" just means that ActorDestroy() was called on the CamerasParent.

[Parent 642272: VideoCapture]: D/CamerasParent bool mozilla::camera::CamerasParent::SetupEngine(mozilla::camera::PCamerasParent::CaptureEngine)
[Parent 642272: VideoCapture]: D/CamerasParent void mozilla::camera::CamerasParent::CloseEngines()
[Parent 642272: IPDL Background]: D/CamerasParent RecvReleaseCapture: child not alive

This assertion failure would not have happened without the Send__delete__() call added in https://hg.mozilla.org/integration/autoland/rev/0dfbf71254ac.
However, the CamerasParent is shutting down and releasing its resources, so it is OK that the CameraChild does not get a response from the ReleaseCapture message.

The assertion is debug-only, but there is a similar MOZ_DIAGNOSTIC_ASSERT for the StopCapture message sent immediately before ReleaseCapture.

(In reply to Karl Tomlinson (:karlt) from comment #13)

This assertion failure would not have happened without the Send__delete__() call added in https://hg.mozilla.org/integration/autoland/rev/0dfbf71254ac.

Perhaps it could have happened due to the video capture thread shutting down.

The Send__delete__() made sure of it. Thank you for narrowing down the regression range, Michael.

Blocks: 1679799

Before https://hg.mozilla.org/integration/autoland/rev/32b3a1dd3940#l1.23, DispatchToParent() would return an old value ofmReplySuccessifCamerasChild` had been disconnected from IPC. That old value would typically be true from a previous call, and so the assertion would have usually thought the call had succeeded when no reply had been received.

Pushed by ktomlinson@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/16687ff29b32
don't intentionally crash when CamerasParent does not reply to resource deallocation messages r=pehrsons
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 112 Branch

The patch landed in nightly and beta is affected.
:karlt, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox111 to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(karlt)

Doesn't affect release builds.

Flags: needinfo?(karlt)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: