Assertion failure: false (MOZ_ASSERT_UNREACHABLE: Couldn't release allocated device), at /builds/worker/checkouts/gecko/dom/media/webrtc/MediaEngineRemoteVideoSource.cpp:188
Categories
(Core :: WebRTC, defect, P2)
Tracking
()
People
(Reporter: tsmith, Assigned: karlt)
References
(Blocks 2 open bugs, Regression)
Details
(Keywords: assertion, regression, testcase, Whiteboard: [bugmon:confirmed])
Attachments
(3 files, 1 obsolete file)
Found while fuzzing m-c 20221110-963d3b67f524 (--enable-debug --enable-fuzzing)
To reproduce via Grizzly Replay:
$ pip install fuzzfetch grizzly-framework
$ python -m fuzzfetch -d --fuzzing -n firefox
$ python -m grizzly.replay ./firefox/firefox testcase.html
Assertion failure: false (MOZ_ASSERT_UNREACHABLE: Couldn't release allocated device), at /builds/worker/checkouts/gecko/dom/media/webrtc/MediaEngineRemoteVideoSource.cpp:188
#0 0x7fe0d5a25c0f in mozilla::MediaEngineRemoteVideoSource::Deallocate() /builds/worker/checkouts/gecko/dom/media/webrtc/MediaEngineRemoteVideoSource.cpp:188:5
#1 0x7fe0d559b0e4 in operator() /builds/worker/checkouts/gecko/dom/media/MediaManager.cpp:3906:15
#2 0x7fe0d559b0e4 in mozilla::media::LambdaTask<mozilla::DeviceListener::Stop()::$_37>::Run() /builds/worker/workspace/obj-build/dist/include/mozilla/media/MediaTaskUtils.h:36:5
#3 0x7fe0d184c475 in mozilla::TaskQueue::Runner::Run() /builds/worker/checkouts/gecko/xpcom/threads/TaskQueue.cpp:259:20
#4 0x7fe0d18681ab in nsThreadPool::Run() /builds/worker/checkouts/gecko/xpcom/threads/nsThreadPool.cpp:310:14
#5 0x7fe0d185f3d8 in nsThread::ProcessNextEvent(bool, bool*) /builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp:1191:16
#6 0x7fe0d186585d in NS_ProcessNextEvent(nsIThread*, bool) /builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp:476:10
#7 0x7fe0d24589da in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) /builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp:300:20
#8 0x7fe0d237b6d8 in MessageLoop::RunInternal() /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:381:10
#9 0x7fe0d237b5e1 in RunHandler /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:374:3
#10 0x7fe0d237b5e1 in MessageLoop::Run() /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:356:3
#11 0x7fe0d185a8d7 in nsThread::ThreadFunc(void*) /builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp:383:10
#12 0x7fe0e4608c86 in _pt_root /builds/worker/checkouts/gecko/nsprpub/pr/src/pthreads/ptthread.c:201:5
#13 0x7fe0e4eb1b42 in start_thread nptl/pthread_create.c:442:8
#14 0x7fe0e4f439ff misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Comment 1•1 year ago
|
||
Unable to reproduce bug 1809354 using build mozilla-central 20221110163712-963d3b67f524. Without a baseline, bugmon is unable to analyze this bug.
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.
Comment 2•1 year ago
|
||
I may need more info here, but I've tried the following:
python -m fuzzfetch -d --fuzzing -n firefox
python3 -m fuzzfetch --build 2022-11-10 -d --fuzzing -n firefox
python3 -m fuzzfetch --build 963d3b67f524 -d --fuzzing -n firefox
I've run:
python3 -m grizzly.replay ./firefox/firefox testcase.html
python3 -m grizzly.replay --repeat 300 ./firefox/firefox testcase.html
This is running on Ubuntu22.04 and I've had no luck reproducing a failure. All my output looks similar to:
[2023-01-10 23:23:03] Starting Grizzly Replay
[2023-01-10 23:23:03] Ignoring: log-limit, timeout
[2023-01-10 23:23:03] Using time limit: 30s, timeout: 45s
[2023-01-10 23:23:03] Repeat: 1, Minimum crashes: 1, Relaunch 1
[2023-01-10 23:23:09] Running test (1/1)...
[2023-01-10 23:23:13] No results detected
[2023-01-10 23:23:13] Shutting down...
[2023-01-10 23:23:13] Done.
I'm going to mark this p3/s4 until we have more info on how to repro. Tyson, if you have additional info here, I'd be happy to try again.
Reporter | ||
Comment 3•1 year ago
|
||
I think that was my fault. That test case didn't work for me locally either. I must have mistyped the timeout. I've updated the test case with the correct timeout and it works for me very reliably now, sorry about that.
The grizzly command looks good to me. The only thing I'd try differently is use --relaunch 1
when using --repeat #
, since this appears to be shutdown related.
I can also try to grab a Pernosco session.
Comment 4•1 year ago
|
||
I can now reproduce locally - thank you Tyson!
I've walked back through versions with the following results:
2022-03-17 good
2022-03-18 bad
In the morning I'll see if I can narrow down the commit range further.
Comment 5•1 year ago
|
||
Testcase crashes using the initial build (mozilla-central 20221110163712-963d3b67f524) but not with tip (mozilla-central 20230111043919-66c42cef8816.)
Unable to bisect testcase (Start build didn't crash!):
Start: 963d3b67f5243687fcb5bc122a22a536caa9a9fd (20221110163712)
End: 66c42cef8816a3264963ad03ee2dc74cead27fef (20230111043919)
BuildFlags: BuildFlags(asan=False, tsan=False, debug=True, fuzzing=True, coverage=False, valgrind=False, no_opt=False, fuzzilli=False, nyx=False)
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.
Comment 6•1 year ago
|
||
(In reply to Bugmon [:jkratzer for issues] from comment #5)
Testcase crashes using the initial build (mozilla-central 20221110163712-963d3b67f524) but not with tip (mozilla-central 20230111043919-66c42cef8816.)
This is just intermittent enough that 2023-01-11 (tip) passed, but it fails quite often, so this has not been bisected.
Comment 7•1 year ago
|
||
After looking at the push log for from 2022-03-17 to 2022-03-18 I was able to trace this to Bug 1757505. I built the commit prior to Bug 1757505 landing, 9140145f3d9d, and ran with 50 repeats and no failures. I built 4eaaabf753cd for last of Bug 1757505 commits and get failures.
Karl, would you mind taking a look?
Comment 8•1 year ago
|
||
Set release status flags based on info from the regressing bug 1757505
Updated•1 year ago
|
Comment 9•1 year ago
|
||
Set release status flags based on info from the regressing bug 1757505
Comment 10•1 year ago
|
||
Michael, is there something we can do here for the releases in flight? Thanks
Updated•1 year ago
|
Assignee | ||
Comment 12•1 year ago
|
||
I can reproduce this with python -m fuzzfetch -d --fuzzing -n firefox
, but not with my own local --enable-fuzzing --enable-debug build.
Assignee | ||
Comment 13•1 year ago
•
|
||
This is log_stderr.txt
from the -l
option to grizzly.replay
and MOZ_LOG=CamerasChild:4,CamerasParent:4
and MOZ_IPC_MESSAGE_LOG=PCamerasChild,PCamerasParent
in the environment.
Salient lines below:
CamerasChild
sends ReleaseCapture
and CamerasParent
receives it.
[Child 642518: MediaSupervisor #1]: D/CamerasChild int mozilla::camera::CamerasChild::ReleaseCapture(mozilla::camera::PCamerasChild::CaptureEngine, const int)
[time: 1675999320045626][642518->642272] [PCamerasChild] Sending PCameras::Msg_ReleaseCapture
[time: 1675999320045721][642272<-642518] [PCamerasParent] Received PCameras::Msg_ReleaseCapture
[Parent 642272: IPDL Background]: D/CamerasParent virtual mozilla::ipc::IPCResult mozilla::camera::CamerasParent::RecvReleaseCapture(const mozilla::camera::PCamerasParent::CaptureEngine &, const int &)
[Parent 642272: IPDL Background]: D/CamerasParent RecvReleaseCamera device nr 2
PCamerasParent::Send__delete__()
is called. This is only triggered after profileBeforeChange
. CamerasParent
can't send or receive after this point.
The CamerasChild
receives the associated ActorDestroy
. This notifies the "MediaSupervisor #1" thread waiting for the reply from the parent, which then returns and error.
"Dispatch for IPC" did not actually fail, but no response was received because the CamerasParent
is no longer an IPC Actor.
The assertion fails because it expects a response from ReleaseCapture
.
[time: 1675999320740894][642272->642518] [PCamerasParent] Sending PCameras::Msg___delete__
[Parent 642272: IPDL Background]: D/CamerasParent virtual void mozilla::camera::CamerasParent::ActorDestroy(mozilla::ipc::IProtocol::ActorDestroyReason)
[time: 1675999320741058][642518<-642272] [PCamerasChild] Received PCameras::Msg___delete__
[Child 642518: Cameras IPC]: D/CamerasChild ActorDestroy
[Child 642518: MediaSupervisor #1]: D/CamerasChild Cameras dispatch for IPC failed in ReleaseCapture
Assertion failure: false (MOZ_ASSERT_UNREACHABLE: Couldn't release allocated device), at /builds/worker/checkouts/gecko/dom/media/webrtc/MediaEngineRemoteVideoSource.cpp:188
The CamerasParent
bails out of responding to ReleaseCapture
. "child not alive" just means that ActorDestroy()
was called on the CamerasParent
.
[Parent 642272: VideoCapture]: D/CamerasParent bool mozilla::camera::CamerasParent::SetupEngine(mozilla::camera::PCamerasParent::CaptureEngine)
[Parent 642272: VideoCapture]: D/CamerasParent void mozilla::camera::CamerasParent::CloseEngines()
[Parent 642272: IPDL Background]: D/CamerasParent RecvReleaseCapture: child not alive
This assertion failure would not have happened without the Send__delete__()
call added in https://hg.mozilla.org/integration/autoland/rev/0dfbf71254ac.
However, the CamerasParent
is shutting down and releasing its resources, so it is OK that the CameraChild
does not get a response from the ReleaseCapture
message.
Assignee | ||
Comment 14•1 year ago
|
||
The assertion is debug-only, but there is a similar MOZ_DIAGNOSTIC_ASSERT
for the StopCapture
message sent immediately before ReleaseCapture
.
(In reply to Karl Tomlinson (:karlt) from comment #13)
This assertion failure would not have happened without the
Send__delete__()
call added in https://hg.mozilla.org/integration/autoland/rev/0dfbf71254ac.
Perhaps it could have happened due to the video capture thread shutting down.
The Send__delete__()
made sure of it. Thank you for narrowing down the regression range, Michael.
Assignee | ||
Comment 15•1 year ago
|
||
The assertions date back to https://hg.mozilla.org/mozilla-central/rev/f30324f2c1a5036d656c38a460810aa394d3cbac
Assignee | ||
Comment 16•1 year ago
•
|
||
Before https://hg.mozilla.org/integration/autoland/rev/32b3a1dd3940#l1.23, DispatchToParent() would return an old value of
mReplySuccessif
CamerasChild` had been disconnected from IPC. That old value would typically be true from a previous call, and so the assertion would have usually thought the call had succeeded when no reply had been received.
Assignee | ||
Comment 17•1 year ago
|
||
Comment 18•1 year ago
|
||
Pushed by ktomlinson@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/16687ff29b32 don't intentionally crash when CamerasParent does not reply to resource deallocation messages r=pehrsons
Comment 19•1 year ago
|
||
bugherder |
Comment 20•1 year ago
|
||
The patch landed in nightly and beta is affected.
:karlt, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox111
towontfix
.
For more information, please visit auto_nag documentation.
Description
•