Closed Bug 1576335 Opened 3 years ago Closed 6 months ago

Crash in [@ AsyncShutdownTimeout | profile-before-change | CamerasParent 1]

Categories

(Core :: WebRTC: Audio/Video, defect, P2)

Unspecified
Linux
defect

Tracking

()

RESOLVED FIXED
96 Branch
Tracking Status
firefox-esr91 --- wontfix
firefox94 --- wontfix
firefox95 --- wontfix
firefox96 --- fixed

People

(Reporter: Usul, Assigned: pehrsons)

References

(Regression)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(2 files)

This bug is for crash report bp-80a47a70-76ad-4df0-b800-460010190824.

Top 10 frames of crashing thread:

0 firefox-bin mozalloc_abort memory/mozalloc/mozalloc_abort.cpp:33
1 libxul.so Abort xpcom/base/nsDebugImpl.cpp:439
2 libxul.so NS_DebugBreak xpcom/string/nsSubstring.cpp
3 libxul.so nsDebugImpl::Abort xpcom/base/nsDebugImpl.cpp:133
4 libxul.so NS_InvokeByIndex 
5 libxul.so XPCWrappedNative::CallMethod js/xpconnect/src/XPCWrappedNative.cpp:1149
6 libxul.so XPC_WN_CallMethod js/xpconnect/src/XPCWrappedNativeJSOps.cpp:943
7 libxul.so js::InternalCallOrConstruct js/src/vm/Interpreter.cpp:539
8 libxul.so Interpret js/src/vm/Interpreter.cpp:594
9 libxul.so js::RunScript js/src/vm/Interpreter.cpp:424

jib, would you mind taking a look?

From the crash report, it seems FF abort here.

Flags: needinfo?(jib)
Priority: -- → P2

Note the thread 0 stack merely points to the shutdown blocker timing out ("CamerasParent 1").

Clicking Show other threads and searching for "CamerasParent" reveals more info in thread 20 (IPDL Background):

1 libxul.so mozilla::camera::CamerasParent::DispatchToVideoCaptureThread(RefPtr<mozilla::Runnable>) dom/media/systemservices/CamerasParent.cpp:188
2 libxul.so mozilla::camera::CamerasParent::StopVideoCapture() dom/media/systemservices/CamerasParent.cpp:210
3 libxul.so mozilla::ipc::IProtocol::DestroySubtree(mozilla::ipc::IProtocol::ActorDestroyReason) ipc/glue/ProtocolUtils.cpp:572
4 libxul.so mozilla::ipc::IProtocol::DestroySubtree(mozilla::ipc::IProtocol::ActorDestroyReason) ipc/glue/ProtocolUtils.cpp:560
5 libxul.so mozilla::ipc::PBackgroundParent::OnChannelError() ipc/ipdl/PBackgroundParent.cpp:5987

...i.e. blocking on sThreadMonitor here:

nsresult CamerasParent::DispatchToVideoCaptureThread(RefPtr<Runnable> event) {
  // Don't try to dispatch if we're already on the right thread.
  // There's a potential deadlock because the sThreadMonitor is likely
  // to be taken already.
→ MonitorAutoLock lock(*sThreadMonitor);

...which is held by thread 66 (VideoCapture):

2 libxul.so rtc::PlatformThread::Stop() media/webrtc/trunk/webrtc/rtc_base/platform_thread.cc:220
3 firefox-bin arena_t::DallocSmall(arena_chunk_t*, void*, arena_chunk_map_t*) memory/build/mozjemalloc.cpp:3292
4 firefox-bin arena_t::DallocSmall(arena_chunk_t*, void*, arena_chunk_map_t*) memory/build/mozjemalloc.cpp:3242
5 firefox-bin arena_dalloc(void*, unsigned long, arena_t*) memory/build/mozjemalloc.cpp:3328
6 firefox-bin replace_free(void*) memory/replace/phc/PHC.cpp:1103
7 firefox-bin replace_free(void*) memory/replace/phc/PHC.cpp:1103
8 firefox-bin arena_t::DallocSmall(arena_chunk_t*, void*, arena_chunk_map_t*) memory/build/mozjemalloc.cpp:3242
9 libxul.so webrtc::videocapturemodule::DeviceInfoLinux::~DeviceInfoLinux() media/webrtc/trunk/webrtc/modules/video_capture/linux/device_info_linux.cc:215
10 libxul.so webrtc::videocapturemodule::DeviceInfoLinux::~DeviceInfoLinux() media/webrtc/trunk/webrtc/modules/video_capture/linux/device_info_linux.cc:210
11 libxul.so _fini
12 libxul.so std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() /builds/worker/fetches/clang/include/c++/6.4.0/bits/shared_ptr_base.h:150
13 libxul.so mozilla::camera::VideoEngine::~VideoEngine() dom/media/systemservices/VideoEngine.h:26
14 libxul.so _fini
15 libxul.so mozilla::camera::CamerasParent::CloseEngines() dom/media/systemservices/VideoEngine.h:34

...here:

void CamerasParent::StopVideoCapture() {
  LOG(("%s", __PRETTY_FUNCTION__));
  // We are called from the main thread (xpcom-shutdown) or
  // from PBackground (when the Actor shuts down).
  // Shut down the WebRTC stack (on the capture thread)
  RefPtr<CamerasParent> self(this);
  DebugOnly<nsresult> rv =
      DispatchToVideoCaptureThread(NewRunnableFrom([self]() {
        MonitorAutoLock lock(*(self->sThreadMonitor));
→       self->CloseEngines();

...but it appears blocked on some device info shutdown not happening here (Linux in this case):

DeviceInfoLinux::~DeviceInfoLinux() {
#ifdef WEBRTC_LINUX
    ++_isShutdown;

    if (_inotifyEventThread) {
→       _inotifyEventThread->Stop();

Dan, are these the remaining crashes you're referring to in bug 1552755 comment 12? They seem to be in the same vicinity.

Flags: needinfo?(jib) → needinfo?(dminor)

To complicate matters, the Windows reports appear different. E.g this one has the VideoCapture thread hanging on _apiLock here.

We should probably be careful when commenting to mention which platform we're referring to. Andreas, any ideas here?

Flags: needinfo?(apehrson)
Duplicate of this bug: 1575759

Duping bug 1575759 into this bug since the crash signature is more specific here. I did a similar analysis in that bug though. The thread we're trying to join is in both bugs sitting in read. I don't know more on why that would be and don't have time to dig into it this week.

Flags: needinfo?(apehrson)
Crash Signature: [@ AsyncShutdownTimeout | profile-before-change | CamerasParent 1] → [@ AsyncShutdownTimeout | profile-before-change | CamerasParent 1] [@ shutdownhang | libpthread-2.29.so@0x1193d]

Ludovic, is this reproducible for you? Would you be able to reproduce with some logging enabled? The modules I'm thinking of would be MOZ_LOG=timestamp,sync,MediaManager:5,webrtc_trace:5,CamerasParent:5.

Crash Signature: [@ AsyncShutdownTimeout | profile-before-change | CamerasParent 1] [@ shutdownhang | libpthread-2.29.so@0x1193d] → [@ AsyncShutdownTimeout | profile-before-change | CamerasParent 1] [@ shutdownhang | libpthread-2.29.so@0x1193d]
Flags: needinfo?(ludovic)

(In reply to Andreas Pehrson [:pehrsons] from comment #6)

Ludovic, is this reproducible for you? Would you be able to reproduce with some logging enabled? The modules I'm thinking of would be MOZ_LOG=timestamp,sync,MediaManager:5,webrtc_trace:5,CamerasParent:5.

I was happening quite frequently on updates restart. I'll set up login and will monitor for the crash.

Flags: needinfo?(ludovic)

I'm sorry, I don't recall which signatures I was referring to in bug 1552755 comment 12, I wish I would have linked to one of them at that time. The fix in Bug 1552755 was for Windows code only, so I would have been talking about other Windows shutdown hangs if that helps.

Flags: needinfo?(dminor)
Attached file Log from the console
Sorry took a while: here is what's visible on the console for the latest crash.

What are the steps you took leading to this log? What am I looking at?

It seems to start with a crash (but doesn't contain many clues to what happened during that session before the crash), then Firefox is relaunched and does not crash (which is most of the log). What those initial [Parent 616, Gecko_IOThread] WARNING: pipe error (1606): Connection reset by peer were would be interesting to know.

Flags: needinfo?(ludovic)

(In reply to Andreas Pehrson [:pehrsons] from comment #10)

What are the steps you took leading to this log? What am I looking at?

the output from the console as MOZ_LOG=timestamp,sync,MediaManager:5,webrtc_trace:5,CamerasParent:5 didn't generate any file (or I didn't find them if files were generated).

Steps are pretty simple : wait for one update and click update -> crash on restart.

It seems to start with a crash (but doesn't contain many clues to what happened during that session before the crash), then Firefox is relaunched and does not crash (which is most of the log). What those initial [Parent 616, Gecko_IOThread] WARNING: pipe error (1606): Connection reset by peer were would be interesting to know.

No idea. I don't carsh on will - but this crash comes up often when I restart for update.

Flags: needinfo?(ludovic)

Not seeing any holes looking at the code. But it's very low level so I'm sure there are some gotchas I could be missing.

Can you recall whether you plugged or unplugged any cameras or audio devices while the instances that crash like this were running?

Do you have /dev/v4l/by-path/ and /dev/snd/by-path/ on your system? Do they look normal?

Flags: needinfo?(ludovic)

(In reply to Andreas Pehrson [:pehrsons] from comment #12)

Not seeing any holes looking at the code. But it's very low level so I'm sure there are some gotchas I could be missing.

Can you recall whether you plugged or unplugged any cameras or audio devices while the instances that crash like this were running?

I might have unplugged my jack based headset

Do you have /dev/v4l/by-path/ and /dev/snd/by-path/ on your system? Do they look normal?

[root@saraan ~]# ls /dev/v4l/by-path/
pci-0000:00:14.0-usb-0:8:1.0-video-index0
pci-0000:00:14.0-usb-0:8:1.0-video-index1
[root@saraan ~]#

[root@saraan ~]# ls /dev/snd/by-path/
pci-0000:00:1f.3
[root@saraan ~]#

not a specialist but these look ok.

Flags: needinfo?(ludovic)

(In reply to Ludovic Hirlimann [:Usul] from comment #13)

(In reply to Andreas Pehrson [:pehrsons] from comment #12)

Not seeing any holes looking at the code. But it's very low level so I'm sure there are some gotchas I could be missing.

Can you recall whether you plugged or unplugged any cameras or audio devices while the instances that crash like this were running?

I might have unplugged my jack based headset

This would be great to verify and then define and minimize some steps into something you can reproduce. If you can get that far I'd like to know whether we're looking at a regression or if it's always been like this.

I'll also try to repro with some unplugging but in my experience this generally works. I'm sorry I can't be of more help.

Do you have /dev/v4l/by-path/ and /dev/snd/by-path/ on your system? Do they look normal?

[root@saraan ~]# ls /dev/v4l/by-path/
pci-0000:00:14.0-usb-0:8:1.0-video-index0
pci-0000:00:14.0-usb-0:8:1.0-video-index1
[root@saraan ~]#

[root@saraan ~]# ls /dev/snd/by-path/
pci-0000:00:1f.3
[root@saraan ~]#

not a specialist but these look ok.

Look fine to me.

This is not a true regression from bug 1407415, since that bug just changed how the problem is exposed. It's good to know that bug 1407415 is the reason this shows up so suddenly in 68 however.

Regressed by: 1407415
Crash Signature: [@ AsyncShutdownTimeout | profile-before-change | CamerasParent 1] [@ shutdownhang | libpthread-2.29.so@0x1193d] → [@ AsyncShutdownTimeout | profile-before-change | CamerasParent 1] [@ AsyncShutdownTimeout | profile-before-change | CamerasParent 1,CamerasParent 2] [@ shutdownhang | libpthread-2.29.so@0x1193d]

When I investigated bug 1692908, I happened to hit this crash on Firefox 86.0b9 on Windows.

https://crash-stats.mozilla.org/report/index/02077044-a36f-4d56-b869-296f10210218
https://crash-stats.mozilla.org/report/index/b88a1d98-e49d-40bf-9066-e044f0210218

Here are the repro steps: (You don't need an AVerMedia device)

  1. Go to https://www.avermedia.com/us/support/download
  2. Choose "Webcam" and "Live Streamer CAM 313 - PW313"
  3. Download "AVerMedia CamEngine (Win) v2.0.0.51" (AVerMedia_Engine_Installer_v2.0.0.51.exe)
  4. Install AVerMedia CamEngine (all default settings)
  5. Please make sure avmvirtualsource.ax exists in your install directory (default: C:\Program Files (x86)\AVerMedia\AVerMedia Engine)
  6. Open an elevated prompt and run regsvr32 <path to avmvirtualsource.ax>
  7. Go to https://webcamtests.com/
  8. If avmvirtualsource.ax is registered correctly, you will be able to choose and test "AVerMedia Cam Engine Source", so do it.
  9. Somehow the page continues to show the message "Waiting for your permission...". Wait some seconds and close Firefox.
  10. The content process continues to run, and crashes eventually. That's the repro.

This scenario loads AVerMedia's virtual camera device avmvirtualsource.ax, which may have a bug because avmvirtualsource.ax causes a crash (bug 1692908), but hopefully this info can be a hint.

QA Whiteboard: qa-not-actionable

Looking at this report, VideoCaptureThread is hanging on a lock in ~DeviceInfoImpl. Looking at how upstream uses this lock there's a path that can leave it hanging locked exclusively. A recipe for... well, let's just say it's a likely reason for this bug.

At least upstream has fixed this by swapping to a Mutex and avoiding all the manual unlock/lock operations. Let's cherry-pick that patch.

Assignee: nobody → apehrson
Status: NEW → ASSIGNED

This is a cherry-pick of upstream libwebrtc's
https://webrtc.googlesource.com/src/+/5b5de21accfd29e21cba2d6f38e3087e1f731be6

This gets rid of the path in DeviceInfoImpl::GetBestMatchedCapability that can
leave the _apiLock exclusively locked forever.

Pushed by pehrsons@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/32edcae2b552
Fix DeviceInfoImpl::_apiLock leaks by cherry-pick. r=padenot
Status: ASSIGNED → RESOLVED
Closed: 6 months ago
Resolution: --- → FIXED
Target Milestone: --- → 96 Branch
Has Regression Range: --- → yes

(In reply to Olivier Crête from comment #22)

I'm still seeing this 96: https://crash-stats.mozilla.org/report/index/ab924bb2-2ffe-46b7-a399-7ab470220127

Thanks for the report! That is a different failure mode. I filed bug 1752326.

You need to log in before you can comment on or make changes to this bug.