Closed Bug 1227407 Opened 9 years ago Closed 9 years ago

Correct a logic error in camera code that could be causing a crash

Categories

(Core :: WebRTC: Audio/Video, defect, P1)

44 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla45
Tracking Status
firefox44 --- fixed
firefox45 --- fixed
b2g-v2.5 --- fixed

People

(Reporter: jib, Assigned: gcp)

References

Details

Attachments

(2 files)

See bug 1218799 comment 20.

The problem is already fixed in trunk, so we need to identify which other bug we need to uplift to 44.
[Tracking Requested - why for this release]:
I'm not able to reproduce the crash with the STR you gave.

Instead I found another one.

(gdb) bt
#0  0x00007fbe751e46fd in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007fbe751e4594 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:137
#2  0x00007fbe79d9e05d in ah_crap_handler(int) (signum=signum@entry=11)
    at /home/morbo/hg/mozilla-aurora/toolkit/xre/nsSigHandlers.cpp:103
#3  0x00007fbe79d9e09d in child_ah_crap_handler(int) (signum=11)
    at /home/morbo/hg/mozilla-aurora/toolkit/xre/nsSigHandlers.cpp:115
#4  0x00007fbe7acaf62d in AsmJSFaultHandler(int, siginfo_t*, void*) (signum=<optimized out>, info=0x7ffc18aae630, context=0x7ffc18aae500) at /home/morbo/hg/mozilla-aurora/js/src/asmjs/AsmJSSignalHandlers.cpp:1162
#5  0x00007fbe7db038d0 in <signal handler called> () at /lib/x86_64-linux-gnu/libpthread.so.0
#6  0x00007fbe76c46058 in mozilla::AbstractThread::MainThread() ()
    at /home/morbo/hg/mozilla-aurora/xpcom/threads/AbstractThread.cpp:116
#7  0x00007fbe78b0eab0 in mozilla::MediaStreamGraphImpl::RunInStableState(bool) (this=<optimized out>, aSourceIsMSG=<optimized out>) at /home/morbo/hg/mozilla-aurora/dom/media/MediaStreamGraph.cpp:1514
#8  0x00007fbe78b0ec11 in mozilla::(anonymous namespace)::MediaStreamGraphStableStateRunnable::Run() (this=<optimized out>) at /home/morbo/hg/mozilla-aurora/dom/media/MediaStreamGraph.cpp:1319
#9  0x00007fbe76bd29b8 in mozilla::CycleCollectedJSRuntime::ProcessStableStateQueue() (this=this@entry=0x7fbe6b22b000)
    at /home/morbo/hg/mozilla-aurora/xpcom/base/CycleCollectedJSRuntime.cpp:1046
#10 0x00007fbe76bd3ba9 in mozilla::CycleCollectedJSRuntime::~CycleCollectedJSRuntime() (this=0x7fbe6b22b000, __in_chrg=<optimized out>) at /home/morbo/hg/mozilla-aurora/xpcom/base/CycleCollectedJSRuntime.cpp:457
#11 0x00007fbe775ea8b0 in XPCJSRuntime::~XPCJSRuntime() (this=this@entry=0x7fbe6b22b000, __in_chrg=<optimized out>)
    at /home/morbo/hg/mozilla-aurora/js/xpconnect/src/XPCJSRuntime.cpp:1555
#12 0x00007fbe775ea8f7 in XPCJSRuntime::~XPCJSRuntime() (this=0x7fbe6b22b000, __in_chrg=<optimized out>)
    at /home/morbo/hg/mozilla-aurora/js/xpconnect/src/XPCJSRuntime.cpp:1619
#13 0x00007fbe7760f111 in nsXPConnect::~nsXPConnect() (this=this@entry=0x7fbe6a261880, __in_chrg=<optimized out>)
    at /home/morbo/hg/mozilla-aurora/js/xpconnect/src/nsXPConnect.cpp:99
#14 0x00007fbe7760f13b in nsXPConnect::~nsXPConnect() (this=this@entry=0x7fbe6a261880, __in_chrg=<optimized out>)
    at /home/morbo/hg/mozilla-aurora/js/xpconnect/src/nsXPConnect.cpp:103
#15 0x00007fbe7760f252 in nsXPConnect::Release() (this=0x7fbe6a261880)
    at /home/morbo/hg/mozilla-aurora/js/xpconnect/src/nsXPConnect.cpp:39
#16 0x00007fbe7761025f in nsXPConnect::ReleaseXPConnectSingleton() ()
    at /home/morbo/hg/mozilla-aurora/js/xpconnect/src/nsXPConnect.cpp:147
#17 0x00007fbe775de301 in xpcModuleDtor() () at /home/morbo/hg/mozilla-aurora/js/xpconnect/src/XPCModule.cpp:22
#18 0x00007fbe79832600 in LayoutModuleDtor() () at /home/morbo/hg/mozilla-aurora/layout/build/nsLayoutModule.cpp:1386
#19 0x00007fbe76c3f276 in nsComponentManagerImpl::Shutdown() (this=0x7fbe6a669040, __in_chrg=<optimized out>)
    at /home/morbo/hg/mozilla-aurora/xpcom/components/nsComponentManager.h:242
#20 0x00007fbe76c3f276 in nsComponentManagerImpl::Shutdown() (this=0x7fbe6a6038e0, __in_chrg=<optimized out>)
    at ../../dist/include/nsAutoPtr.h:74
#21 0x00007fbe76c3f276 in nsComponentManagerImpl::Shutdown() (aE=0x7fbe6a6038e0) at ../../dist/include/nsTArray.h:520
#22 0x00007fbe76c3f276 in nsComponentManagerImpl::Shutdown() (aCount=64, aStart=0, this=0x7fbe6b2967c0)
    at ../../dist/include/nsTArray.h:1999
#23 0x00007fbe76c3f276 in nsComponentManagerImpl::Shutdown() (aCount=64, aStart=0, this=0x7fbe6b2967c0)
    at ../../dist/include/nsTArray.h:1641
#24 0x00007fbe76c3f276 in nsComponentManagerImpl::Shutdown() (this=0x7fbe6b2967c0)
    at ../../dist/include/nsTArray.h:1650
#25 0x00007fbe76c3f276 in nsComponentManagerImpl::Shutdown() (this=0x7fbe6b296690)
    at /home/morbo/hg/mozilla-aurora/xpcom/components/nsComponentManager.cpp:924
#26 0x00007fbe76c83221 in mozilla::ShutdownXPCOM(nsIServiceManager*) (aServMgr=<optimized out>)
    at /home/morbo/hg/mozilla-aurora/xpcom/build/XPCOMInit.cpp:986
#27 0x00007fbe76c83530 in NS_ShutdownXPCOM(nsIServiceManager*) (aServMgr=<optimized out>)
    at /home/morbo/hg/mozilla-aurora/xpcom/build/XPCOMInit.cpp:799
#28 0x00007fbe79d9c429 in XRE_TermEmbedding() () at /home/morbo/hg/mozilla-aurora/toolkit/xre/nsEmbedFunctions.cpp:209
#29 0x00007fbe7704a398 in mozilla::ipc::ScopedXREEmbed::Stop() (this=0x7fbe6b272a58)
    at /home/morbo/hg/mozilla-aurora/ipc/glue/ScopedXREEmbed.cpp:115
#30 0x00007fbe78f18632 in mozilla::dom::ContentProcess::CleanUp() (this=<optimized out>)
    at /home/morbo/hg/mozilla-aurora/dom/ipc/ContentProcess.cpp:97
#31 0x00007fbe79d9d3ae in XRE_InitChildProcess(int, char**, mozilla::gmp::GMPLoader*) (aArgc=<optimized out>, aArgv=<optimized out>, aGMPLoader=<optimized out>) at /home/morbo/hg/mozilla-aurora/toolkit/xre/nsEmbedFunctions.cpp:627
#32 0x00000000004082ab in content_process_main(int, char**) (argc=5, argv=0x7ffc18ab0428)
    at /home/morbo/hg/mozilla-aurora/ipc/app/../contentproc/plugin-container.cpp:237
#33 0x000000000040833a in main(int, char**) (argc=<optimized out>, argv=<optimized out>)
    at /home/morbo/hg/mozilla-aurora/ipc/app/MozillaRuntimeMain.cpp:11
(gdb) up
#1  0x00007fbe751e4594 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:137
137     ../sysdeps/unix/sysv/linux/sleep.c: No such file or directory.
(gdb) up
#2  0x00007fbe79d9e05d in ah_crap_handler (signum=signum@entry=11)
    at /home/morbo/hg/mozilla-aurora/toolkit/xre/nsSigHandlers.cpp:103
103       sleep(_gdb_sleep_duration);
(gdb) up
#3  0x00007fbe79d9e09d in child_ah_crap_handler (signum=11)
    at /home/morbo/hg/mozilla-aurora/toolkit/xre/nsSigHandlers.cpp:115
115       ah_crap_handler(signum);
(gdb) up
#4  0x00007fbe7acaf62d in AsmJSFaultHandler (signum=<optimized out>, info=0x7ffc18aae630, context=0x7ffc18aae500)
    at /home/morbo/hg/mozilla-aurora/js/src/asmjs/AsmJSSignalHandlers.cpp:1162
1162            sPrevSEGVHandler.sa_handler(signum);
(gdb) up
#5  <signal handler called>
(gdb) up
#6  0x00007fbe76c46058 in mozilla::AbstractThread::MainThread ()
    at /home/morbo/hg/mozilla-aurora/xpcom/threads/AbstractThread.cpp:116
116       MOZ_ASSERT(sMainThread);
(gdb) up
#7  0x00007fbe78b0eab0 in mozilla::MediaStreamGraphImpl::RunInStableState (this=<optimized out>, 
    aSourceIsMSG=<optimized out>) at /home/morbo/hg/mozilla-aurora/dom/media/MediaStreamGraph.cpp:1514
1514        AbstractThread::MainThread()->TailDispatcher().DrainDirectTasks();
(In reply to Jan-Ivar Bruaroey [:jib] from comment #0)
> See bug 1218799 comment 20.

  MutexAutoLock requestLock(mRequestMutex);

This is the first use of a CamerasChild member var in that callstack, so what's likely happening is that

  return Cameras()->StopCapture(aCapEngine, capture_id);

is being called on a nullptr.

The way Cameras() is constructed it must return a valid object or we *will* die. So this:

    if (NS_FAILED(rv)) {
      LOG(("Error launching IPC Thread"));
      return nullptr;
    }

...should actually just force-crash instantly. The whole InitializeIPCThread machinery would likely also fail in shutdown but you don't appear to be hitting those asserts. It also looks possible for Shutdown() to have PBackground clean up the object (as CamerasSingleton is a non-owning reference) making this a potential UAF.

Basically, I think I need to fix this code to make sure mCamerasMutex is held across any Cameras() call, because that's the only way to ensure the object is actually alive.

Given that this bug is likely showing up in situations where we're in Shutdown, we should probably also take into account that it may be impossible to still properly set up Cameras(). This kind of logic is there now on the parent side, but it looks the child still assumes it can set up a CamerasChild in all circumstances.
Summary: Find and uplift to 44 a fix to media shutdown crash in bug 1218799 comment 20 → Find and uplift to 44 a fix to media shutdown leak in bug 1218799 comment 30
Summary: Find and uplift to 44 a fix to media shutdown leak in bug 1218799 comment 30 → Find and uplift to 44 a fix to media shutdown crash in bug 1218799 comment 20
Attachment #8692530 - Flags: review?(rjesup)
Attachment #8692530 - Flags: feedback?(jib)
This isn't a fix to uplift, but it does correct a logic error (that still exists on Nightly) that could've caused the original crash.

I guess that the change that stopped this from crashing 45 as compared to 44 is that the scenario (StopCapture called after Shutdown, possibly due to concurrency) doesn't happen any more.
Comment on attachment 8692530 [details] [diff] [review]
Ensure Cameras is alive before calling through it

Review of attachment 8692530 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/systemservices/CamerasChild.cpp
@@ +631,1 @@
>    }

Nit: remove the else {}; the if() ends in return:
    return;
  }
  GetCamerasChild()->Shutdown();
Attachment #8692530 - Flags: review?(rjesup) → review+
When I run with the patch attached, it hangs on shutdown for me, but not without this patch. STR:

1. Open https://jsfiddle.net/srn9db4h and start camera.
2. Open new tab
3. Close the gum tab (observe two beeping asserts in bash)
4. Close browser (wait 30 seconds for shutdown timout)

Expected result: clean shutdown
Actual result: Hit MOZ_CRASH(Shutdown too long, probably frozen, causing a crash.)
Attachment #8692530 - Flags: feedback?(jib) → feedback-
backlog: --- → webrtc/webaudio+
Rank: 10
Priority: -- → P1
As mentioned in bug 1218799 comment 30, I can no longer reproduce the original bug, so lets re-purpose it for the patch gcp added in comment 7.
Assignee: nobody → gpascutto
Summary: Find and uplift to 44 a fix to media shutdown crash in bug 1218799 comment 20 → Correct a logic error in camera code that could be causing a crash
No luck in reproducing. I had to apply the patches from bug 1218799 to m-a to avoid crashing on the relevant assertion.
(In reply to Jan-Ivar Bruaroey [:jib] from comment #9)

> When I run with the patch attached, it hangs on shutdown for me, but not
> without this patch. STR:
> 
> 1. Open https://jsfiddle.net/srn9db4h and start camera.
> 2. Open new tab
> 3. Close the gum tab (observe two beeping asserts in bash)
> 4. Close browser (wait 30 seconds for shutdown timout)
> 
> Expected result: clean shutdown
> Actual result: Hit MOZ_CRASH(Shutdown too long, probably frozen, causing a
> crash.)

Are you able to:

1) Get a backtrace of the actual hang locations
2) Run with export NSPR_LOG_MODULES=MediaManager:4,CamerasParent:4,CamerasChild:4,timestamp:4

and attach all the resulting logfiles?
Flags: needinfo?(jib)
Resolved in #media. I no longer see a hang in nightly with this patch (or aurora with the patch-set in bug 1228064 comment 5).

I suspect what happened in comment 9 was I was on nightly, and hadn't pulled in last Thursday's fix from bug 1216972 yet, which seems to fix an existing shutdown hang. Sorry for the confusion.
Flags: needinfo?(jib)
Attachment #8692530 - Flags: feedback- → feedback+
Comment on attachment 8692530 [details] [diff] [review]
Ensure Cameras is alive before calling through it

Approval Request Comment: See bug 1228064 comment 7.

[Risks and why]: We believe the 5 patches (Bug 1227407, Bug 1216972 and Bug 1218799, as outlined in bug 1228064 comment 5) collectively fix the 30 second hang, debug assert-crash and leak that plagued the new shutdown behavior in trunk for a while, and that aurora is better off using this approach than what it currently has, which exhibits at least the debug crash, maybe more. Worst case it still crashes on shutdown, but it is much less likely with these patches, and we have not seen any sign of trouble with them.

1 of the patches are in this bug, which fixes a leak. Should land on m-c soon as well.
Attachment #8692530 - Flags: approval-mozilla-aurora?
GCP, once this lands on Nightly, could you please verify that the crash is gone? Thanks!
Flags: needinfo?(gpascutto)
https://hg.mozilla.org/mozilla-central/rev/7cca9ddc7f9a
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla45
No, I cannot, since I could never reproduce the initial crash. As far as I know jib can no longer do so either. The patch tightens an edge case that our current code no longer seems to hit - it's insurance.
Flags: needinfo?(gpascutto)
But more importantly, this patch is what ultimately made the MediaManager leak go away in the aurora patch set, so we know the improved logic is beneficial.
Comment on attachment 8692530 [details] [diff] [review]
Ensure Cameras is alive before calling through it

Taking it as it fixes memory leak and crash. Aurora44+
Attachment #8692530 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: