Closed Bug 827503 Opened 12 years ago Closed 12 years ago

Camera API JS callbacks are addref'd and release'd on non-main threads

Categories

(Core :: DOM: Device Interfaces, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla21
blocking-basecamp +
Tracking Status
firefox19 --- wontfix
firefox20 --- wontfix
firefox21 --- fixed
b2g18 --- fixed

People

(Reporter: cjones, Assigned: mikeh)

References

Details

Attachments

(2 files, 3 obsolete files)

If I'm reading things correctly, the various camera callback interfaces like nsICameraErrorCallback [1] are automagically wrapped by XPCOM, over JS content implementations.  The camera backend takes these callbacks and addrefs/releases on non-main threads, for example [2] and [3].

Is this kosher?  I.e., will we use atomic refcounting for these callbacks, and will the off-main-thread addref/releasing play well with GC/CC?

bb? -> investigate.

[1] http://mxr.mozilla.org/mozilla-central/source/dom/camera/nsIDOMCameraManager.idl#251
[2] http://mxr.mozilla.org/mozilla-central/source/dom/camera/CameraControlImpl.cpp#334
[3] http://mxr.mozilla.org/mozilla-central/source/dom/camera/CameraControlImpl.h#451
JS, XPConnect, and the GC/CC are all main thread only. Any non-main thread use will eventually crash.
If it makes a difference, the callback interfaces aren't actually called on non-main threads.  They're AddRef()/Release()'d on non-main threads.
Ah, I missed that detail. Simply addref/release'ing should be ok since XPCWrappedJS uses atomic refcount ops (though, since XPConnect is now single threaded, I guess this could change some day).

Calling the wrapped functions is definitely not ok though. As long as that doesn't happen I think we're alright.
The current machinery for addref/releasing XPCWrappedJS off-main-thread is highly problematic, because releasing calls into the cycle collector, which calls into XPC maps, which means that _all_ access to common XPC data structures needs to be lock-protected. This is the only reason for that, and given how erratic associated bugs are likely to be, I wouldn't be surprised if there were parts of the machinery that weren't threadsafe.

Many months ago, I added an API to allow people to hold strong refs off-main-thread without needing to directly addref them: bug 771074. It's basically just a layer of indirection.

bug 773610 exists to convert all consumers to that API. I've got a massive patch queue that does 90% of what needs to be done, but I've never found the time to finish it.

We've actually had bizarre crashes that seem related to this stuff, and the solution has been to just convert stuff to the new API, which seems to make the problems go away. So if you're experiencing crashes here, don't dismiss this as a cause just because it's theoretically kosher for now. Slaughtering all the necessary XPConnect animals correctly here is very error-prone. ;-)
If this is deemed bb+ I can make the fix.
It's not clear that this is a problem still...

The real issue is an AddRef call when the reference count jumps from 1 to 2. The JS engine always gets the first reference, and then the second would be given to some C++ caller. There could, of course, be some series of AddRef/Release calls that happen later, and each time the 1 -> 2 transition is crucial. If that always happens on the main thread then we're probably ok. If we can't guarantee that then we should fix this now.

This is obviously fragile and we should fix no matter what in the long run. Above is just trying to outline the case that we should fix in the short term.
(In reply to ben turner [:bent] from comment #1)
> JS, XPConnect, and the GC/CC are all main thread only. Any non-main thread
> use will eventually crash.

Does this mean any nsCOMPtr<> must be created on the Main Thread?  If so, how do I create the nsIRunnables that get punted from my private thread back to the Main Thread?
(In reply to Mike Habicher [:mikeh] from comment #7)
> Does this mean any nsCOMPtr<> must be created on the Main Thread?

Creating nsCOMPtrs is ok modulo comment 4, if all you're doing is addref/release. But we want to move away from it.

> If so,
> how do I create the nsIRunnables that get punted from my private thread back
> to the Main Thread?

Using the nsMainThreadPtr{Handle,Holder} API I landed in bug 771074.
Depends on: 827679
(In reply to Bobby Holley (:bholley) from comment #8)
> (In reply to Mike Habicher [:mikeh] from comment #7)
> > Does this mean any nsCOMPtr<> must be created on the Main Thread?
> 
> Creating nsCOMPtrs is ok modulo comment 4, if all you're doing is
> addref/release. But we want to move away from it.
>
> > If so,
> > how do I create the nsIRunnables that get punted from my private thread back
> > to the Main Thread?
> 
> Using the nsMainThreadPtr{Handle,Holder} API I landed in bug 771074.

I've got a patch that uses that to hold references to existing JS callbacks in my cross-thread objects; but it's still not clear to me how to create new nsCOMPtr<nsIRunnable>s off the Main Thread using this API, assuming (as you say above, that regardless of comment 4) we want to move away from this.
(In reply to Mike Habicher [:mikeh] from comment #9)
> I've got a patch that uses that to hold references to existing JS callbacks
> in my cross-thread objects; but it's still not clear to me how to create new
> nsCOMPtr<nsIRunnable>s off the Main Thread using this API, assuming (as you
> say above, that regardless of comment 4) we want to move away from this.

There's a misunderstanding here. Off-main-thread addref and release of references to non-JS objects are fine (as long as those objects implement thread-safe AddRef and Release, which implies they aren't involved in cycle collection).
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #10)
> 
> There's a misunderstanding here. Off-main-thread addref and release of
> references to non-JS objects are fine (as long as those objects implement
> thread-safe AddRef and Release, which implies they aren't involved in cycle
> collection).

Okay, that makes more sense.

Back to nsRunnables (sorry)--I see they include NS_DECL_ISUPPORTS, but my subclasses of them don't have their own AddRef or Release implementation defined.  (I actually don't immediately see any implementations defined in the superclasses of these runnables, threadsafe or otherwise, but I assume they're there and not(?) threadsafe.)
(They also don't implement cycle-collection; is that sufficient to allow me to twiddle these objects off of the main thread?)
(In reply to Mike Habicher [:mikeh] from comment #11)
> Back to nsRunnables (sorry)--I see they include NS_DECL_ISUPPORTS, but my
> subclasses of them don't have their own AddRef or Release implementation
> defined.

nsThreadUtils.cpp has:
NS_IMPL_THREADSAFE_ISUPPORTS1(nsRunnable, nsIRunnable)

(In reply to Mike Habicher [:mikeh] from comment #12)
> (They also don't implement cycle-collection; is that sufficient to allow me
> to twiddle these objects off of the main thread?)

Yes, you can addref and release nsRunnable-derived objects from the main thread.

I guess there's nothing preventing an nsRunnable from participating in cycle collection, but ... don't do that :-).
(In reply to Mike Habicher [:mikeh] from comment #11)
> Back to nsRunnables (sorry)

Mike, I'm here in Berlin this week so I'd be happy to answer any questions you have in person!
Blocking, and over to Mike.
Assignee: nobody → mhabicher
blocking-basecamp: ? → +
This proxies the XPConnect objects through the nsMainThreadPtr objects introduced in bug 771074.

Unfortunately I can't get it to work without SIGSEGV-ing (423 is the camera process ID, and 0x1a7 is the main thread):

I(  423:0x1a7) 1075221752[42407160]: virtual nsresult GetCameraResult::Run() : this=45e312e0 -- BEFORE CALLBACK
I(  423:0x1a7) 1075221752[42407160]: mozilla::DOMCameraCapabilities::DOMCameraCapabilities(mozilla::ICameraControl*):27 : this=449ad670
I(  423:0x1a7) 1075221752[42407160]: mozilla::GetPreviewStreamTask::GetPreviewStreamTask(mozilla::CameraControlImpl*, mozilla::dom::CameraSize, nsICameraPreviewStreamCallback*, nsICameraErrorCallback*):229 : this=45e30760 - onSuccess=449ad7c0
I(  423:0x1a7) 1075221752[42407160]: virtual nsresult GetCameraResult::Run() : this=45e312e0 -- AFTER CALLBACK
I(  423:0x1a7) 1075221752[42407160]: virtual nsresult mozilla::GetPreviewStreamResult::Run():385 : this=449d9970
I(  423:0x1a7) 1075221752[42407160]: virtual nsresult mozilla::GetPreviewStreamResult::Run():394 : this=449d9970 - onSuccess=449ad7c0
I(  423:0x1a7) 1075221752[42407160]: mozilla::DOMCameraPreview::DOMCameraPreview(mozilla::ICameraControl*, uint32_t, uint32_t, uint32_t):147 : this=4474afa0 : mWidth=480, mHeight=320, mFramesPerSecond=30
I(  423:0x1a7) 1075221752[42407160]: DOMCameraPreviewListener::DOMCameraPreviewListener(mozilla::DOMCameraPreview*):83 : this=449ad8d0
I(  423:0x1a7) 1075221752[42407160]: virtual nsresult mozilla::GetPreviewStreamResult::Run():399 : this=449d9970
I(  423:0x1a7) 1075221752[42407160]: virtual mozilla::GetPreviewStreamResult::~GetPreviewStreamResult():204 : this=449d9970
F(  423:0x1a7) Fatal signal 11 (SIGSEGV) at 0x00000000 (code=1)

I've verified that the JS callback (0x449ad7c0) is invoked properly and exits.  The SIGSEGV happens here (captured from a different run, so pointers may be different):

Program received signal SIGSEGV, Segmentation fault.
nsHtml5RefPtrReleaser<nsHtml5StreamParser>::Run (this=<value optimized out>)
    at /home/mikeh/dev/mozilla/btg019/gecko/parser/html/nsHtml5RefPtr.h:22
22	          mPtr->Release();
(gdb) bt
#0  nsHtml5RefPtrReleaser<nsHtml5StreamParser>::Run (this=<value optimized out>)
    at /home/mikeh/dev/mozilla/btg019/gecko/parser/html/nsHtml5RefPtr.h:22
#1  0x4116255e in nsThread::ProcessNextEvent (this=0x42507240, mayWait=<value optimized out>, 
    result=0xbef46de7) at /home/mikeh/dev/mozilla/btg019/gecko/xpcom/threads/nsThread.cpp:620
#2  0x41129d88 in NS_ProcessNextEvent_P (thread=0x0, mayWait=false)
    at /home/mikeh/dev/mozilla/btg019/objdir-gecko-debug/xpcom/build/nsThreadUtils.cpp:236
#3  0x40fb1572 in mozilla::ipc::MessagePump::Run (this=0x425022e0, aDelegate=0xbef478dc)
    at /home/mikeh/dev/mozilla/btg019/gecko/ipc/glue/MessagePump.cpp:82
#4  0x40fb16fa in mozilla::ipc::MessagePumpForChildProcess::Run (this=0x425022e0, aDelegate=0xbef478dc)
    at /home/mikeh/dev/mozilla/btg019/gecko/ipc/glue/MessagePump.cpp:231
#5  0x4119790a in MessageLoop::RunInternal (this=0xbef478dc)
    at /home/mikeh/dev/mozilla/btg019/gecko/ipc/chromium/src/base/message_loop.cc:215
#6  0x4119796a in MessageLoop::RunHandler (this=0xbef478dc)
    at /home/mikeh/dev/mozilla/btg019/gecko/ipc/chromium/src/base/message_loop.cc:208
#7  MessageLoop::Run (this=0xbef478dc)
    at /home/mikeh/dev/mozilla/btg019/gecko/ipc/chromium/src/base/message_loop.cc:182
#8  0x40eeeda2 in nsBaseAppShell::Run (this=0x431ddfa0)
    at /home/mikeh/dev/mozilla/btg019/gecko/widget/xpwidgets/nsBaseAppShell.cpp:163
#9  0x4040632e in XRE_RunAppShell ()
    at /home/mikeh/dev/mozilla/btg019/gecko/toolkit/xre/nsEmbedFunctions.cpp:646
#10 0x40fb1664 in mozilla::ipc::MessagePumpForChildProcess::Run (this=0x425022e0, aDelegate=0xbef478dc)
    at /home/mikeh/dev/mozilla/btg019/gecko/ipc/glue/MessagePump.cpp:198
#11 0x4119790a in MessageLoop::RunInternal (this=0xbef478dc)
    at /home/mikeh/dev/mozilla/btg019/gecko/ipc/chromium/src/base/message_loop.cc:215
#12 0x4119796a in MessageLoop::RunHandler (this=0xbef478dc)
    at /home/mikeh/dev/mozilla/btg019/gecko/ipc/chromium/src/base/message_loop.cc:208
#13 MessageLoop::Run (this=0xbef478dc)
    at /home/mikeh/dev/mozilla/btg019/gecko/ipc/chromium/src/base/message_loop.cc:182
#14 0x40406b5c in XRE_InitChildProcess (aArgc=3, aArgv=0x4252c800, aProcess=GeckoProcessType_Content)
    at /home/mikeh/dev/mozilla/btg019/gecko/toolkit/xre/nsEmbedFunctions.cpp:485
#15 0x00008580 in main (argc=5, argv=0xbef47a44)
    at /home/mikeh/dev/mozilla/btg019/gecko/ipc/app/MozillaRuntimeMain.cpp:48

I assume I'm using the proxy objects incorrectly, but I've been staring at this code since early this morning and don't see anything obvious.  Can you please take a look?
Attachment #699108 - Flags: feedback?(bobbyholley+bmo)
Another bt, different crash site:

Program received signal SIGSEGV, Segmentation fault.
nsProxyReleaseEvent::Run (this=<value optimized out>)
    at /home/mikeh/dev/mozilla/btg019/objdir-gecko-debug/xpcom/build/nsProxyRelease.cpp:19
19	        mDoomed->Release();
(gdb) bt
#0  nsProxyReleaseEvent::Run (this=<value optimized out>)
    at /home/mikeh/dev/mozilla/btg019/objdir-gecko-debug/xpcom/build/nsProxyRelease.cpp:19
#1  0x410f86ce in nsThread::ProcessNextEvent (this=0x42407240, mayWait=<value optimized out>, 
    result=0xbeacede7) at /home/mikeh/dev/mozilla/btg019/gecko/xpcom/threads/nsThread.cpp:620
#2  0x410bfee0 in NS_ProcessNextEvent_P (thread=0x0, mayWait=false)
    at /home/mikeh/dev/mozilla/btg019/objdir-gecko-debug/xpcom/build/nsThreadUtils.cpp:236
#3  0x40f476ca in mozilla::ipc::MessagePump::Run (this=0x424022e0, aDelegate=0xbeacf8dc)
    at /home/mikeh/dev/mozilla/btg019/gecko/ipc/glue/MessagePump.cpp:82
#4  0x40f47852 in mozilla::ipc::MessagePumpForChildProcess::Run (this=0x424022e0, aDelegate=0xbeacf8dc)
    at /home/mikeh/dev/mozilla/btg019/gecko/ipc/glue/MessagePump.cpp:231
#5  0x4112da7a in MessageLoop::RunInternal (this=0xbeacf8dc)
    at /home/mikeh/dev/mozilla/btg019/gecko/ipc/chromium/src/base/message_loop.cc:215
#6  0x4112dada in MessageLoop::RunHandler (this=0xbeacf8dc)
    at /home/mikeh/dev/mozilla/btg019/gecko/ipc/chromium/src/base/message_loop.cc:208
#7  MessageLoop::Run (this=0xbeacf8dc)
    at /home/mikeh/dev/mozilla/btg019/gecko/ipc/chromium/src/base/message_loop.cc:182
#8  0x40e84efa in nsBaseAppShell::Run (this=0x431ddfa0)
    at /home/mikeh/dev/mozilla/btg019/gecko/widget/xpwidgets/nsBaseAppShell.cpp:163
#9  0x4039c32e in XRE_RunAppShell ()
    at /home/mikeh/dev/mozilla/btg019/gecko/toolkit/xre/nsEmbedFunctions.cpp:646
#10 0x40f477bc in mozilla::ipc::MessagePumpForChildProcess::Run (this=0x424022e0, aDelegate=0xbeacf8dc)
    at /home/mikeh/dev/mozilla/btg019/gecko/ipc/glue/MessagePump.cpp:198
#11 0x4112da7a in MessageLoop::RunInternal (this=0xbeacf8dc)
    at /home/mikeh/dev/mozilla/btg019/gecko/ipc/chromium/src/base/message_loop.cc:215
#12 0x4112dada in MessageLoop::RunHandler (this=0xbeacf8dc)
    at /home/mikeh/dev/mozilla/btg019/gecko/ipc/chromium/src/base/message_loop.cc:208
#13 MessageLoop::Run (this=0xbeacf8dc)
    at /home/mikeh/dev/mozilla/btg019/gecko/ipc/chromium/src/base/message_loop.cc:182
#14 0x4039cb5c in XRE_InitChildProcess (aArgc=3, aArgv=0x4242c800, aProcess=GeckoProcessType_Content)
    at /home/mikeh/dev/mozilla/btg019/gecko/toolkit/xre/nsEmbedFunctions.cpp:485
#15 0x00008580 in main (argc=5, argv=0xbeacfa44)
    at /home/mikeh/dev/mozilla/btg019/gecko/ipc/app/MozillaRuntimeMain.cpp:48
Null pointer?

(gdb) p mDoomed
Cannot access memory at address 0xc

(This is the same error I saw for |p mPtr| in comment 16.)
(gdb) frame 2
#2  0x410bfee0 in NS_ProcessNextEvent_P (thread=0x0, mayWait=false)
    at /home/mikeh/dev/mozilla/btg019/objdir-gecko-debug/xpcom/build/nsThreadUtils.cpp:236
236	  return NS_SUCCEEDED(thread->ProcessNextEvent(mayWait, &val)) && val;
(gdb) list
231	    NS_ENSURE_TRUE(current, false);
232	    thread = current.get();
233	  }
234	#endif
235	  bool val;
236	  return NS_SUCCEEDED(thread->ProcessNextEvent(mayWait, &val)) && val;
237	}
238	
239	#ifndef XPCOM_GLUE_AVOID_NSPR
240	
(gdb) p thread
$3 = (class nsIThread *) 0x0
Null pointer confirmed:

I(  431:0x1af) virtual nsresult nsProxyReleaseEvent::Run() : mDoomed=0x0
Depends on: 827841
Attachment #699108 - Attachment is obsolete: true
Attachment #699108 - Flags: feedback?(bobbyholley+bmo)
Attachment #699256 - Flags: review?(bobbyholley+bmo)
Comment on attachment 699256 [details] [diff] [review]
use nsMainThreadPtrHolders when passing XPCWrappedJSes around off main thread

Review of attachment 699256 [details] [diff] [review]:
-----------------------------------------------------------------

I don't know the camera code at all, but this generally looks like textbook usage of the handle API. r=bholley with comments addressed.

Also, if you have any ideas about how to make the API less verbose, I'm all ears. :-)

::: dom/camera/CameraControlImpl.cpp
@@ +256,5 @@
>  void
>  CameraControlImpl::OnShutterInternal()
>  {
>    DOM_CAMERA_LOGI("** SNAP **\n");
> +  if (mOnShutterCb.get()) {

I guess we don't have implicit boolean conversion for nsMainThreadPtrHandle? Can you file a bug about adding one? All the .get()s are kind of lame.

@@ +389,5 @@
>     */
>    MOZ_ASSERT(NS_IsMainThread());
>  
> +  nsCOMPtr<nsICameraPreviewStreamCallback> onSuccess = mOnSuccessCb.get();
> +  if (onSuccess.get() && nsDOMCameraManager::IsWindowStillActive(mWindowId)) {

You don't need to test .get() on nsCOMPtrs. Just test the pointer itself. It has an implicit conversion to boolean.

::: dom/camera/CameraControlImpl.h
@@ +350,5 @@
>  protected:
> +  uint8_t* mData;
> +  uint64_t mLength;
> +  nsString mMimeType;
> +  nsMainThreadPtrHandle<nsICameraTakePictureCallback> mOnSuccessCb;

All makes sense to me except this part. Why can't you just make mImage an nsMainThreadPtrHandle?

If there's a reason and you need to keep it, please have this part reviewed by somebody who actually knows the camera code.
Attachment #699256 - Flags: review?(bobbyholley+bmo) → review+
Attached patch patch for landing (obsolete) — Splinter Review
Carrying reviews forward, verbal r=jst for :bholley's question re: replacing the nsIDOMBlob with simple types and building the nsDOMMemoryFile on the main thread.
Attachment #699256 - Attachment is obsolete: true
Attachment #699473 - Flags: review+
Rebased, carrying reviews forward
Attachment #699473 - Attachment is obsolete: true
Attachment #699491 - Flags: review+
Status: NEW → ASSIGNED
https://hg.mozilla.org/mozilla-central/rev/c48542b11078
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
Depends on: 828306
Backing out the change because the main thread assertion is causing crashes.

https://hg.mozilla.org/integration/mozilla-inbound/rev/eee51daa483e
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 828502
(In reply to Mike Habicher [:mikeh] from comment #27)
> Backing out the change because the main thread assertion is causing crashes.
> 
> https://hg.mozilla.org/integration/mozilla-inbound/rev/eee51daa483e

Merge of backout:
https://hg.mozilla.org/mozilla-central/rev/eee51daa483e
Comment on attachment 700116 [details] [diff] [review]
patch for landing (rebased to inbound-src), part 2

r=jst
Attachment #700116 - Flags: review?(jst) → review+
Per discussion with Mike we'll fold the two parts into one when we re-land this so that anyone who bisects or what not won't pull in between these two patches.
https://hg.mozilla.org/mozilla-central/rev/bb7c082b477c
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Depends on: 829525
Did this fix the crashes? If so, it'd be good to know, so that we can prioritize moving other consumers to the API.
Flags: needinfo?(mhabicher)
(In reply to Bobby Holley (:bholley) from comment #37)
>
> Did this fix the crashes? If so, it'd be good to know, so that we can
> prioritize moving other consumers to the API.

Bobby, it's difficult to say.  We landed about a half-dozen camera patches during the work-week where I applied this change, several of which were crash-y.  Overall, the camera is now much more stable than it was when we started.

The nsMainThreadPtr classes were handy if, for no other reason, they forced us to remove potentially iffy nsCOMPtr<JS> usage patterns.

Usage-wise, bug 828047 remains outstanding.
Flags: needinfo?(mhabicher)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: