Closed Bug 337492 Opened 15 years ago Closed 14 years ago

xpcom proxies may release proxied object on random threads

Categories

(Core :: XPCOM, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla1.9alpha5

People

(Reporter: darin.moz, Assigned: benjamin)

References

Details

Attachments

(4 files, 4 obsolete files)

xpcom proxies may release proxied object on random threads

the proxy object should use NS_ProxyRelease whenever it needs to release the object for which it is a proxy.  otherwise, xpcom proxies are useless for objects that do not support threadsafe release or that do not like being destroyed on background threads.
So the way darin ran into this is that ~nsProxyEventObject does:

449 nsCOMPtr<nsISupports> rootObject=do_QueryInterface(mProxyObject->mRealObject);

So the simple solution is for mProxyObject to store the canonical nsISupports pointer (either in mRealObject or in a new member).
Attached patch v1 patch (obsolete) — Splinter Review
I decided to go ahead with a direct patch for this bug because it was causing grief for safe browsing (the URL classifier thread communicates to JS objects via xpcom proxies).
Attachment #221876 - Flags: review?(benjamin)
-> me
Assignee: nobody → darin
I think it's important to port this fix to the 1.8 branch as well.
Severity: normal → critical
Status: NEW → ASSIGNED
Flags: blocking1.8.1?
Priority: -- → P1
Target Milestone: --- → mozilla1.8.1alpha3
darin: this won't be good enough. if your last release comes from js gc and the object was created on a non main thread, then the xpconnect release will come from the gc (main) thread.
> darin: this won't be good enough. if your last release comes from js gc and the
> object was created on a non main thread, then the xpconnect release will come
> from the gc (main) thread.

Why wouldn't the final release be proxied to the non-main thread?  So long as the proxy was created with the non-main thread as the event target of the proxy, then the release will happen on that thread.  If the release cannot be proxied, then the object will simply be leaked.  We should make sure that final GC happens after xpcom-shutdown but before we start tearing down threads (or something like that).
Comment on attachment 221876 [details] [diff] [review]
v1 patch

Why can't you use nsProxyReleaseCOMPtr for nsProxyObject::mTarget? Is that just because you don't want to store the extra word?
Do you mean mRealObject?  I could have done that, but nsProxyReleaseCOMPtr is really not something that I'm ready to propogate (it's incomplete).  It is a temp solution.... I'm thinking that you are going to wack all of this code anyways with your upcoming changes, so I wanted to keep things simple.
Comment on attachment 221876 [details] [diff] [review]
v1 patch

Yeah, that's what I meant. I'll add it to the list for the proxy rework.
Attachment #221876 - Flags: review?(benjamin) → review+
fixed-on-trunk

I want to create a 1.8 branch version of this as well.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Attached patch v1 patch (branch version) (obsolete) — Splinter Review
Here's the 1.8 branch equivalent.
Attachment #222249 - Flags: review?(benjamin)
Attachment #222249 - Flags: approval-branch-1.8.1?(benjamin)
reopening .. this caused lots of tinderbox builds to crash or hang during test phases.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Blocks: 338248
Attachment #222249 - Flags: review?(benjamin)
Attachment #222249 - Flags: approval-branch-1.8.1?(benjamin)
ok, i've changed my mind, it is possible with this bug fixed to work around the fact that xpconnect is hampered wrt gc by dom.

darin: premise
1. you have a javascript component which implements nsIRunnable (old world, you can port my problem to your new world).

2. you create a thread (i don't care how this is done)

3. you pass the javascript component to the thread and ask the thread to run the object (i don't care how this is done, so long as the js from 1 runs on the thread from 2).

4. the js component tries to create a thread unsafe object. this is obviously going to be running on the thread from 2 and not the main thread.

5. the js component for some insane reason decides it needs to create a proxy for that object because after all, it isn't thread safe. as long as the last reference to the object is from js and not from the proxy, it'll get destroyed as part of gc on the main thread.

6. nothing matters past here. the thread unsafe object is still screwed. assuming you actually try to do anything from js it's not going to work right anyway. well, actually, i suppose w/ a fix to this proxy bug if the js code makes sure that it gets rid of its native references before it ever gets rid of the proxy reference, but that's amazingly gross :). i suppose the alternate hack is for xpconnect to temporarily auto wrapperize *all* objects that aren't threadsafe and aren't created on the main thread, it gives js a non trivial perf disadvantage, but perhaps that is safe with this bug fixed.

--
i actually had a significant reworking of the proxy code to provide classinfo somewhere :(.
Here's the crash stack:

#9  0x0804a550 in ?? ()
#10 0xb7d7dbc1 in nsProxyObjectCallInfo::RefCountInInterfacePointers (this=0x8fe8400, addRef=0) at /builds/moz-trunk/mozilla/xpcom/proxy/src/nsProxyEvent.cpp:288
#11 0xb7d7da79 in ~nsProxyObjectCallInfo (this=0x8fe8400) at /builds/moz-trunk/mozilla/xpcom/proxy/src/nsProxyEvent.cpp:258
#12 0xb7d7f94b in nsProxyCallEvent::Run (this=0x9131120) at /builds/moz-trunk/mozilla/xpcom/proxy/src/nsProxyEvent.cpp:164
#13 0xb7d74bf3 in nsThread::ProcessNextEvent (this=0x8082f00, mayWait=1, result=0xbf9e5260) at /builds/moz-trunk/mozilla/xpcom/threads/nsThread.cpp:482
#14 0xb7cfbb59 in NS_ProcessNextEvent_P (thread=0x8082f00, mayWait=1) at nsThreadUtils.cpp:225
#15 0xb6968fed in nsBaseAppShell::Run (this=0x83fecb8) at /builds/moz-trunk/mozilla/widget/src/xpwidgets/nsBaseAppShell.cpp:153
#16 0xb5487e46 in nsAppStartup::Run (this=0x842a320) at /builds/moz-trunk/mozilla/toolkit/components/startup/src/nsAppStartup.cpp:170
#17 0xb7f79843 in XRE_main (argc=3, argv=0xbf9e5694, aAppData=0x8049860) at /builds/moz-trunk/mozilla/toolkit/xre/nsAppRunner.cpp:2343
#18 0x0804865c in main (argc=3, argv=0xbf9e5694) at /builds/moz-trunk/mozilla/browser/app/nsBrowserApp.cpp:61

It crashes calling Release on a bad nsISupports pointer.
This isn't critical for FF2 now that bug 338248 is fixed.
Target Milestone: mozilla1.8.1alpha3 → mozilla1.9alpha
Moving blocking request 1.8.1 -> 1.9
Flags: blocking1.8.1? → blocking1.9a1?
Attached patch prototype patch (obsolete) — Splinter Review
here's a prototype patch that gets it right.  the trick is to store mRootObj on mProxyObject instead of on the nsProxyEventObject.  I don't fully understand why that matters yet.  This also includes a testcase.
Attached patch v2 patchSplinter Review
OK, here we go.  It turns out that the nsProxyReleaseCOMPtr stuff was unnecessary because we can assume that the caller of GetProxyForObject has an owning reference to the object they pass in.  Therefore, we can safely call Release on the given object inside GetProxyForObject.
Attachment #221876 - Attachment is obsolete: true
Attachment #222249 - Attachment is obsolete: true
Attachment #223403 - Attachment is obsolete: true
Attachment #223408 - Flags: review?(benjamin)
Comment on attachment 223408 [details] [diff] [review]
v2 patch

amen, brother
Attachment #223408 - Flags: review?(benjamin) → review+
fixed-on-trunk
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
Blocks: 339754
Attached patch patch for branchSplinter Review
Same as the trunk patch except removing an extra nsProxyObject constructor that isn't called anywhere.

Since one of the locks causing the deadlock in bug 353962 is in the QI during the release of the nsProxyObject, this should work around the deadlock for now.
Assignee: darin.moz → tony
Status: RESOLVED → ASSIGNED
Attachment #244476 - Flags: review?
Resolution: FIXED → ---
Attachment #244476 - Flags: review? → review?(benjamin)
Comment on attachment 244476 [details] [diff] [review]
patch for branch

This is going to take some heavy-duty review, probably won't be done until late next week. Every time we touch this code we end up with weird headaches.
For what it's worth, please don't reopen bugs for branch work... This bug is fixed...
Status: ASSIGNED → RESOLVED
Closed: 15 years ago14 years ago
Resolution: --- → FIXED
Attachment #244476 - Flags: review?(benjamin) → review+
As it turns out, the NS_ProxyRelease() in the nsProxyObject destructor will often  be ineffective due to a race condition: 
The nsProxyObject destructor gets called from the last nsProxyEventObject destructor. With MSVC the ~nsProxyObject call happens *before* the nsProxyEventObject's mRealInterface reference is released. Thus, in some cases the real object will get deleted on the wrong thread.

Patch coming up.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch Fix destructor race condition (obsolete) — Splinter Review
Attachment #260284 - Flags: review?
Attachment #260284 - Flags: review? → review?(benjamin)
Comment on attachment 260284 [details] [diff] [review]
Fix destructor race condition

Hrm, could we do this by just reordering the members in the class declaration? This makes me sad.
Flags: blocking1.9a1?
Comment on attachment 260284 [details] [diff] [review]
Fix destructor race condition

Please reorder the members or let me know why you cannot and re-request review.
Attachment #260284 - Flags: review?(benjamin) → review-
Flags: blocking1.9?
Flags: blocking1.9? → blocking1.9+
Perhaps having explicit makes it less likely it will be screwed up again in future.
Sorry, this dropped off my radar (as so many other things :-( ).
Attachment #260284 - Attachment is obsolete: true
Attachment #263456 - Flags: review?
Attachment #263456 - Flags: review? → review?(bsmedberg)
When checked in this should have a warning comment, at the least:

  // mRealInterface must come after mProxyObject (see bug 337492 comment 24)
Assignee: tony → benjamin
Status: REOPENED → ASSIGNED
Attachment #263591 - Flags: review?(dbaron)
Target Milestone: mozilla1.9alpha1 → mozilla1.9alpha5
Comment on attachment 263591 [details] [diff] [review]
Reorder class members with correct initialization and comment

sr=dbaron (but I don't feel qualified to review this, so redirecting review request to darin).  I think the original patch that used null assignments in the destructor was much clearer, though.

(And what's up with nsISomeInterface being a typedef to nsISupports?)
Attachment #263591 - Flags: superreview+
Attachment #263591 - Flags: review?(dbaron)
Attachment #263591 - Flags: review?(darinf)
Attachment #263591 - Flags: review?(darinf) → review?(darin.moz)
Comment on attachment 263591 [details] [diff] [review]
Reorder class members with correct initialization and comment

>Index: xpcom/proxy/src/nsProxyEventPrivate.h

>+    // Be careful about member ordering. mRealInterface should be released in
>+    // the destructor *before* mProxyObject to avoid race conditions and
>+    // ensure that the last release of the proxied object occurs on the target
>+    // thread.  This means we list it later in the class declaration. See
>+    // bug 337492.
>     nsProxyEventClass          *mClass;
>     nsCOMPtr<nsProxyObject>     mProxyObject;
>+    nsCOMPtr<nsISomeInterface>  mRealInterface;

Depending on the order of the member variables seems fragile to me (even
with this big comment).  Why not just explicitly null-out mRealInterface in
our destructor instead?  Then you could have a comment in the destructor
explaining why mRealInterface needs to be explicitly nulled.
Ok, I landed something like Alex's original patch with an explicit null and a big comment.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
Flags: in-testsuite?
Attachment #263591 - Flags: review?(darin.moz) → review-
Attachment #263456 - Flags: review?(bsmedberg)
You need to log in before you can comment on or make changes to this bug.