Last Comment Bug 684938 - Proxy nsXPCWrappedJS::Release to the main thread
: Proxy nsXPCWrappedJS::Release to the main thread
Status: RESOLVED FIXED
[sg:critical?][qa?]
:
Product: Core
Classification: Components
Component: XPConnect (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla10
Assigned To: Peter Van der Beken [:peterv]
:
:
Mentors:
Depends on:
Blocks: 684711
  Show dependency treegraph
 
Reported: 2011-09-06 11:38 PDT by Peter Van der Beken [:peterv]
Modified: 2015-10-16 11:41 PDT (History)
8 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-
wontfix
-
wontfix
-
wontfix
+
fixed
+
fixed
unaffected


Attachments
v1 (7.06 KB, patch)
2011-09-22 13:26 PDT, Peter Van der Beken [:peterv]
mrbkap: review+
jst: approval‑mozilla‑beta-
Details | Diff | Splinter Review

Description Peter Van der Beken [:peterv] 2011-09-06 11:38:50 PDT
Running the release on a thread other than the main thread can crash because we add nsXPCWrappedJS objects to the cycle collector, which assumes objects can't be refcounted while it runs. Both AddRef and Release are problematic, but Release will lead to crashes for sure whereas AddRef can lead to premature unlinking which shouldn't cause crashes, but might. I'll add a patch here to fix the crashes due to Release, bug 684711 will deal with the whole issue.
Comment 1 Peter Van der Beken [:peterv] 2011-09-22 13:26:55 PDT
Created attachment 561860 [details] [diff] [review]
v1
Comment 2 Johnny Stenback (:jst, jst@mozilla.com) 2011-09-29 13:25:53 PDT
Peter, is this patch ready to be reviewed? If so, please request review asap. Thanks!
Comment 3 Johnny Stenback (:jst, jst@mozilla.com) 2011-10-20 13:30:31 PDT
Not going to take this fix this late for 8. And I don't know if we can write a testcase for this, but if we can that would be awesome.
Comment 4 Andrew McCreight [:mccr8] 2011-10-21 09:27:46 PDT
This doesn't really count, but I'd wager that some of the intermittent oranges I linked to in Bug 684711 will go away when this lands, so presumably if they come back that is kind of like a test...
Comment 5 Peter Van der Beken [:peterv] 2011-10-28 01:24:14 PDT
Comment on attachment 561860 [details] [diff] [review]
v1

This seems to pass on try.
Comment 6 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-10-28 09:25:53 PDT
Comment on attachment 561860 [details] [diff] [review]
v1

Review of attachment 561860 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/xpconnect/src/xpcwrappedjs.cpp
@@ +206,5 @@
> +    {
> +        // We'd like to abort here, but this can happen if someone uses a proxy
> +        // for the nsXPCWrappedJS.
> +        nsCOMPtr<nsIThread> mainThread = do_GetMainThread();
> +        NS_ProxyRelease(mainThread, static_cast<nsIXPConnectWrappedJS*>(this));

Hm, I'm not 100% sure that this is safe... I guess we should never be able to get here after the thread manager is shutdown, so do_GetMainThread should always succeed, but you should probably assert mainThread here. NS_ProxyReleae will just go ahead and release the object it if you pass null for target (which seems pretty dumb to me).
Comment 7 Peter Van der Beken [:peterv] 2011-10-31 09:19:53 PDT
I guess we can also just return early if the mainThread has been shut down. Leaking isn't the end of the world in that case.
Comment 8 Blake Kaplan (:mrbkap) 2011-11-04 15:14:12 PDT
Comment on attachment 561860 [details] [diff] [review]
v1

r=me with bent's observation fixed.
Comment 9 Peter Van der Beken [:peterv] 2011-11-07 09:20:40 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/e8e8396802f2
Comment 10 Peter Van der Beken [:peterv] 2011-11-07 22:58:00 PST
https://hg.mozilla.org/mozilla-central/rev/e8e8396802f2
Comment 11 Andrew McCreight [:mccr8] 2011-11-17 14:03:13 PST
This landed the day before the switchover, so it should already be on Aurora.

I looked over the blocked bugs in bug 684711, which is a list of bugs with stacks involving the cycle collector calling into nonsense stacks.  All but 2 haven't been seen recently.  I looked more closely at the remaining two, which have been seen in the last few months with fairly high frequency.

Bug 675605: This involves nsXPCWrappedJS::cycleCollection::Traverse, which is something this patch should fix.  It looks like bug 675605 went away on trunk on 10/07.  On Aurora, it went away on 11/07, which is the day before the crossover.  Since then it has only shown up once, on beta.

Bug 635480: This involve nsXPCWrappedJS::cycleCollection::Traverse calling into CheckForRightISupports calling into a random SVG code, which sounds like an instance of something this bug would fix.  The last instance of this on m-c is 9/29, which is vaguely around the date the previous one went away on trunk.  Like the previous bug, the last instance of this on Aurora was on 11/07, right before the branch.

Both of these went away too early on trunk to have been fixed by this particular patch.  Perhaps some of the ongoing work on removing XPCOM proxies (bug 675221) removed the particular off-thread activity that was causing these crashes.  Looking at that bug, the patch "Implement Runnable that can be waited on synchronously, for removing XPCOM proxies" landed on 10/07, so maybe that fixed 675605?  It doesn't really sound like it, but who knows.
Comment 12 Johnny Stenback (:jst, jst@mozilla.com) 2011-11-17 14:33:51 PST
Comment on attachment 561860 [details] [diff] [review]
v1

Thanks Andrew!
Comment 13 Johnny Stenback (:jst, jst@mozilla.com) 2011-11-17 14:36:32 PST
And also, given the fact that this is a race condition that's likely very hard to reliably trigger from a webpage I don't think we need to push to get this in on beta from a security point of view, but I would like to know if the collective number of cycle collector crashes that are likely fixed by this is large enough to warrant the risk of taking this for beta.
Comment 14 Andrew McCreight [:mccr8] 2011-11-17 14:49:05 PST
It isn't clear to me.  The crashes in those bugs I linked were fairly common, but I can't see anything on crash-stats that looks like it.  But as 635480 shows, the top frame could be anything.  I think the most noticeable thing to find would be a crash with nsXPCWrappedJS::cycleCollection::Traverse anywhere on the stack, or probably even just in the second frame.  I'm not sure how one would go about doing that.
Comment 15 Johnny Stenback (:jst, jst@mozilla.com) 2011-11-17 14:51:22 PST
Comment on attachment 561860 [details] [diff] [review]
v1

I looked into the volume of the relevant crashes and they're fairly low volume, so we won't be pushing for this in beta.
Comment 16 Johnny Stenback (:jst, jst@mozilla.com) 2011-11-17 14:52:40 PST
Marking fixed for 10 and 11, and wontfix for 9.
Comment 17 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2011-11-21 17:13:59 PST
Is there something QA can do to verify this fix?
Comment 18 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-02-01 13:39:59 PST
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #17)
> Is there something QA can do to verify this fix?

bump

Note You need to log in before you can comment on or make changes to this bug.