Last Comment Bug 712448 - Slim wrappers do not use deferred release mechanism, causing GC crashes
: Slim wrappers do not use deferred release mechanism, causing GC crashes
Status: RESOLVED FIXED
[qa-]
:
Product: Core
Classification: Components
Component: XPConnect (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla12
Assigned To: Ben Turner (not reading bugmail, use the needinfo flag!)
:
: Andrew Overholt [:overholt]
Mentors:
Depends on:
Blocks: 517196 673925 711794
  Show dependency treegraph
 
Reported: 2011-12-20 14:32 PST by Ben Turner (not reading bugmail, use the needinfo flag!)
Modified: 2012-07-18 00:58 PDT (History)
19 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
+
fixed
+
fixed


Attachments
Patch, v1 (909 bytes, patch)
2011-12-20 14:32 PST, Ben Turner (not reading bugmail, use the needinfo flag!)
bent.mozilla: review+
christian: approval‑mozilla‑aurora+
christian: approval‑mozilla‑beta-
Details | Diff | Splinter Review

Description Ben Turner (not reading bugmail, use the needinfo flag!) 2011-12-20 14:32:24 PST
Created attachment 583301 [details] [diff] [review]
Patch, v1

See bug 711794 comment 53. The slim wrapper code is calling release directly during GC so any finalization code that runs JS will crash.

Attached patch has r=mrbkap.
Comment 1 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-12-20 14:43:49 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/84201b200ef3
Comment 2 Andrew McCreight [:mccr8] 2011-12-20 14:52:59 PST
Is this being done on the main thread or on the background finalization thread?  The latter would be even worse.
Comment 3 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-12-20 14:56:25 PST
We're still on the main thread.
Comment 4 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-12-20 14:58:11 PST
https://hg.mozilla.org/try/rev/7e663b376d3a is against mozilla-release tip.
https://hg.mozilla.org/try/rev/d5fcf215a7db is against mozilla-beta tip.
Comment 5 Olli Pettay [:smaug] 2011-12-20 15:00:46 PST
Anyone who can reproduce bug 711794, could you try the beta builds, just in case to verify whether
the patch fixes bug 711794.
(We don't want to take the patch to FF9)
Comment 6 Jonas Sicking (:sicking) No longer reading bugmail consistently 2011-12-20 16:26:20 PST
so I think we should take this at least on aurora.

If the patch looks good on aurora after a few days we should also consider it for beta IMHO.
Comment 7 Steven Michaud [:smichaud] (Retired) 2011-12-20 16:27:12 PST
(In reply to comment #5)

I'll do it tomorrow, when the beta build's finished.
Comment 8 Marcia Knous [:marcia - use ni] 2011-12-20 16:51:29 PST
I haven't been able to crash with the Nectar toolbar using http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bturner@mozilla.com-d5fcf215a7db/. I am running on 10.7.3. Will try on 10.6 as well.

(In reply to Olli Pettay [:smaug] from comment #5)
> Anyone who can reproduce bug 711794, could you try the beta builds, just in
> case to verify whether
> the patch fixes bug 711794.
> (We don't want to take the patch to FF9)
Comment 9 Mats Palmgren (:mats) 2011-12-20 17:06:27 PST
I can confirm that d5fcf215a7db fixes my OS X Fx9 debug build.

Before when the crash occurs the last thing in the terminal window is:
Assertion failure: !cx->runtime->gcRunning, at js/src/jsgcinlines.h:346
That assertion does not occur with the fix.

There's still a couple of remaining assertions with (or without) the fix:

###!!! ASSERTION: borked mCurrentNode: 'mCurrentNode->IsElement()', file content/xslt/src/xslt/txMozillaXMLOutput.cpp, line 308
###!!! ASSERTION: Want to fire DOMNodeRemoved event, but it's not safe: 'aChild->IsNodeOfType(nsINode::eCONTENT) && static_cast<nsIContent*>(aChild)-> IsInNativeAnonymousSubtree() || IsSafeToRunScript() || sDOMNodeRemovedSuppressCount', file content/base/src/nsContentUtils.cpp, line 3336
Comment 10 Mats Palmgren (:mats) 2011-12-20 18:03:18 PST
(filed bug 712520 and bug 712521 for the remaining assertions)
Comment 11 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-12-20 19:58:24 PST
mozilla-incoming and try builds all look good. I think we win!
Comment 12 Ed Morley [:emorley] 2011-12-21 04:30:26 PST
https://hg.mozilla.org/mozilla-central/rev/84201b200ef3
Comment 13 Steven Michaud [:smichaud] (Retired) 2011-12-21 08:41:05 PST
For what's worth, I just tested the opt beta build of this patch:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bturner@mozilla.com-d5fcf215a7db/try-macosx64/firefox-9.0.en-US.mac.dmg

Like Marcia and Mats, I didn't have any trouble.  I tested with both the Nectar toolbar (from http://www.nectar.com/collect/toolbar/home.points) and the Dallas Cowboys toolbar (from http://www.dallascowboys.com/toolbar/), on OS X 10.6.8.  I followed the STR at bug 711794 comment #35.
Comment 14 Asa Dotzler [:asa] 2011-12-22 14:55:14 PST
Tracking for 12 to make sure this doesn't cause regressions. Tracking 11 in case it's worth uplifting into 11. Tracking for 10 to make sure that we backed out the thing that caused us to respin 9.
Comment 15 Olli Pettay [:smaug] 2011-12-22 14:59:05 PST
I think we should consider taking this to 10.
The thing that caused respin 9 just accidentally triggered the crash. Something else may do the same,
and the fix is in this bug.
Comment 16 Jonas Sicking (:sicking) No longer reading bugmail consistently 2011-12-22 15:20:12 PST
Asa: The thing that caused 9 to respin is a two year old patch (slimwrappers). There's no way we can back that out at this point. Doing that (if at all possible) would introduce far too much risk.

When respinning 9 we put in some wallpaper (strong parent pointers). That wallpaper has always been in 10.

Based on the data we have so far I definitely think we should take the patch in this bug for 11/12.

The main question at this point is if we feel ok with just the wallpaper for 10, or if we should also take this patch.

I'd really like to get peterv's, bent's and mrbkap's input on that.
Comment 17 Alex Keybl [:akeybl] 2012-01-06 13:47:33 PST
(In reply to Jonas Sicking (:sicking) from comment #16)
> The main question at this point is if we feel ok with just the wallpaper for
> 10, or if we should also take this patch.

How/when will this evaluation be made? There are only 3 betas left, with a code freeze on 1/20.
Comment 18 Johnny Stenback (:jst, jst@mozilla.com) 2012-01-06 13:55:08 PST
I say we take the patch. This is not new stuff, just something that was overlooked in this one place, taking this patch makes this code work like the code worked before slimwrappers, which is how this has been working for 8 years before then.
Comment 19 Jonas Sicking (:sicking) No longer reading bugmail consistently 2012-01-06 14:47:34 PST
We've had this on m-c for a while now, right? With no known issues.

If that's the case then I think it's a no-brainer to take for aurora.

It also seems like we have enough days of beta remaining that the added benefit outweighs the added risk. But please let's land it ASAP!
Comment 20 Jonas Sicking (:sicking) No longer reading bugmail consistently 2012-01-06 14:54:42 PST
Comment on attachment 583301 [details] [diff] [review]
Patch, v1

[Approval Request Comment]
Regression caused by (bug #): Slimwrappers patch which landed in 2009.

User impact if declined: 
Unknown. Firefox 9 ended up crashing a lot because we didn't have this fixed. We managed to wallpaper over it in 9.0.1, but that was just a wallpaper and the problem is likely to crop up elsewhere.

Testing completed (on m-c, etc.): 
Has been on m-c since Dec 21st (see comment 12) with no known issues.

Risk to taking this patch (and alternatives if risky):
It is changing a very central part of the code. And since the change is to CC code, any issues that it cases could be very intermittent since CC is in nature very intermittent.

But the applied fix makes us call into very well tested code, and just makes us replicate a pattern used elsewhere. This significantly mitigates the risk.

That in combination with that it will receive a fair amount of testing (on m-c, aurora and beta, where the code will run extensively as soon as you start the browser) should make the risk moderate (though not low).

There aren't really any alternatives.
Comment 21 Peter Van der Beken [:peterv] 2012-01-09 09:59:04 PST
Is there a reason why the same change wasn't done to XPC_WN_Helper_Finalize?
Comment 22 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-01-09 10:13:59 PST
(In reply to Peter Van der Beken [:peterv] from comment #21)
> Is there a reason why the same change wasn't done to XPC_WN_Helper_Finalize?

No, looks like that needs fixing too. I didn't look for other cases, figured this would be the only one :-/
Comment 23 Olli Pettay [:smaug] 2012-01-09 10:22:15 PST
See https://bugzilla.mozilla.org/show_bug.cgi?id=714725
Comment 24 christian 2012-01-10 15:02:42 PST
Comment on attachment 583301 [details] [diff] [review]
Patch, v1

[triage comment]
We're going to approve this for aurora. If the CC/GC pauses in Fx10 are caused by bug 335998, we'd need to back that out. If we need to back that out, we run into the startup crash that required 9.0.1 if we don't take this fix. Although we don't know if we are backing out bug 335998 in Fx10, we will take this on aurora to get more testing just in case we later need to take it in beta.

I'm going to deny it for beta (for now) until information we have changes.
Comment 25 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-01-10 15:35:18 PST
https://hg.mozilla.org/releases/mozilla-aurora/rev/5f75391180f9
Comment 26 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-02-01 12:51:12 PST
How can QA verify this fix? It seems like bug 711794 comment 35 might indicate a reproducible case but that isn't clear to me.

Note You need to log in before you can comment on or make changes to this bug.