Created attachment 583301 [details] [diff] [review]
See bug 711794 comment 53. The slim wrapper code is calling release directly during GC so any finalization code that runs JS will crash.
Attached patch has r=mrbkap.
Is this being done on the main thread or on the background finalization thread? The latter would be even worse.
We're still on the main thread.
https://hg.mozilla.org/try/rev/7e663b376d3a is against mozilla-release tip.
https://hg.mozilla.org/try/rev/d5fcf215a7db is against mozilla-beta tip.
Anyone who can reproduce bug 711794, could you try the beta builds, just in case to verify whether
the patch fixes bug 711794.
(We don't want to take the patch to FF9)
so I think we should take this at least on aurora.
If the patch looks good on aurora after a few days we should also consider it for beta IMHO.
(In reply to comment #5)
I'll do it tomorrow, when the beta build's finished.
I haven't been able to crash with the Nectar toolbar using http://email@example.com/. I am running on 10.7.3. Will try on 10.6 as well.
(In reply to Olli Pettay [:smaug] from comment #5)
> Anyone who can reproduce bug 711794, could you try the beta builds, just in
> case to verify whether
> the patch fixes bug 711794.
> (We don't want to take the patch to FF9)
I can confirm that d5fcf215a7db fixes my OS X Fx9 debug build.
Before when the crash occurs the last thing in the terminal window is:
Assertion failure: !cx->runtime->gcRunning, at js/src/jsgcinlines.h:346
That assertion does not occur with the fix.
There's still a couple of remaining assertions with (or without) the fix:
###!!! ASSERTION: borked mCurrentNode: 'mCurrentNode->IsElement()', file content/xslt/src/xslt/txMozillaXMLOutput.cpp, line 308
###!!! ASSERTION: Want to fire DOMNodeRemoved event, but it's not safe: 'aChild->IsNodeOfType(nsINode::eCONTENT) && static_cast<nsIContent*>(aChild)-> IsInNativeAnonymousSubtree() || IsSafeToRunScript() || sDOMNodeRemovedSuppressCount', file content/base/src/nsContentUtils.cpp, line 3336
(filed bug 712520 and bug 712521 for the remaining assertions)
mozilla-incoming and try builds all look good. I think we win!
For what's worth, I just tested the opt beta build of this patch:
Like Marcia and Mats, I didn't have any trouble. I tested with both the Nectar toolbar (from http://www.nectar.com/collect/toolbar/home.points) and the Dallas Cowboys toolbar (from http://www.dallascowboys.com/toolbar/), on OS X 10.6.8. I followed the STR at bug 711794 comment #35.
Tracking for 12 to make sure this doesn't cause regressions. Tracking 11 in case it's worth uplifting into 11. Tracking for 10 to make sure that we backed out the thing that caused us to respin 9.
I think we should consider taking this to 10.
The thing that caused respin 9 just accidentally triggered the crash. Something else may do the same,
and the fix is in this bug.
Asa: The thing that caused 9 to respin is a two year old patch (slimwrappers). There's no way we can back that out at this point. Doing that (if at all possible) would introduce far too much risk.
When respinning 9 we put in some wallpaper (strong parent pointers). That wallpaper has always been in 10.
Based on the data we have so far I definitely think we should take the patch in this bug for 11/12.
The main question at this point is if we feel ok with just the wallpaper for 10, or if we should also take this patch.
I'd really like to get peterv's, bent's and mrbkap's input on that.
(In reply to Jonas Sicking (:sicking) from comment #16)
> The main question at this point is if we feel ok with just the wallpaper for
> 10, or if we should also take this patch.
How/when will this evaluation be made? There are only 3 betas left, with a code freeze on 1/20.
I say we take the patch. This is not new stuff, just something that was overlooked in this one place, taking this patch makes this code work like the code worked before slimwrappers, which is how this has been working for 8 years before then.
We've had this on m-c for a while now, right? With no known issues.
If that's the case then I think it's a no-brainer to take for aurora.
It also seems like we have enough days of beta remaining that the added benefit outweighs the added risk. But please let's land it ASAP!
Comment on attachment 583301 [details] [diff] [review]
[Approval Request Comment]
Regression caused by (bug #): Slimwrappers patch which landed in 2009.
User impact if declined:
Unknown. Firefox 9 ended up crashing a lot because we didn't have this fixed. We managed to wallpaper over it in 9.0.1, but that was just a wallpaper and the problem is likely to crop up elsewhere.
Testing completed (on m-c, etc.):
Has been on m-c since Dec 21st (see comment 12) with no known issues.
Risk to taking this patch (and alternatives if risky):
It is changing a very central part of the code. And since the change is to CC code, any issues that it cases could be very intermittent since CC is in nature very intermittent.
But the applied fix makes us call into very well tested code, and just makes us replicate a pattern used elsewhere. This significantly mitigates the risk.
That in combination with that it will receive a fair amount of testing (on m-c, aurora and beta, where the code will run extensively as soon as you start the browser) should make the risk moderate (though not low).
There aren't really any alternatives.
Is there a reason why the same change wasn't done to XPC_WN_Helper_Finalize?
(In reply to Peter Van der Beken [:peterv] from comment #21)
> Is there a reason why the same change wasn't done to XPC_WN_Helper_Finalize?
No, looks like that needs fixing too. I didn't look for other cases, figured this would be the only one :-/
Comment on attachment 583301 [details] [diff] [review]
We're going to approve this for aurora. If the CC/GC pauses in Fx10 are caused by bug 335998, we'd need to back that out. If we need to back that out, we run into the startup crash that required 9.0.1 if we don't take this fix. Although we don't know if we are backing out bug 335998 in Fx10, we will take this on aurora to get more testing just in case we later need to take it in beta.
I'm going to deny it for beta (for now) until information we have changes.
How can QA verify this fix? It seems like bug 711794 comment 35 might indicate a reproducible case but that isn't clear to me.