Last Comment Bug 454276 - Crash [@ nsStyleContext::Destroy()] with DOMAttrModified removing window, mathml:apply, iframe and window {display: none} style
: Crash [@ nsStyleContext::Destroy()] with DOMAttrModified removing window, mat...
Status: VERIFIED FIXED
[sg:nse][no longer sg:crit due to fix...
: crash, fixed1.9.0.9, regression, testcase, verified1.9.1
Product: Core
Classification: Components
Component: Layout (show other bugs)
: 1.9.0 Branch
: x86 Windows XP
: P2 critical (vote)
: mozilla1.9.1b3
Assigned To: David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch)
:
Mentors:
Depends on: 475128
Blocks: 477775
  Show dependency treegraph
 
Reported: 2008-09-08 13:21 PDT by Martijn Wargers [:mwargers] (not working for Mozilla)
Modified: 2011-06-13 10:01 PDT (History)
13 users (show)
roc: blocking1.9.1+
dveditz: blocking1.9.0.9+
dveditz: wanted1.9.0.x+
dveditz: wanted1.8.1.x-
jruderman: in‑testsuite-
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
testcase (1.36 KB, text/html)
2008-09-08 13:21 PDT, Martijn Wargers [:mwargers] (not working for Mozilla)
no flags Details
stacktrace from opt. build from yesterday (5.94 KB, text/plain)
2008-09-08 13:22 PDT, Martijn Wargers [:mwargers] (not working for Mozilla)
no flags Details
Stack for rule node destruction and the crash (31.05 KB, text/html)
2009-01-07 18:21 PST, Mats Palmgren (:mats)
no flags Details
wip (1.40 KB, patch)
2009-01-07 19:11 PST, Mats Palmgren (:mats)
no flags Details | Diff | Review
Patch rev. 1 (13.97 KB, patch)
2009-01-19 20:46 PST, Mats Palmgren (:mats)
dbaron: review-
dbaron: superreview-
Details | Diff | Review
Patch rev. 1 for 1.9.0 branch (16.09 KB, patch)
2009-01-19 20:47 PST, Mats Palmgren (:mats)
no flags Details | Diff | Review
alternative approach (25.67 KB, patch)
2009-02-20 23:34 PST, David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch)
no flags Details | Diff | Review
alternative approach (26.09 KB, patch)
2009-02-21 10:29 PST, David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch)
bzbarsky: review+
bzbarsky: superreview+
Details | Diff | Review
alternative approach, merged to 1.9.0 (24.60 KB, patch)
2009-02-25 16:36 PST, David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch)
dveditz: approval1.9.0.9+
Details | Diff | Review

Description Martijn Wargers [:mwargers] (not working for Mozilla) 2008-09-08 13:21:19 PDT
Created attachment 337512 [details]
testcase

See testcase, which normally crashes current trunk build very quickly after loading.
It also crashes Firefox 3, but not Firefox 2, it seems.
If wanted, I could look for the regression range.

The iframe source consists of this:
<?xml-stylesheet href="chrome://browser/skin/" type="text/css"?>
<window xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul" xmlns:html="http://www.w3.org/1999/xhtml" xmlns:mathml="http://www.w3.org/1998/Math/MathML">
<mathml:apply>
  <html:iframe/>
</mathml:apply>
<html:style>window {display: none;}</html:style>
<html:marquee/>
<wizardpage/>

<script xmlns="http://www.w3.org/1999/xhtml">
document.documentElement.removeAttributeNode(document.documentElement.attributes[0]);
window.addEventListener('DOMAttrModified', function(e) {window.frameElement.parentNode.removeChild(window.frameElement) }, true);
</script>
</window>
Comment 1 Martijn Wargers [:mwargers] (not working for Mozilla) 2008-09-08 13:22:56 PDT
Created attachment 337513 [details]
stacktrace from opt. build from yesterday

 	053ec3af()	
>	xul.dll!nsCOMPtr<nsISelection>::nsCOMPtr<nsISelection>(nsISelection * aRawPtr=0x05485118)  Line 572	C++
 	xul.dll!nsStyleContext::Destroy()  Line 936	C++
 	xul.dll!nsStyleContext::Release()  Line 93	C++
 	xul.dll!nsComputedDOMStyle::GetPropertyCSSValue(const nsAString_internal & aPropertyName={...}, nsIDOMCSSValue * * aReturn=0x0012e108)  Line 379	C++
 	xul.dll!nsComputedDOMStyle::GetPropertyValue(const nsAString_internal & aPropertyName={...}, nsAString_internal & aReturn={...})  Line 289 + 0x1f bytes	C++
 	xul.dll!nsComputedDOMStyle::GetPropertyValue(nsCSSProperty aPropID=eCSSProperty_width, nsAString_internal & aValue={...})  Line 233 + 0x2a bytes	C++
 	xul.dll!CSS2PropertiesTearoff::GetWidth(nsAString_internal & aValue={...})  Line 515 + 0x16 bytes	C++
 	xul.dll!NS_InvokeByIndex_P(nsISupports * that=0x000000ef, unsigned int methodIndex=1, unsigned int paramCount=1237648, nsXPTCVariant * params=0x053030f4)  Line 102	C++
 	xul.dll!XPCWrappedNative::CallMethod(XPCCallContext & ccx={...}, XPCWrappedNative::CallMode mode=239)  Line 2393 + 0x1f bytes	C++
 	xul.dll!XPCWrappedNative::CallMethod(XPCCallContext & ccx={...}, XPCWrappedNative::CallMode mode=CALL_GETTER)  Line 2393 + 0x1f bytes	C++
etc..
Comment 2 Martijn Wargers [:mwargers] (not working for Mozilla) 2008-10-01 16:33:08 PDT
Worksforme, using:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b1pre) Gecko/20081001 Minefield/3.1b1pre
Comment 3 Martijn Wargers [:mwargers] (not working for Mozilla) 2008-10-01 16:50:14 PDT
Ria, would you like to find out when this regressed and when it became worksforme?
Comment 4 Daniel Veditz [:dveditz] 2008-10-15 10:33:03 PDT
Martijn: could you do the regression-window honors here? Thanks.
Comment 5 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2008-10-29 16:23:53 PDT
Seems to have been fixed between Linux x86_64 nightlies 2008-09-11-02-mozilla-central and 2008-09-12-02-mozilla-central.
Comment 6 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2008-10-29 16:26:22 PDT
My guess would be that it was fixed by bug 454361; the question is whether that's the real problem or just something that happens to fix this testcase.
Comment 7 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2008-10-29 21:43:13 PDT
(In reply to comment #6)
> My guess would be that it was fixed by bug 454361; the question is whether
> that's the real problem or just something that happens to fix this testcase.

Hmm.  I tried testing this locally by removing the code added there, and it actually didn't seem to be responsible, although I might have done something wrong...
Comment 8 Samuel Sidler (old account; do not CC) 2008-11-10 11:53:31 PST
qawanted: Al, can you verify the regression window in comment 5?
Comment 9 [On PTO until 6/29] 2008-11-11 17:13:59 PST
Crashes in Ubuntu with Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1b1pre) Gecko/20080911020347 Minefield/3.1b1pre. With Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1b1pre) Gecko/20080912020354 Minefield/3.1b1pre, it doesn't crash but takes my cpu between 70% and 100% and keeps it pegged there while the page does it's thing. This is what last night's build does as well.

So, something either fixed the bug or masked the behavior between the daily builds from 9/11 and 9/12. 

If we have hourly builds from then, I could try to narrow it down further but I haven't found any that old.
Comment 10 Samuel Sidler (old account; do not CC) 2008-11-12 11:36:11 PST
Mats, can you look at this?
Comment 11 Samuel Sidler (old account; do not CC) 2009-01-05 07:50:55 PST
Mats, if you're not able to work on this, who would you recommend?
Comment 12 Mats Palmgren (:mats) 2009-01-07 18:21:58 PST
Created attachment 355909 [details]
Stack for rule node destruction and the crash

Getting a property value flushes layout which recreates the rule tree.

nsComputedDOMStyle has a mStyleContextHolder member, it's a strong ref on
a mStyleContext.  I think it's illegal to hold such references over a
flush point according to the comment in nsStyleSet::EndReconstruct():
http://hg.mozilla.org/mozilla-central/annotate/ee25fdabc9fe/layout/style/nsStyleSet.cpp?mark=162-163#l157
(style contexts does not hold a strong ref on mRuleNode)
Comment 13 Mats Palmgren (:mats) 2009-01-07 18:27:24 PST
Substitute /a mStyleContext/a nsStyleContext/ above

The code looks the same on trunk so I think we need to fix it there too.
Comment 14 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-01-07 18:34:49 PST
(In reply to comment #12)
> (style contexts does not hold a strong ref on mRuleNode)

Well, they do most of the time (since rule nodes are GCed, and the style contexts are roots), but not across a rule tree rebuild.
Comment 15 Mats Palmgren (:mats) 2009-01-07 19:11:27 PST
Created attachment 355911 [details] [diff] [review]
wip

Changing Flush_Style to Flush_Layout seems to be sufficient for me on Linux.
Comment 16 Mats Palmgren (:mats) 2009-01-07 19:19:18 PST
(In reply to comment #14)
Ok.  Do you agree this could potentially occur also on trunk?
Comment 17 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-01-07 20:33:24 PST
What frame is still holding on to an old style context, and why?
Comment 18 Mats Palmgren (:mats) 2009-01-07 20:51:28 PST
Not sure I understand the question; there's no frame holding an old
style context, it's nsComputedDOMStyle::mStyleContextHolder.
Comment 19 Mats Palmgren (:mats) 2009-01-07 20:55:13 PST
... so when the frame has released it, this member becomes the last ref
and it's destroyed at the end of GetPropertyCSSValue() when we set
the strong ref to null.
Comment 20 Mats Palmgren (:mats) 2009-01-19 20:44:28 PST
The style context in this case comes from the frame, it's shared
and not a root (ie. not in nsStyleSet::mRoots).

I suspect that the "wip" patch could have a performance penalty.
Also, I'm not sure if a Flush_Layout there *guaranties* that no rule nodes
are deleted while mStyleContextHolder is alive; there's lot's of
FlushPendingReflows() sprinkled in this file.

Otherwise, protecting the rule nodes reachable from mStyleContextHolder
from being destroyed and not leaking them looks tricky...
In nsStyleSet::NotifyStyleContextDestroyed
http://mxr.mozilla.org/mozilla-central/source/layout/style/nsStyleSet.cpp?mark=908-910,916#883
we need to protect them from being GC'ed and we can't simply Destroy
the old rule tree in nsStyleSet::EndReconstruct():
http://mxr.mozilla.org/mozilla-central/source/layout/style/nsStyleSet.cpp?mark=181#158
we have to delay it while there are "external" style contexts alive.
I tried that and seemed to fixed the crash in this bug... but I don't think
it's enough... in nsStyleSet::Shutdown() (called from nsPresShell::Destroy())
http://mxr.mozilla.org/mozilla-central/source/layout/style/nsStyleSet.cpp?mark=875#869
we unconditionally destroy the rule tree and it seems hard to avoid that.
nsPresShell::Destroy() cancels reflow callbacks *before* calling
nsStyleSet::Shutdown() though, so GetPropertyCSSValue() could
post a reflow callback just to detect that and null out mStyleContextHolder
before the rule tree is destroyed. This is rather messy though...

A simpler (less risky) approach is to re-resolve mStyleContextHolder,
mInnerFrame etc, in FlushPendingReflows() and add null-checks after all calls.
I think that needs to be done anyway, eg.
http://mxr.mozilla.org/mozilla-central/source/layout/style/nsComputedDOMStyle.cpp?mark=2881-2883#2880
I think 'mInnerFrame' could have been destroyed there.
Comment 21 Mats Palmgren (:mats) 2009-01-19 20:46:25 PST
Created attachment 357758 [details] [diff] [review]
Patch rev. 1

The re-resolve and null-check approach.
Comment 22 Mats Palmgren (:mats) 2009-01-19 20:47:18 PST
Created attachment 357759 [details] [diff] [review]
Patch rev. 1 for 1.9.0 branch
Comment 23 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-01-21 10:34:38 PST
(In reply to comment #20)
> The style context in this case comes from the frame, it's shared
> and not a root (ie. not in nsStyleSet::mRoots).

Not sure what you mean by this; mRoots is just the set of style contexts that have no parent context.

> Otherwise, protecting the rule nodes reachable from mStyleContextHolder
> from being destroyed and not leaking them looks tricky...
> In nsStyleSet::NotifyStyleContextDestroyed
> http://mxr.mozilla.org/mozilla-central/source/layout/style/nsStyleSet.cpp?mark=908-910,916#883
> we need to protect them from being GC'ed and we can't simply Destroy
> the old rule tree in nsStyleSet::EndReconstruct():
> http://mxr.mozilla.org/mozilla-central/source/layout/style/nsStyleSet.cpp?mark=181#158
> we have to delay it while there are "external" style contexts alive.
> I tried that and seemed to fixed the crash in this bug... but I don't think
> it's enough... in nsStyleSet::Shutdown() (called from nsPresShell::Destroy())
> http://mxr.mozilla.org/mozilla-central/source/layout/style/nsStyleSet.cpp?mark=875#869
> we unconditionally destroy the rule tree and it seems hard to avoid that.
> nsPresShell::Destroy() cancels reflow callbacks *before* calling
> nsStyleSet::Shutdown() though, so GetPropertyCSSValue() could
> post a reflow callback just to detect that and null out mStyleContextHolder
> before the rule tree is destroyed. This is rather messy though...
> 
> A simpler (less risky) approach is to re-resolve mStyleContextHolder,
> mInnerFrame etc, in FlushPendingReflows() and add null-checks after all calls.
> I think that needs to be done anyway, eg.
> http://mxr.mozilla.org/mozilla-central/source/layout/style/nsComputedDOMStyle.cpp?mark=2881-2883#2880
> I think 'mInnerFrame' could have been destroyed there.
Comment 24 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-01-21 11:40:30 PST
This seems a little too piecemeal, and runs the risk both of people messing this up for new code, and the risk of a whole bunch of null-dereference crashes since I think it breaks our guaranteeing of the non-nullness of mStyleContextHolder in the getter functions.  It also has the performance cost of doing this work twice.

I think I might prefer a solution where:
 * we make all the getter functions static, and separate them into two types:
   + some take just a style context
   + some take a style context and the inner and outer frames
   and then when we're calling the getter type that takes frames, we flush layout before getting the style context in the first place.

Note that this would require eliminating mAppUnitsPerInch and mPresShell (which is trivial, they can be gotten from the style context) and it requires that GetROCSSPrimitiveValue take either a pres context or the app units per inch number (at least until we get rid of that API), and it would require that all the helper functions be made static as well (though I think a bunch of them already should be, like GetROCSSValueList).

I'm curious what bzbarsky thinks of this idea, though.
Comment 25 Boris Zbarsky [:bz] (Out June 25-July 6) 2009-01-21 17:52:27 PST
That sounds fine to me.
Comment 26 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-01-22 09:13:05 PST
Comment on attachment 357758 [details] [diff] [review]
Patch rev. 1

I'd prefer to do what I described in comment 24 instead.  Let me know if you want to do it or you want me to do it.
Comment 27 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-01-24 21:22:25 PST
The patch in bug 475128 will likely fix this by changing the underlying problem from a crash into a correctness bug; we should still do something to fix the correctness bug like comment 24.
Comment 28 Jesse Ruderman 2009-01-30 12:09:18 PST
The testcase WFM, afaict.
Comment 29 Martijn Wargers [:mwargers] (not working for Mozilla) 2009-01-31 06:28:33 PST
It's not wfm, using
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.6pre) Gecko/2009010606 GranParadiso/3.0.6pre (.NET CLR 3.5.30729)

Dan Veditz specifically reopened this on 2008-10-06 for the 1.9.0.x branch, I believe.
Comment 30 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-01-31 10:07:23 PST
And in any case we should get a form of this patch in; it's correcting a real error, so if this bug is closed we should get a new bug open on that problem.
Comment 31 Martijn Wargers [:mwargers] (not working for Mozilla) 2009-02-04 12:18:20 PST
This seems to be wfm, using:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7pre) Gecko/2009020405 GranParadiso/3.0.7pre (.NET CLR 3.5.30729)

Olli mentioned he landed the script blocker patches yesterday that might have fixed this.
Comment 32 Daniel Veditz [:dveditz] 2009-02-04 16:19:35 PST
"fixed" or merely masked? From a security POV users are safe either way (for now anyway), but if there's a real issue here we should fix it.

dbaron: did you want the changes described in comment 24 on the trunk as well, or is that specific to the 1.9.0 branch?
Comment 33 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-02-04 16:32:13 PST
The changes in comment 24 need to be on the trunk.

Bug 475128 landed on 1.9.0, so this should no longer be sg:critical.

It would be a lot easier if I didn't have to re-file all the dependencies of bug 475128, since they're all real bugs, just not sg:critical and blocking anymore.  But I'm not quite sure how to go about doing that...
Comment 34 Daniel Veditz [:dveditz] 2009-02-09 11:12:18 PST
That means this bug has morphed to a non-critical arch change?

qawanted: Al, please verify that the testcase no longer crashes
Comment 35 [On PTO until 6/29] 2009-02-19 11:37:03 PST
Crashed 1.9.0.6 on XP but does not crash the nightly 1.9.0 build from last night: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7pre) Gecko/2009021705 GranParadiso/3.0.7pre (.NET CLR 3.5.30729),
Comment 36 Daniel Veditz [:dveditz] 2009-02-20 11:39:42 PST
Removing "wanted1.9.0" flag based on comment 33, but if we do still want the remaining fixes on the older versions please renominate.
Comment 37 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2009-02-20 21:25:00 PST
Upgrading to blocking since the patch here fixes blocker bug 477775
Comment 38 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-02-20 22:47:20 PST
Taking bug given lack of response to comment 26.  I think it might be easier to do a hybrid approach that's halfway and still leaves them as member functions, though.
Comment 39 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-02-20 23:34:01 PST
Created attachment 363458 [details] [diff] [review]
alternative approach

Here's the simple version of what I was thinking.  Not tested yet.
Comment 40 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-02-21 10:29:07 PST
Created attachment 363493 [details] [diff] [review]
alternative approach
Comment 41 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-02-21 11:57:57 PST
Comment on attachment 363493 [details] [diff] [review]
alternative approach

Full mochitests, chrome mochitests, and browser mochitests, and crashtests pass (still running reftests, but they pass so far).
Comment 42 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-02-21 12:01:41 PST
The basic idea of this patch is that rather than call FlushPendingReflows in the middle of computed style computation, we do it at the beginning for those properties that need it.  Then if the layout flush causes frames to be destroyed, we don't end up with dangling frame or style context pointers, since we get the frames and style contexts after the flush.
Comment 43 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-02-24 20:51:41 PST
http://hg.mozilla.org/mozilla-central/rev/3e9d90ebeaf0
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/41404019f754
Comment 44 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-02-25 16:36:15 PST
Created attachment 364213 [details] [diff] [review]
alternative approach, merged to 1.9.0
Comment 45 Daniel Veditz [:dveditz] 2009-03-06 11:34:15 PST
Comment on attachment 364213 [details] [diff] [review]
alternative approach, merged to 1.9.0

Approved for 1.9.0.8, a=dveditz for release-drivers
Comment 46 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-03-08 12:18:08 PDT
Checked in to CVS trunk for 1.9.0.8, 2009-03-08 12:16 -0700.
Comment 47 [On PTO until 6/29] 2009-03-19 16:09:32 PDT
The attached testcase doesn't crash 1.9.0.7, which makes it a bit hard to verify the fix for 1.9.0.8. The behavior appears the same in both 1.9.0.7 and the 1.9.0.8 nightly.
Comment 48 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-03-19 17:51:07 PDT
Bug 477775 might, though.
Comment 49 [On PTO until 6/29] 2009-03-23 14:53:42 PDT
Apparent bug 477775 doesn't either (as comments there point out). Any additional ideas on how to verify this fix?
Comment 50 Jesse Ruderman 2009-04-26 23:23:12 PDT
This testcase is too slow to be a crashtest.
Comment 51 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-04-29 09:36:48 PDT
That doesn't mean we couldn't write another one.
Comment 52 Aakash Desai [:aakashd] 2009-07-21 11:38:06 PDT
verified FIXED using attached testcase on builds:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090721
Minefield/3.6a1pre ID:20090721044139

and

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.1pre) Gecko/20090720
Shiretoko/3.5.1pre ID:20090720042942

Note You need to log in before you can comment on or make changes to this bug.