Closed Bug 454276 Opened 16 years ago Closed 16 years ago

Crash [@ nsStyleContext::Destroy()] with DOMAttrModified removing window, mathml:apply, iframe and window {display: none} style

Categories

(Core :: Layout, defect, P2)

1.9.0 Branch
x86
Windows XP
defect

Tracking

()

VERIFIED FIXED
mozilla1.9.1b3

People

(Reporter: martijn.martijn, Assigned: dbaron)

References

Details

(5 keywords, Whiteboard: [sg:nse][no longer sg:crit due to fix for 475128] see comment 24 [needs review])

Crash Data

Attachments

(7 files, 2 obsolete files)

Attached file testcase β€”
See testcase, which normally crashes current trunk build very quickly after loading.
It also crashes Firefox 3, but not Firefox 2, it seems.
If wanted, I could look for the regression range.

The iframe source consists of this:
<?xml-stylesheet href="chrome://browser/skin/" type="text/css"?>
<window xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul" xmlns:html="http://www.w3.org/1999/xhtml" xmlns:mathml="http://www.w3.org/1998/Math/MathML">
<mathml:apply>
  <html:iframe/>
</mathml:apply>
<html:style>window {display: none;}</html:style>
<html:marquee/>
<wizardpage/>

<script xmlns="http://www.w3.org/1999/xhtml">
document.documentElement.removeAttributeNode(document.documentElement.attributes[0]);
window.addEventListener('DOMAttrModified', function(e) {window.frameElement.parentNode.removeChild(window.frameElement) }, true);
</script>
</window>
053ec3af()	
>	xul.dll!nsCOMPtr<nsISelection>::nsCOMPtr<nsISelection>(nsISelection * aRawPtr=0x05485118)  Line 572	C++
 	xul.dll!nsStyleContext::Destroy()  Line 936	C++
 	xul.dll!nsStyleContext::Release()  Line 93	C++
 	xul.dll!nsComputedDOMStyle::GetPropertyCSSValue(const nsAString_internal & aPropertyName={...}, nsIDOMCSSValue * * aReturn=0x0012e108)  Line 379	C++
 	xul.dll!nsComputedDOMStyle::GetPropertyValue(const nsAString_internal & aPropertyName={...}, nsAString_internal & aReturn={...})  Line 289 + 0x1f bytes	C++
 	xul.dll!nsComputedDOMStyle::GetPropertyValue(nsCSSProperty aPropID=eCSSProperty_width, nsAString_internal & aValue={...})  Line 233 + 0x2a bytes	C++
 	xul.dll!CSS2PropertiesTearoff::GetWidth(nsAString_internal & aValue={...})  Line 515 + 0x16 bytes	C++
 	xul.dll!NS_InvokeByIndex_P(nsISupports * that=0x000000ef, unsigned int methodIndex=1, unsigned int paramCount=1237648, nsXPTCVariant * params=0x053030f4)  Line 102	C++
 	xul.dll!XPCWrappedNative::CallMethod(XPCCallContext & ccx={...}, XPCWrappedNative::CallMode mode=239)  Line 2393 + 0x1f bytes	C++
 	xul.dll!XPCWrappedNative::CallMethod(XPCCallContext & ccx={...}, XPCWrappedNative::CallMode mode=CALL_GETTER)  Line 2393 + 0x1f bytes	C++
etc..
Flags: blocking1.9.1?
Flags: wanted1.9.0.x+
Flags: blocking1.9.0.4?
Worksforme, using:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b1pre) Gecko/20081001 Minefield/3.1b1pre
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → WORKSFORME
Ria, would you like to find out when this regressed and when it became worksforme?
Status: RESOLVED → REOPENED
Flags: blocking1.9.1? → wanted1.8.1.x-
Resolution: WORKSFORME → ---
Whiteboard: [sg:critical?] 1.9.0-only, wfm on moz-central
Version: Trunk → 1.9.0 Branch
Flags: blocking1.9.0.4? → blocking1.9.0.4+
Martijn: could you do the regression-window honors here? Thanks.
Flags: blocking1.9.0.4+ → blocking1.9.0.5+
Seems to have been fixed between Linux x86_64 nightlies 2008-09-11-02-mozilla-central and 2008-09-12-02-mozilla-central.
My guess would be that it was fixed by bug 454361; the question is whether that's the real problem or just something that happens to fix this testcase.
(In reply to comment #6)
> My guess would be that it was fixed by bug 454361; the question is whether
> that's the real problem or just something that happens to fix this testcase.

Hmm.  I tried testing this locally by removing the code added there, and it actually didn't seem to be responsible, although I might have done something wrong...
qawanted: Al, can you verify the regression window in comment 5?
Keywords: qawanted
Crashes in Ubuntu with Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1b1pre) Gecko/20080911020347 Minefield/3.1b1pre. With Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1b1pre) Gecko/20080912020354 Minefield/3.1b1pre, it doesn't crash but takes my cpu between 70% and 100% and keeps it pegged there while the page does it's thing. This is what last night's build does as well.

So, something either fixed the bug or masked the behavior between the daily builds from 9/11 and 9/12. 

If we have hourly builds from then, I could try to narrow it down further but I haven't found any that old.
Mats, can you look at this?
Assignee: nobody → mats.palmgren
Flags: blocking1.9.0.5+ → blocking1.9.0.6+
Mats, if you're not able to work on this, who would you recommend?
Whiteboard: [sg:critical?] 1.9.0-only, wfm on moz-central → [sg:critical?] [needs 1.9.0 patch] 1.9.0-only, wfm on moz-central
Flags: blocking1.9.0.6+ → blocking1.9.0.7+
Getting a property value flushes layout which recreates the rule tree.

nsComputedDOMStyle has a mStyleContextHolder member, it's a strong ref on
a mStyleContext.  I think it's illegal to hold such references over a
flush point according to the comment in nsStyleSet::EndReconstruct():
http://hg.mozilla.org/mozilla-central/annotate/ee25fdabc9fe/layout/style/nsStyleSet.cpp?mark=162-163#l157
(style contexts does not hold a strong ref on mRuleNode)
Substitute /a mStyleContext/a nsStyleContext/ above

The code looks the same on trunk so I think we need to fix it there too.
(In reply to comment #12)
> (style contexts does not hold a strong ref on mRuleNode)

Well, they do most of the time (since rule nodes are GCed, and the style contexts are roots), but not across a rule tree rebuild.
Attached patch wip (obsolete) β€” β€” Splinter Review
Changing Flush_Style to Flush_Layout seems to be sufficient for me on Linux.
(In reply to comment #14)
Ok.  Do you agree this could potentially occur also on trunk?
What frame is still holding on to an old style context, and why?
Not sure I understand the question; there's no frame holding an old
style context, it's nsComputedDOMStyle::mStyleContextHolder.
... so when the frame has released it, this member becomes the last ref
and it's destroyed at the end of GetPropertyCSSValue() when we set
the strong ref to null.
The style context in this case comes from the frame, it's shared
and not a root (ie. not in nsStyleSet::mRoots).

I suspect that the "wip" patch could have a performance penalty.
Also, I'm not sure if a Flush_Layout there *guaranties* that no rule nodes
are deleted while mStyleContextHolder is alive; there's lot's of
FlushPendingReflows() sprinkled in this file.

Otherwise, protecting the rule nodes reachable from mStyleContextHolder
from being destroyed and not leaking them looks tricky...
In nsStyleSet::NotifyStyleContextDestroyed
http://mxr.mozilla.org/mozilla-central/source/layout/style/nsStyleSet.cpp?mark=908-910,916#883
we need to protect them from being GC'ed and we can't simply Destroy
the old rule tree in nsStyleSet::EndReconstruct():
http://mxr.mozilla.org/mozilla-central/source/layout/style/nsStyleSet.cpp?mark=181#158
we have to delay it while there are "external" style contexts alive.
I tried that and seemed to fixed the crash in this bug... but I don't think
it's enough... in nsStyleSet::Shutdown() (called from nsPresShell::Destroy())
http://mxr.mozilla.org/mozilla-central/source/layout/style/nsStyleSet.cpp?mark=875#869
we unconditionally destroy the rule tree and it seems hard to avoid that.
nsPresShell::Destroy() cancels reflow callbacks *before* calling
nsStyleSet::Shutdown() though, so GetPropertyCSSValue() could
post a reflow callback just to detect that and null out mStyleContextHolder
before the rule tree is destroyed. This is rather messy though...

A simpler (less risky) approach is to re-resolve mStyleContextHolder,
mInnerFrame etc, in FlushPendingReflows() and add null-checks after all calls.
I think that needs to be done anyway, eg.
http://mxr.mozilla.org/mozilla-central/source/layout/style/nsComputedDOMStyle.cpp?mark=2881-2883#2880
I think 'mInnerFrame' could have been destroyed there.
Attached patch Patch rev. 1 β€” β€” Splinter Review
The re-resolve and null-check approach.
Attachment #355911 - Attachment is obsolete: true
Attachment #357758 - Flags: superreview?(dbaron)
Attachment #357758 - Flags: review?(dbaron)
(In reply to comment #20)
> The style context in this case comes from the frame, it's shared
> and not a root (ie. not in nsStyleSet::mRoots).

Not sure what you mean by this; mRoots is just the set of style contexts that have no parent context.

> Otherwise, protecting the rule nodes reachable from mStyleContextHolder
> from being destroyed and not leaking them looks tricky...
> In nsStyleSet::NotifyStyleContextDestroyed
> http://mxr.mozilla.org/mozilla-central/source/layout/style/nsStyleSet.cpp?mark=908-910,916#883
> we need to protect them from being GC'ed and we can't simply Destroy
> the old rule tree in nsStyleSet::EndReconstruct():
> http://mxr.mozilla.org/mozilla-central/source/layout/style/nsStyleSet.cpp?mark=181#158
> we have to delay it while there are "external" style contexts alive.
> I tried that and seemed to fixed the crash in this bug... but I don't think
> it's enough... in nsStyleSet::Shutdown() (called from nsPresShell::Destroy())
> http://mxr.mozilla.org/mozilla-central/source/layout/style/nsStyleSet.cpp?mark=875#869
> we unconditionally destroy the rule tree and it seems hard to avoid that.
> nsPresShell::Destroy() cancels reflow callbacks *before* calling
> nsStyleSet::Shutdown() though, so GetPropertyCSSValue() could
> post a reflow callback just to detect that and null out mStyleContextHolder
> before the rule tree is destroyed. This is rather messy though...
> 
> A simpler (less risky) approach is to re-resolve mStyleContextHolder,
> mInnerFrame etc, in FlushPendingReflows() and add null-checks after all calls.
> I think that needs to be done anyway, eg.
> http://mxr.mozilla.org/mozilla-central/source/layout/style/nsComputedDOMStyle.cpp?mark=2881-2883#2880
> I think 'mInnerFrame' could have been destroyed there.
This seems a little too piecemeal, and runs the risk both of people messing this up for new code, and the risk of a whole bunch of null-dereference crashes since I think it breaks our guaranteeing of the non-nullness of mStyleContextHolder in the getter functions.  It also has the performance cost of doing this work twice.

I think I might prefer a solution where:
 * we make all the getter functions static, and separate them into two types:
   + some take just a style context
   + some take a style context and the inner and outer frames
   and then when we're calling the getter type that takes frames, we flush layout before getting the style context in the first place.

Note that this would require eliminating mAppUnitsPerInch and mPresShell (which is trivial, they can be gotten from the style context) and it requires that GetROCSSPrimitiveValue take either a pres context or the app units per inch number (at least until we get rid of that API), and it would require that all the helper functions be made static as well (though I think a bunch of them already should be, like GetROCSSValueList).

I'm curious what bzbarsky thinks of this idea, though.
That sounds fine to me.
Comment on attachment 357758 [details] [diff] [review]
Patch rev. 1

I'd prefer to do what I described in comment 24 instead.  Let me know if you want to do it or you want me to do it.
Attachment #357758 - Flags: superreview?(dbaron)
Attachment #357758 - Flags: superreview-
Attachment #357758 - Flags: review?(dbaron)
Attachment #357758 - Flags: review-
The patch in bug 475128 will likely fix this by changing the underlying problem from a crash into a correctness bug; we should still do something to fix the correctness bug like comment 24.
The testcase WFM, afaict.
It's not wfm, using
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.6pre) Gecko/2009010606 GranParadiso/3.0.6pre (.NET CLR 3.5.30729)

Dan Veditz specifically reopened this on 2008-10-06 for the 1.9.0.x branch, I believe.
And in any case we should get a form of this patch in; it's correcting a real error, so if this bug is closed we should get a new bug open on that problem.
Flags: blocking1.9.0.7+ → blocking1.9.0.8?
This seems to be wfm, using:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7pre) Gecko/2009020405 GranParadiso/3.0.7pre (.NET CLR 3.5.30729)

Olli mentioned he landed the script blocker patches yesterday that might have fixed this.
"fixed" or merely masked? From a security POV users are safe either way (for now anyway), but if there's a real issue here we should fix it.

dbaron: did you want the changes described in comment 24 on the trunk as well, or is that specific to the 1.9.0 branch?
The changes in comment 24 need to be on the trunk.

Bug 475128 landed on 1.9.0, so this should no longer be sg:critical.

It would be a lot easier if I didn't have to re-file all the dependencies of bug 475128, since they're all real bugs, just not sg:critical and blocking anymore.  But I'm not quite sure how to go about doing that...
Depends on: 475128
Flags: blocking1.9.0.8? → blocking1.9.0.7+
Whiteboard: [sg:critical?] [needs 1.9.0 patch] 1.9.0-only, wfm on moz-central → [sg:critical?] [needs 1.9.0.7 verification that sg:crit crash is gone] see comment 24
That means this bug has morphed to a non-critical arch change?

qawanted: Al, please verify that the testcase no longer crashes
Flags: blocking1.9.0.7+ → blocking1.9.0.8?
Keywords: qawanted
Crashed 1.9.0.6 on XP but does not crash the nightly 1.9.0 build from last night: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7pre) Gecko/2009021705 GranParadiso/3.0.7pre (.NET CLR 3.5.30729),
Keywords: qawanted
Removing "wanted1.9.0" flag based on comment 33, but if we do still want the remaining fixes on the older versions please renominate.
Flags: wanted1.9.0.x+
Flags: blocking1.9.0.8?
Whiteboard: [sg:critical?] [needs 1.9.0.7 verification that sg:crit crash is gone] see comment 24 → [sg:nse][no longer sg:crit due to fix for 475128] see comment 24
Upgrading to blocking since the patch here fixes blocker bug 477775
Flags: wanted1.9.1+ → blocking1.9.1+
Taking bug given lack of response to comment 26.  I think it might be easier to do a hybrid approach that's halfway and still leaves them as member functions, though.
Assignee: mats.palmgren → dbaron
Status: REOPENED → ASSIGNED
Attached patch alternative approach (obsolete) β€” β€” Splinter Review
Here's the simple version of what I was thinking.  Not tested yet.
Attached patch alternative approach β€” β€” Splinter Review
Attachment #363458 - Attachment is obsolete: true
Comment on attachment 363493 [details] [diff] [review]
alternative approach

Full mochitests, chrome mochitests, and browser mochitests, and crashtests pass (still running reftests, but they pass so far).
Attachment #363493 - Flags: superreview?(bzbarsky)
Attachment #363493 - Flags: review?(bzbarsky)
The basic idea of this patch is that rather than call FlushPendingReflows in the middle of computed style computation, we do it at the beginning for those properties that need it.  Then if the layout flush causes frames to be destroyed, we don't end up with dangling frame or style context pointers, since we get the frames and style contexts after the flush.
Whiteboard: [sg:nse][no longer sg:crit due to fix for 475128] see comment 24 → [sg:nse][no longer sg:crit due to fix for 475128] see comment 24 [needs review]
Whiteboard: [sg:nse][no longer sg:crit due to fix for 475128] see comment 24 [needs review] → [needs review][sg:nse][no longer sg:crit due to fix for 475128] see comment 24 [needs review]
Attachment #363493 - Flags: superreview?(bzbarsky)
Attachment #363493 - Flags: superreview+
Attachment #363493 - Flags: review?(bzbarsky)
Attachment #363493 - Flags: review+
http://hg.mozilla.org/mozilla-central/rev/3e9d90ebeaf0
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/41404019f754
Status: ASSIGNED → RESOLVED
Closed: 16 years ago16 years ago
Keywords: fixed1.9.1
Resolution: --- → FIXED
Whiteboard: [needs review][sg:nse][no longer sg:crit due to fix for 475128] see comment 24 [needs review] → [sg:nse][no longer sg:crit due to fix for 475128] see comment 24 [needs review]
Target Milestone: --- → mozilla1.9.1b3
Flags: wanted1.9.0.x+
Flags: blocking1.9.0.8+
Attachment #364213 - Flags: approval1.9.0.8?
Comment on attachment 364213 [details] [diff] [review]
alternative approach, merged to 1.9.0

Approved for 1.9.0.8, a=dveditz for release-drivers
Attachment #364213 - Flags: approval1.9.0.8? → approval1.9.0.8+
Checked in to CVS trunk for 1.9.0.8, 2009-03-08 12:16 -0700.
Keywords: fixed1.9.0.8
The attached testcase doesn't crash 1.9.0.7, which makes it a bit hard to verify the fix for 1.9.0.8. The behavior appears the same in both 1.9.0.7 and the 1.9.0.8 nightly.
Bug 477775 might, though.
Apparent bug 477775 doesn't either (as comments there point out). Any additional ideas on how to verify this fix?
Group: core-security
This testcase is too slow to be a crashtest.
Flags: in-testsuite-
That doesn't mean we couldn't write another one.
verified FIXED using attached testcase on builds:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090721
Minefield/3.6a1pre ID:20090721044139

and

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.1pre) Gecko/20090720
Shiretoko/3.5.1pre ID:20090720042942
Status: RESOLVED → VERIFIED
Crash Signature: [@ nsStyleContext::Destroy()]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: