Crash [@ nsStyleContext::Destroy()] with DOMAttrModified removing window, mathml:apply, iframe and window {display: none} style

VERIFIED FIXED in mozilla1.9.1b3

Status

()

Core
Layout
P2
critical
VERIFIED FIXED
9 years ago
6 years ago

People

(Reporter: Martijn Wargers (dead), Assigned: dbaron)

Tracking

(5 keywords)

1.9.0 Branch
mozilla1.9.1b3
x86
Windows XP
crash, fixed1.9.0.9, regression, testcase, verified1.9.1
Points:
---
Dependency tree / graph
Bug Flags:
blocking1.9.1 +
blocking1.9.0.9 +
wanted1.9.0.x +
wanted1.8.1.x -
in-testsuite -

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [sg:nse][no longer sg:crit due to fix for 475128] see comment 24 [needs review], crash signature)

Attachments

(7 attachments, 2 obsolete attachments)

(Reporter)

Description

9 years ago
Created attachment 337512 [details]
testcase

See testcase, which normally crashes current trunk build very quickly after loading.
It also crashes Firefox 3, but not Firefox 2, it seems.
If wanted, I could look for the regression range.

The iframe source consists of this:
<?xml-stylesheet href="chrome://browser/skin/" type="text/css"?>
<window xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul" xmlns:html="http://www.w3.org/1999/xhtml" xmlns:mathml="http://www.w3.org/1998/Math/MathML">
<mathml:apply>
  <html:iframe/>
</mathml:apply>
<html:style>window {display: none;}</html:style>
<html:marquee/>
<wizardpage/>

<script xmlns="http://www.w3.org/1999/xhtml">
document.documentElement.removeAttributeNode(document.documentElement.attributes[0]);
window.addEventListener('DOMAttrModified', function(e) {window.frameElement.parentNode.removeChild(window.frameElement) }, true);
</script>
</window>
(Reporter)

Comment 1

9 years ago
Created attachment 337513 [details]
stacktrace from opt. build from yesterday

 	053ec3af()	
>	xul.dll!nsCOMPtr<nsISelection>::nsCOMPtr<nsISelection>(nsISelection * aRawPtr=0x05485118)  Line 572	C++
 	xul.dll!nsStyleContext::Destroy()  Line 936	C++
 	xul.dll!nsStyleContext::Release()  Line 93	C++
 	xul.dll!nsComputedDOMStyle::GetPropertyCSSValue(const nsAString_internal & aPropertyName={...}, nsIDOMCSSValue * * aReturn=0x0012e108)  Line 379	C++
 	xul.dll!nsComputedDOMStyle::GetPropertyValue(const nsAString_internal & aPropertyName={...}, nsAString_internal & aReturn={...})  Line 289 + 0x1f bytes	C++
 	xul.dll!nsComputedDOMStyle::GetPropertyValue(nsCSSProperty aPropID=eCSSProperty_width, nsAString_internal & aValue={...})  Line 233 + 0x2a bytes	C++
 	xul.dll!CSS2PropertiesTearoff::GetWidth(nsAString_internal & aValue={...})  Line 515 + 0x16 bytes	C++
 	xul.dll!NS_InvokeByIndex_P(nsISupports * that=0x000000ef, unsigned int methodIndex=1, unsigned int paramCount=1237648, nsXPTCVariant * params=0x053030f4)  Line 102	C++
 	xul.dll!XPCWrappedNative::CallMethod(XPCCallContext & ccx={...}, XPCWrappedNative::CallMode mode=239)  Line 2393 + 0x1f bytes	C++
 	xul.dll!XPCWrappedNative::CallMethod(XPCCallContext & ccx={...}, XPCWrappedNative::CallMode mode=CALL_GETTER)  Line 2393 + 0x1f bytes	C++
etc..
(Reporter)

Updated

9 years ago
Flags: blocking1.9.1?
Flags: wanted1.9.0.x+
Flags: blocking1.9.0.4?
Keywords: regressionwindow-wanted
(Reporter)

Comment 2

9 years ago
Worksforme, using:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b1pre) Gecko/20081001 Minefield/3.1b1pre
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → WORKSFORME
(Reporter)

Comment 3

9 years ago
Ria, would you like to find out when this regressed and when it became worksforme?
Status: RESOLVED → REOPENED
Flags: blocking1.9.1? → wanted1.8.1.x-
Resolution: WORKSFORME → ---
Whiteboard: [sg:critical?] 1.9.0-only, wfm on moz-central
Version: Trunk → 1.9.0 Branch
Flags: wanted1.9.1+
Flags: blocking1.9.0.4? → blocking1.9.0.4+
Martijn: could you do the regression-window honors here? Thanks.
Flags: blocking1.9.0.4+ → blocking1.9.0.5+
(Assignee)

Comment 5

9 years ago
Seems to have been fixed between Linux x86_64 nightlies 2008-09-11-02-mozilla-central and 2008-09-12-02-mozilla-central.
(Assignee)

Comment 6

9 years ago
My guess would be that it was fixed by bug 454361; the question is whether that's the real problem or just something that happens to fix this testcase.
(Assignee)

Comment 7

9 years ago
(In reply to comment #6)
> My guess would be that it was fixed by bug 454361; the question is whether
> that's the real problem or just something that happens to fix this testcase.

Hmm.  I tried testing this locally by removing the code added there, and it actually didn't seem to be responsible, although I might have done something wrong...
qawanted: Al, can you verify the regression window in comment 5?
Keywords: qawanted
Crashes in Ubuntu with Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1b1pre) Gecko/20080911020347 Minefield/3.1b1pre. With Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1b1pre) Gecko/20080912020354 Minefield/3.1b1pre, it doesn't crash but takes my cpu between 70% and 100% and keeps it pegged there while the page does it's thing. This is what last night's build does as well.

So, something either fixed the bug or masked the behavior between the daily builds from 9/11 and 9/12. 

If we have hourly builds from then, I could try to narrow it down further but I haven't found any that old.
Keywords: qawanted, regressionwindow-wanted
Mats, can you look at this?
Assignee: nobody → mats.palmgren
Flags: blocking1.9.0.5+ → blocking1.9.0.6+
Mats, if you're not able to work on this, who would you recommend?
Whiteboard: [sg:critical?] 1.9.0-only, wfm on moz-central → [sg:critical?] [needs 1.9.0 patch] 1.9.0-only, wfm on moz-central
Flags: blocking1.9.0.6+ → blocking1.9.0.7+
Created attachment 355909 [details]
Stack for rule node destruction and the crash

Getting a property value flushes layout which recreates the rule tree.

nsComputedDOMStyle has a mStyleContextHolder member, it's a strong ref on
a mStyleContext.  I think it's illegal to hold such references over a
flush point according to the comment in nsStyleSet::EndReconstruct():
http://hg.mozilla.org/mozilla-central/annotate/ee25fdabc9fe/layout/style/nsStyleSet.cpp?mark=162-163#l157
(style contexts does not hold a strong ref on mRuleNode)
Substitute /a mStyleContext/a nsStyleContext/ above

The code looks the same on trunk so I think we need to fix it there too.
(Assignee)

Comment 14

9 years ago
(In reply to comment #12)
> (style contexts does not hold a strong ref on mRuleNode)

Well, they do most of the time (since rule nodes are GCed, and the style contexts are roots), but not across a rule tree rebuild.
Created attachment 355911 [details] [diff] [review]
wip

Changing Flush_Style to Flush_Layout seems to be sufficient for me on Linux.
(In reply to comment #14)
Ok.  Do you agree this could potentially occur also on trunk?
(Assignee)

Comment 17

9 years ago
What frame is still holding on to an old style context, and why?
Not sure I understand the question; there's no frame holding an old
style context, it's nsComputedDOMStyle::mStyleContextHolder.
... so when the frame has released it, this member becomes the last ref
and it's destroyed at the end of GetPropertyCSSValue() when we set
the strong ref to null.
The style context in this case comes from the frame, it's shared
and not a root (ie. not in nsStyleSet::mRoots).

I suspect that the "wip" patch could have a performance penalty.
Also, I'm not sure if a Flush_Layout there *guaranties* that no rule nodes
are deleted while mStyleContextHolder is alive; there's lot's of
FlushPendingReflows() sprinkled in this file.

Otherwise, protecting the rule nodes reachable from mStyleContextHolder
from being destroyed and not leaking them looks tricky...
In nsStyleSet::NotifyStyleContextDestroyed
http://mxr.mozilla.org/mozilla-central/source/layout/style/nsStyleSet.cpp?mark=908-910,916#883
we need to protect them from being GC'ed and we can't simply Destroy
the old rule tree in nsStyleSet::EndReconstruct():
http://mxr.mozilla.org/mozilla-central/source/layout/style/nsStyleSet.cpp?mark=181#158
we have to delay it while there are "external" style contexts alive.
I tried that and seemed to fixed the crash in this bug... but I don't think
it's enough... in nsStyleSet::Shutdown() (called from nsPresShell::Destroy())
http://mxr.mozilla.org/mozilla-central/source/layout/style/nsStyleSet.cpp?mark=875#869
we unconditionally destroy the rule tree and it seems hard to avoid that.
nsPresShell::Destroy() cancels reflow callbacks *before* calling
nsStyleSet::Shutdown() though, so GetPropertyCSSValue() could
post a reflow callback just to detect that and null out mStyleContextHolder
before the rule tree is destroyed. This is rather messy though...

A simpler (less risky) approach is to re-resolve mStyleContextHolder,
mInnerFrame etc, in FlushPendingReflows() and add null-checks after all calls.
I think that needs to be done anyway, eg.
http://mxr.mozilla.org/mozilla-central/source/layout/style/nsComputedDOMStyle.cpp?mark=2881-2883#2880
I think 'mInnerFrame' could have been destroyed there.
Created attachment 357758 [details] [diff] [review]
Patch rev. 1

The re-resolve and null-check approach.
Attachment #355911 - Attachment is obsolete: true
Attachment #357758 - Flags: superreview?(dbaron)
Attachment #357758 - Flags: review?(dbaron)
Created attachment 357759 [details] [diff] [review]
Patch rev. 1 for 1.9.0 branch
(Assignee)

Comment 23

9 years ago
(In reply to comment #20)
> The style context in this case comes from the frame, it's shared
> and not a root (ie. not in nsStyleSet::mRoots).

Not sure what you mean by this; mRoots is just the set of style contexts that have no parent context.

> Otherwise, protecting the rule nodes reachable from mStyleContextHolder
> from being destroyed and not leaking them looks tricky...
> In nsStyleSet::NotifyStyleContextDestroyed
> http://mxr.mozilla.org/mozilla-central/source/layout/style/nsStyleSet.cpp?mark=908-910,916#883
> we need to protect them from being GC'ed and we can't simply Destroy
> the old rule tree in nsStyleSet::EndReconstruct():
> http://mxr.mozilla.org/mozilla-central/source/layout/style/nsStyleSet.cpp?mark=181#158
> we have to delay it while there are "external" style contexts alive.
> I tried that and seemed to fixed the crash in this bug... but I don't think
> it's enough... in nsStyleSet::Shutdown() (called from nsPresShell::Destroy())
> http://mxr.mozilla.org/mozilla-central/source/layout/style/nsStyleSet.cpp?mark=875#869
> we unconditionally destroy the rule tree and it seems hard to avoid that.
> nsPresShell::Destroy() cancels reflow callbacks *before* calling
> nsStyleSet::Shutdown() though, so GetPropertyCSSValue() could
> post a reflow callback just to detect that and null out mStyleContextHolder
> before the rule tree is destroyed. This is rather messy though...
> 
> A simpler (less risky) approach is to re-resolve mStyleContextHolder,
> mInnerFrame etc, in FlushPendingReflows() and add null-checks after all calls.
> I think that needs to be done anyway, eg.
> http://mxr.mozilla.org/mozilla-central/source/layout/style/nsComputedDOMStyle.cpp?mark=2881-2883#2880
> I think 'mInnerFrame' could have been destroyed there.
(Assignee)

Comment 24

9 years ago
This seems a little too piecemeal, and runs the risk both of people messing this up for new code, and the risk of a whole bunch of null-dereference crashes since I think it breaks our guaranteeing of the non-nullness of mStyleContextHolder in the getter functions.  It also has the performance cost of doing this work twice.

I think I might prefer a solution where:
 * we make all the getter functions static, and separate them into two types:
   + some take just a style context
   + some take a style context and the inner and outer frames
   and then when we're calling the getter type that takes frames, we flush layout before getting the style context in the first place.

Note that this would require eliminating mAppUnitsPerInch and mPresShell (which is trivial, they can be gotten from the style context) and it requires that GetROCSSPrimitiveValue take either a pres context or the app units per inch number (at least until we get rid of that API), and it would require that all the helper functions be made static as well (though I think a bunch of them already should be, like GetROCSSValueList).

I'm curious what bzbarsky thinks of this idea, though.
That sounds fine to me.
(Assignee)

Comment 26

9 years ago
Comment on attachment 357758 [details] [diff] [review]
Patch rev. 1

I'd prefer to do what I described in comment 24 instead.  Let me know if you want to do it or you want me to do it.
Attachment #357758 - Flags: superreview?(dbaron)
Attachment #357758 - Flags: superreview-
Attachment #357758 - Flags: review?(dbaron)
Attachment #357758 - Flags: review-
(Assignee)

Comment 27

9 years ago
The patch in bug 475128 will likely fix this by changing the underlying problem from a crash into a correctness bug; we should still do something to fix the correctness bug like comment 24.

Comment 28

9 years ago
The testcase WFM, afaict.
(Reporter)

Comment 29

9 years ago
It's not wfm, using
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.6pre) Gecko/2009010606 GranParadiso/3.0.6pre (.NET CLR 3.5.30729)

Dan Veditz specifically reopened this on 2008-10-06 for the 1.9.0.x branch, I believe.
(Assignee)

Comment 30

9 years ago
And in any case we should get a form of this patch in; it's correcting a real error, so if this bug is closed we should get a new bug open on that problem.
Flags: blocking1.9.0.7+ → blocking1.9.0.8?
(Reporter)

Comment 31

9 years ago
This seems to be wfm, using:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7pre) Gecko/2009020405 GranParadiso/3.0.7pre (.NET CLR 3.5.30729)

Olli mentioned he landed the script blocker patches yesterday that might have fixed this.
"fixed" or merely masked? From a security POV users are safe either way (for now anyway), but if there's a real issue here we should fix it.

dbaron: did you want the changes described in comment 24 on the trunk as well, or is that specific to the 1.9.0 branch?
(Assignee)

Comment 33

9 years ago
The changes in comment 24 need to be on the trunk.

Bug 475128 landed on 1.9.0, so this should no longer be sg:critical.

It would be a lot easier if I didn't have to re-file all the dependencies of bug 475128, since they're all real bugs, just not sg:critical and blocking anymore.  But I'm not quite sure how to go about doing that...
Depends on: 475128
Flags: blocking1.9.0.8? → blocking1.9.0.7+
Whiteboard: [sg:critical?] [needs 1.9.0 patch] 1.9.0-only, wfm on moz-central → [sg:critical?] [needs 1.9.0.7 verification that sg:crit crash is gone] see comment 24
That means this bug has morphed to a non-critical arch change?

qawanted: Al, please verify that the testcase no longer crashes
Flags: blocking1.9.0.7+ → blocking1.9.0.8?
Keywords: qawanted
Crashed 1.9.0.6 on XP but does not crash the nightly 1.9.0 build from last night: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7pre) Gecko/2009021705 GranParadiso/3.0.7pre (.NET CLR 3.5.30729),
(Reporter)

Updated

9 years ago
Keywords: qawanted
Removing "wanted1.9.0" flag based on comment 33, but if we do still want the remaining fixes on the older versions please renominate.
Flags: wanted1.9.0.x+
Flags: blocking1.9.0.8?
Whiteboard: [sg:critical?] [needs 1.9.0.7 verification that sg:crit crash is gone] see comment 24 → [sg:nse][no longer sg:crit due to fix for 475128] see comment 24
(Assignee)

Updated

9 years ago
Blocks: 477775
Upgrading to blocking since the patch here fixes blocker bug 477775
Flags: wanted1.9.1+ → blocking1.9.1+
Priority: -- → P2
(Assignee)

Comment 38

9 years ago
Taking bug given lack of response to comment 26.  I think it might be easier to do a hybrid approach that's halfway and still leaves them as member functions, though.
Assignee: mats.palmgren → dbaron
(Assignee)

Updated

9 years ago
Status: REOPENED → ASSIGNED
(Assignee)

Comment 39

9 years ago
Created attachment 363458 [details] [diff] [review]
alternative approach

Here's the simple version of what I was thinking.  Not tested yet.
(Assignee)

Comment 40

9 years ago
Created attachment 363493 [details] [diff] [review]
alternative approach
Attachment #363458 - Attachment is obsolete: true
(Assignee)

Comment 41

9 years ago
Comment on attachment 363493 [details] [diff] [review]
alternative approach

Full mochitests, chrome mochitests, and browser mochitests, and crashtests pass (still running reftests, but they pass so far).
Attachment #363493 - Flags: superreview?(bzbarsky)
Attachment #363493 - Flags: review?(bzbarsky)
(Assignee)

Comment 42

9 years ago
The basic idea of this patch is that rather than call FlushPendingReflows in the middle of computed style computation, we do it at the beginning for those properties that need it.  Then if the layout flush causes frames to be destroyed, we don't end up with dangling frame or style context pointers, since we get the frames and style contexts after the flush.
Whiteboard: [sg:nse][no longer sg:crit due to fix for 475128] see comment 24 → [sg:nse][no longer sg:crit due to fix for 475128] see comment 24 [needs review]
(Assignee)

Updated

9 years ago
Whiteboard: [sg:nse][no longer sg:crit due to fix for 475128] see comment 24 [needs review] → [needs review][sg:nse][no longer sg:crit due to fix for 475128] see comment 24 [needs review]
Attachment #363493 - Flags: superreview?(bzbarsky)
Attachment #363493 - Flags: superreview+
Attachment #363493 - Flags: review?(bzbarsky)
Attachment #363493 - Flags: review+
(Assignee)

Comment 43

9 years ago
http://hg.mozilla.org/mozilla-central/rev/3e9d90ebeaf0
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/41404019f754
Status: ASSIGNED → RESOLVED
Last Resolved: 9 years ago9 years ago
Keywords: fixed1.9.1
Resolution: --- → FIXED
Whiteboard: [needs review][sg:nse][no longer sg:crit due to fix for 475128] see comment 24 [needs review] → [sg:nse][no longer sg:crit due to fix for 475128] see comment 24 [needs review]
Target Milestone: --- → mozilla1.9.1b3
(Assignee)

Updated

9 years ago
Attachment #363493 - Flags: approval1.9.0.8?
Flags: wanted1.9.0.x+
Flags: blocking1.9.0.8+
(Assignee)

Comment 44

9 years ago
Created attachment 364213 [details] [diff] [review]
alternative approach, merged to 1.9.0
Attachment #364213 - Flags: approval1.9.0.8?
(Assignee)

Updated

9 years ago
Attachment #363493 - Flags: approval1.9.0.8?
Comment on attachment 364213 [details] [diff] [review]
alternative approach, merged to 1.9.0

Approved for 1.9.0.8, a=dveditz for release-drivers
Attachment #364213 - Flags: approval1.9.0.8? → approval1.9.0.8+
(Assignee)

Comment 46

9 years ago
Checked in to CVS trunk for 1.9.0.8, 2009-03-08 12:16 -0700.
Keywords: fixed1.9.0.8
The attached testcase doesn't crash 1.9.0.7, which makes it a bit hard to verify the fix for 1.9.0.8. The behavior appears the same in both 1.9.0.7 and the 1.9.0.8 nightly.
(Assignee)

Comment 48

9 years ago
Bug 477775 might, though.
Apparent bug 477775 doesn't either (as comments there point out). Any additional ideas on how to verify this fix?
Group: core-security

Comment 50

8 years ago
This testcase is too slow to be a crashtest.
Flags: in-testsuite-
(Assignee)

Comment 51

8 years ago
That doesn't mean we couldn't write another one.
verified FIXED using attached testcase on builds:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090721
Minefield/3.6a1pre ID:20090721044139

and

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.1pre) Gecko/20090720
Shiretoko/3.5.1pre ID:20090720042942
Status: RESOLVED → VERIFIED
Keywords: fixed1.9.1 → verified1.9.1
Crash Signature: [@ nsStyleContext::Destroy()]
You need to log in before you can comment on or make changes to this bug.