Bug 323394 (ffoxdie)

Crash due to too much recursion in frame construction (ffoxdie)

NEW
Unassigned

Status

()

defect
--
critical
14 years ago
11 months ago

People

(Reporter: jruderman, Unassigned)

Tracking

({crash, testcase})

Trunk
Points:
---
Dependency tree / graph
Bug Flags:
wanted1.9 -
wanted1.8.1.x +
wanted1.8.0.x +

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [sg:dos], )

Attachments

(2 attachments)

 
Exception:  EXC_BAD_ACCESS (0x0001)
Codes:      KERN_INVALID_ADDRESS (0x0001) at 0xbf7ffff0

Thread 0 Crashed:
0   non-virtual thunk [nv:-480] to nsHTMLDocument::QueryCommandValue(nsAString_internal const&, nsAString_internal&) + 360
1   PL_DHashTableOperate + 120
2   RuleHash::EnumerateTagRules(nsIAtom*, void (*)(nsICSSStyleRule*, nsCSSSelector*, void*), void*) + 36
3   nsCSSRuleProcessor::RulesMatching(PseudoRuleProcessorData*) + 56
4   nsStyleSet::ResolveStyleForNonElement(nsStyleContext*) + 212
5   nsStyleSet::FileRules(int (*)(nsIStyleRuleProcessor*, void*), RuleProcessorData*) + 56
6   nsStyleSet::ProbePseudoStyleFor(nsIContent*, nsIAtom*, nsStyleContext*) + 224
7   nsCSSFrameConstructor::CreateGeneratedContentFrame(nsFrameConstructorState&, nsIFrame*, nsIContent*, nsStyleContext*, nsIAtom*, nsIFrame**) + 132
8   nsCSSFrameConstructor::ProcessInlineChildren(nsFrameConstructorState&, nsIContent*, nsIFrame*, int, nsFrameItems&, int*) + 132

9   nsCSSFrameConstructor::ConstructInline(nsFrameConstructorState&, nsStyleDisplay const*, nsIContent*, nsIFrame*, nsStyleContext*, int, nsIFrame*) + 168
10  nsCSSFrameConstructor::ConstructFrameByDisplayType(nsFrameConstructorState&, nsStyleDisplay const*, nsIContent*, int, nsIAtom*, nsIFrame*, nsStyleContext*, nsFrameItems&, int) + 1516
11  nsCSSFrameConstructor::ConstructFrameInternal(nsFrameConstructorState&, nsIContent*, nsIFrame*, nsIAtom*, int, nsStyleContext*, nsFrameItems&, int) + 1568
12  nsCSSFrameConstructor::ConstructFrame(nsFrameConstructorState&, nsIContent*, nsIFrame*, nsFrameItems&) + 308
13  nsCSSFrameConstructor::ProcessInlineChildren(nsFrameConstructorState&, nsIContent*, nsIFrame*, int, nsFrameItems&, int*) + 412

14  nsCSSFrameConstructor::ConstructInline(nsFrameConstructorState&, nsStyleDisplay const*, nsIContent*, nsIFrame*, nsStyleContext*, int, nsIFrame*) + 168
15  nsCSSFrameConstructor::ConstructFrameByDisplayType(nsFrameConstructorState&, nsStyleDisplay const*, nsIContent*, int, nsIAtom*, nsIFrame*, nsStyleContext*, nsFrameItems&, int) + 1516
16  nsCSSFrameConstructor::ConstructFrameInternal(nsFrameConstructorState&, nsIContent*, nsIFrame*, nsIAtom*, int, nsStyleContext*, nsFrameItems&, int) + 1568
17  nsCSSFrameConstructor::ConstructFrame(nsFrameConstructorState&, nsIContent*, nsIFrame*, nsFrameItems&) + 308
18  nsCSSFrameConstructor::ProcessInlineChildren(nsFrameConstructorState&, nsIContent*, nsIFrame*, int, nsFrameItems&, int*) + 412

...
I get a different crash, stack overflow recursing in nsHTMLDocument::RegisterNamedItems, TB13941953

http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/content/html/document/src/nsHTMLDocument.cpp&mark=3158&rev=MOZILLA_1_8_0_BRANCH#3158

Probably doesn't like a 10,000-deep call stack.
I get a stack overflow in TB13958161Q Win98

nsPseudoFrameData::nsPseudoFrameData  [c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/layout/base/nsCSSFrameConstructor.cpp, line 890]

code below repeated for a total of 12 times:

nsCSSFrameConstructor::ConstructInline  [c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/layout/base/nsCSSFrameConstructor.cpp, line 12892]
nsCSSFrameConstructor::ConstructFrameByDisplayType  [c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/layout/base/nsCSSFrameConstructor.cpp, line 7010]
nsCSSFrameConstructor::ConstructFrameInternal  [c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/layout/base/nsCSSFrameConstructor.cpp, line 8131]
nsCSSFrameConstructor::ConstructFrame  [c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/layout/base/nsCSSFrameConstructor.cpp, line 7956]
nsCSSFrameConstructor::ProcessInlineChildren  [c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/layout/base/nsCSSFrameConstructor.cpp, line 13080]

... stack ending here:

nsCSSFrameConstructor::ConstructInline  [c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/layout/base/nsCSSFrameConstructor.cpp, line 12892]
nsCSSFrameConstructor::ConstructFrameByDisplayType  [c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/layout/base/nsCSSFrameConstructor.cpp, line 7010]
nsCSSFrameConstructor::ConstructFrameInternal  [c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/layout/base/nsCSSFrameConstructor.cpp, line 8131]
Attachment #208471 - Attachment description: testcase → testcase (10000 <x>s followed by 10000 </x>s, in XHTML namespace)
Note that in HTML we somewhat limit content model depth, exactly to prevent such things from happening....
Why doesn't MAX_FRAME_DEPTH prevent this too-much-recursion crash?
MAX_FRAME_DEPTH/MAX_REFLOW_DEPTH is only checked for reflow, not frame construction.

It would be possible to do something similar for frame construction, probably by keeping a counter on the frame constructor state.  The question is what limit to set....

And we'd probably want similar checks in ReResolveStyleContext, by the way.  And various DOM methods that recurse.  And anything else in general that recurses down the content tree.  :(
Flags: blocking1.9?
Fixing this would make it easier to figure out a claimed security hole, bug 348514.
(In reply to comment #7)
> 
> It would be possible to do something similar for frame construction, probably
> by keeping a counter on the frame constructor state.  The question is what
> limit to set....

What would be the right place(s) to stop too deep frame construction?

> 
> And we'd probably want similar checks in ReResolveStyleContext, by the way. 
> And various DOM methods that recurse.  And anything else in general that
> recurses down the content tree.  :(

Why do we need to add a check in /various/ DOM methods and not just DOM construction methods? Wouldn't limiting at construction time, limit all other recursions too?
If you're trying to limit content tree depth, you could do the DOM construction methods....  Generally speaking, that will break pages.  But yes, that would be a little simpler.  You'd have to change the XML parser, various nsINode methods, XBL, XFT and a few other things, probably...
(In reply to comment #10)
> methods....  Generally speaking, that will break pages.  But yes, that would be
> a little simpler.  You'd have to change the XML parser, various nsINode

Simpler than what? I don't see the alternate approach outlined here. What's the idea ... how can we tackle this without breaking pages that nest that deep?
> Simpler than what?

Simpler than putting checks in various recursive methods.

There is no way to fix this without breaking very deeply nested pages ... except by rewriting Gecko to avoid the use of recursion. That would take a while.
I think limiting the depth allowed by the XML parser, just like the HTML parser does, would be a good start.
This bug crashes Fx 1.5.x to 2.0 RC1 at: http://lcamtuf.coredump.cx/ffoxdie.html
*** Bug 356404 has been marked as a duplicate of this bug. ***
*** Bug 357002 has been marked as a duplicate of this bug. ***
Test cases ffoxdie.html and ffoxdie2.html (from http://lcamtuf.coredump.cx/) both affect 1.5.0.7 (tried on FC5 and RHEL4), however ffoxdie3.html doesn't.

The backtraces of both contain tens of thousands of calls to
NSGetModule () from /usr/lib64/firefox-1.5.0.7/components/libgklayout.so

That is quite different from what both Jesse and Hermann reported.
This is more than 32000 lines long, that's why I gzipped it.
OS: Mac OS X 10.2 → All
Hardware: Macintosh → All
> The backtraces of both contain tens of thousands of calls to
> NSGetModule () from /usr/lib64/firefox-1.5.0.7/components/libgklayout.so

That indicates that you're using a stripped optimized build, so the stack is pretty much useless.
Summary: Crash due to too much recursion in frame construction → Crash due to too much recursion in frame construction (ffoxdie)
are we thinking about roc's suggestion in https://bugzilla.mozilla.org/show_bug.cgi?id=323394#c14 for 1.9 or the branches?

any other ideas on fixes?
Limiting in the parser is only going to get us partly there since you can still use the DOM to create any depth you want. And even if we plug that there's XSLT, and most likely other things (templates? overlays?)

What we could do is to limit it in nsINode::InsertChildAt. That could be done efficiently (time-wise) by letting each node keep track of its own nesting depth. We will have quite a few bits left over in mFlagsOrSlots, so if we grab, say, 15 bits there we could allow trees up to 32000 nesting levels (or lower depending on where we want to limit it).

(Fwiw, such information could be used to create much faster implementations of nsContentUtils::ComparePosition)
That should say: We *still* have quite a few bits...
(In reply to comment #21)
> are we thinking about roc's suggestion in
> https://bugzilla.mozilla.org/show_bug.cgi?id=323394#c14 for 1.9 or the
> branches?
> 
> any other ideas on fixes?

JS engine uses a generic way to check for too-much recursion and treats the native stack overflow roughly like out-of-memory, see bug 192414. The same mechanics can be used in the rest of the browser where recursion is possible.
Flags: blocking1.8.1.1?
Would be great to get a fix here. Even if we don't think it's exploitable it's causing a great deal of grief.
Flags: blocking1.8.1.1?
Flags: blocking1.8.1.1+
Flags: blocking1.8.0.9+
Could someone add [@nsIFrame::SetStyleContext] in the summary. See crash #19 on http://talkback-public.mozilla.org/reports/firefox/FF20/index.html .

A good bunch of crash are coming from http://lcamtuf.coredump.cx/ffoxdie_orig.html,  	http://lcamtuf.coredump.cx/ffoxdie.html .
I don't think that would be right.  Any given too-much-recursion crash bug can have many things at the "top" of the stack.
This is just a crash, and not exploitable.  Still looking for a fix.
Flags: wanted1.8.1.x+
Flags: wanted1.8.0.x+
Flags: blocking1.8.1.1+
Flags: blocking1.8.0.9+
(In reply to comment #28)
> This is just a crash, and not exploitable.

However. This bug was reported on many IT-sites (e.g. the german it magazine www.heise.de). If the next version of FF will arrive and there is the information, that this bug is still not fixed it sounds to the people like MS.
It would be a better publicity for FF/Mozilla if this bug is fixed in the next version.
All true, but given the choice between fixing security bugs and fixing issues that are unfortunate, not security bugs, but blown out of proportion by the press, we would rather do the former.  Security is more important than the appearance of security...
Alias: ffoxdie
Flags: blocking1.8.1.2+
Flags: blocking1.8.0.10+
Flags: blocking1.8.1.2+
Flags: blocking1.8.0.10+
One thing we would definitely like for 1.9 is to limit the depth of frame trees in the frame constructor. This would prevent a lot of problems.

Limiting the content tree depth would also be good but it sounds a bit harder, especially in terms of what you do when content tree depth is exceeded by someone manipulating the DOM.
Whiteboard: [wanted-1.9]
The JS engine has a stack address limit set by DOM code based on a low stack, and assumption that the stack grows down.  Many recursive JS engine functions monitor the address of a stack dummy variable and throw when this limit is reached or exceeded.  It might be better to have more Gecko code use this mechanism, or one like it, rather than counting recursion levels and limiting them with brute-force numbers, which are hard to relate to stack memory use.

/be
I imagine the two strategies would have different implications for how hard it is to maintain Gecko's key invariants.  They might even require changes to the invariants themselves.
(In reply to comment #31)
> Limiting the content tree depth would also be good but it sounds a bit harder,
> especially in terms of what you do when content tree depth is exceeded by
> someone manipulating the DOM.

The large content tree depth is a problem only for recursive code, right? Then just checking for the stack depth during the recursion should be enough to cover that.
(In reply to comment #32)
> The JS engine has a stack address limit set by DOM code based on a low stack,
> and assumption that the stack grows down.

It is xpconnect code, not DOM, that these days sets the limit.
(In reply to comment #33)
> I imagine the two strategies would have different implications for how hard it
> is to maintain Gecko's key invariants.  They might even require changes to the
> invariants themselves.

I don't think this is the case.  There are no "key invariants" that depend on magic recursion depth limits, as far as I can see.  The issue is simply DOS attack or unintended stack overflow crash prevention, which means a hard quota of some kind on stack growth.  Ad-hoc and varying recursion depth limits are a poor way to implement such a quota.  Address sampling is better.

(In reply to comment #35)
> (In reply to comment #32)
> > The JS engine has a stack address limit set by DOM code based on a low stack,
> > and assumption that the stack grows down.
> 
> It is xpconnect code, not DOM, that these days sets the limit.

Right, thanks (XPConnect, DOM; Tomato, Tomahtoe ;-).

/be
> There are no "key invariants" that depend on magic recursion depth limits, as
> far as I can see.

Well.  There sort of are.  Frame destruction is recursive, and we absolutely depend on the invariant that Destroy() is called on all frames in a presshell before ~nsPresShell finishes.

That's off the top of my head, of course.  There may be other similar things.
(In reply to comment #37)
> > There are no "key invariants" that depend on magic recursion depth limits, as
> > far as I can see.
> 
> Well.  There sort of are.  Frame destruction is recursive, and we absolutely
> depend on the invariant that Destroy() is called on all frames in a presshell
> before ~nsPresShell finishes.

Ah, I see (and stand corrected).  But really, if we must limit stack consumption to avoid DOS and unintended overflow crashes, then we should revise destruction to use an iterative algorithm if possible.

One question: is the stack consumption from destruction bounded by prior recursive construction or traversal, so that if we have a frame tree of height H, we know we can destroy it without stack overflow?  This depends on depth of stack at destruction time vs. at creation or traversal, so may be hard to bound.

> That's off the top of my head, of course.  There may be other similar things.

It would be good to know of others.

/be
I think it'd be hard to limit the recursion itself while maintaining all invariants that we have. For example the BindToTree implementation does some stuff that is really important that we do and we simply can't stop the recursion there safely. Same thing when for example reparenting wrappers when nodes are moved around in the DOM (especially when moved between documents using .adoptNode)
> One question: is the stack consumption from destruction bounded by prior
> recursive construction or traversal,

Thanks to the wonders of DOM events being reentrant, any operation that can be triggered from script (and frame destruction is amongst those) may begin when the stack is arbitrarily deep.

Same thing for UnbindFromTree and so forth.

Of course if we take this approach we should stack-limit DOM event dispatch such that it can't get us too far into trouble.

But we have a _lot_ of recursive algorithms in our code, thanks to the whole "the data structure is a tree" business.  :(
How about using the JS-maintained thread stack limit, but instead of failing, switch to an iterative algorithm such as Deutsch-Schorr-Waite?  It would be best to have a single tree-walking higher-order function, e.g.

traverse(node, doit) {
    doit(node);
    for (each kid in node)
        traverse(kid, doit);
}

that we could extend via D-S-W, but I suspect (someone pleas confirm) that we have not used a HOF to consolidate tree-walking recursion.

/be
Yeah, we simply inline all the tree walking. The problem with the above approach is that for many of our recursive algorithms we keep state that is either passed on to child calls, or return state from child calls. Though maybe that can be solved using closure parameters.

And this still wouldn't solve the problem when we go in and out of javascript. That part would still need to throw an exception.

In any case, it will be *a lot* of work doing this. I think the time spent on trying to convert all our recursive algorithms into iterative ones would be better spent on other things. These crashes are after all not exploitable.

I suspect a much easier approach would be to limit the depth of the DOM trees and put limits on places where we can recurse over the tree multiple times, such as for mutation events. 
What sicking said. Unwinding all our recursive treewalkers into code with explicit closures would be a ton of work.
Flags: wanted1.9+
Whiteboard: [wanted-1.9]
Flags: wanted1.9+ → wanted1.9-
Whiteboard: [sg:dos]
note: this http://lcamtuf.coredump.cx/ffoxdie_orig.html is still crashing opt-builds.
Duplicate of this bug: 505350
Recent stack of this crash

Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_PROTECTION_FAILURE at 0x00000000bf7ffffc
Crashed Thread:  0

Thread 0 Crashed:
0   libxpcom_core.dylib               0x0051f935 NS_LogCtor_P + 5
(nsTraceRefcntImpl.cpp:1032)
1   libgklayout.dylib                 0x12338b35 nsCSSRect::nsCSSRect() + 69
2   libgklayout.dylib                 0x12338b9c nsCSSDisplay::nsCSSDisplay() +
76
3   libgklayout.dylib                 0x123a4f0c
nsRuleNode::GetVisibilityData(nsStyleContext*) + 28 (nsRuleNode.cpp:1361)
4   libgklayout.dylib                 0x123a50d8
nsRuleNode::GetStyleVisibility(nsStyleContext*, int) + 264
(nsStyleStructList.h:103)
5   libgklayout.dylib                 0x123a6947
nsStyleContext::GetStyleVisibility() + 55
6   libgklayout.dylib                 0x12209c44
nsFrame::DidSetStyleContext(nsStyleContext*) + 340 (nsFrame.cpp:574)
7   libgklayout.dylib                 0x12212ce5 nsFrame::Init(nsIContent*,
nsIFrame*, nsIFrame*) + 293 (nsIFrame.h:2159)
8   libgklayout.dylib                 0x12291298
nsSplittableFrame::Init(nsIContent*, nsIFrame*, nsIFrame*) + 40
(nsSplittableFrame.cpp:55)
9   libgklayout.dylib                 0x121fafac
nsContainerFrame::Init(nsIContent*, nsIFrame*, nsIFrame*) + 44
(nsContainerFrame.cpp:93)
10  libgklayout.dylib                 0x1211ebed
nsCSSFrameConstructor::InitAndRestoreFrame(nsFrameConstructorState const&,
nsIContent*, nsIFrame*, nsIFrame*, nsIFrame*, int) + 221 (nsIFrame.h:1095)
11  libgklayout.dylib                 0x12136f8e
nsCSSFrameConstructor::ConstructInline(nsFrameConstructorState&,
nsCSSFrameConstructor::FrameConstructionItem&, nsIFrame*, nsStyleDisplay
const*, nsFrameItems&, nsIFrame**) + 142 (nsCSSFrameConstructor.cpp:10740)
...

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre)
Gecko/20090720 Minefield/3.6a1pre
If we crash due to excessive recursion, the interesting part of the stack is the repeating part, not the part at the very top.
Duplicate of this bug: 456954
Duplicate of this bug: 160228
Blocks: 468240
There are several more recent duplicates of this, some claim to be somewhat "fixed" (like https://bugzilla.mozilla.org/show_bug.cgi?id=557180 which had only one testcase, but I get the crash in current Minefield with another input file). Would another example input help here?
> Would another example input help here?

It depends.

It's pretty easy to trigger this crash if you're _trying_ to.  If your example input is just a real-life page that accidentally triggers it, then it would be helpful to see it.
Duplicate of this bug: 592532
Duplicate of this bug: 621850
In bug 644526, we're considering capping the DOM tree depth for DOMs generated by the XML parser, in order to work around these layout recursion issues.
Product: Core → Core Graveyard
Component: Layout: Misc Code → Layout
Product: Core Graveyard → Core
You need to log in before you can comment on or make changes to this bug.