Closed Bug 637242 Opened 9 years ago Closed 5 years ago

Crash due to too much recursion in nsRuleNode::Sweep

Categories

(Core :: CSS Parsing and Computation, defect, critical)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla33

People

(Reporter: jruderman, Assigned: mats)

References

(Blocks 1 open bug)

Details

(Keywords: crash, testcase)

Attachments

(4 files, 3 obsolete files)

The recursive call is
http://hg.mozilla.org/mozilla-central/file/35b6854b0159/layout/style/nsRuleNode.cpp#l6582

Testcase based on layout/style/crashtests/439184-1.html
The reason I didn't find this bug earlier is that I only recently started including "skip" crashtests as starting points for fuzzing.  layout/style/crashtests/439184-1.html passes, but is marked as "skip" because it is slow.
There were 148 crash reports in the last week that match this bug, in that they were EXCEPTION_STACK_OVERFLOW and [@ nsRuleNode::Sweep()].  So this is not a topcrash, but also not fuzz-only.
Assignee: nobody → matspal
OS: Windows 7 → All
Hardware: x86 → All
I found an unrelated bug.  We should subtract the old child count
from the current to get the number of children that were removed.
This leads to integer overflow, but a few lines down we do
"mRefCnt -= childrenDestroyed;" which underflows so the net effect
is that we currently add the number of removed children to mRefCnt
which doesn't seem right.

layout/reftests/table-anonymous-boxes/162063-1.xhtml
triggers this.
Attachment #520860 - Flags: review?(dbaron)
(optional patch)
If all children were removed we might as well delete
the hash table.

This could easily be extended to check kMaxChildrenInList
or some lower number to convert the hash to a list (we
already do the walk for the sweep so we could build the
list at the same time).
Attachment #520861 - Flags: review?(dbaron)
Attached patch Part 3, non-recursive sweep (obsolete) — Splinter Review
Attachment #520863 - Flags: review?(dbaron)
In my local debug builds: 2^16 crashes on XP, 2^17 on OSX, 2^18 on Linux.
I chose 2^17 which takes 2.4 seconds in a Linux debug build (fast Core i7).
Attachment #520864 - Flags: review?(dbaron)
Comment on attachment 520860 [details] [diff] [review]
Part 1, ref counting bug

r=dbaron
Attachment #520860 - Flags: review?(dbaron) → review+
Comment on attachment 520861 [details] [diff] [review]
Part 2, delete hash table (optional)

I don't think this is worth the bother.  We don't otherwise convert from hashtable to list storage.
Attachment #520861 - Flags: review?(dbaron) → review-
Whiteboard: [needs review]
Comment on attachment 520863 [details] [diff] [review]
Part 3, non-recursive sweep

This could use a few comments to explain what it's doing.  In
particular, it should explain (in nsRuleNode::SweepHashEntry) that when
a rule node has its children hashed, this code uses the otherwise unused
mNextSibling linkage.

s/survivers/survivors/ throughout.

>-
>-  // Clear our mark, for the next time around.

better to leave this.

nsRuleNode::Sweep should assert IsRoot() at the start of the method.
(And it should assert !mNextSibling.)

You need to fix the comments above the Mark() and Sweep() methods in
nsRuleNode.h to reflect these changes.

Please put braces around single-line if bodies.

In nsRuleNode::Sweep, avoid code duplication by pushing |this| onto
the sweep queue and processing |this| the same way you process all
other elements.  (You can remove the outer HaveChildren() check

In nsRuleNode::SweepChildren, you can simplify the survivorsWithChildren
handling substantially by just setting survivorsWithChildren at the end
of the loop, to ChildrenList().

Given that I suggested not taking patch 2, this patch should neither
remove nor re-add the code that patch 2 adds.

With those changes I think this looks good.

review- since I'd like to look at the revised patch.
Attachment #520863 - Flags: review?(dbaron) → review-
Comment on attachment 520864 [details] [diff] [review]
Part 3, crash test

This looks fine assuming you've tested that it crashes without the
patch.
Attachment #520864 - Flags: review?(dbaron) → review+
Attachment #520861 - Attachment is obsolete: true
Attachment #520864 - Attachment description: Part 4, crash test → Part 3, crash test
Attached patch Part 2, non-recursive sweep (obsolete) — Splinter Review
(In reply to David Baron [:dbaron] from comment #9)
> nsRuleNode::Sweep should assert IsRoot() at the start of the method.

ok

> (And it should assert !mNextSibling.)

That exposes an existing bug that mNextSibling is never initialized and
has an undefined value -- 'operator new' says it zeroes the allocated
memory but that's a lie.

> You need to fix the comments above the Mark() and Sweep() methods in
> nsRuleNode.h to reflect these changes.

ok, I added a note saying that mNextSibling on the child nodes is
temporarily used internally by Sweep.

> Please put braces around single-line if bodies.

ok (it seemed like the prevailing style in this file though)

> In nsRuleNode::Sweep, avoid code duplication by pushing |this| onto
> the sweep queue and processing |this| the same way you process all
> other elements.

ok

> In nsRuleNode::SweepChildren, you can simplify the survivorsWithChildren
> handling substantially by just setting survivorsWithChildren at the end
> of the loop, to ChildrenList().

ok  (the intention was to skip over leaf children for as long as possible
to avoid iterating over them again)
Attachment #520863 - Attachment is obsolete: true
Attachment #575030 - Flags: review?(dbaron)
(In reply to David Baron [:dbaron] from comment #10)
> This looks fine assuming you've tested that it crashes without the
> patch.

Yes, it crashes without the fix.
Comment on attachment 575030 [details] [diff] [review]
Part 2, non-recursive sweep

>Bug 637242 - Crash due to too much recursion in nsRuleNode::Sweep.  Patch 2 of 3, r=dbaron

The patch description should be different from the bug summary.  How about:

Bug 637242, patch 2 of 3: Make nsRuleNode::Sweep nonrecursive to avoid stack exhaustion crashes.  r=dbaron

>+PLDHashOperator
>+nsRuleNode::SweepHashEntry(PLDHashTable *table, PLDHashEntryHdr *hdr,
>+                           PRUint32 number, void *arg)
>+{
>+  ChildrenHashEntry *entry = static_cast<ChildrenHashEntry*>(hdr);
>+  nsRuleNode* node = entry->mRuleNode;
>+  if (node->DestroyIfNotMarked()) {
>+    return PL_DHASH_REMOVE; // implies NEXT, unless |ed with STOP
>+  }
>+  if (node->HaveChildren()) {
>+    // When children are hashed mNextSibling is not normally used but we use it
>+    // here to build a list of children that needs to be swept.
>+    nsRuleNode*** tailQ = static_cast<nsRuleNode***>(arg);
>+    **tailQ = node;
>+    *tailQ = &node->mNextSibling;

I don't see any need for this list to be built in traversal order (as opposed to reverse-traversal order).  It might even be better for the CPU cache to do it in reverse-traversal order.  The reason I point this out is that it's simpler:  you can make arg an nsRuleNode**, and just append nodes to the beginning of the list and set the new node's mNextSibling to the old value.  (Just pass &survivorsWithChildren as arg; no need for tailQ.)

r=dbaron with that
Attachment #575030 - Flags: review?(dbaron) → review+
time for checkin? 

would this solve  nsRuleNode::Sweep() 
eg bp-9c5ccdfd-9d1a-418c-aad3-a60292131116
Flags: needinfo?(matspal)
Whiteboard: [needs review]
Patch with the review comments in comment 13 addressed as suggested.
Attachment #575030 - Attachment is obsolete: true
Attachment #8455009 - Flags: review+
(In reply to Wayne Mery (:wsmwk) from comment #14)
> would this solve  nsRuleNode::Sweep() 
> eg bp-9c5ccdfd-9d1a-418c-aad3-a60292131116

Yes, that looks like it's caused by this bug.
You need to log in before you can comment on or make changes to this bug.