Closed Bug 306116 Opened 17 years ago Closed 17 years ago

Crash when loading glozy.com [@ nsContainerFrame::SyncFrameViewProperties]

Categories

(Core :: Layout, defect)

x86
Windows XP
defect
Not set
critical

Tracking

()

VERIFIED FIXED

People

(Reporter: martijn.martijn, Unassigned)

References

()

Details

(Keywords: regression, testcase, verified1.8)

Attachments

(2 files)

Reported here:
http://forums.mozillazine.org/viewtopic.php?t=310574&sid=865f4d9c92685cdb349c036c756002c4

With my debug build, I get this, which seems rather typical, in my console
before I crash:
nsLineLayout: Inline(font)(0)@10B7F8C0 metrics=1390350,195!
nsLineLayout: Inline(font)(1)@10B7F83C metrics=1390350,195!
nsLineLayout: Inline(font)(0)@10B7F6CC metrics=1390350,240!
nsLineLayout: Inline(font)(0)@10B7F648 metrics=1390350,195!
nsLineLayout: Inline(font)(0)@10B7F5C4 metrics=1390350,195!
nsLineLayout: Inline(font)(0)@10B7F540 metrics=1390350,195!
nsLineLayout: Inline(font)(1)@10B7F4BC metrics=1390350,195!
nsLineLayout: Inline(font)(0)@10B7F8C0 metrics=3765360,195!
nsLineLayout: Inline(font)(1)@10B7F83C metrics=3765360,195!
nsLineLayout: Inline(font)(0)@10B7F6CC metrics=3765360,240!
nsLineLayout: Inline(font)(0)@10B7F648 metrics=3765360,195!
nsLineLayout: Inline(font)(0)@10B7F5C4 metrics=3765360,195!
nsLineLayout: Inline(font)(0)@10B7F540 metrics=3765360,195!
nsLineLayout: Inline(font)(1)@10B7F4BC metrics=3765360,195!
--DOMWINDOW == 6
nsLineLayout: Inline(font)(0)@10B7F8C0 metrics=15300630,195!
nsLineLayout: Inline(font)(1)@10B7F83C metrics=15300630,195!
nsLineLayout: Inline(font)(0)@10B7F6CC metrics=15300630,240!
nsLineLayout: Inline(font)(0)@10B7F648 metrics=15300630,195!
nsLineLayout: Inline(font)(0)@10B7F5C4 metrics=15300630,195!
nsLineLayout: Inline(font)(0)@10B7F540 metrics=15300630,195!
nsLineLayout: Inline(font)(1)@10B7F4BC metrics=15300630,195!

I got a talkback ID, with a 20050824 trunk build: TB8751075W

I'm pretty sure the regression range is between the 2005-04-28 build and the
2005-04-29 build: 
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2005-04-28+06%3A00%3A00&maxdate=2005-04-29+01%3A00%3A00&cvsroot=%2Fcvsroot
Summary: Crash when loading glozy.com → Crash when loading glozy.com [@ nsContainerFrame::SyncFrameViewProperties]
I get also crashes but with different signatures:

TB8761937Q trunk @nsContainerFrame::PushChildren
TB8762401G branch @nsFormControlFrame::nsFormControlFrame  
Attached file testcase
This testcase seems big, but the markup inside it is minimal.
It also crashes Mozilla1.7, so this bug might in fact not be a recent
regression after all (just something that gets triggered at different
conditions now).

Talkback ID's I get when loading this testcase: TB8762679H TB8762636H
TB8762459Z TB8762371W
Mostyle this:

nsContainerFrame::PushChildren 
[c:/builds/tinderbox/Fx-Trunk/WINNT_5.2_Depend/mozilla/layout/generic/nsContainerFrame.cpp,
line 1144]
nsContainerFrame::PushChildren 
[c:/builds/tinderbox/Fx-Trunk/WINNT_5.2_Depend/mozilla/layout/generic/nsContainerFrame.cpp,
line 1150]
nsContainerFrame::PushChildren 
[c:/builds/tinderbox/Fx-Trunk/WINNT_5.2_Depend/mozilla/layout/generic/nsContainerFrame.cpp,
line 1150]
Maybe it isn't a regression somewhere in 1.8, but this site certainly started to
crash with 1.8 builds.
Flags: blocking1.8b4?
Keywords: testcase
I can't seem to reproduce this crash (though rendering the page takes forever)
in a current Linux trunk build...  Is this branch-only at this point?
this worked much better in 1.0.x so it would be nice to understand what changed
this. martijn, can you narrow down a regression window?
I did that already in comment 0, but the testcase also crashes Mozilla1.7, so
the regression range probably doesn't really matter.
As far as the regression window goes, Bug 240276 landed during that time and
could be a good candidate for what caused this. Bug 289940 (make chrome event
handlers only received trusted events) doesn't look related at all but was a
very large checkin that came in during this window.

It looks like we get stuck inside an infinite loop inside
nsContainerFrame::FrameNeedsView. Although now I wonder how much to trust the
stack trace in the talkback report. It says FrameNeedsView is recursively
calling itself over and over again, but the actual code says that shouldn't be
possible (it never calls itself).
I think the talkback reports for this bug are somewhat bogus for this bug report.

While trying to load this website in a trunk debug build from today, I get
several assertions in nsTableFrame::Reflow:

NS_ASSERTION(NS_UNCONSTRAINEDSIZE != aReflowState.availableWidth,
                   "this doesn't do anything");

I also see the following dumped out to the console (not sure if it matters):

nsLineLayout: Inline(font)(0)@031959CC metrics=17116230,195!

Shortly after skipping past the assertion, the application ends up in an
infinite loop until you eventually crash.

Stack trace for the assertion:

nsTableFrame::Reflow(nsTableFrame * const 0x0313ca70, nsPresContext *
0x02f14750, nsHTMLReflowMetrics & {...}, const nsHTMLReflowState & {...},
unsigned int & 0) line 1935 + 38 bytes
nsContainerFrame::ReflowChild(nsIFrame * 0x0313ca70, nsPresContext * 0x02f14750,
nsHTMLReflowMetrics & {...}, const nsHTMLReflowState & {...}, int 0, int 0,
unsigned int 3, unsigned int & 0) line 904 + 31 bytes
nsTableOuterFrame::OuterReflowChild(nsTableOuterFrame * const 0x0313c96c,
nsPresContext * 0x02f14750, nsIFrame * 0x0313ca70, const nsHTMLReflowState &
{...}, nsHTMLReflowMetrics & {...}, int 1073741824, nsSize & {...}, nsMargin &
{...}, nsMargin & {...}, nsMargin & {...}, nsReflowReason eReflowReason_Resize,
unsigned int & 0, int * 0x00000000) line 1314 + 47 bytes
nsTableOuterFrame::Reflow(nsTableOuterFrame * const 0x0313c96c, nsPresContext *
0x02f14750, nsHTMLReflowMetrics & {...}, const nsHTMLReflowState & {...},
unsigned int & 0) line 1967 + 79 bytes
nsBlockReflowContext::ReflowBlock(const nsRect & {...}, int 0,
nsCollapsingMargin & {...}, int 0, int 1, nsMargin & {...}, nsHTMLReflowState &
{...}, unsigned int & 0) line 605 + 42 bytes
nsBlockFrame::ReflowBlockFrame(nsBlockReflowState & {...}, nsLineList_iterator
{...}, int * 0x0012938c) line 3429 + 66 bytes
nsBlockFrame::ReflowLine(nsBlockReflowState & {...}, nsLineList_iterator {...},
int * 0x0012938c, int 0) line 2590 + 27 bytes
nsBlockFrame::ReflowDirtyLines(nsBlockReflowState & {...}, int 1) line 2242 + 31
bytes
nsBlockFrame::Reflow(nsBlockFrame * const 0x03141888, nsPresContext *
0x02f14750, nsHTMLReflowMetrics & {...}, const nsHTMLReflowState & {...},
unsigned int & 0) line 876 + 17 bytes
nsBlockReflowContext::ReflowBlock(const nsRect & {...}, int 1,
nsCollapsingMargin & {...}, int 0, int 1, nsMargin & {...}, nsHTMLReflowState &
{...}, unsigned int & 0) line 588 + 42 bytes
nsBlockFrame::ReflowBlockFrame(nsBlockReflowState & {...}, nsLineList_iterator
{...}, int * 0x00129fd8) line 3429 + 66 bytes
nsBlockFrame::ReflowLine(nsBlockReflowState & {...}, nsLineList_iterator {...},
int * 0x00129fd8, int 0) line 2590 + 27 bytes
nsBlockFrame::ReflowDirtyLines(nsBlockReflowState & {...}, int 1) line 2242 + 31
bytes
nsBlockFrame::Reflow(nsBlockFrame * const 0x0313c720, nsPresContext *
0x02f14750, nsHTMLReflowMetrics & {...}, const nsHTMLReflowState & {...},
unsigned int & 0) line 876 + 17 bytes
nsContainerFrame::ReflowChild(nsIFrame * 0x0313c720, nsPresContext * 0x02f14750,
nsHTMLReflowMetrics & {...}, const nsHTMLReflowState & {...}, int 0, int 0,
unsigned int 0, unsigned int & 0) line 904 + 31 bytes
nsTableCellFrame::Reflow(nsTableCellFrame * const 0x0313c6b8, nsPresContext *
0x02f14750, nsHTMLReflowMetrics & {...}, const nsHTMLReflowState & {...},
unsigned int & 0) line 858

Bug 240276 seems to be the only bug checked in during the regression window that
was modifying the table frame code, but that's just a guess on my part. 
Like bz, I don't crash with GTK2 on trunk. It eventually displays correctly.

The markup is basically a table containing a single cell, containing a
width:10px DIV, containing a 300K string. We start by putting the string on one
line, and measuring it, which is a) extremely slow and b) gives us a huge width
which nsLineLayout warns us about.

Strangely, the stack shows that we are flowing nested table frames. There
shouldn't be any table frame nesting here. The stack gets really deep and that's
probably the cause of the crasher.
(In reply to comment #9)
> Strangely, the stack shows that we are flowing nested table frames.

Forget this part of the comment, it was incorrect.
I tried this in 1.0 and as far as I can tell it behaves in a very similar way.
Is this only a regression on Windows?
bz or roc, can you please test this on windows where it is worse than 1.0?
I won't be able to do that for at least a week -- need the machine doing other
stuff, and then after I reboot I'll have to set up a Windows build env...
i can reproduce this crash on Windows and Mac. So you don't necessarily need a
Windows box if that's easier :).
Scott, can you run Purify or something like that to check for memory corruption?
I don't have a purify license. I keep trying to catch it in the debugger but the
stack gets hosed to quickly. I'll try to find a safe spot to set a break point
earlier in the page load.
Here's the best look I've gotten at it so far. A random snapshot during the page
load.

While trying to process nsHTMLScrollFrame::ReflowScrolledFrame
we end up calling

nsInlineFrame::Reflow(nsInlineFrame * const 0x031dfa40, nsPresContext *
0x02fc63c0, nsHTMLReflowMetrics & {...}, const nsHTMLReflowState & {...},
unsigned int & 0) line 429 + 28 bytes
nsLineLayout::ReflowFrame(nsIFrame * 0x031dfa40, unsigned int & 0,
nsHTMLReflowMetrics * 0x00000000, int & 0) line 994 + 43 bytes
nsBlockFrame::ReflowInlineFrame(nsBlockReflowState & {...}, nsLineLayout &
{...}, nsLineList_iterator {...}, nsIFrame * 0x031dfa40, unsigned char *
0x001212ff) line 3999 + 22 bytes
nsBlockFrame::DoReflowInlineFrames(nsBlockReflowState & {...}, nsLineLayout &
{...}, nsLineList_iterator {...}, int * 0x001216b8, unsigned char * 0x00121403,
int 1, int 1) line 3838 + 32 bytes
nsBlockFrame::ReflowInlineFrames(nsBlockReflowState & {...}, nsLineList_iterator
{...}, int * 0x001216b8, int 1, int 1) line 3711 + 46 bytes
nsBlockFrame::ReflowLine(nsBlockReflowState & {...}, nsLineList_iterator {...},
int * 0x001216b8, int 1) line 2680
nsBlockFrame::ReflowDirtyLines(nsBlockReflowState & {...}, int 1) line 2239 + 31
bytes
nsBlockFrame::Reflow(nsBlockFrame * const 0x031df914, nsPresContext *
0x02fc63c0, nsHTMLReflowMetrics & {...}, const nsHTMLReflowState & {...},
unsigned int & 1187128) line 876 + 17 bytes
nsContainerFrame::ReflowChild(nsIFrame * 0x031df914, nsPresContext * 0x02fc63c0,
nsHTMLReflowMetrics & {...}, const nsHTMLReflowState & {...}, int 0, int 0,
unsigned int 3, unsigned int & 1187128) line 899 + 31 bytes

This leads us through 7 nested inline frames. After these 7 inline frames, we
find nested nsContainerFrames, 276(!!) levels deep at the spot where I broke
into the debugger. But each of the frames actually looked legit to me. The
pointers were valid. During Reflow Frame we end up walking down this tree
calling nsContainerFrame::DeleteNextInFlowChild 276 times in a row, each time on
a unique frame. 

Would it be normal for a page to have so many nested frames inside of it? 
There should not be frames nested that deep; there aren't on Linux.

But DeleteNextInFlowChild actually does a tail call on its next sibling frame,
and there certainly are more than 276 siblings. I'll give you a patch to test.
That seems to work, no crash happens with the "fix?" (although loading is very
slow).
Yeah, it's slow. David's reflow-refactoring might help a lot here on this
particular testcase. But for me, it's about the same speed as on branch. So I
think we should just land the crash regression.
Comment on attachment 194609 [details] [diff] [review]
fix?

Fairly simple crash fix.
Attachment #194609 - Flags: superreview?(dbaron)
Attachment #194609 - Flags: review?(dbaron)
Yes, sorry, it was a bit of a needless complaint. 
Without patch -> slow and crashing
With patch -> slow
So definitely an improvement. Thanks!
Attachment #194609 - Flags: superreview?(dbaron)
Attachment #194609 - Flags: superreview+
Attachment #194609 - Flags: review?(dbaron)
Attachment #194609 - Flags: review+
Comment on attachment 194609 [details] [diff] [review]
fix?

please land and verify on the trunk and then we'll try to get this into the
branch. thanks.
Attachment #194609 - Flags: approval1.8b4?
Flags: blocking1.8b5? → blocking1.8b5+
Whiteboard: [needs approval asa]
If you want this in beta1 - you'll have to land on the trunk soon so we can verify.
checked into trunk.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
verified on Windows (and Linux for kicks) with trunk builds from 0906
Status: RESOLVED → VERIFIED
Attachment #194609 - Flags: approval1.8b4? → approval1.8b4+
checked in on branch.
Keywords: fixed1.8
Whiteboard: [needs approval asa]
Keywords: fixed1.8verified1.8
You need to log in before you can comment on or make changes to this bug.