hang on load under WinXP and Linux

RESOLVED FIXED

Status

()

--
critical
RESOLVED FIXED
14 years ago
13 years ago

People

(Reporter: ihok, Assigned: roc)

Tracking

({hang, qawanted})

Trunk
x86
All
hang, qawanted
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(7 attachments, 4 obsolete attachments)

177.65 KB, text/html
Details
177.68 KB, text/plain
Details
10.49 KB, text/html
Details
167.55 KB, text/html
Details
2.53 KB, application/zip
Details
167.59 KB, text/html
Details
17.99 KB, patch
dbaron
: review+
dbaron
: superreview+
Details | Diff | Splinter Review
(Reporter)

Description

14 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040803 Firefox/0.9.3
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040803 Firefox/0.9.3

Don't click on
http://www.senate.state.ny.us/graduate.nsf/0/2c556e7ba00321f085256e540052328e?OpenDocument
'cuz it'll crash your browser. (It crashed Firefox 0.9.3 under XP and under FC2;
incidentally, it also crashed IE6, but IE rendered the page before crashing.)

It also crashes the browser if loaded locally; file attached for posterity's sake.

Reproducible: Always
Steps to Reproduce:
1. Load that page.

Actual Results:  
2. Watch it crash.
(Reporter)

Comment 1

14 years ago
Created attachment 156599 [details]
Source for the crashing page, saved for posterity.
Created attachment 156601 [details]
sourcefile in txt format

added the file as .txt - plain/text
It'll crash your browser if you open the source from the above attachement

Comment 3

14 years ago
Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.8a3) Gecko/20040716

confirming on Mozilla 1.8a3
I don´t see a crash, only slow reaction to events, iniate a scroll, wait for
four seconds to see it happen. This type of hang also was seen as the page was
loading, but it finished loading after a long time. Opening the context menu,
select 'close'tab, and closing the tab was also doen in time-consuming steps,
but could be done, and after this, mozilla reacts normally.
 
Component: General → Browser-General
Product: Firefox → Browser
Version: unspecified → Trunk

Comment 4

14 years ago
tried that page with Mozilla 1.4.2, hang, Opera 7.54, working.
Saved the source with Opera, got 178kb, 167kb thereof is one line of <DL> tags.
After removing this, page loads fine locally.
Assignee: firefox → parser
Status: UNCONFIRMED → NEW
Component: Browser-General → HTML: Parser
Ever confirmed: true
QA Contact: firefox.general

Updated

14 years ago
Keywords: crash
Why is this bug confirmed given that:

1)  A trunk build doesn't actually crash
2)  This bug does not have a minimal testcase?

Assigning to confirmer to sort those issues out.  Please feel free to reassign
to default owner again once you do.
Assignee: parser → hhschwab
Keywords: qawanted

Comment 6

14 years ago
Created attachment 157456 [details]
original, linefeeds inserted, <DL> removed, 11 kb, working

1st attachment: from a download with opera, eduted with wordpad, linefeeds
inserted between </p><p>, copy saved, then <DL>s removed.

2nd attachment: from the same file as the first, but this time all removed
between <body> and <DL>, and the HTML at the end removed (</form>)

Comment 7

14 years ago
Created attachment 157457 [details]
minimal testcase, 168 kb <DL> between body tags, hanging

Comment 8

14 years ago
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><head><title>Bug 256311 DL only</title>
</head>
<body text="#000000" bgcolor="#FFFFFF">
<DL><DL>..............
</DL></DL>..............
<DL><DL>..............
<DL><DD>&nbsp;</DL>
</body></html>

the original file doesn´t contain linefeeds, pn wasn´t very good with handling
that file, so I tried to do it using wordpad.
There is one big block of <DL> tags, followed by a big block of </DL> tags,
maybe they are balanced, but I don´t want to check that manually.
A third big block of <DL> follows, unbalanced, as far as I can see.
I didn´t remove the <DL><DD>&nbsp;</DL> at the end, don´t assume that will make
a difference.
1st testcase is working for me, locally and loaded from bugzilla.
2nd testcase (DL only) is hanging my computer, I don´t care if it crashes, or I
do have to crash it.
Assignee: hhschwab → parser

Comment 9

14 years ago
1st block of <DL> has a length of 13181 tags = 52724 characters,
followed by <DD>&nbsp;
2nd block of </DL> has a length of 13181 tags = 65905 characters,
3rd block of <DL> has a length of 13181 tags = 52724 characters,
followed by <DD>&nbsp;</DL>

so first two blocks are deeply nested, but balanced,
third block is unbalanced.


Are better testcases needed, maybe 2000 tags deep nested balanced only?
I would do more tests, if somebody can tell me about the assumed limits,
and if I should test nested DL only, or unbalanced DL only.
Doesn´t make sense without programming environment to test from 100 tags up or
from 13000 tags down doing something like a binary search, waiting some minutes,
if it is supposed to hang, or maybe wait a minute longer hoping it finishes.
I know there are some magic binary numbers, reasonably assumed limits to speedup
the software. There is no whitespace in about three 64 kb blocks, but I don´t
think that matters. The </DL> block breaks the 64 kb limit with its number of
chars, but I don´t think the number of chars is relevant. If nesting 13k DL tags
doesn´t make problems, the unbalanced 13k DL tags shouldn´t make problems too,
as the end is near, of the <Body> ;-)
So is there a need for a testcase with unbalanced tags, or is the problem known,
similar to 
Bug 58917 [FIX] No rendering after tag nesting/TAGVL has been exceeded

I think I tried to use http://validator.w3.org/ to get some hints, but it
couldn´t process it. In my comment #3 it was (sort of) wfm, though waiting for a
minute per mouseclick or keypress to be processed couldn´t be named WFM, really.

I don´t see big differences between crashes and hangs, some hangs can be killed
by task manager, some others only by RESET, and the the result is the same as 
with a crash, windows demands to check the file system.
Maybe Bugzilla needs an own Priority 'Hang', where the hangs can be hanging,
until shoved into another category.


 
Hermann, testing with a shallow <dl> tree to see what the DOM looks like would
be a good start.  Do we end up creating a DOM tree that's too deep on the testcase?

In any case, now that we have a simple testcase I'll just profile this when I
get back into town.
Keywords: crash → hang
Summary: crash on load under WinXP and Linux → hang on load under WinXP and Linux

Comment 11

14 years ago
Created attachment 157733 [details]
3kb zipped testcases, 100/1000/5000/10000 tag blocks

<body text="#000000" bgcolor="#FFFFFF">
<DL><DL>..............
DD embedded in nnn deep nested DLs.
</DL></DL>..............
<DD>&nbsp;balanced block done</DL>
<DL><DL>..............
<DL><DD>&nbsp;DD preceded by nnn DLs</DL>
2nd block done
</body></html>

the ... denotes 100, 1000, 5000, or 10000 repeated tags, nnn shows that number

On an Athlon XP1600, Win98, 512k RAM last testcase took more than a minute.

Comment 12

14 years ago
Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.8a3) Gecko/20040826
Athlon XP1600+, Win98SE, 512MB RAM

Loaded the testcases from attachment 157733 [details] into tabs, and checked timing on Reload.

256311-100.HTM      ca. 1 second total time

256311-1000.HTM     ca. 2 seconds total time

256311-5000.HTM
ca.  3 seconds until window is cleared, 'start' is drawn
plus 20 seconds until 'balanced block done' and horizontal scrollbar is shown

256311-10000.HTM
ca.  3 seconds until window is cleared, 'start' is drawn
plus 30 seconds until horizontal scrollbar is shown
plus 42 seconds until 'balanced block done'

Comment 13

14 years ago
Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.8a3) Gecko/20040830
Celeron 333, Win98 SP1, 96 MB RAM

Loaded the testcases from attachment 157733 [details] into tabs, and checked timing on Reload.

256311-100.HTM      ca. 1 second total time

256311-1000.HTM     ca.  10 seconds total time
after 5 seconds the vertical scrollbar is cleared

256311-2000.HTM     ca.  27 seconds total time
ca.    6 seconds until window is cleared, 'start' is drawn
plus   7 seconds until 'balanced block done' and horizontal scrollbar is shown
plus  14 seconds until vertical scrollbar is shown and loading is done.


256311-2500.HTM     ca.  40 seconds total time
ca.   6 seconds until window is cleared, 'start' is drawn
plus 11 seconds until horizontal scrollbar  is shown
plus  5 seconds until 'balanced block done' is shown
plus 18 seconds until vertical scrollbar is shown and loading is done.


256311-5000.HTM     ca. 144 seconds total time
ca.   7 seconds until window is cleared, 'start' is drawn
plus 47 seconds until horizontal scrollbar  is shown
plus 12 seconds until 'balanced block done' is shown
plus 68 seconds until vertical scrollbar is shown and loading is done.

256311-10000.HTM    ca. 460 seconds total time
ca.   16 seconds until window is cleared, 'start' is drawn
plus 183 seconds until 'balanced block done' and horizontal scrollbar is shown
plus 262 seconds until vertical scrollbar is shown and loading is done.

So Celeron 333 MHz / 96 MB needs six times the time XP1600+ / 512 MB needs.

Comment 14

14 years ago
Created attachment 158554 [details]
This attachment shows the problem is really with rendering the height of nested empty tags

I think this hang (actually slow loading page) is caused by mozilla having to
determine the height of the EMPTY nested dl elements.

If you replace the <DL> tags with "<DL>TEXT" or add a min-height style for the
DL element, the page loads on my machine in about 3 seconds instead of 30. 

The page resumes taking 30 seconds if the min-height is changed below 0.0333px,
set to 0px, or removed.
First, this is not a parser bug.  The same issue happens with <div> instead of
<dl>, for example.

A profile looks like this:

Total hit count: 199176
Count %Total  Function Name
56655   28.4     nsBlockFrame::IsEmpty()
48356   24.3     nsLineBox::IsEmpty() const
31012   15.6     nsRuleNode::GetStyleData(nsStyleStructID, nsStyleContext*, int)
25187   12.6     nsStyleContext::GetStyleData(nsStyleStructID)
8164   4.1     nsStyleCoord::SetUnionValue(nsStyleUnion const&, nsStyleUnit)


The IsEmpty() functions call each other, with the "toplevel" caller of
nsBlockFrame::IsEmpty being nsBlockFrame::ShouldApplyTopMargin.

The problem is that if we really are empty, IsEmpty() has to walk all the kids
(O(N) in number of tags in this case) to determine that.  Since we have to end
up calling this on every single one of our blocks, we end up with O(N^2) layout.

Is there a reasonable way blocks can cache the IsEmpty() state or something?  It
seems to depend on the style of kids, which makes it hard...
Assignee: parser → nobody
Component: HTML: Parser → Layout: Misc Code
QA Contact: core.layout.misc-code
Perhaps the blocks could somehow cache it for the duration of reflow only?
I don't see how to do that without adding a post-reflow phase that scans over
the whole frame tree to unset some flag. And that would probably hurt Tp.
I guess we could have restyling walk up the frame tree from the restyle frame(s)
clearing any cached emptiness bits.

But we'd also need to take care of content changes changing a text frame from
empty to non-empty.
OK, it seems to me that we could fold clearing emptiness bits into the
generation of the incremental reflow tree, since everything that affects
emptiness will cause a reflow.
So... At this point we even _have_ caching of IsEmpty() at least on nsLineBox
(done in bug 269905).  But this bug is still present... the profile has:

Total hit count: 134394
Count %Total  Function Name
43150   32.1     nsLineBox::CachedIsEmpty()
15047   11.2     nsBlockFrame::ShouldApplyTopMargin(nsBlockReflowState&, nsLineBox*)
11966   8.9     nsLineBox::IsEmpty() const
9564   7.1     nsBlockFrame::IsSelfEmpty()
7964   5.9     nsBlockFrame::IsEmpty()

With nsLineBox::CachedIsEmpty looking like:

 42847 43150    61314 nsLineBox::CachedIsEmpty()
              18129 nsLineBox::IsEmpty() const

Note also that we have tons of calls to nsLineBox::IsEmpty from
nsBlockFrame::IsEmpty. Is there a reason not to use CachedIsEmpty() there?  I
guess we might not be in reflow...

In any case, it looks like the calls into IsEmpty() initially come from
nsBlockReflowContext::ComputeCollapsedTopMargin (calling IsEmpty() on a line)
and then spiral out of control.

The other big time sink here is what seems to be a _lot_ of calls to
nsLineBox::CachedIsEmpty from nsBlockFrame::ShouldApplyTopMargin (which
basically calls this on all lines, it looks like).

roc, any idea what we could do here?
Depends on: 269905
I think we should add CachedIsEmpty to nsIFrame, with an approriate *carefully
defined* specification. nsFrame::CachedIsEmpty would just call IsEmpty, but
nsBlockFrame::CachedIsEmpty would reimplement nsBlockFrame::IsEmpty but call
nsLineBox::CachedIsEmpty on its lines. Then nsLineBox::CachedIsEmpty could call
CachedIsEmpty on its frames and I *think* this problem would go away.
Created attachment 197463 [details] [diff] [review]
like so

Try this! It seems to help for me, although it doesn't solve the problem
completely.
That patch makes things a tad better... with it we have (for a random chunk of
time while we're frozen; we never rendered the page before I killed the browser)
on attachment 157457 [details]:

Total hit count: 240539
Count %Total  Function Name
112965   47.0     nsLineBox::CachedIsEmpty()
32960   13.7     nsBlockFrame::ShouldApplyTopMargin(nsBlockReflowState&, nsLineBox*)
11905   4.9     nsLineBox::IsEmpty() const
8428   3.5     nsBlockFrame::IsSelfEmpty()
7652   3.2     nsBlockFrame::IsEmpty()

The main caller of nsLineBox::CachedIsEmpty() is
nsBlockFrame::ShouldApplyTopMargin (over 99% of the calls to CachedIsEmpty come
from ShouldApplyTopMargin).  If I read that code correctly, if we have lots of
empty lines it's O(N^2) in the number of such (N walks, each of length O(N),
down the list).
Created attachment 197626 [details] [diff] [review]
better patch

Okay, this patch adds an optimization that remembers the last "adjacent to top"
line we encountered, and starts the search from there in ShouldApplyTopMargin.
This really seems to help a lot ... the testcase is still slow but it seems to
spend all its time in frame construction (based on my dumb profiling with gdb
and ctrl-C).
Attachment #197463 - Attachment is obsolete: true
Attachment #197626 - Attachment is obsolete: true
That last testcase does help a lot, but we're not into frame construction yet. 
;)  With that patch, I'm looking at:

Total hit count: 282078
Count %Total  Function Name
45204   16.0     nsLineBox::IsEmpty() const
32131   11.4     nsBlockFrame::IsSelfEmpty()
30146   10.7     nsBlockFrame::IsEmpty()
27067   9.6     nsRuleNode::GetStyleData(nsStyleStructID, nsStyleContext*, int)
25287   9.0     nsStyleContext::GetStyleData(nsStyleStructID)

176001 of the hits are under nsBlockReflowContext::ComputeCollapsedTopMargin,
which breaks down as:

             172954 nsLineBox::IsEmpty() const
               2332 nsHTMLReflowState::nsHTMLReflowState(nsPresContext*,
nsHTMLReflowState const&, nsIFrame*, nsSize const&, nsReflowReason, int)
                 89 nsStyleContext::GetStyleData(nsStyleStructID)

and small fry.
Created attachment 197942 [details] [diff] [review]
another try

Previous patch, but also adds some paths to avoid the call to IsEmpty() in
ComputeCollapsedTopMargin, by computing the emptiness of the block during the
ComputeCollapsedTopMargin recursion.
Attachment #197627 - Attachment is obsolete: true
Alright, that seems to be it.  For the first time ever, I actually had the
patience to wait till the page finished loading!  (It helped that this happened
reasonably quickly.)  The profile shows:

Total hit count: 126458
Count %Total  Function Name
15022   11.9     SelectorMatches(RuleProcessorData&, nsCSSSelector*, int,
nsIAtom*, signed char)
8144   6.4     RuleProcessorData::RuleProcessorData(nsPresContext*, nsIContent*,
nsRuleWalker*, nsCompatibility*)
7776   6.1     SelectorMatchesTree(RuleProcessorData&, nsCSSSelector*)
7695   6.1     RuleProcessorData::~RuleProcessorData()
7271   5.7     __libc_poll
4593   3.6     nsAttrAndChildArray::GetAttr(nsIAtom*, int) const
3932   3.1     nsGenericElement::QueryInterface(nsID const&, void**)

and the total number of hits under block reflow is only about 16000, with 504 of
those hits under nsBlockFrame::CachedIsEmpty and 775 under
nsLineBox::CachedIsEmpty (some hits are under both, of course).

The original testcase for this bug still takes somewhere around 5-10 seconds to
load on my p3-733, but that's not nearly as bad (and worth separate bugs, perhaps).
There's no way in the world this is branch material, and I need to study the
patch again to make sure it really makes sense :-). Someone remind me about this
in a few weeks, OK? :-)
I filed bug 310848 for the style system perf issue.
roc, it's been a few weeks.  ;)
Comment on attachment 197942 [details] [diff] [review]
another try

Alright, I reread the patch. Let's go for review.  Extra comments to be added to layout/generic/nsBlockReflowState.h:

+// Set when mLineAdjacentToTop is valid
+#define BRS_HAVELINEADJACENTOTOP  0x00000800


// When BRS_HAVELINEADJACENTOTOP is set, this refers to a line
// which we know is adjacent to the top of the block (in other words, all lines
// before it are empty and do not have clearance
+  nsLineList::iterator mLineAdjacentToTop;
Attachment #197942 - Flags: superreview?(dbaron)
Attachment #197942 - Flags: review?(dbaron)
For my sanity:  the summary of what this patch does is in comment 21, comment 24, and comment 27.  (Please correct me if I'm wrong.)
nsIFrame::CachedIsEmpty should have an
NS_PRECONDITION(!(GetStateBits() & NS_FRAME_IS_DIRTY), "...");

In nsBlockReflowContext::ComputeCollapsedTopMargin, please s/blockIsEmpty/aBlockIsEmpty/.

Why should this chunk not be a loop or two further out?:
+        if (!setBlockIsEmpty && blockIsEmpty) {
+          // The first time we reach here is when this is the first block
+          // and we have processed all its normal lines.
+          setBlockIsEmpty = PR_TRUE;
+          // All lines are empty, or we wouldn't be here!
+          *blockIsEmpty = aRS.frame->IsSelfEmpty();
         }
Please also fix the spelling of BRS_HAVELINEADJACENTOTOP -- it's missing a T.
I'm a little scared of caching line iterators because we have so many weird "redo" bits of code scattered all over, but I suppose that should get a little better with the reflow branch so I suppose I shouldn't be so scared.

I think those are all the comments I have, but I'm interested in an answer on the last one in comment 34.
Comment on attachment 197942 [details] [diff] [review]
another try

- because of the last bit of comment 34 (perhaps you can just explain this to me so it makes sense, but it doesn't currently make sense to me) and maybe the first bit of comment 36 (although if you've tested this a bit I'm more confident about that one)
Attachment #197942 - Flags: superreview?(dbaron)
Attachment #197942 - Flags: superreview-
Attachment #197942 - Flags: review?(dbaron)
Attachment #197942 - Flags: review-
Assignee: nobody → roc
(In reply to comment #34)
> Why should this chunk not be a loop or two further out?:
> +        if (!setBlockIsEmpty && blockIsEmpty) {
> +          // The first time we reach here is when this is the first block
> +          // and we have processed all its normal lines.
> +          setBlockIsEmpty = PR_TRUE;
> +          // All lines are empty, or we wouldn't be here!
> +          *blockIsEmpty = aRS.frame->IsSelfEmpty();
>          }

This block executes at the end of the first iteration of the outer 'for' loop. As the comment says, this the point where we have processed the normal lines of the first block, and no others; hence it's the right place to determine whether the first block is empty or not.
(In reply to comment #36)
> I'm a little scared of caching line iterators because we have so many weird
> "redo" bits of code scattered all over, but I suppose that should get a little
> better with the reflow branch so I suppose I shouldn't be so scared.

I only know of two kinds of 'redoing':
-- reiteration through the entire line list, as performed by shrinkwrap for example. This creates a new block reflow state.
-- repeated reflow of the current line.

In particular, I don't think we can go backwards in the same block reflow state. The cached line mLineAdjacentToTop will always be before the current line, so it should never be reflowed while set.
Created attachment 212997 [details] [diff] [review]
updated patch

Updated to comments.
Attachment #197942 - Attachment is obsolete: true
Attachment #212997 - Flags: superreview?(dbaron)
Attachment #212997 - Flags: review?(dbaron)
Comment on attachment 212997 [details] [diff] [review]
updated patch

I think the issue with the question I was asking is that it's not clearly defined what IsEmpty means for split frames -- my understanding was that IsEmpty really refers to whether the concatenation of all continuations is empty.  I guess I should have documented that better before.  Is that consistent with the way we use it?  Or do we use it in both ways?

Is a followup bug needed for this?  Or a change were I asked about which loop?  r+sr=dbaron anyway.
Attachment #212997 - Flags: superreview?(dbaron)
Attachment #212997 - Flags: superreview+
Attachment #212997 - Flags: review?(dbaron)
Attachment #212997 - Flags: review+
I see. That's a good question.

IsEmpty(), as you implemented it for blocks in 2001, only checks the lines of the given block frame. What I'm doing is consistent with that...

Whether that's the right thing, I'm not sure. I think I've always assumed it just tests the particular frame subtree. I think the way we use it is mostly consistent with that. In block-land we generally are testing the emptiness of previous lines which have already been reflowed (hence the way CachedIsEmpty works here) and therefore continuations are irrelevant --- blocks in previous lines must be complete, or we wouldn't be trying to reflow something new in the current line.

I certainly wouldn't be surprised if some of our code is confused on this issue, though.
checked in.
Status: NEW → RESOLVED
Last Resolved: 13 years ago
Resolution: --- → FIXED

Updated

13 years ago
Depends on: 328946
You need to log in before you can comment on or make changes to this bug.