Last Comment Bug 725376 - Firefox hangs when visiting http://en.wikipedia.org/wiki/Blue_Dog_Coalition
: Firefox hangs when visiting http://en.wikipedia.org/wiki/Blue_Dog_Coalition
Status: VERIFIED FIXED
[inbound]
: hang
Product: Core
Classification: Components
Component: Layout (show other bugs)
: Trunk
: All All
: -- critical (vote)
: mozilla13
Assigned To: Mats Palmgren (vacation)
:
Mentors:
http://en.wikipedia.org/wiki/Blue_Dog...
Depends on: 822053
Blocks:
  Show dependency treegraph
 
Reported: 2012-02-08 09:06 PST by Swarnava Sengupta (:Swarnava)
Modified: 2012-12-16 06:54 PST (History)
9 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
windbg stacktrace (130.11 KB, text/plain)
2012-02-08 11:33 PST, Matthias Versen [:Matti]
no flags Details
Saved source to reproduce the hang in case the page changes (79.56 KB, text/html)
2012-02-08 11:40 PST, Matthias Versen [:Matti]
no flags Details
wip (4.06 KB, patch)
2012-02-08 14:33 PST, Mats Palmgren (vacation)
roc: review+
Details | Diff | Splinter Review
log (13.91 KB, text/plain)
2012-02-08 15:44 PST, Mats Palmgren (vacation)
no flags Details
addendum (2.16 KB, patch)
2012-02-13 06:40 PST, Mats Palmgren (vacation)
roc: review+
Details | Diff | Splinter Review
max 2 nested balancing levels (1.03 KB, patch)
2012-02-16 20:18 PST, Mats Palmgren (vacation)
roc: review+
Details | Diff | Splinter Review

Description Swarnava Sengupta (:Swarnava) 2012-02-08 09:06:29 PST
Firefox seems to hang permanently when I load the Blue Dog Coalition page on Wikipedia. I tried loading it with JavaScript disabled and in Safe Mode, to no avail. It loads fine in IE8. I tried saving the page from IE8 and then opening the html file in Fx, freeze. 
i can reproduce this issue on Firefox 10 and 11 beta 1
Comment 1 Thomas Ahlblom 2012-02-08 11:13:57 PST
Reproduced:
Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9.2.26) Gecko/20120128 Firefox/3.6.26
Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20100101 Firefox/10.0
Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20100101 Firefox/11.0
Mozilla/5.0 (X11; Linux x86_64; rv:12.0a2) Gecko/20120208 Firefox/12.0a2
Mozilla/5.0 (X11; Linux x86_64; rv:13.0a1) Gecko/20120208 Firefox/13.0a1
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.77 Safari/535.7

WFM:
Opera/9.80 (X11; Linux x86_64; U; en) Presto/2.10.229 Version/11.61

Observation:
Firefox hangs with 100% on one CPU core, but the amount allocated memory does no gallop.
Comment 2 Matthias Versen [:Matti] 2012-02-08 11:32:10 PST
The windbg log points to Layout
Comment 3 Matthias Versen [:Matti] 2012-02-08 11:33:33 PST
Created attachment 595469 [details]
windbg stacktrace
Comment 4 Matthias Versen [:Matti] 2012-02-08 11:40:16 PST
Created attachment 595472 [details]
Saved source to reproduce the hang in case the page changes
Comment 5 Boris Zbarsky [:bz] 2012-02-08 11:47:05 PST
The page is using column styles, but only in Gecko and WebKit.  That's what causes the hang, so it's not surprising there's no hang in IE8: it's getting a different page.

WebKit hangs on this page just like we do.

Looking at a profile, the callstack shows that we're spending all our time deep inside reflow of a columnset.  This columnset has another columnset as its ancestor, and so on, at least 6 levels deep.  I will bet money all of these are auto-height-balancing.

Reflow of a height-balancing of a columnset is O(N log(K)), iirc where N is the number of kids and K is the height (because we have to do a binary search to find the right height, basically).  Once you start nesting columnsets you can easily get to situations where the innermost block is being reflowed a huge number of times.  In this case that reflow is particularly expensive because list renumbering is involved; that's 30% of the profile right there.

roc, thoughts?
Comment 6 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-02-08 14:03:57 PST
We could probably optimize this by caching stuff on column frames --- say a map of (available height, available width) to balance height --- but it would be complex and fragile and I'm struggling to care.
Comment 7 Mats Palmgren (vacation) 2012-02-08 14:33:18 PST
Created attachment 595541 [details] [diff] [review]
wip

Perhaps we should simply refuse to balance more than N nested column sets?
Comment 8 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-02-08 14:49:18 PST
Comment on attachment 595541 [details] [diff] [review]
wip

Review of attachment 595541 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/generic/nsColumnSetFrame.cpp
@@ +375,5 @@
> +        ++cnt;
> +      }
> +    }
> +    if (cnt == MAX_NESTED_COLUMN_BALANCING) {
> +      numColumns = 1;

Why don't you just set isBalancing to false?
Comment 9 Mats Palmgren (vacation) 2012-02-08 15:44:49 PST
Created attachment 595562 [details]
log

> Why don't you just set isBalancing to false?

I tried three versions:
1. numColumns = 1
2. isBalancing = false
3. isBalancing = false; numColumns = 1;

The page layout looks normal with 1.
2. also fixes the bug, but it looks like the deep columns have zero width
so they have a single word per line for the affected content.
3. does not fix the bug

I've attached the output from the printf at the end of ChooseColumnStrategy
for comparison of the three versions.

Note that for 1. colWidth=71940 or thereabout,
for 2. some have colWidth=1 or very low,
for 3. we have colHeight=0
Comment 10 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-02-08 17:29:43 PST
Comment on attachment 595541 [details] [diff] [review]
wip

Review of attachment 595541 [details] [diff] [review]:
-----------------------------------------------------------------

ok
Comment 12 Phil Ringnalda (:philor, back in August) 2012-02-12 14:17:35 PST
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/0c1fdf386ac1 since I wasn't sure which reftest failures and which crashtest assertion count failures went with which cset in the push.
Comment 13 Mats Palmgren (vacation) 2012-02-13 05:50:27 PST
Thanks philor, sorry about the orange.
Comment 14 Mats Palmgren (vacation) 2012-02-13 06:40:35 PST
Created attachment 596673 [details] [diff] [review]
addendum

I forgot to initialize 'mIsColumnBalancing' in the ctor.
(I'll fold this into the earlier patch before landing but I figured it be
easier to review separately)

Try run is green this time:
https://tbpl.mozilla.org/?tree=Try&rev=a8e4006a6a88
Comment 15 Mats Palmgren (vacation) 2012-02-13 16:40:13 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/748efe64d073

QA: wikipedia has now fixed the page, use the "Saved source to reproduce the hang..."
attachment to verify.  (Good thinking, Matti!)
Comment 16 Marco Bonardo [::mak] (Away 6-20 Aug) 2012-02-14 02:34:49 PST
https://hg.mozilla.org/mozilla-central/rev/748efe64d073
Comment 17 Stefan 2012-02-14 10:45:29 PST
(In reply to Mats Palmgren [:mats] from comment #15)
rv. 62695656d7bd hangs on both
Using http://en.wikipedia.org/w/index.php?title=Blue_Dog_Coalition&oldid=475569463
and
the saved source.
Comment 18 Stefan 2012-02-14 10:47:05 PST
It takes about 30 s until Fx regains responsiveness.
Comment 19 Mats Palmgren (vacation) 2012-02-14 12:38:52 PST
> It takes about 30 s until Fx regains responsiveness.

On which platform?  On what kind of hardware?  I assume you're testing
with Mozilla Nightly builds?
Comment 20 Stefan 2012-02-14 13:20:19 PST
(In reply to Mats Palmgren [:mats] from comment #19)
> On which platform?  On what kind of hardware?

Linux x86_64

>  I assume you're testing with Mozilla Nightly builds?

No.  I have compiled the sources of the m-c tree revision 62695656d7bd.  I just confirmed it using a fresh build of revision 748efe64d073.
Comment 21 Matthias Versen [:Matti] 2012-02-14 13:56:33 PST
I also still see the hang using  Mozilla/5.0 (Windows NT 6.1; rv:13.0a1) Gecko/20120214 Firefox/13.0a1 SeaMonkey/2.10a1, Built from http://hg.mozilla.org/mozilla-central/rev/60edf587f4af
Comment 22 Mats Palmgren (vacation) 2012-02-14 14:25:13 PST
(In reply to Stefan from comment #20)
> (In reply to Mats Palmgren [:mats] from comment #19)
> > On which platform?  On what kind of hardware?
> 
> Linux x86_64

Thanks, but I was unclear, I need to know if this a phone/laptop/desktop/super-computer?
which brand/model of CPU(s), memory etc is it using?
"30 seconds" have no meaning otherwise.

> No.  I have compiled the sources of the m-c tree revision 62695656d7bd.  I
> just confirmed it using a fresh build of revision 748efe64d073.

OK, I appreciate the feedback, but I would prefer numbers from official Mozilla Nightly
builds so I know how the binary was built.   http://nightly.mozilla.org/
Comment 23 Mats Palmgren (vacation) 2012-02-14 14:25:46 PST
(In reply to Matthias Versen (Matti) from comment #21)
> I also still see the hang using  Mozilla/5.0 (Windows NT 6.1; rv:13.0a1)
> Gecko/20120214 Firefox/13.0a1 SeaMonkey/2.10a1, Built from
> http://hg.mozilla.org/mozilla-central/rev/60edf587f4af

That build doesn't have the fix as far as I can tell, it's built from mozilla-central
cset 86714, the fix is cset 86760.

If you re-test later, I'd like to know: For how long does it hang? on what hardware?
Comment 24 Stefan 2012-02-14 14:58:28 PST
(In reply to Mats Palmgren [:mats] from comment #22)
> Thanks, but I was unclear, I need to know if this a
> phone/laptop/desktop/super-computer?

Desktop

> which brand/model of CPU(s), memory etc is it using?

intel E6300, 4 GiB core

> > No.  I have compiled the sources of the m-c tree revision 62695656d7bd.  I
> > just confirmed it using a fresh build of revision 748efe64d073.
> 
> OK, I appreciate the feedback, but I would prefer numbers from official
> Mozilla Nightly
> builds so I know how the binary was built.   http://nightly.mozilla.org/

The latest nightly seems to be

http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/02/2012-02-14-03-12-27-mozilla-central/

and is build from revision 60edf587f4af. This does not seem to contain the 748efe64d073 changeset, right?
Comment 25 Mats Palmgren (vacation) 2012-02-14 15:22:16 PST
> intel E6300, 4 GiB core

Thanks, that gives me a data point I can use.

> and is build from revision 60edf587f4af

Correct, so it's not in Nightly yet.  If you can verify the result when the
fix becomes available in Nightly (probably the next one) that would be great.
Comment 27 Matthias Versen [:Matti] 2012-02-15 10:49:39 PST
I get a 20s hang with a Core i3 with 2.1Ghz CPU and todays nightly
Comment 28 Mats Palmgren (vacation) 2012-02-15 12:36:52 PST
I get a 14s hang using a Linux x86-64 Nightly on a Core i7 870 @ 2.93 GHz.
Hmm, perhaps it's still a bit too long, especially on lower-end CPUs...

MAX_NESTED_COLUMN_BALANCING=4 in a local build cuts the time in half, as expected.

I think the goal we want is to recover from the hang before the user kills the
application...  I think less than 10s is probably good enough for that purpose.

roc, what's the minimum nesting level we must support in your opinion?
Comment 29 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-02-15 13:18:00 PST
I've never seen anyone use nested columns in a real way. So I think supporting 2 would be enough.
Comment 30 Mats Palmgren (vacation) 2012-02-16 20:18:21 PST
Created attachment 598116 [details] [diff] [review]
max 2 nested balancing levels

OK, let's see if someone complains ;-)

Reftests are green on Try:
https://tbpl.mozilla.org/?tree=Try&rev=448fc22cd215
Comment 32 Ed Morley [:emorley] 2012-02-18 09:56:47 PST
https://hg.mozilla.org/mozilla-central/rev/f0ac2d720c7b
Comment 33 Matthias Versen [:Matti] 2012-02-18 20:39:33 PST
verified fixed with  Mozilla/5.0 (Windows NT 6.1; rv:13.0a1) Gecko/13.0a1 Firefox/13.0a1 SeaMonkey/2.10a1
The rendering is instant now on my system. I hope that nobody complains about the limit :-)

Mats: Thank you very much for this fix !

Note You need to log in before you can comment on or make changes to this bug.