Linux Mozilla allocates 324MB to load a page

VERIFIED FIXED

Status

()

Core
HTML: Parser
P3
major
VERIFIED FIXED
18 years ago
18 years ago

People

(Reporter: tenthumbs, Assigned: rickg)

Tracking

Trunk
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [PDT+], URL)

Attachments

(2 attachments)

(Reporter)

Description

18 years ago
Page is about 377K yet Linux Mozilla allocates 324MB and it actually uses 311MB.
The page is just a manual. There doesn't seem to be anything odd about it

Comment 1

18 years ago
Looks like a dup of bug 21637 ("Large/long pages won't load")
(Reporter)

Comment 2

18 years ago
No, it's bad HTML. There are tons of <A NAME="foo"> tags with no </A> tags. I
guess Mozilla is nesting each anchor inside the previous one with the insane
results.

Feeding the page to this perl script:

#!/usr/bin/perl -w

while (<>)
{
    print;
    if (/<A NAME=/)
    {
        $_ = <>;
        print $_, "</A>\n";
    }
}

fixes the problem.

I guess this is a quirks issue because I can't find a current browser that fails
so spectacularly.

Comment 3

18 years ago
This is not a layout bug, it's a parser problem (marking as such).

Here's what happends.

The reason that mozilla can't handle this page is that the parser is
building an incorrect and very much overcomplicated content model.

This is basically what the gnuplot page contains (simplified testcase)

<HTML>
<BODY>
<A NAME="35">
<PRE>
  data
</PRE>
<p>text
<p>more text
<A NAME="135">
<PRE>
  data
</PRE>
<p>text
<p>more text
<A NAME="235">
<PRE>
  data
</PRE>
<p>text
<p>more text
<A NAME="335">
<PRE>
  data
</PRE>
<p>text
<p>more text
</BODY>
</HTML>

Seems like there are two major problems with how the parser handles
this, one problem is that the parser doesn't close tags (PRE is one of
the problem tags on the gnuplot page) correctly in this case (if you
cut down the gnuplot page to something like 50k you'll see (after you
wait a while) that the page loads but everything is displayd as PRE
text) and the other problem is that due to the opening <A NAME="xxx">
(without the missing </A>) tags we end up with a residual style stack
that grows deeper and deeper for every <A NAME="xxx"> the parser sees.

So, the content model for the upper sample HTML is this:

docshell=0x81fcb80
html@0x823c564 refcount=5<
  head@0x81ce3e4 refcount=2<
  >
  Text@0x83a8f38 refcount=3<\n>
  body@0x83b6994 refcount=3<
    Text@0x84124a0 refcount=3<\n>
    a@0x8412dd4 name=35 refcount=3<
      Text@0x83300c8 refcount=3<\n>
    >
    pre@0x833003c refcount=3<
      a@0x825feb4 name=35 refcount=4<
        Text@0x8264098 refcount=4<  data\n\n>
      >
      p@0x82640ec refcount=3<
        a@0x8264154 name=35 refcount=3<
          Text@0x8264290 refcount=3<text\n>
        >
      >
      p@0x8264324 refcount=3<
        a@0x8264364 name=35 refcount=3<
          Text@0x83896a0 refcount=3<more text\n>
        >
        a@0x8389704 name=35 refcount=3<
          a@0x838984c name=135 refcount=3<
            Text@0x83899b0 refcount=3<\n>
          >
        >
      >
    >
    pre@0x8365124 refcount=3<
      a@0x8365164 name=35 refcount=4<
        a@0x836528c name=135 refcount=4<
          Text@0x83653c8 refcount=4<  data\n\n>

...

    pre@0x82673a4 refcount=3<
      a@0x826c864 name=35 refcount=4<
        a@0x826c98c name=135 refcount=4<
          a@0x826cab4 name=235 refcount=4<
            a@0x826cbfc name=335 refcount=4<
              Text@0x826cd38 refcount=4<  data\n\n>
            >
          >
        >
      >
      p@0x826cd9c refcount=3<
        a@0x826cddc name=35 refcount=3<
          a@0x826cf24 name=135 refcount=3<
            a@0x826d06c name=235 refcount=3<
              a@0x826d1b4 name=335 refcount=3<
                Text@0x826d2f0 refcount=3<text\n>
              >
            >
          >
        >
      >
      p@0x826d394 refcount=3<
        a@0x826d3d4 name=35 refcount=3<
          a@0x826d51c name=135 refcount=3<
            a@0x826d664 name=235 refcount=3<
              a@0x826d7ac name=335 refcount=3<
                Text@0x83fd908 refcount=3<more text\n>
              >
            >
          >
        >
      >
    >
  >
>

And then the further we go most of the tags end up with things like
<p><a><a>... with as many <a>'s as there are <A NAME="xxx"> tags
before the tag in the file, a quick grep tgrough the file shows that
there are ~350 of them and thus we end up chewing up loads of memory
for the frames of all these <a> tags when viewing this page.

The cool thing is that the layout system seems to be able to handle
this if you have the resources...

As a quick hack sollution to this problem I created a patch that
closes all <A> tags *without* HREF attribute immediately after they've
been opened (hey, that wouldn't actually be such a bad thing for the parser
would it?). I'll attach the patch.
Assignee: troy → rickg
Component: Layout → Parser
OS: Linux → All
QA Contact: petersen → janc
Hardware: PC → All

Comment 4

18 years ago
Created attachment 5705 [details] [diff] [review]
Quick hack fix.

Comment 5

18 years ago
Created attachment 5708 [details]
"pre close problem demo"

Comment 6

18 years ago
The "pre close demo" attachment contains this HTML

<HTML>
<BODY>
<A HREF="foo"><PRE>pre text</PRE> normal text</A>
</BODY>
</HTML>

Loading this in mozilla shows that the parser doesn't close the PRE tag in
this case either (look at the font). Here's the content model generated from the
above HTML

docshell=0x81fc548
html@0x8240104 refcount=5<
  head@0x82d11f4 refcount=2<
  >
  Text@0x8242f88 refcount=3<\n>
  body@0x83a0d5c refcount=3<
    Text@0x83c5688 refcount=3<\n>
    a@0x83bca24 href=foo refcount=3<
    >
    pre@0x83bccac refcount=3<
      a@0x83bccec href=foo refcount=3<
        Text@0x83e16a8 refcount=3<pre text normal text>
      >
      Text@0x82436e0 refcount=3<\n>
    >
  >
>

Hopefully this will help to solving the problem.

Comment 7

18 years ago
pretty severe results if you try to load this page on linux.
here is the posting of test results.

yes,  bad juju on the gnuplot page that seamonkey does not like.
http://www.ucc.ie/gnuplot/gnuplot.html

on the current win32 and linux builds I didn't see the large
memory hogging, but cpu use went to 100% for a very long
time.  I eventually had to kill off the mozilla process with the
win95 task manager because it was not responding,
and I actually had to hit the old power switch on my linux box
because the loading the page seemed to have driven my
whole system into the ground.  (hp vectra - 200 mhz)

equivalent or larger, but more  simple, pages using
http://komodo.mozilla.org/buster/mkpg.cgi?lines=160000
seem to load ok.

beta1?
Keywords: beta1
(Reporter)

Comment 8

18 years ago
Just to note that closing all the <A NAME="..."> tags and loading the document
drops Mozilla's footprint down to ~28MB which is just about what it usually is.

Comment 9

18 years ago
rickg, need an estimate please.
Whiteboard: [NEED INFO]
(Assignee)

Comment 10

18 years ago
Ok -- there's a moderate chance that this is the same problem as 3944. I'm 
trying to land that tonight (or tomorrow) -- and once I do I'll revisit this. 
It's likely a 1 to 2 day project, given my schedule.
Status: NEW → ASSIGNED

Comment 11

18 years ago
Closing unterminated <A HREF...> tags was described in bug 2406, could the fix
for this one be similar ?
Or rather, should the fix for that one close all anchor tags, in stead of just
the hrefs ?
(Assignee)

Comment 12

18 years ago
It looks like I have a fix, but I need to spend a bit more time testing. I'll 
give a new status tomorrow.

Comment 13

18 years ago
PDT+
Whiteboard: [NEED INFO] → [PDT+]
(Assignee)

Comment 14

18 years ago
This bug is related to bug3944, which I've landed a fix for (which corrected the 
residual style bug). I'll do more testing to see what that fix does in this 
case in terms of memory.

Comment 15

18 years ago
I just loaded the gnuplot page and things look normal now, mozilla used ~40Mb
of memory when loading that page after doing some other surfing, I'd say this is
fixed.

Comment 16

18 years ago
Works for me too, the page loaded in 11.5 secs with build 2000-02-27-16 on NT,
which seems pretty good considering the filesize.
(Assignee)

Comment 17

18 years ago
I believe this is fixed per my checkin this weekend.
Status: ASSIGNED → RESOLVED
Last Resolved: 18 years ago
Resolution: --- → FIXED
(Reporter)

Comment 18

18 years ago
Certainly seems to be fixed. The bad HTML page takes just a little longer to
load and seems to use a little more space compared to a fixed page, but it's not
serious.

Comment 19

18 years ago
loads fine for me, so based on this and comments of others I'm marking this 
verified.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.