Open Bug 112738 Opened 23 years ago Updated 2 years ago

large page with unescaped '<' in email addresses takes 137 seconds to load - locks browser until loaded (nested unclosed tags), plaintext served as HTML

Categories

(Core :: Layout, defect)

defect

Tracking

()

Future

People

(Reporter: jrspm, Unassigned)

References

()

Details

(Keywords: perf, Whiteboard: [WONTFIX 12/31/2002?][reflow-refactor])

Attachments

(4 files, 2 obsolete files)

Using build 2001112914 on Win2k(SP2)

I'm not sure that Parser is the right component for this, but I don't think it 
is layout either.

The page in the URL field took 137 seconds to load.  I tried it in IE5.5(SP2) 
and it took about 3 to 4 seconds to load.  BTW, I am on a cable modem.

The page has two tags:
<html>
<pre>

It has no other tags and no end tags for the tags that it has.  The two tags 
take up the first 3 lines of the file.  The rest of the file is plain text.  
The size of the file is about 311KB.

Steps to reproduce:
1. load file: http://www.theos.com/deraadt/coremail.html
2. Wait a loooooooooooong time (about 137 seconds) with all mozilla windows 
freezing during that time.
3. Finally view file.

Expected:
File should load in 3-4 seconds like IE5.5

Actual:
File takes 137 seconds to load

Jake
Since this page has unterminated tags, I'm guessing possible dupe of bug 84466.
This is _so_ layout based on the profile
Assignee: harishd → attinasi
Component: Parser → Layout
Keywords: perf
OS: Windows 2000 → All
QA Contact: moied → petersen
Hardware: PC → All
See also 73889.
I found the problem.  I removed all < and > from the document and time to 
load dropped from 54 seconds (Netscape 6.2) to less than a second.  When you have
a <pre> tag all further scanning of anything other than a </pre> should be
suspended.
No.  You can use <b>, <i>, <span style="font-color:red"> and so forth in the
<pre>....  So yes, the "<" should be removed (there is an evangelism bug on
that) but that is the responsibility of the site in this case.
The fact that IE can do the page 40-50 times faster than Mozilla suggests that
the responsibility should not be the site but the browser.

responsibility for correct markup is always the site.

this bug is open solely because we _should_ be able to do it faster.
Target Milestone: --- → mozilla1.1
Blocks: 56854
I accessed the URL:http://www.theos.com/deraadt/coremail.html ,
but not succeed.

Who can tell me where is the testcase now?
Because the bug cannot recur again,I think it should be closed.

Btw I create a html file(>320KB),which include only two tags: <html> and <pre> 
and a big block of plain text,but the phenomena didn't appear again.
I reproduced the bug.

Even though I download the whole file from the website,and opened it form local
machine, the time costed still is same to open the file from original website.
The time consumed is about 22s.

I tested the file using IE and netscape4.76,the time is only 2-3 seconds.
BTW, reproduced the bug using netscape 7.0 and mozilla 1.0 trunk20020618
I just reprofiled this and it looks about the same as my orginal profile.  The
complete lack of closing tags is causing us to create what is essentially a
linear content model, with a very large depth.  Since reflow has to walk down
the tree to the node being reflown and then back up to the root, construction of
such a content model will be O(n^2) in the number of nodes if you keep reflowing
it as you construct it.

We could probably make this fast by ignoring open tags inside <pre> if they were
not known elements.... but I question whether this one edge case is really worth it.
I have got a logfile about the reflow process of this file.
But it is too huge(>360M) to upload 
:(

you can produce the logfile according to
http://lxr.mozilla.org/seamonkey/source/layout/doc/frame_reflow_debug.html
As a note, I talked to Harish and ignoring these unknown elements is simply not
an option... Apparently sites actually depend on this sort of thing and ignoring
the elements would break them.
I think a simplified testcase is necessary to resolve this bug.
Can anyone provide one?
:)
After analyzed the testcase,I find the source of bug is the invalid tag,such
as<sys/syslog.h> and <Chris_G_Demetriou@LAGAVULIN.PDL.CS.CMU.EDU>.

we can verify it by deleting those tags.

I think the bug is related to the parser.
Because there exist many invalid tags,So change some of them into
eHTMLTag_text, especially such tags: <aaa.bbb.c@ee.ff.com> or <sys/file.h>.

The time consumed when	loading the testcase is almost same to IE now.
one idea: if parser meet <pre>, then it will skip all '<'  inside it.
just take '<' as one char, not begin of another html tag.
Not an option.  See comment 6.  In fact, read the whole bug, please.
Comment 19 would not work since markup is allowed within PRE.

harishd could probably comment better on the patch in comment 18, although I
suspect there are reasons we act as we do.
Attachment #92520 - Attachment is obsolete: true
a little modification
Isn't this going to create a bunch of non-existent tags in the page DOM? I think
we'd be better off just sending this to Tech Evangelism.
Assignee: attinasi → harishd
Component: Layout → Parser
QA Contact: petersen → moied
This bug was found relatively late, and there are no duplicates. Only one
non-top100 page exhibits this problem. The fix is not universal, and would slow
us down in certain cases. Based on this I also think this should be handled by
Evangelism. Reassigning.
Assignee: harishd → doron
Component: Parser → US General
Product: Browser → Tech Evangelism
QA Contact: moied → zach
Target Milestone: mozilla1.1alpha → ---
Version: other → unspecified
We already have an evang bug on this site -- bug 112802.  If we're not planning
to fix the core problem, whatever we determine that to be, we should just
wontfix this.
Assignee: doron → harishd
Component: US General → Parser
Product: Tech Evangelism → Browser
QA Contact: zach → moied
Version: unspecified → other
I vote for WONTFIX. heikki?
Hmm, let's make a compromise. I'll future this for now, and will wontfix this at
the end of the year if we don't find more sites with this problem. If we do find
more sites, we'll think of the fix again. That will also let Harish have a look
at the proposed fix (he is on sabbatical now).
Whiteboard: [WONTFIX 12/31/2002?]
Target Milestone: --- → Future
Should we push Evangelism team to take a look at this bug and bug 112802? It
seems that there are many such issues and fix them can give us big performance
improvement.
1) Because testcase(http://www.theos.com/deraadt/coremail.html) often is
unavialble, I upload the orignal one.

2) Becuase the limitation to the attachment is 300k,and the original testcase
is 305k,so I delete some pure text line in orginal testcase.

3) The attachment is the modified testcase.
How to do with this bug now?
new idea? or old one?
Comment on attachment 107551 [details] [diff] [review]
Changed some invalid tags into eHTMLTag_text (format is correct)

hi,harishd
The patch have been exist for 4 months, can you review it?
Attachment #107551 - Flags: review?(harishd)
Leon, your patch basically masks the real problem ( pointed out by boris ).
Anyway, based on your patch tag-like-syntax that contain '@', '/', or '.' will
be treated as text but what if my tag-like-syntax contained special characters
that are different from '@','/', or '.'?

Note: '.' and '-' are legal characters in a tag.
*** Bug 190267 has been marked as a duplicate of this bug. ***
Comment on attachment 107551 [details] [diff] [review]
Changed some invalid tags into eHTMLTag_text (format is correct)

This is scary. How much testing have you done? I really don't like the idea of
searching for a few special characters just to fix this bug. IMO, this is not a
fix but just a hack.
Attachment #107551 - Flags: review?(harishd) → review-
then, who can give a better solution? 
parser or layout should do something to do with this issue, I still prefer parser.
*** Bug 107733 has been marked as a duplicate of this bug. ***
Note that bug 107733 has this same issue with tags like <control> that could not
possibly be screened out in the parser....

Perhaps the real solution is to change the incremental loading logic to reflow
less often in cases like this....
Blocks: 61684, 81993
Summary: large plain text page takes 137 seconds to load - locks browser until loaded → large plain text page takes 137 seconds to load - locks browser until loaded (nested unclosed tags)
*** Bug 265619 has been marked as a duplicate of this bug. ***
*** Bug 239330 has been marked as a duplicate of this bug. ***
*** Bug 298422 has been marked as a duplicate of this bug. ***
Guys, this bug is one cause of many bad user experiences of Firefox sucking up
RAM and bringing systems to their knees "for no reason."  It appears that
Mozilla needs  logic to better handle bad web pages, of which there are many! 
Just surf with the HTML Validator extension and see for yourselves.

Considering that this issue has been abandoned, is there no desire to make
Firefox immune to these problems - in the form of more complete exception
handling (by which I mean being able to deal with anything that does not obey
the rules, rather than expecting the world-at-large to behave themselves) - the
root of most performance problems and exploits?
What exact solution are you proposing here?  Ignoring markup on the page?  Or what?
> What exact solution are you proposing here?  Ignoring markup on the page?  Or
what?

Well, it would be nice to have Firefox use much less memory to render documents,
as is apparent Microsoft was able to do with Internet Explorer.
Yeah, we could remove support for all sorts of DOM and CSS stuff (like
explorer), and then we could use a lot less memory to get the much more limited
functionality...
It's not like you would have to eliminate functionality.  Why allocate memory
for stuff you are not rendering?  Couldn't firefox use some sort of flag or
tag-based architecture that tells other modules not to allocate memory for stuff
that isn't there?  I'm not sure it makes much sense to expend 130+ MB of RAM
just to display text.

JMHO
> Why allocate memory for stuff you are not rendering?

Because you have to render it, because it may affect the positions of other
things that would end up in the viewport.  Look up CSS relative positioning, please.

> I'm not sure it makes much sense to expend 130+ MB of RAM just to display text.

Consider that you have to allocate various DOM objects for all that text, per
the DOM spec requirements.
> Consider that you have to allocate various DOM objects for all that text, per
the DOM spec requirements.

Then, ignoring for a moment that Microsoft does not always obey standards, how
is it that they are able to render and properly display the exact same page
using far less time, CPU power, and memory.  Firefox is not even close.
Because you've ignored the reason that they can do it.
"a little modification to original testcase" loads in about 20 seconds for me on trunk, about 10 seconds on the reflow branch...

For comparison, Opera 8.5 is at about 2 seconds.
Whiteboard: [WONTFIX 12/31/2002?] → [WONTFIX 12/31/2002?][reflow-refactor]
This page, which is the one that caused me to log this bug: 239330
makes 1.5.0.5 start running away with resources. It does eventually load.
But since it consumes 185MB who wants to keep ff running after that.
http://www.lambdacs.com/cpt/FAQ.html

(not sure, but there could be some irony in the fact that the page
is a thread programming FAQ....)
Summary: large plain text page takes 137 seconds to load - locks browser until loaded (nested unclosed tags) → large page with unescaped '<' in email addresses takes 137 seconds to load - locks browser until loaded (nested unclosed tags), plaintext served as HTML
Blocks: 216418
Assignee: harishd → nobody
QA Contact: moied → parser
Problem persists in Fx 3.5.3. Test URLs yield 100% CPU load, unresponsiveness and crashes.
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.4) Gecko/20100503 Firefox/3.6.4

http://www.theos.com/deraadt/coremail.html and http://www.lambdacs.com/cpt/FAQ.html still cause hangs and 100% CPU usage.
I can't switch tabs while either uri of comment 55 is loading, using Firefox 4.0 Beta 7 on Windows 7 (32 bit).
No longer blocks: 216418
Sending this back to layout based on the profiles. Whenever the parser tries to protect layout from certain kinds of input, unintended consequences follow.
Component: HTML: Parser → Layout
QA Contact: parser → layout
Severity: normal → S3

The severity field for this bug is relatively low, S3. However, the bug has 4 duplicates.
:dholbert, could you consider increasing the bug severity?

For more information, please visit auto_nag documentation.

Flags: needinfo?(dholbert)

The last needinfo from me was triggered in error by recent activity on the bug. I'm clearing the needinfo since this is a very old bug and I don't know if it's still relevant.

Flags: needinfo?(dholbert)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: