Closed Bug 61842 Opened 24 years ago Closed 24 years ago

Performance degradation: HTML parsing of comments/scripts

Categories

(Core :: DOM: Core & HTML, defect, P2)

x86
Windows 95
defect

Tracking

()

CLOSED DUPLICATE of bug 56624

People

(Reporter: bht237, Assigned: harishd)

References

Details

(Keywords: perf, testcase)

Attachments

(1 file)

This simple attached HTML file demonstrates a shocking bug in Netscape 6 and
Mozilla build 2000111704:

It appears that HTML parsing does not stop within non-HTML portions
of the document such as scripts and comments.

In practical terms, the "<" character and the -"- (double quote character)
cause considerable problems. I have not tested other cases.

On a 300MHz Windows 95 PC, this file takes 24,700 ms to reload
from a local disk.

In comparison, Navigator 4 needs 220ms, Internet Explorer 4 needs 50ms.

This test case is derived from a problem in a live web application.

In this application, Netscape/Mozilla typically shows a SUBSTANTIAL
performance degradation of one order of magnitude slower compared
with Netscape 4.

This does of course depend on the content, but obviously enough,
-"- and "<" characters are heavily used in scripts and commented-out HTML.

As a proof, please replace "<abc>" with aaaaaaa: 270ms on a 300MHz PC.
Attached file Test case
Keywords: perf, testcase
Priority: P3 → P2
Didn't try changing the <abc>, but the attachment took 25000ms to load on Linux
2000120206.  Yikes!
//a="<abc>"

or

a="<abc>"

makes no difference except the minor (in comparison with the bug related delay)
additional JavaScript processing. I think this test variation is a good proof of
the real life significance of the bug.

However:

a="<abc"

(closing ">" missing) does not cause the delay.

In general, the whole issue is very relevant for JavaScript document.write().

I noticed that Mozilla's JavaScript engine is not too bad but until now I have
wondered why it did not perform at all in real life scenarios.

This is a total knockout and I am looking forward to the real thing.
How much better could Mozilla be without this ??? ETA?

Blocks: 23187
Blocks: 29805
This is a parser performance problem, not a DOM problem. Reassigning to the
parser owner.
Assignee: jst → harishd
Status: UNCONFIRMED → NEW
Ever confirmed: true
Here is the one line change to fix the performance problem:

Index: nsHTMLTokens.cpp
===================================================================
RCS  /cvsroot/mozilla/htmlparser/src/nsHTMLTokens.cpp,v
retrieving revision 3.177
diff -u -w -r3.177 nsHTMLTokens.cpp
--- nsHTMLTokens.cpp	2000/11/02 22:20:28	3.177
+++ nsHTMLTokens.cpp	2000/12/05 21:41:46
@@ -642,6 +642,7 @@

//theTermStrPos=theBuffer.RFind(aTerminalString,PR_TRUE,tempOffset,termStrLen+2);

theTermStrPos=theBuffer.RFind(aTerminalString,PR_TRUE,tempOffset,tempOffset-(theCurrOffset-2));
 //bug43513...
           if(theTermStrPos>-1) break;
+          theCurrOffset=tempOffset;
           tempOffset++;
         }
         else break;

Note: This exact change will not land because a lot of parser performance work,
which has addressed problems like these, will be landing soon and therefore the
fix will not look the same.
Status: NEW → ASSIGNED

*** This bug has been marked as a duplicate of 56624 ***
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → DUPLICATE
Verifying as a duplicate of 56624 (general bug for poor parser performance)
'needs a lot of time to load the page'
Status: RESOLVED → VERIFIED
Works like a charm. VERY COOL!
Status: VERIFIED → CLOSED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: