Closed Bug 525960 Opened 15 years ago Closed 14 years ago

[HTML5] HTML5 parser doesn't reconstruct active formatting elements properly

Categories

(Core :: DOM: HTML Parser, defect, P2)

x86
Windows XP
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: lists.bugzilla, Unassigned)

References

()

Details

Attachments

(2 files)

User-Agent:       Opera/9.80 (Windows NT 5.1; U; en) Presto/2.2.15 Version/10.00
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.4) Gecko/20091016 Firefox/3.5.4

The page in question, when parsed by the htmlparser-1.2.1.jar (available from http://about.validator.nu/htmlparser/) generates a document containing 1154329 font elements (as determined by document.getElementsByTagName('font').length). This is mostly because there's a big pile of font tags that are left open as the active formatting elements, and the big pile has to be cloned thousands of times because of all the div tags and whatnot.

However, when I load this page in FF with the html5.enable about:config set to true, and then do alert(document.getElementsByTagName('font').length), it displays "1890". The HTML5 parser in FF doesn't seem to be doing the whole reconstruction of active formatting elements the same way that the validator.nu parser does and that the spec prescribes.

Reproducible: Always

Steps to Reproduce:
1. Load http://saintjohnchurchmiddletown.com/default.aspx
2. Load javascript:alert(document.getElementsByTagName('font').length)
Actual Results:  
1890

Expected Results:  
1154329
Summary: HTML5 parser doesn't reconstruct active formatting elements properly → [HTML5] HTML5 parser doesn't reconstruct active formatting elements properly
I have no idea why this happens. Does the DOM itself have protections against this kind of thing regardless of what the parser tries to do?
Do you have a test case that can be traced back to spec text? Currently the parser doesn't treat whitespace in the AAA correctly according to the html5lib test suite.
Priority: -- → P1
Here you go. I played around with the number of font/p repetitions and noticed some very interesting behavior. In Firefox there was a turning point where it appears to have hit some hard limit and switched from a polynomial to a linear increase:

Number of reps  |  font tags in FF  |  font tags per HTML5 parser
----------------+-------------------+------------------------------
   153          |      11934        |           11781
   154          |      12089        |           11935
   155          |      12245        |           12090
   156          |      12402        |           12246
   157          |      12403        |           12403
   158          |      12404        |           12561
> Does the DOM itself have protections against this kind of thing

Not that I know of.

Need to figure out what's going on here.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Is the middle column in the table in comment 3 with or without HTML5 parser enabled?

(I don't care what the behavior is with the old gecko parser is, but of course we want the new HTML5 parser to follow the spec)
With the HTML5 parser enabled.
I'm going to claim this doesn't need to be researched and addressed in order to be able to flip the pref by default. It would be nice to figure out what's going on by release time, though.
Priority: P1 → P2
It would even be possible to ship with this, IMO, although it would of course be preferable to figure this out before shipping.
Priority: P2 → P3
Blocks: 580091
Priority: P3 → P2
No longer blocks: 580091
(In reply to comment #0)
> Actual Results:  
> 1890
> 
> Expected Results:  
> 1154329

With all the changes pending in my tree, I get 609115. This number is rather insane. 

It makes me wonder if we should change the spec to have limits on the growth of the list of formatting elements. Say, if the list length exceeds n (where n would be a carefully chosen number between 10 and 100), assume the page is never going to close active formatting elements and stop adding formatting elements to the list if there's already an element with same name and attributes on the list.

With the attached test case, the parser seems to hit the deep nesting protection code.
There should definitely be a limit to the size of reconstructed elements. 
The attached Testcase evades the deep nesting protection and causes ~100mb allocated memory out of just a 34kb file. Enlarging the Testcase shows a quadratic increase in memory usage.

Tested in 4.0pre7
(In reply to comment #10)
> There should definitely be a limit to the size of reconstructed elements. 
> The attached Testcase evades the deep nesting protection and causes ~100mb
> allocated memory out of just a 34kb file. Enlarging the Testcase shows a
> quadratic increase in memory usage.

The formatting element growth protection is meant to protect against author incompetence. It's not even supposed to protect against deliberate resource exhaustion attacks.

Have you seen the pattern that the test case shows occur on the Web out of author incompetence?

I'm resolving this bug as WFM, because the parser now behaves per spec. If additional protections against deliberate attacks are deemed necessary, let's open new bugs on those.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.