Closed
Bug 525960
Opened 15 years ago
Closed 14 years ago
[HTML5] HTML5 parser doesn't reconstruct active formatting elements properly
Categories
(Core :: DOM: HTML Parser, defect, P2)
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: lists.bugzilla, Unassigned)
References
()
Details
Attachments
(2 files)
User-Agent: Opera/9.80 (Windows NT 5.1; U; en) Presto/2.2.15 Version/10.00
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.4) Gecko/20091016 Firefox/3.5.4
The page in question, when parsed by the htmlparser-1.2.1.jar (available from http://about.validator.nu/htmlparser/) generates a document containing 1154329 font elements (as determined by document.getElementsByTagName('font').length). This is mostly because there's a big pile of font tags that are left open as the active formatting elements, and the big pile has to be cloned thousands of times because of all the div tags and whatnot.
However, when I load this page in FF with the html5.enable about:config set to true, and then do alert(document.getElementsByTagName('font').length), it displays "1890". The HTML5 parser in FF doesn't seem to be doing the whole reconstruction of active formatting elements the same way that the validator.nu parser does and that the spec prescribes.
Reproducible: Always
Steps to Reproduce:
1. Load http://saintjohnchurchmiddletown.com/default.aspx
2. Load javascript:alert(document.getElementsByTagName('font').length)
Actual Results:
1890
Expected Results:
1154329
Updated•15 years ago
|
Summary: HTML5 parser doesn't reconstruct active formatting elements properly → [HTML5] HTML5 parser doesn't reconstruct active formatting elements properly
Comment 1•15 years ago
|
||
I have no idea why this happens. Does the DOM itself have protections against this kind of thing regardless of what the parser tries to do?
Comment 2•15 years ago
|
||
Do you have a test case that can be traced back to spec text? Currently the parser doesn't treat whitespace in the AAA correctly according to the html5lib test suite.
Priority: -- → P1
Reporter | ||
Comment 3•15 years ago
|
||
Here you go. I played around with the number of font/p repetitions and noticed some very interesting behavior. In Firefox there was a turning point where it appears to have hit some hard limit and switched from a polynomial to a linear increase:
Number of reps | font tags in FF | font tags per HTML5 parser
----------------+-------------------+------------------------------
153 | 11934 | 11781
154 | 12089 | 11935
155 | 12245 | 12090
156 | 12402 | 12246
157 | 12403 | 12403
158 | 12404 | 12561
![]() |
||
Comment 4•15 years ago
|
||
> Does the DOM itself have protections against this kind of thing
Not that I know of.
Need to figure out what's going on here.
Is the middle column in the table in comment 3 with or without HTML5 parser enabled?
(I don't care what the behavior is with the old gecko parser is, but of course we want the new HTML5 parser to follow the spec)
Reporter | ||
Comment 6•15 years ago
|
||
With the HTML5 parser enabled.
Comment 7•15 years ago
|
||
I'm going to claim this doesn't need to be researched and addressed in order to be able to flip the pref by default. It would be nice to figure out what's going on by release time, though.
Priority: P1 → P2
Comment 8•15 years ago
|
||
It would even be possible to ship with this, IMO, although it would of course be preferable to figure this out before shipping.
Priority: P2 → P3
Updated•14 years ago
|
Priority: P3 → P2
Comment 9•14 years ago
|
||
(In reply to comment #0)
> Actual Results:
> 1890
>
> Expected Results:
> 1154329
With all the changes pending in my tree, I get 609115. This number is rather insane.
It makes me wonder if we should change the spec to have limits on the growth of the list of formatting elements. Say, if the list length exceeds n (where n would be a carefully chosen number between 10 and 100), assume the page is never going to close active formatting elements and stop adding formatting elements to the list if there's already an element with same name and attributes on the list.
With the attached test case, the parser seems to hit the deep nesting protection code.
Comment 10•14 years ago
|
||
There should definitely be a limit to the size of reconstructed elements.
The attached Testcase evades the deep nesting protection and causes ~100mb allocated memory out of just a 34kb file. Enlarging the Testcase shows a quadratic increase in memory usage.
Tested in 4.0pre7
Comment 11•14 years ago
|
||
(In reply to comment #10)
> There should definitely be a limit to the size of reconstructed elements.
> The attached Testcase evades the deep nesting protection and causes ~100mb
> allocated memory out of just a 34kb file. Enlarging the Testcase shows a
> quadratic increase in memory usage.
The formatting element growth protection is meant to protect against author incompetence. It's not even supposed to protect against deliberate resource exhaustion attacks.
Have you seen the pattern that the test case shows occur on the Web out of author incompetence?
I'm resolving this bug as WFM, because the parser now behaves per spec. If additional protections against deliberate attacks are deemed necessary, let's open new bugs on those.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•