Closed Bug 525960 Opened 15 years ago Closed 14 years ago

[HTML5] HTML5 parser doesn't reconstruct active formatting elements properly

Tracking

()

Status:

RESOLVED WORKSFORME

People

(Reporter: lists.bugzilla, Unassigned)

References

(
URL
)

Details

Attachments

(2 files)

A test case that should be simple to walk through using the spec. 15 years ago lists.bugzilla 6.14 KB, text/html		Details
Testcase which triggers quadratic memory usage 14 years ago Markus Kull 33.21 KB, text/html		Details

lists.bugzilla

Reporter

Description

•

15 years ago

User-Agent: Opera/9.80 (Windows NT 5.1; U; en) Presto/2.2.15 Version/10.00 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.4) Gecko/20091016 Firefox/3.5.4 The page in question, when parsed by the htmlparser-1.2.1.jar (available from http://about.validator.nu/htmlparser/) generates a document containing 1154329 font elements (as determined by document.getElementsByTagName('font').length). This is mostly because there's a big pile of font tags that are left open as the active formatting elements, and the big pile has to be cloned thousands of times because of all the div tags and whatnot. However, when I load this page in FF with the html5.enable about:config set to true, and then do alert(document.getElementsByTagName('font').length), it displays "1890". The HTML5 parser in FF doesn't seem to be doing the whole reconstruction of active formatting elements the same way that the validator.nu parser does and that the spec prescribes. Reproducible: Always Steps to Reproduce: 1. Load http://saintjohnchurchmiddletown.com/default.aspx 2. Load javascript:alert(document.getElementsByTagName('font').length) Actual Results: 1890 Expected Results: 1154329

Henri Sivonen (:hsivonen)

Updated

•

15 years ago

Summary: HTML5 parser doesn't reconstruct active formatting elements properly → [HTML5] HTML5 parser doesn't reconstruct active formatting elements properly

Henri Sivonen (:hsivonen)

Comment 1

•

15 years ago

I have no idea why this happens. Does the DOM itself have protections against this kind of thing regardless of what the parser tries to do?

Henri Sivonen (:hsivonen)

Comment 2

•

15 years ago

Do you have a test case that can be traced back to spec text? Currently the parser doesn't treat whitespace in the AAA correctly according to the html5lib test suite.

Priority: -- → P1

lists.bugzilla

Reporter

Comment 3

•

15 years ago

Attached file A test case that should be simple to walk through using the spec. — Details

Here you go. I played around with the number of font/p repetitions and noticed some very interesting behavior. In Firefox there was a turning point where it appears to have hit some hard limit and switched from a polynomial to a linear increase: Number of reps | font tags in FF | font tags per HTML5 parser ----------------+-------------------+------------------------------ 153 | 11934 | 11781 154 | 12089 | 11935 155 | 12245 | 12090 156 | 12402 | 12246 157 | 12403 | 12403 158 | 12404 | 12561

Boris Zbarsky [:bzbarsky]

Comment 4

•

15 years ago

> Does the DOM itself have protections against this kind of thing Not that I know of. Need to figure out what's going on here.

Blocks: html5-parsing

Status: UNCONFIRMED → NEW

Ever confirmed: true

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 5

•

15 years ago

Is the middle column in the table in comment 3 with or without HTML5 parser enabled? (I don't care what the behavior is with the old gecko parser is, but of course we want the new HTML5 parser to follow the spec)

lists.bugzilla

Reporter

Comment 6

•

15 years ago

With the HTML5 parser enabled.

Henri Sivonen (:hsivonen)

Comment 7

•

15 years ago

I'm going to claim this doesn't need to be researched and addressed in order to be able to flip the pref by default. It would be nice to figure out what's going on by release time, though.

Priority: P1 → P2

Henri Sivonen (:hsivonen)

Comment 8

•

15 years ago

It would even be possible to ship with this, IMO, although it would of course be preferable to figure this out before shipping.

Priority: P2 → P3

Henri Sivonen (:hsivonen)

Updated

•

15 years ago

Blocks: 580091

Henri Sivonen (:hsivonen)

Updated

•

14 years ago

Priority: P3 → P2

Henri Sivonen (:hsivonen)

Updated

•

14 years ago

No longer blocks: 580091

Henri Sivonen (:hsivonen)

Comment 9

•

14 years ago

(In reply to comment #0) > Actual Results: > 1890 > > Expected Results: > 1154329 With all the changes pending in my tree, I get 609115. This number is rather insane. It makes me wonder if we should change the spec to have limits on the growth of the list of formatting elements. Say, if the list length exceeds n (where n would be a carefully chosen number between 10 and 100), assume the page is never going to close active formatting elements and stop adding formatting elements to the list if there's already an element with same name and attributes on the list. With the attached test case, the parser seems to hit the deep nesting protection code.

Markus Kull

Comment 10

•

14 years ago

Attached file Testcase which triggers quadratic memory usage — Details

There should definitely be a limit to the size of reconstructed elements. The attached Testcase evades the deep nesting protection and causes ~100mb allocated memory out of just a 34kb file. Enlarging the Testcase shows a quadratic increase in memory usage. Tested in 4.0pre7

Henri Sivonen (:hsivonen)

Comment 11

•

14 years ago

(In reply to comment #10) > There should definitely be a limit to the size of reconstructed elements. > The attached Testcase evades the deep nesting protection and causes ~100mb > allocated memory out of just a 34kb file. Enlarging the Testcase shows a > quadratic increase in memory usage. The formatting element growth protection is meant to protect against author incompetence. It's not even supposed to protect against deliberate resource exhaustion attacks. Have you seen the pattern that the test case shows occur on the Web out of author incompetence? I'm resolving this bug as WFM, because the parser now behaves per spec. If additional protections against deliberate attacks are deemed necessary, let's open new bugs on those.

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → WORKSFORME

You need to log in before you can comment on or make changes to this bug.

Bugzilla

[HTML5] HTML5 parser doesn't reconstruct active formatting elements properly

Categories

(Core :: DOM: HTML Parser, defect, P2)

Tracking

()

People

(Reporter: lists.bugzilla, Unassigned)

References

(
URL
)

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Updated

Updated

Comment 9

Comment 10

Comment 11

Attachment

General

Description

File Name

Content Type