Open Bug 556305 Opened 15 years ago Updated 2 years ago

Firefox cpu usage goes to 100% when presented with mismatched html tables

Categories

(Core :: Layout: Tables, defect)

x86
All
defect

Tracking

()

Tracking Status
blocking2.0 --- -
blocking1.9.2 --- -

People

(Reporter: simonft, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: hang, regression, testcase)

Attachments

(6 files)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.3a4pre) Gecko/20100328 Minefield/3.7a4pre (.NET CLR 3.5.30729) Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.3a4pre) Gecko/20100328 Minefield/3.7a4pre (.NET CLR 3.5.30729) Going to the page linked here: http://www.gregmerideth.net/?p=87 (http://www.gregmerideth.net/html/iecrash.html) freezes firefox so that it has to be killed manually. Reproducible: Always Steps to Reproduce: 1. Go to http://www.gregmerideth.net/html/iecrash.html Actual Results: Firefox CPU went to 100 percent, had to be killled. Expected Results: An error, or some rendering of the page.
Confirmed: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.3a4pre) Gecko/20100331 Minefield/3.7a4pre Note that this page rendered properly in an earlier nightly. I'm chasing down the regression now.
Severity: normal → major
OS: Windows XP → All
Bug 54542 (and its dependencies) related?
Confirming. You need QA for anything else? Regression range: works: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2a1pre) Gecko/20090623 Minefield/3.6a1pre http://hg.mozilla.org/mozilla-central/rev/c575412d976a broken: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2a1pre) Gecko/20090624 Minefield/3.6a1pre http://hg.mozilla.org/mozilla-central/rev/5fe89f2c22f0 http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=c575412d976a&tochange=5fe89f2c22f0
Status: UNCONFIRMED → NEW
Ever confirmed: true
Component: General → Layout: Tables
Product: Firefox → Core
QA Contact: general → layout.tables
Awesome, thank you very much! I'm spinning up a debug build to find the offending changeset in that window. Bug 484121 looks promising, but I don't want to point fingers just yet ;)
Keywords: qawanted
Glad I didn't point any fingers! http://hg.mozilla.org/mozilla-central/rev/c6c685aa0379 was the changeset that caused this bug.
Nominating to block 1.9.2 because its a hang and crashes (and hangs) have been cracked down on recently. These all hang: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.3a5pre) Gecko/20100520 Minefield/3.7a5pre Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.5pre) Gecko/20100526 Namoroka/3.6.5pre
blocking1.9.2: --- → ?
Owen or Matt: can you post a testcase as an attachment on this bug, so we've got some immutable snapshot that is known to reproduce the issue?
Version: unspecified → Trunk
Attached file original testcase
Done.
Keywords: testcase
Interesting... For me, that testcase initially works fine -- it only hangs for a few seconds on the first load. But after a shift-reload, it hangs indefinitely (~90 seconds so far and still going). Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.3a5pre) Gecko/20100526 Minefield/3.7a5pre
In the interest of being thorough: I got the "original testcase" using "Save Link As..." because the page kept crashing firefox. I got this from opera because it wasn't crashing. I assume they are basically the same though.
Attached file smaller testcase
Slightly smaller than before and run through tidyHTML.
Happens with well-formed tables too. Minimal testcase that triggers the bug on my system.
Which builds? I can get recent 1.9.2 nightlies to break but haven't gotten trunk or 3.6.3 to break yet.
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.3a5pre) Gecko/20100529 Minefield/3.7a5pre Reload several times! (more often to trigger if I test from local disk) Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 Seems affected more often than trunk, especially when testing locally
Okay. Here is the range where it seems to have been fixed for me. Something might have landed on Windows 7 but not XP to fix it. Tentative partial fixed range - for attachment 448284 [details] in comment 12: broken: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.3a1pre) Gecko/20091002 Minefield/3.7a1pre http://hg.mozilla.org/mozilla-central/rev/0aeac787a86d works: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.3a1pre) Gecko/20091003 Minefield/3.7a1pre http://hg.mozilla.org/mozilla-central/rev/0a7dd88dbe67 http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=0aeac787a86d&tochange=0a7dd88dbe67
Would be nice to get a fix in 1.9.2, but doesn't appear to be a common enough case to "block" a release. Please request branch approval if you come up with a patch.
Blocks: 495385
blocking1.9.2: ? → -
Severity: major → critical
I'm confused. Comment 6 says this happens on mozilla-central, but comment 15 says it doesn't. Could somebody clarify?
Summary: Comment 6 is correct. Comment 15 is about the testcase in comment 12 which uses well-formed instead of mismatched tables. Longer breakdown: Comment 6: Found it hangs on trunk. Comment 11: Smaller testcase than original that reproduces original issue (should have mismatched tables like the original). Comment 12: A similar testcase but with a slightly different issue (he mentions well formed tables instead of mismatched). Comment 15: Tried to reproduce j.j.'s results using __his__ testcase (from comment 12) but trunk seemed to work fine for me. So I marked the range where __his__ testcase stopped working for me.
AMD CodeAnalyst profile on the following build for smaller testcase. Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6
Comment 18 was just summarizing what has already been said. Currently, the bug doesn't appear on trunk but does on 1.9.2. Fixed range: broken: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.3a5pre) Gecko/20100603 Minefield/3.7a5pre http://hg.mozilla.org/mozilla-central/rev/7c3924df9f92 works: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.3a5pre) Gecko/20100604 Minefield/3.7a5pre http://hg.mozilla.org/mozilla-central/rev/6dbc5341b490 http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=7c3924df9f92&tochange=6dbc5341b490
Bisection shows this changeset fixes the bug on mozilla-central: http://hg.mozilla.org/mozilla-central/rev/80e47aae2681 changeset: 43088:80e47aae2681 user: Henri Sivonen <hsivonen@iki.fi> date: Thu May 20 12:03:11 2010 +0300 summary: Bug 561874 - Make the HTML5 parser clip deep trees (similar to the old HTML parser) to avoid crashes in recursive code elsewhere. r=bzbarsky
That seems like it just covered up something somewhere... So if I understand this right: 1) The change to not create frames for whitespace textnodes caused the problem to appear. 2) In the modified testcase of comment 12 something (reflow interruption changes?) made the bug disappear. 3) The HTML5 parser's cutting off trees made the bug disappear. But #3 shouldn't be relevant for the observed issue on 1.9.2, since the old parser had such a cutoff and we don't use the html5 parser on that branch...
OK. I just tried a 1.9.2 branch build on "smaller testcase", which caused it to more or less hang. Stopping in a debugger, we seem to be in reflow. We also seem to be about 600 frames deep into the callstack. The testcase goes 164 tables deep, so about 164*4 (table, tbody, tr, td) nodes deep. Henri, where is the new parser cutoff? Does it apply to tables? The old one parser only applied a cutoff to non-structural elements, iirc...
So it seems like we should be having special height reflows all over the place here, right? In fact, 2^164 table reflows should happen or so? If not, what keeps it from happening, exactly? I don't see why this used to work back when we created whitespace textframes...
I should note that the special height reflow commentage seems to be out of sync with the code. There's no mSpecialHeightInitiator or IsPrematureSpecialHeightReflow() around, which sounded like they were designed to avoid the problem from comment 24...
Special height reflows changed pretty drastically around and after the reflow branch landed; I think that was code from before the changes, which I think made pretty much the same optimization that IsPrematureSpecialHeightReflow did, but in a different way (using the mVResize optimization) -- I don't think it was precisely equivalent, though. That said, I think table height code needs to be completely redesigned, and I'd actually sort of like to do it before IE7 disappears from the Web, because IE7-compatibility would actually probably be easier in a better model than compatibility with most other browsers. Bug 359481 has my thoughts on doing this.
Hmm. So what's the new setup for avoiding multiple special height reflows on nested kids? Am I just missing it?
Bug 438509 was where I added the optimizations that I think were mostly equivalent to IsPrematureSpecialHeightReflow, but maybe not completely. I removed IsPrematureSpecialHeightReflow on the reflow branch at least partly because I could no longer figure out all cases of whether we'd need a special-height reflow before the main reflow; we sometimes determine it afterwards. We might be able to add back a concept of special-height reflow initiator, but we'd need to be more careful with it than the pre-reflow-branch code because today's special-height reflows are not guaranteed to reflow all descendants (and in most cases, don't). So we'd only propagate it in cases where there was a height dependency.
(In reply to comment #27) > Hmm. So what's the new setup for avoiding multiple special height reflows on > nested kids? Am I just missing it? The basic answer is, I think, that we're not as good at avoiding them, for the benefit of making special height reflows being much cheaper in the general case (particularly when not all descendants are height-dependent). Though we could probably do both.
(In reply to comment #23) > Henri, where is the new parser cutoff? At tree builder stack depth of 200. > Does it apply to tables? It tries to avoid cutting the tree at table, tfoot, thead, tbody, tr, or colgroup, so it should be waiting until it gets into a td before it cuts. (Where cutting means throwing away elements but not text.) Note that the depth limit code has a known DoS problem that I thought was not worthwhile to fix: Once the tree builder throws elements away, it keeps the text content but leaves the coalescing of that text to the main thread where the value of the most recent text node keeps extended multiple times. In my testing, this could become really slow in pathological cases (i.e. a multimegabyte run of <tag>text<tag>text<tag>text...), but at least on fast machines, the event loop kept spinning to allow the user to press the stop button, so I figured it was not worth optimizing for this case. http://hg.mozilla.org/mozilla-central/rev/80e47aae2681#l4.55
Picture shows the difference between builds before and after the changeset in comment 21 landed. I thought it might be helpful to some.
QA Whiteboard: qa-not-actionable

This and bug 792508 (and bug 1725781) are likely all the same root issue.

Severity: critical → S3
status2.0: wanted → ---
See Also: → 792508
See Also: → 1725781
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: