Closed Bug 477495 Opened 16 years ago Closed 3 years ago

Hang [@ nsBlockFrame::DoReflowInlineFrames] loading Tinderbox build log with reftest failures

Categories

(Core :: Layout, defect, P3)

x86
All
defect

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
blocking2.0 --- -
status2.0 --- wanted

People

(Reporter: jruderman, Unassigned)

References

()

Details

(Keywords: dogfood, hang)

Attachments

(1 file)

Attached file sample
Steps to reproduce: 1. Start loading http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1234111896.1234114660.22461.gz in one tab. 2. Try to browse the web with another tab. Result: After about 30 seconds, Firefox suddenly becomes completely unresponsive and must be force-quit. Waiting for it to finish does not seem to help. Expected: Firefox remains reasonably responsive until the page is finished loading.
Flags: blocking1.9.1?
Yeah, I've been seeing this for a while on Linux and meaning to investigate; I think it's specific to logs that have reftest failures in them.
Summary: Hang [@ nsBlockFrame::DoReflowInlineFrames] loading Tinderbox build log → Hang [@ nsBlockFrame::DoReflowInlineFrames] loading Tinderbox build log with reftest failures
For me (Mac debug), it's slow, but it doesn't hang.
I've hit situations with real logs where it takes 2-3 minutes and then becomes responsive again. (Linux debug)
Flags: blocking1.9.1? → blocking1.9.1+
Priority: -- → P2
Do you get equivalent hangs if you save the file locally and then load it? What if you load it, wait for it to unhang, and then zoom --- does that cause a comparable length hang?
For me, zooming --- which tests pure reflow of the whole document --- hangs for about 10 seconds, which sucks, but we spend all our time in reflow and textrun construction, and most of the latter is poking the textrun word cache and calling ATSUI. I don't think we're doing anything particularly stupid there, other than using ATSUI. If I load from a local file, which adds the parser to the mix, the content streams in more slowly than I'd like but we stay quite responsive the whole time. If I don't try to interact with it then it takes about 12 seconds to load. If I load from the URL, the content streams in even more slowly but we still seem to be responsive the whole time. It takes a couple of minutes to fully load the content and we reflow a very large number of times. But it takes curl 42 seconds to load that data from here, currently, so we're at worst 2-3x slower than the pure download rate. I'm testing responsiveness by holding down the spacebar in the window as it loads. We keep scrolling down, with short interruptions but never for too long. This is all with my debug build. I'm not sure how to proceed...
Oh, I'd also like to know whether a shift-reload behaves any differently from a normal load.
So, oddly enough, I can reproduce this in my main Firefox instance but not in a clean profile. I need to figure out whether that's because my main Firefox has 30 other windows open or whether it's something about the profile...
(In reply to comment #7) > So, oddly enough, I can reproduce this in my main Firefox instance but not in a > clean profile. I need to figure out whether that's because my main Firefox has > 30 other windows open or whether it's something about the profile... So I tried copying my preferences and sessionstore into a clean profile, and I still don't see the problem. I'm wondering if it has something to do with how long I've been running Firefox.
... and then I magically got my clean profile into a state where it shows the bug too, but I'm not sure what did it. I'll try deleting things from that profile until it goes away.
It stopped happening when I removed the prefs.js, but then it didn't start again when I put it back. Maybe there's something random outside the browser, like the sequencing / separation of packets coming from the tinderbox server.
So I think the bug happens more reliably if I load from a local file. The pref responsible is: user_pref("intl.charset.default", "UTF-8"); So my steps to reproduce the hang with a clean profile are: 1. save the tinderbox log in comment 0 to a file 2. create an empty profile directory, run ./firefox -profile <dir> and then quit to populate it, and then add the above user_pref line to the end of <dir>/prefs.js 3. run ./firefox -profile <dir> again, and load the file saved in step 1.
Or, alternatively: 1. save the tinderbox log in comment 0 to a file 2. edit the file and add <meta charset="UTF-8"> right after <HEAD> and then you have a testcase that hangs a clean profile.
And, I would further note: * the log *is* actually valid UTF-8, but the server sends it with no encoding header * it contains some Chinese: *** 7238 INFO TEST-PASS | /tests/content/base/test/test_bug416317-1.html | Element: ID selector using UTF8 (#root3 #台北Táiběi) *** 7239 INFO TEST-PASS | /tests/content/base/test/test_bug416317-1.html | Element: Multiple ID selectors using UTF8 (#root3 #台北Táiběi, #root3 #台北) *** 7240 INFO TEST-PASS | /tests/content/base/test/test_bug416317-1.html | Element: Descendant ID selector using UTF8 (#root3 div #台北) *** 7241 INFO TEST-PASS | /tests/content/base/test/test_bug416317-1.html | Element: Child ID selector using UTF8 (#root3 form &amp;gt; #台北)
When I load http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1234111896.1234114660.22461.gz Firefox gets halfway down the page, decides it is UTF-8 instead of ISO-8859-1, and *reloads* the page as UTF-8. Then it hangs for a while. I have auto-detect set to "Universal" (the default is off).
(Once bug 474657 is fixed, they will be sent as UTF-8, and everyone will get to enjoy this hang.)
#3 0x0227b2aa in MakeTextRun (aText=0xfba5008 "*** 20462 INFO TEST-PASS | /tests/content/html/content/test/test_bug375003-1.html | td5.scrollTop\n*** 20463 INFO TEST-PASS | /tests/content/html/content/test/test_bug375003-1.html | td5.scrollWidth\n**"..., aLength=513819, aFontGroup=0x971c660, aParams=0xbfff79bc, aFlags=16779392) at /Users/roc/mozilla-trunk/layout/generic/nsTextFrameThebes.cpp:430 This isn't good. The textruns should be limited to individual lines termined by hard linebreaks.
We're getting hammered by bidi resolution. I think that's why UTF-8 matters, it's giving us characters that enable bidi. We lay out the part of the document we've got, creating one textframe per line, with one textrun per line too. Then new content arrives, and because bidi is enabled, we have to do bidi resolution on the block that's got new inline content --- the one block that contains most of the text in the file. Bidi resolution basically goes through our lines and joins all the frames back together into one frame per text node (for LTR text, which is almost all of it) (there are many text nodes because of the 4096 byte parser limit, but that's not a problem here). This also means the textruns are now no good, because we can't have multiple textruns for a single frame, so we throw them out. Then of course we reflow the lines and we have to build new textruns. This is all O(N^2), and expensive O(N^2) too.
It should be possible to fix the bidi resolution code so it doesn't unnecessarily join up all the text frames. But I don't understand the bidi resolution code. Simon, do you think you could do this?
This might also be a good time to revisit the question of whether the bidi algorithm's scope should extend across hard newlines. IIRC IE doesn't, but we do to conform to the letter of the law?
(In reply to comment #19) > This might also be a good time to revisit the question of whether the bidi > algorithm's scope should extend across hard newlines. IIRC IE doesn't, but we > do to conform to the letter of the law? I think you're confusing two issues here. "IE doesn't, but we do to conform to the letter of the law" is an accurate answer to the question whether the bidi algorithm's scope extends across <br>s. To the question whether it extends across hard newlines in preformatted text, the answer is "IE doesn't, but we do because we suck" (bug 263359). I had a bash at fixing that some years ago, but got bogged down. I'll attack it again.
Depends on: 263359
(In reply to comment #18) > It should be possible to fix the bidi resolution code so it doesn't > unnecessarily join up all the text frames. But I don't understand the bidi > resolution code. Simon, do you think you could do this? That's also something I worked on in the past - bug 332655.
Depends on: 33265
Depends on: 332655
No longer depends on: 33265
Which one do you think is the best way to tackle this?
Hard call. Bug 263559 is likely to bring more performance gains, but getting to a working patch in bug 332655 is probably a lot easier.
On the assumption that we'll eventually want both, let's do 332655 then.
OS: Mac OS X → All
Whiteboard: [depends on 332655]
It's Simon's work that will fix this.
Assignee: roc → smontagu
Jesse, others: now that 332655 is fixed, we should be a lot faster here. Is it good enough to close the bug or at least take it off the blocker list?
WFM. Debug-build perf on this newer log containing a reftest failure is acceptable. http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1240012614.1240019653.8120.gz&fulltext=1
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → WORKSFORME
Doesn't work for me... now I end up hanging in AdvanceLineIteratorToFrame for a few minutes instead.
Reopening - the fix in bug 332655 wasn't sufficient. (I'm still seeing long hangs on linux.) See: http://groups.google.com/group/mozilla.dev.tech.layout/browse_thread/thread/9c9caee654762853#
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Whiteboard: [depends on 332655]
This hurts a fair bit for developers, and might be showing up elsewhere on the web too. Should we aim for fixing this for 2.0?
blocking2.0: --- → ?
It's pretty hard to fix, it means incrementalizing our bidi algorithm.
Depends on: 646359
Moving to p3 because no activity for at least 1 year(s). See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information
Priority: P2 → P3
QA Whiteboard: qa-not-actionable

The bug assignee didn't login in Bugzilla in the last 7 months.
:dholbert, could you have a look please?
For more information, please visit auto_nag documentation.

Assignee: smontagu → nobody
Flags: needinfo?(dholbert)

Looks like we don't have a testcase / affected URL at this point. The URLs posted earlier here are for tinderbox.mozilla.org which isn't around anymore.

The good news is that fission helps out here, by isolating the hang to only affect tabs for the origin in question.

--> Anyway, closing as INCOMPLETE given that we don't have functional STR at this point.

Status: REOPENED → RESOLVED
Closed: 15 years ago3 years ago
Flags: needinfo?(dholbert)
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: