477495 - Hang [@ nsBlockFrame::DoReflowInlineFrames] loading Tinderbox build log with reftest failures

Reporter

Description

•

16 years ago

Attached file sample — Details

Steps to reproduce: 1. Start loading http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1234111896.1234114660.22461.gz in one tab. 2. Try to browse the web with another tab. Result: After about 30 seconds, Firefox suddenly becomes completely unresponsive and must be force-quit. Waiting for it to finish does not seem to help. Expected: Firefox remains reasonably responsive until the page is finished loading.

Flags: blocking1.9.1?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 1

•

16 years ago

Yeah, I've been seeing this for a while on Linux and meaning to investigate; I think it's specific to logs that have reftest failures in them.

Summary: Hang [@ nsBlockFrame::DoReflowInlineFrames] loading Tinderbox build log → Hang [@ nsBlockFrame::DoReflowInlineFrames] loading Tinderbox build log with reftest failures

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 2

•

16 years ago

For me (Mac debug), it's slow, but it doesn't hang.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 3

•

16 years ago

I've hit situations with real logs where it takes 2-3 minutes and then becomes responsive again. (Linux debug)

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

16 years ago

Assignee: nobody → roc

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

16 years ago

Flags: blocking1.9.1? → blocking1.9.1+

Priority: -- → P2

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 4

•

16 years ago

Do you get equivalent hangs if you save the file locally and then load it? What if you load it, wait for it to unhang, and then zoom --- does that cause a comparable length hang?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 5

•

16 years ago

For me, zooming --- which tests pure reflow of the whole document --- hangs for about 10 seconds, which sucks, but we spend all our time in reflow and textrun construction, and most of the latter is poking the textrun word cache and calling ATSUI. I don't think we're doing anything particularly stupid there, other than using ATSUI. If I load from a local file, which adds the parser to the mix, the content streams in more slowly than I'd like but we stay quite responsive the whole time. If I don't try to interact with it then it takes about 12 seconds to load. If I load from the URL, the content streams in even more slowly but we still seem to be responsive the whole time. It takes a couple of minutes to fully load the content and we reflow a very large number of times. But it takes curl 42 seconds to load that data from here, currently, so we're at worst 2-3x slower than the pure download rate. I'm testing responsiveness by holding down the spacebar in the window as it loads. We keep scrolling down, with short interruptions but never for too long. This is all with my debug build. I'm not sure how to proceed...

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 6

•

16 years ago

Oh, I'd also like to know whether a shift-reload behaves any differently from a normal load.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 7

•

16 years ago

So, oddly enough, I can reproduce this in my main Firefox instance but not in a clean profile. I need to figure out whether that's because my main Firefox has 30 other windows open or whether it's something about the profile...

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 8

•

16 years ago

(In reply to comment #7) > So, oddly enough, I can reproduce this in my main Firefox instance but not in a > clean profile. I need to figure out whether that's because my main Firefox has > 30 other windows open or whether it's something about the profile... So I tried copying my preferences and sessionstore into a clean profile, and I still don't see the problem. I'm wondering if it has something to do with how long I've been running Firefox.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 9

•

16 years ago

... and then I magically got my clean profile into a state where it shows the bug too, but I'm not sure what did it. I'll try deleting things from that profile until it goes away.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 10

•

16 years ago

It stopped happening when I removed the prefs.js, but then it didn't start again when I put it back. Maybe there's something random outside the browser, like the sequencing / separation of packets coming from the tinderbox server.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 11

•

16 years ago

So I think the bug happens more reliably if I load from a local file. The pref responsible is: user_pref("intl.charset.default", "UTF-8"); So my steps to reproduce the hang with a clean profile are: 1. save the tinderbox log in comment 0 to a file 2. create an empty profile directory, run ./firefox -profile <dir> and then quit to populate it, and then add the above user_pref line to the end of <dir>/prefs.js 3. run ./firefox -profile <dir> again, and load the file saved in step 1.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 12

•

16 years ago

Or, alternatively: 1. save the tinderbox log in comment 0 to a file 2. edit the file and add <meta charset="UTF-8"> right after <HEAD> and then you have a testcase that hangs a clean profile.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 13

•

16 years ago

And, I would further note: * the log *is* actually valid UTF-8, but the server sends it with no encoding header * it contains some Chinese: *** 7238 INFO TEST-PASS | /tests/content/base/test/test_bug416317-1.html | Element: ID selector using UTF8 (#root3 #台北Táiběi) *** 7239 INFO TEST-PASS | /tests/content/base/test/test_bug416317-1.html | Element: Multiple ID selectors using UTF8 (#root3 #台北Táiběi, #root3 #台北) *** 7240 INFO TEST-PASS | /tests/content/base/test/test_bug416317-1.html | Element: Descendant ID selector using UTF8 (#root3 div #台北) *** 7241 INFO TEST-PASS | /tests/content/base/test/test_bug416317-1.html | Element: Child ID selector using UTF8 (#root3 form &gt; #台北)

Jesse Ruderman

Reporter

Comment 14

•

16 years ago

When I load http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1234111896.1234114660.22461.gz Firefox gets halfway down the page, decides it is UTF-8 instead of ISO-8859-1, and *reloads* the page as UTF-8. Then it hangs for a while. I have auto-detect set to "Universal" (the default is off).

Jesse Ruderman

Reporter

Comment 15

•

16 years ago

(Once bug 474657 is fixed, they will be sent as UTF-8, and everyone will get to enjoy this hang.)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 16

•

16 years ago

#3 0x0227b2aa in MakeTextRun (aText=0xfba5008 "*** 20462 INFO TEST-PASS | /tests/content/html/content/test/test_bug375003-1.html | td5.scrollTop\n*** 20463 INFO TEST-PASS | /tests/content/html/content/test/test_bug375003-1.html | td5.scrollWidth\n**"..., aLength=513819, aFontGroup=0x971c660, aParams=0xbfff79bc, aFlags=16779392) at /Users/roc/mozilla-trunk/layout/generic/nsTextFrameThebes.cpp:430 This isn't good. The textruns should be limited to individual lines termined by hard linebreaks.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 17

•

16 years ago

We're getting hammered by bidi resolution. I think that's why UTF-8 matters, it's giving us characters that enable bidi. We lay out the part of the document we've got, creating one textframe per line, with one textrun per line too. Then new content arrives, and because bidi is enabled, we have to do bidi resolution on the block that's got new inline content --- the one block that contains most of the text in the file. Bidi resolution basically goes through our lines and joins all the frames back together into one frame per text node (for LTR text, which is almost all of it) (there are many text nodes because of the 4096 byte parser limit, but that's not a problem here). This also means the textruns are now no good, because we can't have multiple textruns for a single frame, so we throw them out. Then of course we reflow the lines and we have to build new textruns. This is all O(N^2), and expensive O(N^2) too.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 18

•

16 years ago

It should be possible to fix the bidi resolution code so it doesn't unnecessarily join up all the text frames. But I don't understand the bidi resolution code. Simon, do you think you could do this?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 19

•

16 years ago

This might also be a good time to revisit the question of whether the bidi algorithm's scope should extend across hard newlines. IIRC IE doesn't, but we do to conform to the letter of the law?

Simon Montagu :smontagu

Comment 20

•

16 years ago

(In reply to comment #19) > This might also be a good time to revisit the question of whether the bidi > algorithm's scope should extend across hard newlines. IIRC IE doesn't, but we > do to conform to the letter of the law? I think you're confusing two issues here. "IE doesn't, but we do to conform to the letter of the law" is an accurate answer to the question whether the bidi algorithm's scope extends across <br>s. To the question whether it extends across hard newlines in preformatted text, the answer is "IE doesn't, but we do because we suck" (bug 263359). I had a bash at fixing that some years ago, but got bogged down. I'll attack it again.

Depends on: 263359

Simon Montagu :smontagu

Comment 21

•

16 years ago

(In reply to comment #18) > It should be possible to fix the bidi resolution code so it doesn't > unnecessarily join up all the text frames. But I don't understand the bidi > resolution code. Simon, do you think you could do this? That's also something I worked on in the past - bug 332655.

Depends on: 33265

Simon Montagu :smontagu

Updated

•

16 years ago

Depends on: 332655
No longer depends on: 33265

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 22

•

16 years ago

Which one do you think is the best way to tackle this?

Simon Montagu :smontagu

Comment 23

•

16 years ago

Hard call. Bug 263559 is likely to bring more performance gains, but getting to a working patch in bug 332655 is probably a lot easier.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 24

•

16 years ago

On the assumption that we'll eventually want both, let's do 332655 then.

Karl Tomlinson (:karlt)

Updated

•

16 years ago

OS: Mac OS X → All

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

16 years ago

Whiteboard: [depends on 332655]

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 25

•

16 years ago

It's Simon's work that will fix this.

Assignee: roc → smontagu

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 26

•

15 years ago

Jesse, others: now that 332655 is fixed, we should be a lot faster here. Is it good enough to close the bug or at least take it off the blocker list?

Jesse Ruderman

Reporter

Comment 27

•

15 years ago

WFM. Debug-build perf on this newer log containing a reftest failure is acceptable. http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1240012614.1240019653.8120.gz&fulltext=1

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → WORKSFORME

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 28

•

15 years ago

Doesn't work for me... now I end up hanging in AdvanceLineIteratorToFrame for a few minutes instead.

dwitte@gmail.com

Comment 29

•

15 years ago

Reopening - the fix in bug 332655 wasn't sufficient. (I'm still seeing long hangs on linux.) See: http://groups.google.com/group/mozilla.dev.tech.layout/browse_thread/thread/9c9caee654762853#

Status: RESOLVED → REOPENED

Resolution: WORKSFORME → ---

Whiteboard: [depends on 332655]

Johnny Stenback (:jst)

Comment 31

•

14 years ago

This hurts a fair bit for developers, and might be showing up elsewhere on the web too. Should we aim for fixing this for 2.0?

blocking2.0: --- → ?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 32

•

14 years ago

It's pretty hard to fix, it means incrementalizing our bidi algorithm.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Updated

•

14 years ago

blocking2.0: ? → -

status2.0: --- → wanted

Simon Montagu :smontagu

Updated

•

14 years ago

Depends on: 646359

Sylvestre Ledru [:Sylvestre]

Comment 33

•

6 years ago

Moving to p3 because no activity for at least 1 year(s). See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information

Priority: P2 → P3

RaresB

Updated

•

3 years ago

QA Whiteboard: qa-not-actionable

BugBot [:suhaib / :marco/ :calixte]

Comment 34

•

3 years ago

The bug assignee didn't login in Bugzilla in the last 7 months.
:dholbert, could you have a look please?
For more information, please visit auto_nag documentation.

Assignee: smontagu → nobody

Flags: needinfo?(dholbert)

Daniel Holbert [:dholbert]

Comment 35

•

3 years ago

Looks like we don't have a testcase / affected URL at this point. The URLs posted earlier here are for tinderbox.mozilla.org which isn't around anymore.

The good news is that fission helps out here, by isolating the hang to only affect tabs for the origin in question.

--> Anyway, closing as INCOMPLETE given that we don't have functional STR at this point.

Status: REOPENED → RESOLVED

Closed: 15 years ago → 3 years ago

Flags: needinfo?(dholbert)

Resolution: --- → INCOMPLETE