Closed Bug 430332 Opened 12 years ago Closed 11 years ago

Hang in nsLineBreaker::AppendText with text-transform:uppercase and a paragraph full of links

Categories

(Core :: Layout: Text and Fonts, defect, P2, critical)

defect

Tracking

()

VERIFIED FIXED
mozilla1.9.2a1

People

(Reporter: dsmith, Assigned: roc)

References

Details

(4 keywords)

Attachments

(4 files, 4 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9pre) Gecko/2008042106 Minefield/3.0pre
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9pre) Gecko/2008042106 Minefield/3.0pre

When loading the page, the browser 'hangs' for a period of time proportional to the number of links contained within a malformed attempt to hide said links.  Hang time appears to grow at a rate greater than O(n) for the number of links.

In addition, when mousing over a link after the page has loaded the browser hangs again, usually longer than the initial page load hang.

Requires the malformed HTML (as seen in the attachment), a large number of links, and CSS rules that apply text-transform or text-decoration.

Reproducible: Always

Steps to Reproduce:
1. Will add attachment of sample file.

Actual Results:  
Browser hangs/is unresponsive for a period of time.

Expected Results:  
Browser should not hang.
Attached file Test case (obsolete) —
Note: when building the test file it caused my browser to hang for 35 seconds with the full 1500 links when loaded from a local file, but when loading the test case from the above attachment the browser hung indefinitely (killed process after 6 minutes).
Status: UNCONFIRMED → NEW
Ever confirmed: true
Hang with what stacks? You should be able to get some over the 35 seconds, even by hand using a debugger or stack-dumping tool. Poor man's profiling as it were.

/be
Flags: blocking-firefox3?
Flags: blocking-firefox3?
Product: Firefox → Core
QA Contact: general → general
(Restoring Brendan's blocking request, but I don't see how we could block on something like this at this stage - need more information.)
Flags: blocking1.9?
Severity: normal → critical
Keywords: hang, perf, testcase
Seems to be hanging in nsLineBreaker::AppendText, with 2/3 of that time spent in BuildTextRunsScanner::BreakSink::SetBreaks and 1/3 in nsLineBreaker::FlushCurrentWord.
Attachment #317082 - Attachment is obsolete: true
Component: General → Layout: Fonts and Text
OS: Windows XP → All
QA Contact: general → layout.fonts-and-text
Hardware: PC → All
Summary: Slow performance on web page with malformed HTML and lots of links → Hang in nsLineBreaker::AppendText with text-transform:uppercase and a paragraph full of links
Assignee: nobody → roc
Regression range:
http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&date=explicit&mindate=1182366840&maxdate=1182369239
Is a noticeable delay here on Vista with a big hourglass and gray-out but only for a few seconds.
On XP it is a more serious hang.
Is your XP machine slower?
(In reply to comment #9)
> Is your XP machine slower?
>
Yes, a lot. 

If it helps with any platform-specific stuff:

Using the simplified test case on Windows XP on a system with a Core2Duo 6750 (so not a slow machine), locally loaded it spikes the CPU for 6 seconds; loaded from the Bugzilla page it hangs for at least several minutes (killed process when I got tired of waiting).

That may be relevant to the difference Ria is seeing, though it's pretty much a repeat of what I said above on the original testcase.

It would also explain why I was considering it a complete program hang when I encountered the original web page, but not when I was trying to isolate the issue manually.
Trying to decide blocking status here.  How frequently is something like this used in the wild?  Is it a normal thing to do?  Need to know how bad this is from a UE point of view.
I don't think this happens frequently.  The testcase comes from the same hacked blog as bug 429110.  The hacker injected a bunch of "hidden" spam links, and the blog happened to have both a stylesheet specifying text-transform:uppercase and a markup typo that made the spam links visible.
In most normal pages, I'd say it's a non-issue.  Simply putting all the links into an unordered list (just added <li> before each link, surrounded by ul, and put the CSS rule in the header matched against li), there's no performance hit (loaded locally, so no guarantee there aren't additional network issues).  Any standard page with a long list of links is probably going to do something like that, and even if they have it in simple paragraph form it's the text-transform stuff that really kills it, which itself is not used terribly often. 

The fact that there's basicly no performance hit for having all the links in an unordered list, but a significant hit when in a paragraph, indicates something's not being handled properly in there somewhere; transforming the full page as straight text (back to the p page, removed <a prefixes) took slightly longer than the list version (I can actually see a CPU spike, where there was nothing notable for the list version), though is still a pretty minor hit on performance. 

It's somewhat annoying that someone could basically DOS the browser with such a simple page, but probably not worth blocking RC1 on. Will likely want to get it fixed before final, though.

No, we're not scheduling fixes for after RC -- RC means "release candidate", and so it has all the known issues addressed that we plan to address.  Defects found in RC-scale testing might cause us to effectively "pull the bits off the wire", but an RC isn't another beta, so not-for-RC means "not until 3.0.1 at the earliest".
What's happening is that we're appending a lot of nodes to the nsLineBreaker. Each text node results in a call to SetPotentialLineBreaks on the textrun. Because it's an nsTextRunTransformations textrun, we're rebuilding the entire textrun for each such call, giving classic O(N^2) performance.
So text-transform is a necessary part of this testcase, so we shouldn't worry about this much.
Flags: wanted-next+
Flags: blocking1.9?
Flags: blocking1.9-
Version: unspecified → Trunk
I've noticed this issue in a real-life situation. I've created a web application you can see at http://ww2.bloople.net . As far as I know, the HTML is valid, and the page doesn't hide/show content apart from adding more content. This is a huge blocker for me, on my Core 2 Duo 2.2ghz machine it hangs the entire browser (all windows etc.) for 3-10 seconds. Just my 2 cents.
You should be able to work around the bug by not using text-transform or avoiding having a really long paragraph.
(In reply to comment #19)
> You should be able to work around the bug by not using text-transform or
> avoiding having a really long paragraph.
> 

Thankyou. However I still believe this is a bug in Firefox, and workarounds should not be required when writing standards compliant pages. There's enough workarounds required for IE 6 already, I don't think requiring workarounds for Firefox is a good idea.
Yeah, it's definitely a bug in Firefox.  I didn't mean to imply otherwise.
(In reply to comment #21)
> Yeah, it's definitely a bug in Firefox.  I didn't mean to imply otherwise.
> 

Thanks again. Sorry if I had a go at you, the comments re bug/not bug were more directed at the earlier comments for this ticket.
Nobody is arguing that it's not a bug, anywhere that I can see.  People are discussing the severity of it, in the context of "all the bugs we could work on" and "whether to force a fix into the very late states of FF3".

If it weren't a bug, it would have been closed as INVALID or WONTFIX.
Sorry, I probably misunderstood the comments above a bit. I do hope this can be fixed before the 3.0 comes out. I'd look into it myself, but I'm unfamilier with the Mozilla codebase and my C++ skills are a bit rusty.
This isn't going to be fixed for 3.0, unfortunately.  3.0 is code-frozen at this point.
Blocks: 449046
Flags: blocking1.9.1?
Any chance this will be fixed for 3.1?  IE loads the testcase instantaneously, but Fx takes close to a minute.
Flags: blocking1.9.1? → wanted1.9.1+
I have a series of patches that fix this. One of them is my patch for bug 455826, which avoids reconstructing textruns in some situations when we're deleting empty continuations.
The biggest thing we need to do here is avoid rebuilding nsTransformedTextRuns too much. This patch does a few things:

1) Gut SetLineBreaks. Don't bother trying to transmit "actual line break" information to the underlying textruns. We currently don't use it for shaping so it's just a waste of time. If/when we get around to doing that, we'll need to resurrect this code and make it smarter to avoid full rebuilds.

2) Make SetCapitalization and SetPotentialLineBreaks not immediately do a rebuild, just set a flag so that after the linebreaker has done all its work, we call FinishSettingProperties to rebuild the nsTransformedTextRuns in one go.
Attachment #349153 - Flags: review?(smontagu)
Filed bug 465913 with a patch (somewhat riskier) that further reduces textrun reconstructions when empty nsContinuingTextFrames are deleted.
Depends on: 465913
Filed bug 465928 with a patch (also a bit risky) that reduces textrun reconstructions by avoiding SetInvalidateTextRun calls when removing empty frames from lines because a prev-in-flow is complete.

With these four patches combined, loading this testcase is pretty much instantaneous, and so is zooming in/out. Bug 449046 is also fixed.
Depends on: 465928
Whiteboard: [needs review]
In particular, loading the testcase results in exactly one call to nsCaseTransformTextRunFactory::RebuildTextRun for the one textrun that contains all the text. Ditto for zooming it.
Comment on attachment 349153 [details] [diff] [review]
avoid nsTransformedTextRun rebuilds

>-  virtual void SetCapitalization(PRUint32 aStart, PRUint32 aLength,
>-                                 PRPackedBool* aCapitalization,
>-                                 gfxContext* aRefContext);
>+  void SetCapitalization(PRUint32 aStart, PRUint32 aLength,
>+                         PRPackedBool* aCapitalization,
>+                         gfxContext* aRefContext);

Why this change?
Attachment #349153 - Flags: review?(smontagu) → review+
There's no need for it to be virtual. It's not defined anywhere else.
Whiteboard: [needs review] → [needs landing]
Upgrading to blocking since it blocks a blocking bug.
Flags: wanted1.9.1+ → blocking1.9.1+
Priority: -- → P2
Attached file valgrind warning
I believe this bug's patch was the one responsible for the reliable Linux pageload test crashes and intermittent cross-platform mochitest failures that caused the hg bundle containing this patch to get backed out.

Here's a valgrind warning I got when loading http://www.cnn.com/ using the revision that was crashy.  I believe this warning implicates this patch as the problematic one; I've thus just relanded the other four.
The line numbers in the warning above are from a tree with a clean build of
http://hg.mozilla.org/mozilla-central/rev/026147c91538 .
Whiteboard: [needs landing] → [needs new patch]
Attached patch new patch (obsolete) — Splinter Review
The problem was with textruns that we create for a short time just to work with the linebreaker, as controlled by mSkipIncompleteTextRuns. These textruns don't get attached to frames. We were deleting these textruns too early; we can't delete them until the linebreaker has been fully flushed and thus doesn't have a reference to a break sink that has a reference to one of these textruns.

This patch fixes that by deferring deletion of these temporary textruns until we've finished clearing out the break sinks. It also adds some code to tweak textrun flags on destruction so that a dead textrun will be detected via an assertion in BreakSink::Finish. Setting all possible flags also sends us down the transformed-textrun path so we crash hard if we have a dead textrun there.
Attachment #349153 - Attachment is obsolete: true
Attachment #350908 - Flags: review?(smontagu)
Whiteboard: [needs new patch] → [needs review]
Attachment #350908 - Flags: review?(smontagu) → review+
Bug 467150 still has to land before this is 100% fixed, but for now, I'm closing it since the major patch here has landed on trunk.
Status: NEW → RESOLVED
Closed: 11 years ago
Depends on: 467150
Resolution: --- → FIXED
Whiteboard: [needs review] → [needs 191 landing]
Backed out again. This time it was a mochitest crash and reftest crash on all platforms. E.g.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1229074833.1229077873.25966.gz#err0
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [needs 191 landing] → [needs new patch]
Builds with this seem to be extremely crash-happy.  It might be advisable to re-spin the nightlies.
The crashes can be stopped by turning 'Off'  JIT.content in about:config
(In reply to comment #42)
> The crashes can be stopped by turning 'Off'  JIT.content in about:config

The JIT setting have no impact on the crashes I am seeing.

Both my webmail and http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/ crash for me with both JIT settings set to false.
(In reply to comment #43)
> (In reply to comment #42)
> > The crashes can be stopped by turning 'Off'  JIT.content in about:config
> 
> The JIT setting have no impact on the crashes I am seeing.
> 
> Both my webmail and http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/
> crash for me with both JIT settings set to false.

I guess I lied.  I only see the crash on the nightly if I load it form stage.mozilla.org.  I have not been able to duplicate this on the mirror sites.

In any event visiting http://stage.mozilla.org/pub/mozilla.org/firefox/nightly/ with both JIT options set to false produced 

http://crash-stats.mozilla.com/report/index/00842cfe-be7b-49a4-bb5d-8b7172081212?p=1
My crash with squirellmail is this:

http://crash-stats.mozilla.com/report/index/a8c64d49-7296-4bf5-84ea-4e6362081212?p=1

which looks completely different
There are JIT-related crashers in the nightly that I think are a completely different bug (or set of bugs). The patch here definitely caused crashes of its own --- backing it out fixed Tinderbox.
the nightly I woke up to today was crashing every 5 minutes on this, specifically 

BuildTextRunsScanner::FlushFrames  	 layout/generic/nsTextFrameThebes.cpp:1183

I just got a nightly stability update, and the problem seems to have gone away.
Attached patch fix v4 (obsolete) — Splinter Review
I'm dumb. I was missing a mTextRunsToDelete.Clear() after we've iterated through mTextRunsToDelete deleting all the textruns.
Attachment #350908 - Attachment is obsolete: true
Attachment #353333 - Flags: review?(smontagu)
Whiteboard: [needs new patch] → [needs review]
Attachment #353333 - Flags: review?(smontagu) → review+
Whiteboard: [needs review] → [needs landing]
Pushed dcd1ad7a918e
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Whiteboard: [needs landing] → [needs 191 landing]
Backed out again due to test failure on "talos mozilla-central fast qm-plinux-fast03" --- crash in Tp.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1229565751.1229568451.18590.gz
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [needs 191 landing] → [needs new patch]
The crash was here:

NOISE: Cycle 2: loaded http://localhost/page_load_test/pages/www.aol.com/index.html (next: http://localhost/page_load_test/pages/www.apple.com/index.html)
Failed tp: 
		Stopped Wed, 17 Dec 2008 18:49:43
FAIL: Busted: tp
FAIL: browser crash
Attached patch fix v5Splinter Review
OK, so the problem was that some textruns were not being removed from the cache in the loop over mTextRunsToDelete. In fact we were sometimes destroying the scanner while mTextRunsToDelete was non-empty, so I added some assertions in the destructor that mTextRunsToDelete and the other arrays should be empty.

That was happening because BuildTextRuns sometimes exits without calling FlushFrames(PR_TRUE, ...) --- in particular, when "if (linesAfterStartLine >= NUM_LINES_TO_BUILD_TEXT_RUNS && scanner.CanStopOnThisLine())". In that case we intend to exit without building the final textrun, but we still need to do the flush-line-breaker cleanup. So I moved that cleanup to a separate function so we can call it from there.

I also added a call to SetUserData(nsnull) on the textruns that go into the mTextRunsToDelete array. This isn't strictly necessary but it's a good idea to avoid dangling pointers. This would have masked this crash, but the assertions I've added should detect similar problems in the future.
Attachment #353333 - Attachment is obsolete: true
Attachment #354641 - Flags: review?
Attachment #354641 - Flags: review+ → review?(smontagu)
I reran reftests, crashtests and the Tp2 page cycler on Mac and things look fine --- no crashes, no failures, and the new assertions didn't fire.
Whiteboard: [needs new patch] → [needs review]
Attachment #354641 - Flags: review?(smontagu) → review+
Whiteboard: [needs review] → [needs landing]
Pushed b6a4a9673190
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Whiteboard: [needs landing]
Whiteboard: [needs 191 landing]
Depends on: 472909
Regarding this being "[needs 191 landing]" -- this actually can't land yet.  Justification:
 - bug 465913's patch needs to land before this does [bug 465913 comment 3]
 - bug 465913's patch can't land until bug 470272 is fixed [bug 465913 comment 8]
 - bug 470272 isn't yet fixed.

So, we need to fix bug 470272 before this can land. (assuming my reasoning above is correct)
This bug and its related blocker bugs are all performance problems and not very common, so I'm going to take them off the blocker list.
Flags: blocking1.9.1+ → blocking1.9.1-
Whiteboard: [needs 191 landing]
Actually, I take that back. This patch *can* and should land even if we don't land the fix for bug 465913. It does help with performance on this testcase, even without bug 465913 fixed. (We do need the fix for bug 472909 of course.)
Flags: blocking1.9.1- → blocking1.9.1+
Whiteboard: [needs 191 landing]
I'm glad this will land, because it isn't just a perf problem; on my own machine, it's been a 30 second total browser hang (incl. chrome etc.). Couldn't a nefarious piece of JS do an update, wait 30 seconds or so, then do another update, to cause a continuous series of hangs? Would the "Script is taking a long time" dialog even come up in that case? If it's exploitable in this way, I think it's very important to fix.
(In reply to comment #58)
> Actually, I take that back. This patch *can* and should land 

Ok. Pushed to 1.9.1 branch:
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/5fd8f2c2303b

I landed the fix for bug 472909 in a separate changeset, in the same push.
Keywords: fixed1.9.1
Whiteboard: [needs 191 landing]
Verified on trunk and 1.9.1 with:

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090201 Minefield/3.2a1pre ID:20090201020604

Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2a1pre) Gecko/20090131 Minefield/3.2a1pre ID:20090131034630

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b3pre) Gecko/20090201 Shiretoko/3.1b3pre ID:20090201020520

Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1b3pre) Gecko/20090201 Shiretoko/3.1b3pre ID:20090201033606
Status: RESOLVED → VERIFIED
Target Milestone: --- → mozilla1.9.2a1
You need to log in before you can comment on or make changes to this bug.