Closed Bug 1014252 Opened 6 years ago Closed 6 years ago

Firefox 28+ hangs on a specific combination of DOM manipulation

Categories

(Core :: Layout: Text and Fonts, defect)

29 Branch
x86_64
Linux
defect
Not set

Tracking

()

VERIFIED FIXED
mozilla32
Tracking Status
firefox31 + verified
firefox32 --- verified

People

(Reporter: martinez.novo+bugzilla, Assigned: roc)

References

Details

Attachments

(2 files, 1 obsolete file)

Attached file FirefoxHangBug.zip
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:29.0) Gecko/20100101 Firefox/29.0 (Beta/Release)
Build ID: 2014042500

Steps to reproduce:

On a web application using ASP.Net developed in our company, I've seen a strange combination that causes Firefox to hang.

Our application, when the user clicks on a button, ti shows a in-browser-popup window (not a window, but an absolute positioned DOM element inside the page that looks like it) with a textarea filled with text. That "popup" is generated on an Ajax call. But the ajax is not really relevant here, since apparently just the JavaScript that makes the DOM element look like a "popup" is what makes Firefox to hang.

There's a test case attached. I've removed all the sensitive information and left with the minimal requirements to work, just some scripts that are for core ASP.Net JS and AjaxControlToolkit, all in debug mode (non-minimized) required to reproduce the bug.

Hit the button, it should make the DOM element that's at the end of the page to be positioned at the center of the screen, but on Firefox 29, and 28 it causes Firefox to hang indefinitely, while on Firefox 27 it works instantaneously, so it seems to be a regression since 28. This happens in my machine running Windows 7, which is a bit slow. There, it doesn't show the dialog stating that there's a busy script, with the option to stop the script or continue. Just hangs forever.

Now I'm testing on 1.29 in linux, on a very fast machine, and it hangs during 10 seconds until it displays the dialog about unresponsive script. If I hit "continue", it ends the execution and leaving the DOM as if it never hanged.

Notable things:
- If I reduce the amount of text in the textarea (it's near the end of the page) before hitting the button, it doesn't hang.
- With the JavaScript debugger, adding some breakpoints in the _layout function of ModalPopup.ModalPopupBehavior.debug.js (around line 495), so it stops on them, and resuming the execution, doesn't cause the hang. Makes me think of some race condition when manipulating CSS properties.

Analyzing that part of code, I don't see any loop that could cause an infinite loop or something like this.


Actual results:

Firefox hangs (on a slow machine without displaying the dialog about busy script)


Expected results:

Script should execute without hanging the browser, since it doesn't seem to be an expensive operation or long-running task
Component: Untriaged → DOM
Product: Firefox → Core
All the time is spent creating textruns during a reflow triggered by getting .offsetHeight.
Component: DOM → Layout: Text
Status: UNCONFIRMED → NEW
Ever confirmed: true
The first bad revision is:
changeset:   153326:046d7a70ef24
user:        Robert O'Callahan <robert@ocallahan.org>
date:        Mon Sep 09 17:08:41 2013 -0700
summary:     Bug 745485. Optimize positioning offset changes whenever the computed size does not change. r=dholbert
Flags: needinfo?(roc)
Attachment #8426550 - Attachment mime type: application/x-zip-compressed → application/java-archive
Attached patch fix (obsolete) — Splinter Review
This works for now and should fix all platforms.

I wonder how to handle this in a future world where we only have Moz2D. I think we probably need to expose a method DrawTarget::ShouldSnapToDevicePixels which tells the caller whether to use snapping or not.
Assignee: nobody → roc
Attachment #8427646 - Flags: review?(matt.woodrow)
Comment on attachment 8427646 [details] [diff] [review]
fix

Sorry, attached to the wrong bug.
Attachment #8427646 - Attachment is obsolete: true
Attachment #8427646 - Flags: review?(matt.woodrow)
We lay out the element once, which works fine, and then something changes (CSS or DOM, I'm not sure what) and we lay it out again, apparently with more width. The element has a 300K chunk of text with no newlines. which has been broken into lots of textframes in lots of lines. The second reflow is incredibly slow because the block frame is dirty, so we have to reflow all lines, and we recreate the (single) textrun every few lines. So, why are we doing that?

After reflowing line N, in nsTextFrame::SetLength we often see that the frame for line N+1 is no longer needed because all the text in line N+1 got pulled into line N (because the element is getting wider, I guess). So we end up calling RemoveInFlows(text-frame-N+1, text-frame-N+2). Which calls text-frame-N+1->ClearTextRuns(), which blows away the textrun being used by all the lines :-(.

I don't know what this has to do with the fix for bug 745485, but it removed some reflows which may help the frame tree get into the state it's now in.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #7)
> ...Which calls
> text-frame-N+1->ClearTextRuns(), which blows away the textrun being used by
> all the lines :-(.

That seems like a (separate) bug to me.  Several of the call sites seems to
assume 'frame->ClearTextRuns()' will clear the text run only for 'frame' and
its next-continuations, not all frames using the same text run.
The documentation even says so:
"Clears out |mTextRun| ... from all frames that hold a
reference to it, starting at |aStartContinuation|, or if it's
nullptr, starting at |this|."
http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsTextFrame.h?rev=edfbdecd9e82#478

That said, we apparently have some consumer(s) that depend on the current
behavior:
https://tbpl.mozilla.org/?tree=Try&rev=5a4eb0c6f723

We should probably sort this out in a separate bug though.
Comment on attachment 8428490 [details] [diff] [review]
1014252-1

>+static bool
>+TextFrameIsMentionedInTextRunUserData(nsTextFrame* aFrame)

I think this would be better as a method in nsTextFrame.h:
  bool IsInTextRunUserData() const { ... }

(dropping "Mentioned" since it just makes me puzzled what that
means in terms of code. It also results in a name similar to
the frame bit name: TEXT_IN_TEXTRUN_USER_DATA)


>+  NS_ASSERTION(!TextFrameIsMentionedInTextRunUserData(this),

I'd prefer MOZ_ASSERT to make sure it's noticed.
Attachment #8428490 - Flags: review?(matspal) → review+
I think we should land this on Aurora. It's kind of a bad performance problem/regression.
https://hg.mozilla.org/mozilla-central/rev/f7d67e4e0aa6
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
Comment on attachment 8428490 [details] [diff] [review]
1014252-1

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Some combination of 745485 and 791601
User impact if declined: Breakage on a few sites, slowdowns on other sites
Testing completed (on m-c, etc.): none, though we have a lot of automated test coverage in this area
Risk to taking this patch (and alternatives if risky): patches in this area are usually risky. Hence requesting only Aurora approval
String or IDL/UUID changes made by this patch: none
Attachment #8428490 - Flags: approval-mozilla-aurora?
Comment on attachment 8428490 [details] [diff] [review]
1014252-1

Approved for Aurora.
Attachment #8428490 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0

Reproduced the hang and then the unresponsive script dialog on Firefox 29 .

Using Firefox 31 beta 4 and latest Aurora 32.0a2 20140623004002 the DOM element is instantly placed at the center of the screen and can be scrolled, no hangs or other issues. Marking as verified.
Status: RESOLVED → VERIFIED
Keywords: verifyme
QA Contact: petruta.rasa
Thanks for the quick fix!
You need to log in before you can comment on or make changes to this bug.