Closed Bug 84645 Opened 24 years ago Closed 23 years ago

Lockup on DOM processing with large TD widths

Categories

(Core :: Layout, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla0.9.5

People

(Reporter: swh, Assigned: waterson)

References

()

Details

(Keywords: hang, Whiteboard: QAHP)

Attachments

(18 files, 1 obsolete file)

12.76 KB, text/plain
Details
13.11 KB, text/plain
Details
2.28 KB, text/rdf
Details
1.04 KB, patch
Details | Diff | Splinter Review
3.60 KB, text/plain
Details
17.55 KB, text/html
Details
20.61 KB, text/html
Details
1.20 KB, text/html
Details
853 bytes, patch
Details | Diff | Splinter Review
50.43 KB, patch
Details | Diff | Splinter Review
50.44 KB, patch
Details | Diff | Splinter Review
128.85 KB, patch
Details | Diff | Splinter Review
142.53 KB, patch
Details | Diff | Splinter Review
145.67 KB, patch
Details | Diff | Splinter Review
107.39 KB, patch
Details | Diff | Splinter Review
138.03 KB, patch
Details | Diff | Splinter Review
138.81 KB, patch
Details | Diff | Splinter Review
100.36 KB, patch
dbaron
: review+
Details | Diff | Splinter Review
From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux 2.4.2-2smp i686; en-US; rv:0.9.1) Gecko/20010607 BuildID: 2001060713 When the DOM is modified in a certain way on the example page the browser goes into a tight loop and stops responding to mouse events etc. Reducing the WIDTH attribute of the TD element on line 148 removes this problem. eg. http://inanna.ecs.soton.ac.uk/~swh/moz-java-nocrash.html Unfortunatly I'm having problems reporducing the lockup outside of my software, so it requires some installation to reproduce. Reproducible: Always Steps to Reproduce: 1. Go to http://cohse.semanticweb.org/downloads/xul-dls/ follow instructions and install 2. Open the sidebar and select the DLS tab. 3. Go to http://inanna.ecs.soton.ac.uk/~swh/moz-java-crash.html 4. Select "Settings | Generic Links" in the sidebar 5. Press reload Actual Results: Several debug messages will go out on stdout, ending with "Added N links" (probably 6), the browser will then go into a tight loop. Expected Results: After the "Added N links" message the browser should be idle and there will be N small L icons in the document. I have been developing this software for some time and this is the only page I have experienced this bug on. Unfortunatly it is critical for me as I am developing an Open Hypermedia system for java.sun.com and many important pages on the site trigger this bug.
What is mozilla doing when it locks up, can you stop in a debugger and see what it's up to?
Attached file Stack trace
Attached file Stack trace
I tried stopping mozilla twice when it locks up, and I got to different traces, though both in the reflow code. Stack traces attached.
The attached stacktraces show that mozilla is off in layout land when it "locks up", reassigning to layout.
Assignee: jst → karnaze
Component: DOM Core → Layout
QA Contact: desale → petersen
Reassigning to ftang.
Assignee: karnaze → ftang
Confirming in June 13th build.
Status: UNCONFIRMED → NEW
Ever confirmed: true
The stack trace look very very similar to 85487. reassign this to jbetak. mark it moz0.9.2 and P1
Assignee: ftang → jbetak
Priority: -- → P1
Target Milestone: --- → mozilla0.9.2
I can't mark dependency on bug 85487 just yet - there seems to be a bit of a trouble with DLS sidebar installation in my private build...
Might have been a server problem. Did you get the DLS sidebar header and a spinner on white? If so, leave this for up to 30 seconds or so (incase the ontology server needs to do a cold start). However the instalation does use a hack, the chrome:// url, as there is no way to add yourself to the sidebar from install.js, so it might not work 100% If there is anyhting I can do to help, please ask. Attaching working panels.rdf
but hang as the keyword
Keywords: hang
can we reproduce this one now?
Not yet - I'll try again.
Frank, I used swh's panels.rdf to bypass the sidebar panel installation and it still locks up (it doesn't crash) - even with our new patch for bug 85487. I'll investigate some more.
Status: NEW → ASSIGNED
OK, it seems that the while loop in nsTextFrame.cpp never ends. The line below keeps returning the same frame. Maybe a null check ((nsnull != aNextFrame)) is not enough, how about making sure we don't end up in an endless loop by comparing the new frame to the old one? 5350 // Move on to the next frame in the text-run 5351 aNextFrame = aLineLayout.FindNextText(aPresContext, aNextFrame);
this seems to fix the lock-up. Good idea, bad idea?
uhm, I'm just looking at nsLineLayout::FindNextText to see, whether its frame walking logic is kosher. Seems like it can return the same frame we pass in instead of the next adjacent frame. ccing ftang, hyatt and waterson
Can you construct a frame hierarchy where FindNextText() will return the same frame?
waterson: I'll follow-up with swh and try to simplify the current test case. At this point it involves his sidebar panel, which is kinda nasty. Stephen: thanks for suypplying the panels.rdf - that did the trick for me. Chris Waterson might help us to figure out, why the layout code is indefinitely looping through one and the same frame, when using your sidebar panel. While perusing through your code in dls.jar, I'm trying to understand, what exactly happens, when reloading the test page with "Generic Links" turned on. I'm trying to see, how we could simplify this into a lightweight test case and possibly attach itin the form of a zip archive to the bug.
waterson, per ftang's suggestion, I'm sending this over to you. The bug is reproducible per Stephen Harris's instructions and I can help to set his panel up, if needed. I'm also actively working on the simplification of this test scenario, so please bear with us.
Assignee: jbetak → waterson
Status: ASSIGNED → NEW
since this is a separate issue from bug 85487, changig target to 0.9.3
Target Milestone: mozilla0.9.2 → mozilla0.9.3
Sorry, yes, the code is very complicated, there is lots of extra stuff that isn't stricly neccesary to this bug. I tried writing a simple test case but couldn't reproduce it. It seems to be very sensitive to the size of the reinserted text relative to the size of the removed text. I will try a closer simulation. The algorithm is: decend DOM tree if text node foreach generic link (word or phrase) store end position(s) in text node foreach reverse sorted postion split text node, insert [L] icon PS Any ideas about what causes the panels.rdf failure? Is it likly to cause me problems in releases?
OK, I have a much simplified, non-sidebar testcase: http://inanna.ecs.soton.ac.uk/~swh/mozbugs/generic-link.html Clicking on the html links loads the apropriate page into the frame, clicking "Apply Links" tries to add links to the document. The source is in http://inanna.ecs.soton.ac.uk/~swh/mozbugs/generic-link.js and downloadable versions are in http://inanna.ecs.soton.ac.uk/~swh/mozbugs/ I haven't run a dubugger but the symptoms seem the same (100% cpu use on one cpu), browser locked up.
adding myself to the cc list
Stephen, thanks for your follow-up - this is great help! I'm still working on a simple one-page text case, which I could attach here. Please disregard my attempts so far, it's not quite finished.
OK, there is a one-page version at http://inanna.ecs.soton.ac.uk/~swh/mozbugs/one-page.html [attached], but the HTML is not minimal.
Status: NEW → ASSIGNED
Attached file minimized test case
So what's happening here is that we're getting a ContentInserted() that ends up being tageted at an inline frame in a continued frame; i.e., something like this: td< line< inline(font)@A next=B< text< " " > > > line< inline(font)@A prev=B< inline(a)< text< blah > > > > > In this heirarchy, if we try to insert something into the <font> after the <a>, we'll end up screwing up the frame heirarchy because parentFrame will point to inline(font)@A! From reading the code, it looks like someone put the comments here to make the right thing happen, but forgot to write the code? (I'll do a bit of archaeology to see what happened...)
Keywords: patch
Ok, looks like I'll be reverting some of the changes that hyatt made in r1.563 of nsCSSFrameConstructor.cpp (which it looks like I sr='d for bug 69142). hyatt, got any complaints?
Wait. Do we know why my changes caused a regression?
hyatt: I think my comments on 2001-06-26 16:16 explain why this regression was caused. The code that I want to ``put back'' (actually, I wrote the code before I realized I was simply putting back code you took out) makes sure that |parentFrame| is properly set if we end up in a continued frame. In the case that we're inserting before/after a frame that's in a continuation, we need |parentFrame| to be the geometric parent, not the primary frame.
FWIW, I don't think my changes will undo the Mac menu goodness. It seems like the primary point of that patch was to avoid grovelling around for a parent if the frame had no parent to start with. (In retrospect, perhaps this particular bit of code got nuked because we collectively managed to forget about continuing frames, and just thought it was extraneous?)
Well, if the patch doesn't regress bug 69142, I say [s]r=attinasi (if you want me to check the Mac lemme know...)
r=rbs (marc, please go ahead and check the mac so that we can all sleep well -- you know how tricky and sneaky layout is :-)
Yeah, I think it'll be a bit difficult to notice if this regresses the Mac; cc'ing pinkerton who discovered bug 69142 in the first place...
I just talked with hyatt about this and he is okay with it too. Fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
This should probably be considered for the NS6.1 branch.
Keywords: nsBranch
we love crash fixes on the branch... reopening to get this one there.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: PDT+
Fixed on branch, too, r1.602.2.2
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
The patch for this bug caused a regression resulting in mostfreq bug 88073. Reopening.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Frames does not work.After working with moz 2001071103 and the fix of bug 88073, and the reopening of 84645, the /Debug/../frames does not work. Looks like a resize problem. Some URL´s does not display frames...
alberto, have a test url? thanks.
Is this fixed on the branch? What's left to do? (We're running out of time!)
I don't currently have a good patch for this bug. I've removed the ``patch'' keyword, but am actively working on this.
Status: REOPENED → ASSIGNED
Keywords: patch
OS: Linux → All
Hardware: PC → All
I'm going to remove the pdt+ from this bug because no fix in hand. If we come up with something interesting and low risk send mail to pdt2@netscape.com for a look at getting on the branch.
Whiteboard: PDT+
*** Bug 90521 has been marked as a duplicate of this bug. ***
Target Milestone: mozilla0.9.3 → mozilla0.9.4
Attached patch fix-in-progressSplinter Review
I am somewhat ashamed of the above patch, but I thought I'd attach it for posterity's sake. The basic idea is to remove the XBL insertion point logic from the frame manager, and to fold it in with the CSS frame constructor's logic so that insertion points and continuation frames can exist together peacefully. I moved the |GetInsertionPoint()| method from nsIFrameManager to nsIStyleSet for consumers (like menus) that need to ask about insertion points. The primary problem with the patch is the horrific way in which I've integrated insertion points into the frame constructor. I've missed several critical code paths, and really only have enough working to bring up the current Seamonkey chrome. It'll take a bit more time to fold it in with the frame constructor ``nicely''. Also, some debugging code sprinkled liberally throughout.
(Sigh, don't apply the above patch at home. It won't compile.)
Was the previous patch ever really shown to be in error? N.B. I am using a nightly from before that patch was backed off, and it is only when using the dropdown triangle icon that the url bar expands, if I use the cursor down key it works as expected.
Hatred. Seething hatred. I've regressed the URL bar (wrong style context parent), and this is still uglier than an elephant's ass.
Target Milestone: mozilla0.9.4 → mozilla0.9.5
Okay, so I think that the last patch is pretty much working. (Famous last words, I know.) Anyway, I'm at a point where I'd like to solicit a bit of feedback to see if I'm off in the weeds. Here's a summary of what's I've done. 1. Removed the insertion point fixup out of the frame manager and into the frame constructor, where it can cooperate with the other frame fixup logic, most notably that for continuations. See (5), below. 2. Doing that required some API shuffling; specifically, moving GetInsertionPoint() from nsIFrameManager to nsIStyleSet, so that the few frames that need to get at it still can. 3. Added some APIs over in XBL so that we could ``sniff'' for XBL children without worrying about faulting in the DOM slots on content nodes that don't have them. 4. In the frame constructor, I rewrote hyatt's ChildIterator to take advantage of some new APIs that I added, so now uses the binding manager to ``peek'' at the content: if it has XBL children, then the iterator will grab a DOMNodeList and iterate those; otherwise, it'll just use the nsIContent interface. 5. Re-wrote Find[Previous|Next]Sibling to use the ChildIterator so that it can be XBL-savvy. Butchered Content(Inserted|Appended|Removed) to use the XBL insertion point (rather than the parent frame) as the place to insert and remove frames. 6. Misc cleanup in the frame constructor; e.g., making routines |static| and moving the generated content iterator out into its own file. Part (5) is where the work-in-progress is still happening. I need to spend some time figuring out where the original parent frame is appropriate, and where the XBL insertion point is appropriate. Then, I need to try to do get this all to work without copying code all over hell. Does this seem reasonable? (hyatt, especially?)
Had a look at how things are coming along, and they look okay in spirit. Do you mind throwing one of your helpful analyses here following any further understanding you might have had -- although it takes a cup of coffee to understand some of these layout issues that are hapenning :-) In particular, it would help to share some follow-up info that you might have had as to how exactly the XBL insertion logic led to the core issue you described in "Additional Comments From Chris Waterson 2001-07-10 15:37" in bug 88073.
Attached patch working patchSplinter Review
Ok, I'll resummarize what's going on here (I have a hard time remembering myself). Way back when, hyatt fixed a performance problem with menus on Mac (bug 69142). He removed what we thought to be silly code that was grovelling around the frame tree trying to find a parent frame. If you don't know much about continuing frames (which hyatt, attinasi, and I didn't at the time), this code did seem silly, since you've got the parent frame already. Specifically, what happens here is that we find the frame that will be the previous sibling to the new frames we're about to create and insert. The code that hyatt removed reset the parent pointer (which is used to parent the new frames) from the previous sibling's parent frame. This is important in the case of continuing frames; for example, if we have this content model: <td><font> <a href="">blah</a></td> and the <td> is constrained such that ``blah'' will be forced to its own line, we end up with a frame model like this: td< line< inline(font)@A next=(nil) nextInFlow=B< text@T1<" "> next=C > > line< inline(font)@B next=(nil) prevInFlow=A< inline(a)@C next=(nil) < text@T2 next=(nil)<"blah"> > > > > Now, let's say we try to insert content into the <font> after the <a> tag. ContentInserted() looks for <a>'s parent in the content model, and then calls GPFF() to find the frame labelled `A' (above) as the parent frame. The frame creation code proceeds to insert frames beneath `A', when in fact, they should be inserted beneath A's continuing frame, `B'! In other words, we get something like: td< line< inline(font)@A next=(nil) nextInFlow=B< text@T1 next=C<" "> text@T3 next=(nil)<"new!"> !!!BAD!!! > > line< inline(font)@B next=(nil) prevInFlow=A< inline(a)@C next=T3 < text@T2 next=(nil)<"blah"> > > > > Instead of: td< line< inline(font)@A next=(nil) nextInFlow=B< text@T1<" "> next=C > > line< inline(font)@B next=(nil) prevInFlow=A< inline(a)@C next=T3 < text@T2 next=(nil)<"blah"> > text@T3 next=(nil) !!!GOOD!!! > > > This leads to crashes and infinite loops because we've screwed up the frame model. In my first attempt to fix the bug, I simply put the parent pointer fixup code back. This caused a pretty serious regression, bug 88073, which led to my changes being backed out. There is a detailed analysis of what the problem was (see my comments in that bug on 2001-07-10 15:37); however, I can summarize here. The <menupopup> code upon which the URL bar is based uses XBL insertion points, which ``fix up'' the parent frame pointer under which new frames should be inserted. The logic that worked so well for continuing frames happens to royally screw up the otherwise pristine parent pointer which the XBL code expected to receive, and frames end up in bizarre locations _again_! At this point, after talking with dbaron, hixie, and hyatt, I realized that XBL insertion points and continuing frames simply /could not/ play nice together unless the logic for both was handled in the frame constructor, which is what this last patch does. The fix itself is pretty straight-forward. Instead of doing the XBL insertion point fixup ``after the fact'' in the frame manager, we now do it ``up front'' in the frame constructor, before we've even created the new frames. Once we've got the parent frame for insertion using GPFF(), we immediately use the XBL-savvy GetInsertionPoint() method to fix up the parent pointer. This is done in ContentAppended(), ContentInserted(), and ContentRemoved(). Granted, the patch is a bit more daunting than that. Here's why. 1. GetInsertionPoint() is actually a public API on frame manager, used by the menu code, for example. This logic was moved into the frame constructor, and required updating nsIStyleSet and nsICSSFrameConstructor interfaces, as well as modifying the client logic that used it. 2. Removed a method from nsIBindingManager that didn't need to be there, HasContentListFor(). Also doc-commented several of the methods on this interface. In CSS frame constructor, I went a bit overboard (perhaps) and 3. Moved the nsGenericedContentIterator object into its own file. 4. Cleaned up duplicate logic between the four routines that grovel around the content model to find sibling frames, FindPreviousSibling(), FindNextSibling(), FindPreviousAnonymousSibling(), and FindNextAnonymousSibling(). Specifically, I moved a large portion of this logic into hyatt's ChildIterator class, and made it so that the iterator itself could go forwards, backwards, and provide random access into the child list. 5. Hid file-local methods by qualifying them as ``static''. 6. Re-grouped related functions together. 7. Replaced naked QI() calls with CallQueryInterface(). 8. Removed dead code. 9. nsCOMPtr'ized. 10. Cleaned up kipp's overly nested if/else constructs, using early return instead; notably in ContentInserted() and ContentAppended(). I could try to re-do the patch to minimize the cleanup if people are averse to having cleanup mixed with a bug fix. (I know I am, on principle.) However, seeing as that would probably take me several hours to do, I'm loathe to do it unless general consensus is that I should.
Keywords: patch
*** Bug 98544 has been marked as a duplicate of this bug. ***
Keywords: nsbeta1
Whiteboard: QAHP
sr=hyatt. just test the hell out of this please. ;)
A few comments (I haven't read the whole thing yet): The stuff in ChildIterator that returns nsCOMPtrs is evil -- but that can be fixed later (I think). See the nsCOMPtr guide. (Return already_AddRefed<T> if you want, but never nsCOMPtr<T>.) Or perhaps not, since it might cause a lot more refcounting than you need... I'm confused about GetContentListFor and HasContentListFor. The comment in nsIBindingManager says the latter gets real children, but the comment in the function says it gets anonymous children. It looks more like GetContentListFor returns anonymous children if there are any, and otherwise real children, which seems weird (although I could be misreading it). Also, HasContentListFor doesn't seem equivalent to GetContentListFor returning non-null, so I'm not really sure what the changes in nsGenericElement.cpp are doing. Or is the latter check in GetContentListFor just paranoia? (If so, there should be an NS_ASSERTION.) At the very least, I think some comments need to be corrected. I'll read through the rest after dinner, although really I should just be happy with it if hyatt likes it. :-)
Actually, hold up. I want to look at that in more detail.
WRT returning nsCOMPtr's, I'll clean that up. IIRC, I _thought_ I could remove HasContentListFor() (which I put in with the patch for bug 92190, adding debug output that lists anonymous content), because it was not needed. Let me double-check that: perhaps having that check avoids extremely verbose output. hyatt, please do help me, esp. with the commentary in nsIBindingManager. I'm adrift when it comes to the delicacies of anonymous content semantics.
It turns out that HasContentFor() is necessary to keep the debug output terse, so I've reverted that change. Hyatt helped me clean up the comments in nsIBindingManager, as well as some of the stuff in nsCSSFrameConstructor.cpp.
Still need to clean up nsCOMPtr returns though...patch anon.
I'm going to have to attach one more version of this patch. It turns out there is already an nsGeneratedContentIterator living in the content .so; my content iterator changes (moving it into its own file) collide with this causing the static build to fail. I've renamed the version in layout to nsFrameContentIterator, as it iterates content by navigating the frame tree.
Attachment #48626 - Attachment is obsolete: true
Looking at the new code cleaning up the nsCOMPtr returns -- I think it will leak if operator* and operator-> are used -- are they? I suspect they wouldn't work since you'd need a .get() on the already_AddRefed<nsIContent>. Can they just be removed? Other than that, I'm happy with it if hyatt is, although I don't understand the details of insertion points.
Comment on attachment 48889 [details] [diff] [review] as above, but diff -w ok, r=dbaron if you remove the unused (and non-functional, I hope) operator->, and if hyatt does the super-review
Attachment #48889 - Flags: review+
hyatt gave me verbal sr= on this.
sr=hyatt
Fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago23 years ago
Resolution: --- → FIXED
can this fix have caused bug 99443?
*** Bug 90528 has been marked as a duplicate of this bug. ***
*** Bug 97636 has been marked as a duplicate of this bug. ***
*** Bug 94217 has been marked as a duplicate of this bug. ***
*** Bug 101969 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: