Closed Bug 724235 Opened 13 years ago Closed 10 years ago

crash in nsStyleSet::ReparentStyleContext

Categories

(Core :: XBL, defect)

17 Branch
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla36

People

(Reporter: cbook, Assigned: MatsPalmgren_bugz)

References

()

Details

(Keywords: crash, regression, reproducible)

Crash Data

Attachments

(5 files, 1 obsolete file)

noticed on crashstats for Firefox 11a2 - Crash Report [@ nsStyleSet::ReparentStyleContext(nsStyleContext*, nsStyleContext*, mozilla::dom::Element*) ] overview report: https://crash-stats.mozilla.com/report/list?signature=nsStyleSet%3A%3AReparentStyleContext%28nsStyleContext%2A%2C+nsStyleContext%2A%2C+mozilla%3A%3Adom%3A%3AElement%2A%29 example crash report: https://crash-stats.mozilla.com/report/index/d0390db3-3011-4b1b-a65b-926ba2120203 Crashing Thread Frame Module Signature [Expand] Source 0 @0xe 1 xul.dll nsStyleSet::ReparentStyleContext layout/style/nsStyleSet.cpp:1454 2 xul.dll nsFrameManager::ReResolveStyleContext layout/base/nsFrameManager.cpp:1249 3 xul.dll nsFrameManager::ReResolveStyleContext layout/base/nsFrameManager.cpp:1598 4 xul.dll nsFrameManager::ReResolveStyleContext layout/base/nsFrameManager.cpp:1598 5 xul.dll nsFrameManager::ReResolveStyleContext layout/base/nsFrameManager.cpp:1598 6 xul.dll nsFrameManager::ReResolveStyleContext layout/base/nsFrameManager.cpp:1598 7 xul.dll nsFrameManager::ReResolveStyleContext layout/base/nsFrameManager.cpp:1598 8 xul.dll nsFrameManager::ReResolveStyleContext layout/base/nsFrameManager.cpp:1598 9 xul.dll nsFrameManager::ReResolveStyleContext layout/base/nsFrameManager.cpp:1598 10 xul.dll nsFrameManager::ComputeStyleChangeFor layout/base/nsFrameManager.cpp:1684 11 xul.dll mozilla::css::RestyleTracker::DoProcessRestyles layout/base/RestyleTracker.cpp:242 12 xul.dll PresShell::FlushPendingNotifications layout/base/nsPresShell.cpp:4073 13 xul.dll nsRefreshDriver::Notify layout/base/nsRefreshDriver.cpp:396 14 xul.dll nsTimerImpl::Fire xpcom/threads/nsTimerImpl.cpp:431 15 xul.dll nsTimerEvent::Run xpcom/threads/nsTimerImpl.cpp:524 16 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:660 17 nspr4.dll PR_Unlock nsprpub/pr/src/threads/combined/prulock.c:347 18 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:201 19 xul.dll _SEH_epilog4 20 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:175 21 xul.dll nsImageBoxFrame::OnStopDecode layout/xul/base/src/nsImageBoxFrame.cpp:608 22 xul.dll nsBaseAppShell::Run widget/src/xpwidgets/nsBaseAppShell.cpp:189 23 @0x2991d5f
Depends on: 698469
There's a spike in crashes (about 50 crashes per hour!) starting in 16.0a1/20120613. The regression range for the spike is: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=131961e5e0d1&tochange=964b11fea7f1
Crash Signature: [@ nsStyleSet::ReparentStyleContext(nsStyleContext*, nsStyleContext*, mozilla::dom::Element*) ] → [@ nsStyleSet::ReparentStyleContext(nsStyleContext*, nsStyleContext*, mozilla::dom::Element*) ] [@ nsStyleSet::ReparentStyleContext ]
Keywords: topcrash
OS: Windows 7 → All
Hardware: x86 → All
Summary: Crash Report [@ nsStyleSet::ReparentStyleContext(nsStyleContext*, nsStyleContext*, mozilla::dom::Element*) ] → crash in nsStyleSet::ReparentStyleContext
Steps to reproduce that work reliably for me: - Have NoScript installed, start with the default start page. - View -> Toolbars -> Addon bar. - Click the NoScript icon in the addon bar. Boom.
Might it be a fallout of bug 539356 or bug 736695?
Comment 1 through Comment 3 are a new regression in today's nightly, which just happens to crash in the function this bug was filed on. I filed bug 764591 on the new regression -- let's take the discussion over there.
Depends on: 764591
Keywords: topcrash
bp-98a866da-4ac3-4488-a7f7-309392120616 http://hg.mozilla.org/mozilla-central/rev/4e3362864fbd Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/16.0 Firefox/16.0a1 ID:20120616030538 STR 1. Start Nightly nightly with clean profile 2. Install NoScript https://addons.mozilla.org/ja/firefox/addon/noscript/ And Restart 3. Enable Add-on Bar 4. Open Customize Toolbar by right click toolbar 5. Drag and Drop NoScript Icon from Navigation Toolbar to Add-on Bar And Done 6. Mouse hover over the NoScript Icon on the Add-on Bar.
Comment 5 and Comment 2 are both bug 764591 (which should hopefully be fixed in tomorrow's nightly).
(In reply to Alice0775 White from comment #5) > bp-98a866da-4ac3-4488-a7f7-309392120616 > http://hg.mozilla.org/mozilla-central/rev/4e3362864fbd > Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/16.0 Firefox/16.0a1 > ID:20120616030538 > > STR > 1. Start Nightly nightly with clean profile > 2. Install NoScript https://addons.mozilla.org/ja/firefox/addon/noscript/ > And Restart > 3. Enable Add-on Bar > 4. Open Customize Toolbar by right click toolbar > 5. Drag and Drop NoScript Icon from Navigation Toolbar to Add-on Bar And Done > 6. Mouse hover over the NoScript Icon on the Add-on Bar. This is bug 765502
which itself is a dupe of bug 765502.
Removing topcrash as bug 764591 is fixed.
Keywords: topcrash
(In reply to Bob Clary [:bc:] from comment #12) > http://www.rozmari.blogfa.com/category/8 I doesn't crash for me.
(In reply to Scoobidiver from comment #13) > (In reply to Bob Clary [:bc:] from comment #12) > > http://www.rozmari.blogfa.com/category/8 > I doesn't crash for me. using? How long did you let it sit there? May take a little while.
(In reply to Bob Clary [:bc:] from comment #14) > using? How long did you let it sit there? May take a little while. On 64-bit Windows 7 with 22.0a1/20130228. Enough long.
hmm. pretty reproducible for me but I'm using vms. including the app notes from the crash reports. win61i64 (esxi vm) bp-5e6bd1e2-bedf-425d-9554-1cd2a2130301 AdapterVendorID: 0x0000, AdapterDeviceID: 0x0000, AdapterSubsysID: 00000000, AdapterDriverVersion: Has dual GPUs. GPU #2: AdapterVendorID2: 0x15ad, AdapterDeviceID2: 0x0405, AdapterSubsysID2: 040515ad, AdapterDriverVersion2: 7.14.1.1070D3 win61i32 (esxi vm) bp-ea496b2a-162c-467f-beb6-939972130301 AdapterVendorID: 0x0000, AdapterDeviceID: 0x0000, AdapterSubsysID: 00000000, AdapterDriverVersion: Has dual GPUs. GPU #2: AdapterVendorID2: 0x15ad, AdapterDeviceID2: 0x0405, AdapterSubsysID2: 040515ad, AdapterDriverVersion2: 7.14.1.1070D3
The following steps are 100% reproducible on Linux64: 1. load http://www.abhinethri.com/Actress/T/Taapsee_Pannu/Tapsee_01.html 2. click the "Photosession" link bp-93b3cab1-4b31-4cba-b6e5-deec92130528
Attached file stacks
It's also reproducible in a debug build -- there's an assertion before the crash (stack included): ###!!! ASSERTION: Unexpected document; this will lead to incorrect behavior!: 'aElement->GetCurrentDoc() == Document()', file layout/base/RestyleTracker.cpp, line 260
Regression window for the STR in comment 17: 2012-08-14-03-05-21 -- 2012-08-15-03-05-56 http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=22288130fea2&tochange=86ee4deea55b So I guess that might be a separate bug. Or possibly that on Linux there was a second change required to trigger the crash.
Keywords: regression
STR of comment 12: bp-bc163701-65bc-4c3c-857e-903cd2130528 STR of comment 17: bp-c5bb9de2-3215-4ced-b9c0-0ac012130528 They seem similar with mozilla::css::RestyleTracker::DoProcessRestyles in the stack trace.
Version: 11 Branch → 17 Branch
(In reply to Mats Palmgren [:mats] from comment #19) > Regression window for the STR in comment 17: > 2012-08-14-03-05-21 -- 2012-08-15-03-05-56 > http://hg.mozilla.org/mozilla-central/ > pushloghtml?fromchange=22288130fea2&tochange=86ee4deea55b > > So I guess that might be a separate bug. Or possibly that on Linux there was > a second change required to trigger the crash. bug 775965 is the most suspicious thing I see in that range. (In reply to Mats Palmgren [:mats] from comment #18) > Created attachment 754639 [details] > stacks > > It's also reproducible in a debug build -- there's an assertion before > the crash (stack included): > ###!!! ASSERTION: Unexpected document; this will lead to incorrect > behavior!: 'aElement->GetCurrentDoc() == Document()', file > layout/base/RestyleTracker.cpp, line 260 That's because aElement->GetCurrentDoc() is null.
And, in particular, we're looking at an nsXULScrollFrame (display:box) that's still in the frame tree but whose mContent is an HTMLDivElement that's not IsInDoc(), and whose mSubtreeRoot is itself. Its frame parent is display:inline-block, and has a normal content node. (gdb) up #2 0x00007fd19c5cad12 in nsFrameManager::ReResolveStyleContext (this=0x36a2050, aPresContext=0x42485c0, aFrame=0x44be170, aParentContent=0x4115950, aChangeList=0x7fffd721e560, aMinChange=nsChangeHint_RepaintFrame, aParentFrameHintsNotHandledForDescendants=0, aRestyleHint=0, aRestyleTracker=..., aDesiredA11yNotifications=nsFrameManager::eSendAllNotifications, aVisibleKidsOfHiddenElement=..., aTreeMatchContext=...) at /home/dbaron/builds/ssd/mozilla-central/mozilla/layout/base/nsFrameManager.cpp:1068 1068 if (aRestyleTracker.GetRestyleData(content->AsElement(), &restyleData)) { (gdb) p aFrame $19 = (nsXULScrollFrame *) 0x44be170 (gdb) p aFrame->mParent $20 = (nsBlockFrame *) 0x1ea0a70 (gdb) p $20->mContent $21 = (mozilla::dom::HTMLDivElement *) 0x4115950 (gdb) p $->mParent $22 = (mozilla::dom::HTMLAnchorElement *) 0x51c95c0 (gdb) p $->mParent $23 = (mozilla::dom::HTMLElement *) 0x4115dd0 (gdb) p $->mParent $24 = (mozilla::dom::HTMLBodyElement *) 0x3695d40 (gdb) p $->mParent $25 = (mozilla::dom::HTMLSharedElement *) 0x278f340 (gdb) p $->mParent $26 = (nsHTMLDocument *) 0x40fb210 (gdb) p aFrame->mContent $27 = (mozilla::dom::HTMLDivElement *) 0x48bf820 (gdb) p $->mSubtreeRoot $28 = (mozilla::dom::HTMLDivElement *) 0x48bf820 (gdb) p $20->mContent->mPrimaryFrame $29 = (nsBlockFrame *) 0x1ea0a70 (note that mSubtreeRoot and mPrimaryFrame are a union) We crash because we try to reparent the style context of that nsXULScrollFrame to have a null parent, and we dereference null.
(I'm not planning to do any more with this bug right now.)
For the STR in comment 17, I get this bad changeset: changeset: 102269:bcac58cbf328 user: Chris Pearce <cpearce@mozilla.com> date: Tue Aug 14 16:06:44 2012 +1200 summary: Bug 775965 - Ensure presentation persists across nsSubDocumentFrame reframes. r=roc Specifically, if I comment out this flush the crash does not occur: http://hg.mozilla.org/mozilla-central/annotate/18fc62fd8dcc/layout/generic/nsSubDocumentFrame.cpp#l788 I don't see anything wrong with flushing there per se, I think it just triggers a bug that already existed.
Attached file stacks + frame dumps
Restyling the <marquee> creates a new style context for it the "Block(marquee)(1)@0x7fffc0277c90" frame. Its child, the XULscroll needs to have its style context reparented, but nsFrame::DoGetParentStyleContextFrame mistakenly thinks this is the root frame and returns null: 7214 if (mContent && !mContent->GetParent() && 7215 !StyleContext()->GetPseudo()) { 7216 // we're a frame for the root. We have no style context parent. 7217 return nullptr; 7218 } http://hg.mozilla.org/mozilla-central/annotate/18fc62fd8dcc/layout/generic/nsFrame.cpp#l7212 because the XULscroll content has mParent == null (it's the root of an anonymous subtree) and does not have a pseudo. So it's a (safe) null-pointer crash, at least for this testcase.
Attached patch Like so? (obsolete) — Splinter Review
I did verify in a debugger that the XULscroll style context's parent was indeed changed to the new style context of its parent (green in the attached frame dump) https://tbpl.mozilla.org/?tree=Try&rev=bed97bdddca3
Attachment #757041 - Flags: review?(bzbarsky)
Comment on attachment 757041 [details] [diff] [review] Like so? Being root of anonymous subtree should not mean null mParent. Why is the parent actually null here?
Attached file more stacks
> Being root of anonymous subtree should not mean null mParent. OK, it's non-null at the time the frame tree is created by the flush in nsHideViewer::Run. > Why is the parent actually null here? It's nulled out by UnbindFromTree from a RemoveFromBindingManagerRunnable (first stack) that was posted as a result of the parser doing a RemoveChildAt (second stack). I guess as a result of some "adoption" operation due to the unclosed <a> ?
Attachment #757041 - Attachment is obsolete: true
Attachment #757041 - Flags: review?(bzbarsky)
The nsHideViewer and RemoveFromBindingManagerRunnable are created as a result of the same nsINode::doRemoveChildAt call. First we do nsNodeUtils::ContentRemoved which destroys the frame tree which posts the nsHideViewer. Then we call UnbindFromTree which posts the RemoveFromBindingManagerRunnable. http://hg.mozilla.org/mozilla-central/annotate/295a11e1efef/content/base/src/nsINode.cpp#l1432 So I guess we could delay the nsHideViewer so it occurs after all other script runners. Maybe just use NS_DispatchToCurrentThread instead?
The NS_DispatchToCurrentThread idea didn't work: https://tbpl.mozilla.org/?tree=Try&rev=ecbcb0a1e683 Trying a "delayed script runner" approach instead: https://tbpl.mozilla.org/?tree=Try&rev=9d7ec2ca7d38
Attached patch fix+testSplinter Review
Use a dummy script runner that appends the nsHideViewer runner at the end, thereby "stepping over" any other runners that was added after it, e.g. RemoveFromBindingManagerRunnables. https://tbpl.mozilla.org/?tree=Try&rev=e88b26425fc9
Assignee: nobody → matspal
Attachment #757355 - Flags: review?(bzbarsky)
Comment on attachment 757355 [details] [diff] [review] fix+test > It's nulled out by UnbindFromTree from a RemoveFromBindingManagerRunnable Why is this not killing the frame? That seems like the real bug here: unbound things should not have frames, ever. We should add an assert to that effect in UnbindFromTree! I don't think messing with scriptrunners is the right way to go here, at first glance...
Attachment #757355 - Flags: review?(bzbarsky) → review-
It does! as a result of nsNodeUtils::ContentRemoved http://hg.mozilla.org/mozilla-central/annotate/295a11e1efef/content/base/src/nsINode.cpp#l1432 during which nsSubDocumentFrame::DestroyFrom posts a nsHideViewer. Then we do the UnbindFromTree which posts the RemoveFromBindingManagerRunnable. After all that is done, with no frame tree, we start running script runners, first nsHideViewer which *creates a new frame tree* with its flush, then RemoveFromBindingManagerRunnable...
The point is RemoveFromBindingManagerRunnable is calling UnbindFromTree on a node that has a frame, no? That's not an OK thing to do. It needs to kill off that frame first.
(In reply to Boris Zbarsky (:bz) from comment #35) > The point is RemoveFromBindingManagerRunnable is calling UnbindFromTree on a > node that has a frame, no? Yes, but that frame did not exist when the RemoveFromBindingManagerRunnable instance was created. It was created later, by the flush in nsHideViewer. > That's not an OK thing to do. Yes, of course. That's what my patch is trying to avoid. > It needs to kill off that frame first. I really don't think we want an ongoing UnbindFromTree operation to mess with the *new* frame tree that did not exist at the time UnbindFromTree began. It seems to me that "kill off that frame first" will lead to the wrong result (a Frankenstein frame tree). We should instead makes sure the UnbindFromTree has completed before creating a new frame tree. Yes, my patch mostly just wallpapers the problem by delaying the nsHideViewer to create the separation. Alternatively, I can fix the underlying problem in UnbindFromTree if you want but that's a bit more work. Instead of using script runners to do the RemovedFromDocumentInternal part, I imagine we could collect those in an array and run them at the end of the top-most UnbindFromTree. Would that be an acceptable solution to you?
(In reply to Boris Zbarsky (:bz) from comment #33) > unbound things should not have frames, ever. We should add an assert to > that effect in UnbindFromTree! I tried that, MOZ_ASSERT(!mPrimaryFrame) fails because of the way nsXBLBinding::InstallAnonymousContent works -- it always calls UnbindFromTree: (gdb) bt #0 nsIContent::UnbindFromTree #1 nsXBLBinding::InstallAnonymousContent #2 nsXBLBinding::GenerateAnonymousContent #3 nsXBLService::LoadBindings #4 nsCSSFrameConstructor::AddFrameConstructionItemsInternal ... Relaxing it to MOZ_ASSERT(!GetPrimaryFrame()) fixed that, but it fails in AnonymousContentDestroyer::Run for the <input>.setAttribute call in content/html/content/crashtests/571428-1.html (gdb) bt #0 nsIContent::UnbindFromTree #1 AnonymousContentDestroyer::Run #2 nsContentUtils::RemoveScriptBlocker #3 nsAutoScriptBlocker::~nsAutoScriptBlocker #5 mozilla::dom::Element::SetAttr ... Let me know if you want bugs filed for those issues.
> It was created later, by the flush in nsHideViewer. Sure. > That's what my patch is trying to avoid. But it only avoids it for the one case here. Nothing prevents other script runners having the same problem... > I really don't think we want an ongoing UnbindFromTree operation to mess > with the *new* frame tree that did not exist at the time UnbindFromTree > began. There are two different UnbindFromTree here, no? There's the one on the non-anonymous content and the one one the anonymous content. The latter is not making sure that the anon content has no frames first, as far as I can tell, which is the bug. > MOZ_ASSERT(!mPrimaryFrame) fails Because mPrimaryFrame is in a union, sure. > but it fails in AnonymousContentDestroyer::Run Yes, because imo AnonymousContentDestroyer is buggy, no?
(In reply to Boris Zbarsky (:bz) from comment #38) > > That's what my patch is trying to avoid. > > But it only avoids it for the one case here. Nothing prevents other script > runners having the same problem... Which I already admitted to, and suggested an alternative solution which would fix all such cases since the top-most UnbindFromTree would be fully complete when it returns. Did you see that part? (comment 36, last two paragraphs) > There are two different UnbindFromTree here, no? There's the one on the > non-anonymous content and the one one the anonymous content. Correct. The latter runs from a RemoveFromBindingManagerRunnable. > The latter is not making sure that the anon content has no frames first, > as far as I can tell, which is the bug. No, there should not be any frames there in the first place since they were already destroyed earlier. What happens is: The parser decides it wants to move a node from one place to another so it does a Remove followed by an Append: 1. nsINode::doRemoveChildAt 1a. destroys the frame tree (fully), one of the frames posts a nsHideViewer 1b. calls UnbindFromTree, one of the nodes posts a RemoveFromBindingManagerRunnable 2. nsINode::AppendChildTo 2a. calls BindToTree then we execute the script runners: 3. nsHideViewer flushes which creates frames for all the content, including anon content. 4. RemoveFromBindingManagerRunnable calls UnbindFromTree to complete the operation in 1b, but since 2a and 3 occurred it's messing up stuff in the new tree. I've suggested a wallpaper and a alternative fix that would solve the problem completely. Not sure what more I can do here...
(In reply to Boris Zbarsky (:bz) from comment #38) > Because mPrimaryFrame is in a union, sure. Oh, OK. I didn't get the memo on that ;-) > Yes, because imo AnonymousContentDestroyer is buggy, no? Probably, but I didn't dive into the details on how it's supposed to work.
I did see comment 36 last part. I guess it's not obvious to me that this problem is limited to the RemoveFromBindingManagerRunnable as posted from UnbindFromTree, but maybe it is.... > then we execute the script runners: Hrm. The fact that #4 happens after #2 is just wrong; now I seem to recall it coming up before... :( Blake, Jonas, can we just fix that ordering?
It does indeed seem like it would be nice if we could call RemovedFromDocumentInternal synchronously rather than off a script runner. That looks possible if we change nsXBLBinding::ChangeDocument to do all of it's script jazz of a script runner rather than synchronously. We would also need to call nsXBLPrototypeBinding::BindingDetached from a scriptrunner rather than synchronously. It'd be nice to change this teardown stuff so that all script-running is located in a single function, and that that function is called from a scriptrunner. And all other teardown is done synchronously. We should probably wait until the other xbl refactoring has landed before making such a big change though.
The underlying problem is in content/ code so reassigning.
Assignee: matspal → nobody
Component: Layout → XBL
I can't reproduce this crash in any build. It seems to have been fixed in bug 930250; if I revert that change then I can reproduce the crash loading the attached testcase.
Depends on: 930250
Flags: in-testsuite?
Flags: in-testsuite? → in-testsuite+
Assignee: nobody → mats
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: