Closed Bug 521426 Opened 15 years ago Closed 14 years ago

Crash [@ PL_DHashTableEnumerate] called from nsRootPresContext::GetPluginGeometryUpdates because nsPresContext::RootPresContext() can return non-nsRootPresContext object

Categories

(Core :: Layout, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla1.9.3a1
Tracking Status
blocking2.0 --- beta1+
blocking1.9.2 --- needed
status1.9.2 --- .2-fixed
status1.9.1 --- unaffected

People

(Reporter: mayhemer, Assigned: MatsPalmgren_bugz)

References

Details

(Keywords: crash, regression, verified1.9.2)

Crash Data

Attachments

(6 files, 5 obsolete files)

nsRootPresContext* nsPresContext::RootPresContext() may apparently return nsPresContext instance (not nsRootPresContext that inherits that). The static cast is wrong in that function. 

mPresContext->RootPresContext()->UpdatePluginGeometry(target) then may crash during mRegisteredPlugins.EnumerateEntries call, the hash table instance is a foreign object memory.

I have to find a simple STR.

Backtrace on windows:

 	xpcom_core.dll!PL_DHashTableEnumerate(PLDHashTable * table=0x05051930, PLDHashOperator (PLDHashTable *, PLDHashEntryHdr *, unsigned int, void *)* etor=0x01b2eb80, void * arg=0x0012e1d8)  Line 743 + 0x24 bytes	C
 	gklayout.dll!nsTHashtable<nsPtrHashKey<nsObjectFrame> >::EnumerateEntries(PLDHashOperator (nsPtrHashKey<nsObjectFrame> *, void *)* enumFunc=0x01b2cbe0, void * userArg=0x0012e4d8)  Line 241 + 0x13 bytes	C++
>	gklayout.dll!nsRootPresContext::GetPluginGeometryUpdates(nsIFrame * aChangedSubtree=0x06eb6098, nsTArray<nsIWidget::Configuration> * aConfigurations=0x0012e544)  Line 2347	C++
 	gklayout.dll!nsRootPresContext::UpdatePluginGeometry(nsIFrame * aChangedSubtree=0x06eb6098)  Line 2395	C++
 	gklayout.dll!PresShell::DoReflow(nsIFrame * target=0x06eb6098, int aInterruptible=0x00000000)  Line 7362	C++
 	gklayout.dll!PresShell::ProcessReflowCommands(int aInterruptible=0x00000000)  Line 7428 + 0x10 bytes	C++
 	gklayout.dll!PresShell::FlushPendingNotifications(mozFlushType aType=Flush_Layout)  Line 4892 + 0x12 bytes	C++
 	gklayout.dll!nsEditor::EndUpdateViewBatch()  Line 4397	C++
 	gklayout.dll!nsEditor::EndPlaceHolderTransaction()  Line 1002	C++
 	gklayout.dll!nsAutoPlaceHolderBatch::~nsAutoPlaceHolderBatch()  Line 65 + 0x31 bytes	C++
 	gklayout.dll!nsPlaintextEditor::DeleteSelection(short aAction=0x0000)  Line 765 + 0x26 bytes	C++
 	gklayout.dll!nsTextControlFrame::SetValue(const nsAString_internal & aValue={...})  Line 2692	C++
 	gklayout.dll!nsTextControlFrame::SetFormProperty(nsIAtom * aName=0x00d00160, const nsAString_internal & aValue={...})  Line 1902 + 0xf bytes	C++
 	gklayout.dll!nsHTMLInputElement::SetValueInternal(const nsAString_internal & aValue={...}, nsITextControlFrame * aFrame=0x00000000, int aUserInput=0x00000000)  Line 1057	C++
 	gklayout.dll!nsHTMLInputElement::SetValue(const nsAString_internal & aValue={...})  Line 922	C++
 	gklayout.dll!nsHTMLInputElement::Reset()  Line 2466 + 0x1c bytes	C++
 	gklayout.dll!nsDocument::Sanitize()  Line 6807	C++
 	gklayout.dll!DocumentViewerImpl::Destroy()  Line 1493 + 0x1e bytes	C++
 	gklayout.dll!DocumentViewerImpl::Show()  Line 1882	C++
 	gklayout.dll!nsPresContext::EnsureVisible()  Line 1592	C++
 	gklayout.dll!PresShell::UnsuppressAndInvalidate()  Line 4632 + 0xb bytes	C++
 	gklayout.dll!PresShell::UnsuppressPainting()  Line 4675	C++
 	gklayout.dll!DocumentViewerImpl::LoadComplete(unsigned int aStatus=0x00000000)  Line 1064	C++
 	docshell.dll!nsDocShell::EndPageLoad(nsIWebProgress * aProgress=0x0535a5a4, nsIChannel * aChannel=0x00da7b28, unsigned int aStatus=0x00000000)  Line 5748	C++
 	docshell.dll!nsDocShell::OnStateChange(nsIWebProgress * aProgress=0x0535a5a4, nsIRequest * aRequest=0x00da7b28, unsigned int aStateFlags=0x00020010, unsigned int aStatus=0x00000000)  Line 5626	C++
 	docshell.dll!nsDocLoader::FireOnStateChange(nsIWebProgress * aProgress=0x0535a5a4, nsIRequest * aRequest=0x00da7b28, int aStateFlags=0x00020010, unsigned int aStatus=0x00000000)  Line 1315	C++
 	docshell.dll!nsDocLoader::doStopDocumentLoad(nsIRequest * request=0x00da7b28, unsigned int aStatus=0x00000000)  Line 937	C++
 	docshell.dll!nsDocLoader::DocLoaderIsEmpty(int aFlushLayout=0x00000001)  Line 804	C++
 	docshell.dll!nsDocLoader::OnStopRequest(nsIRequest * aRequest=0x053712b0, nsISupports * aCtxt=0x00000000, unsigned int aStatus=0x00000000)  Line 700	C++
 	necko.dll!nsLoadGroup::RemoveRequest(nsIRequest * request=0x053712b0, nsISupports * ctxt=0x00000000, unsigned int aStatus=0x00000000)  Line 680 + 0x2e bytes	C++
 	imglib2.dll!imgRequestProxy::RemoveFromLoadGroup(int releaseLoadGroup=0x00000001)  Line 195	C++
 	imglib2.dll!imgRequestProxy::OnStopRequest(nsIRequest * request=0x064674c0, nsISupports * ctxt=0x00000000, unsigned int statusCode=0x00000000, int lastPart=0x00000001)  Line 648	C++
 	imglib2.dll!imgRequest::OnStopRequest(nsIRequest * aRequest=0x064674c0, nsISupports * ctxt=0x00000000, unsigned int status=0x00000000)  Line 939	C++
 	imglib2.dll!ProxyListener::OnStopRequest(nsIRequest * aRequest=0x064674c0, nsISupports * ctxt=0x00000000, unsigned int status=0x00000000)  Line 1784	C++
 	necko.dll!nsStreamListenerTee::OnStopRequest(nsIRequest * request=0x064674c0, nsISupports * context=0x00000000, unsigned int status=0x00000000)  Line 66	C++
 	necko.dll!nsHttpChannel::OnStopRequest(nsIRequest * request=0x06de3600, nsISupports * ctxt=0x00000000, unsigned int status=0x00000000)  Line 5256	C++
 	necko.dll!nsInputStreamPump::OnStateStop()  Line 577	C++
 	necko.dll!nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream * stream=0x064e56c8)  Line 401 + 0xb bytes	C++
 	xpcom_core.dll!nsInputStreamReadyEvent::Run()  Line 113	C++
 	xpcom_core.dll!nsThread::ProcessNextEvent(int mayWait=0x00000001, int * result=0x0012f838)  Line 527 + 0x19 bytes	C++
 	xpcom_core.dll!NS_ProcessNextEvent_P(nsIThread * thread=0x00cee6a0, int mayWait=0x00000001)  Line 230 + 0x16 bytes	C++
 	gkwidget.dll!nsBaseAppShell::Run()  Line 170 + 0xc bytes	C++
 	toolkitcomps.dll!nsAppStartup::Run()  Line 182 + 0x1c bytes	C++
 	xul.dll!XRE_main(int argc=0x00000005, char * * argv=0x00ce9be8, const nsXREAppData * aAppData=0x00ce1a08)  Line 3420 + 0x25 bytes	C++
 	firefox.exe!NS_internal_main(int argc=0x00000005, char * * argv=0x00ce9be8)  Line 156 + 0x12 bytes	C++
 	firefox.exe!wmain(int argc=0x00000005, wchar_t * * argv=0x00cbfce0)  Line 110 + 0xd bytes	C++
 	firefox.exe!__tmainCRTStartup()  Line 594 + 0x19 bytes	C
 	firefox.exe!wmainCRTStartup()  Line 414	C
 	kernel32.dll!7c817077() 	
 	[Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll]
Version: 1.9.1 Branch → Trunk
Keywords: crash
(In reply to comment #0)
> nsRootPresContext* nsPresContext::RootPresContext() may apparently return
> nsPresContext instance (not nsRootPresContext that inherits that). The static
> cast is wrong in that function. 

Yes, this will happen if PresShell::Destroy (which calls mPresContext->SetShell(nsnull)) has been called on any of the pres shells up the chain to the root, since then pc->mShell will be null.
Summary: Crash [@ PL_DHashTableEnumerate] (not duplicate of bug 491197) → Crash [@ PL_DHashTableEnumerate] called from nsRootPresContext::GetPluginGeometryUpdates (not duplicate of bug 491197)
Summary: Crash [@ PL_DHashTableEnumerate] called from nsRootPresContext::GetPluginGeometryUpdates (not duplicate of bug 491197) → Crash [@ PL_DHashTableEnumerate] called from nsRootPresContext::GetPluginGeometryUpdates because nsPresContext::RootPresContext() can return non-nsRootPresContext object
I think we should store a separate parent nsPresContext chain so that RootPresContext never fails.
Thanks for the hint Rob. I'll try to go that way.
Status: NEW → ASSIGNED
And to confirm, yes, this is happening when clicking (hovering) in a web page load in the sidebar.
(In reply to comment #3)
> I think we should store a separate parent nsPresContext chain so that
> RootPresContext never fails.

In nsPresContext::SetShell we assign mShell. Should we at this moment get the parent using the algorithm from nsPresContext::RootPresContext and keep it in a new weak member?

According to what I know about pres contexts, there is a tree of pres contexts and when a pres context with leafs goes away, all leafs (and their leafs and so on...) are destroyed with it, right?

Can be an existing pres context moved to a different parent? If so, is nsPresContext::SetShell invoked in that case to update the parent?
(In reply to comment #7)
> (In reply to comment #3)
> > I think we should store a separate parent nsPresContext chain so that
> > RootPresContext never fails.
> 
> In nsPresContext::SetShell we assign mShell. Should we at this moment get the
> parent using the algorithm from nsPresContext::RootPresContext and keep it in a
> new weak member?

Sounds reasonable to me, except I'm not sure if there's a risk of the pointer becoming a dangling pointer.  (And if we made it a strong pointer, I'm not sure whether there's a risk of leaks.)

roc might also have thoughts here, though.
It's a tricky one. I think we need to hold a strong reference, since a prescontext can be held onto from various places and could outlive the root. I don't think that would cause a cycle, but I'm not sure!
Dose the RC contains some fixes for this?
No. Honza, are you still working on this?
(In reply to comment #11)
> No. Honza, are you still working on this?

It dropped a bit down in my todo list, but it's still on my radar. If anyone has time to work on this feel free to take it. I will provide some simple STR.
I dont want do bother you, but its a worst case scenario when this bug gets to 3.6 RTM. 

I should not be able to kill Firefox with a simple click to a link even when its in the sidebar.
OK, what are the steps to reproduce?
Mats, I might need you to look at this...
Ive described all in a duplicated bug above. https://bugzilla.mozilla.org/show_bug.cgi?id=533685
Thanks.

The best fix might be to change RootPresContext() to GetRootPresContext(), allowing it to return null when the prescontext has been disconnected from the hierarchy of presentations. Then we can check for null in the right places.
Attached patch wip1 (obsolete) — Splinter Review
GetRootPresContext() + null-checks.

I can't reproduce the crash on Linux, XP or OSX,
so I need help testing that it fixes the crash.
Trunk builds with the patch are here:
https://build.mozilla.org/tryserver-builds/mpalmgren@mozilla.com-521426-wip1/
Attachment #422059 - Flags: review?(roc)
Unfortunatelly I am not able to reproduce either, with m-c nor 3.6rc1.  It was always quit rare to crash and actually didn't happen for quit a long time.

Assigning to Mats.
Assignee: honzab.moz → matspal
@Honza: Hmm i dont know what you did, but i need exactly 5 seconds with an new rc1 installation to bring FF to crash. 

It seems to fix the issue. 

Ive created a new portable Firefox 3.6 rc1 installation with an empty profile. Did a bookmark on "http://www.google.com/ie". Enable it to open in sidebar. Open it and search for "test". Did 3 random clicks on links on that page short after another and firefox crashes.
Replaced all Firefox files with the one from the given build above and tested it with the same profile, but could not bring it to crashes even with 100 clicks. 

Good work!
Thanks for the updated STR, I can now reproduce the crash on Windows.
I think the suggested patch isn't quite enough... I'm seeing that
GetRootPresContext() would also do a bogus cast in some cases.
I'm investigating the details...
Attachment #422059 - Flags: review?(roc)
The reason GetRootPresContext() can't find the root pres context is that
DocumentViewerImpl::Destroy() removed the root view from the view tree so
nsLayoutUtils::GetCrossDocParentFrame fails to walk up the view ancestor.
http://hg.mozilla.org/mozilla-central/annotate/c63b24afe2c0/layout/base/nsDocumentViewer.cpp#l1480

At the end of DocumentViewerImpl::Destroy() there is also a
DetachContainerRecurse() that nulls out the context's mContainer
and set up a mForwardingContainer on the pres-shell (which we use
to retarget events in some cases).

If I add a !mContainer check in GetRootPresContext() then there are no
bogus casts, but it will return null occasionally.  I also tried using
mForwardingContainer (when mContainer is null) to get hold of the
parent pres-shell and continue the walk from there... it works for
this bug but I'm not sure that's the right thing to do generally
and I haven't tested it much...
Attached patch wip2 (obsolete) — Splinter Review
GetRootPresContext() + null checks as before, with the addition that
it returns null when "mContainer" is null.  Added an assertion as well
to catch a bogus type cast.

Starting to use "mForwardingContainer" when "mContainer" is null
seemed to risky to me so I skipped that part.

One more thing to consider:  I did see some nsRefreshDriver::Notify() calls
to presShell->FlushPendingNotifications(Flush_Style) when the pres-context
was already detached...  The patch makes the code robust against that
but I think we probably shouldn't do that flush.  Thoughts?
Attachment #422059 - Attachment is obsolete: true
Attachment #422165 - Flags: review?(roc)
Attached patch wip2 (obsolete) — Splinter Review
The right file this time...
Attachment #422165 - Attachment is obsolete: true
Attachment #422165 - Flags: review?(roc)
Attachment #422166 - Flags: review?(roc)
Flags: blocking1.9.2?
I am nominating this bug to block 1.9.2 based off the comments from bug 539983.  If that bug is actually a dupe of this bug, then this issue is causing the Firefox 3.6 to crash when using an add-on that I support.  There are probably other add-ons that will also become unusable in 3.6 if this issue is not resolved first.  Thanks!
blocking1.9.2: --- → ?
Comment on attachment 422166 [details] [diff] [review]
wip2

Additional testing (on 1.9.2) makes it clear this is not enough.
The GetRootPresContext() call in nsObjectFrame::Destroy() really
must not fail -- if it does there's a stale frame pointer in the
root pres context we couldn't find, which will cause a crash
eventually.

Using the mForwardingContainer hack mentioned in comment 22
seems to work in this particular case, but I'm not confident
it would *always* work.

I think we need to store a nsWeakFrame in mRegisteredPlugins
instead of a raw pointer.  Or enumerating all pres contexts
and trying to unregister the frame in all.  Or something...
Attachment #422166 - Attachment is obsolete: true
Attachment #422166 - Flags: review?(roc)
Flags: blocking1.9.2?
blocking1.9.2: ? → needed
Blocks: 539983
Attached patch wip3 (obsolete) — Splinter Review
This patch is the same as "wip2" with the addition that I moved the
UnregisterPluginForGeometryUpdates() call from nsObjectFrame::DestroyFrom
to StopPluginInternal.  DestroyFrom calls StopPluginInternal, but it's
also called from StopPlugin, so, now also get here through:
DocumentViewerImpl::Destroy()
  PresShell::Freeze()
    FreezeElement
      objectFrame->StopPlugin()

and the pres shell freeze occurs before we unhook the view:
http://mxr.mozilla.org/mozilla-central/source/layout/base/nsDocumentViewer.cpp#1512
so we should be able to find our root pres context to unregister the frame.
I'm intentionally leaving the null-pointer crash if we don't - I assume
that's preferred over leaving a stale frame pointer around.

mozilla-central test builds:
https://build.mozilla.org/tryserver-builds/mpalmgren@mozilla.com-521426-mc-wip3/
Attachment #422762 - Flags: review?(roc)
I tried to use Adobe ClickMap with the test build in Comment 29.  Firefox still crashed, but it was caught by the Windows XP crash reporter instead of the Mozilla crash reporter.  I have attached a screenshot of what the crash looked like as well as the xml file that the Windows XP crash reporter generated.  I tried to also get all the "Error Report Contents" shown in the screenshot, but the window doesn't allow me to select or copy.
Comment on attachment 422762 [details] [diff] [review]
wip3

Great work.

+  if (rootPresContext)
+    rootPresContext->PresShell()->GetViewManager()->
+      GetRootWidget(getter_AddRefs(event.widget));
+  else
+    event.widget = nsnull;

{} around these statements.
Attachment #422762 - Flags: review?(roc) → review+
(In reply to comment #30)
> I tried to use Adobe ClickMap with the test build ... still crashed

Thanks for testing.  I can reproduce this crash locally.  With "wip3"
there's a "ASSERTION: top-most pres context isn't a root context" before
the crash.  Investigating...
Attached patch wip4Splinter Review
Bug 539983 (crash using Adobe ClickMap) is closely related so let's fix
that here too.  The problem there is that the pres shell in the ancestor
is Destroy'd, so GetRootPresContext() fails to look further up due to
the non-existent root frame in that shell.  So, in "wip3" the assert
fails and we do a bogus cast...

I don't believe we can deduce from just looking at the
frame/view/shell/container if we have a true root pres context or not,
we need to use something like IsRoot() that I used for the assert
in the last patch.

Changes compared to "wip3":
  *  use virtual IsRoot() to guarantee a correct cast
  *  fix {} nit
  *  make it explicit in the doc comment for GetRootPresContext()
     that it can return null

mozilla-central test builds:
https://build.mozilla.org/tryserver-builds/mpalmgren@mozilla.com-521426-mc-wip4/
Attachment #422762 - Attachment is obsolete: true
Attachment #422764 - Attachment is obsolete: true
Attachment #422980 - Flags: review?(roc)
This bug (and bug 539983) doesn't affect 1.9.1 or earlier, AFAICT.
That worked!  I tested on the wip4 for 1.9.2 test build and Firefox no longer crashes when I login to ClickMap.  I spent some time using the add-on and everything appeared to work normally.

There was one oddity that may or may not be related to this issue.  When I login to ClickMap, before the next page loads, the "Login to SiteCatalyst" button disappears for a second.  This doesn't happen in Firefox 3.5.  It's a minor issue, but I thought I would bring it up just in case.
Sorry, I guess the first screenshot didn't have the button, either.
Attachment #422997 - Attachment description: Notice the missing login button (compare with my previous screenshot) → Notice the missing login button (compare with my next screenshot)
Comment on attachment 422982 [details] [diff] [review]
wip4 for 1.9.2

Thanks a lot Mats!
Attachment #422982 - Flags: review?(roc) → review+
http://hg.mozilla.org/mozilla-central/rev/795803409745
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
OS: Windows XP → All
Hardware: x86 → All
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.3a1
Don't forget to request approval1.9.2.1 on the appropriate patch.
Ok.  But we still require a minimum 2 day baking period before making
that request, right?
I dunno. It seems to me that people don't always wait for two days.
Depends on: 542608
blocking2.0: --- → beta1
Priority: -- → P2
Will wait another day or two to allow the regression fix to bake.
Comment on attachment 422982 [details] [diff] [review]
wip4 for 1.9.2

Request approval together with the regression fix in bug 542608.
Attachment #422982 - Flags: approval1.9.2.1?
Attachment #422982 - Flags: approval1.9.2.2? → approval1.9.2.2+
I have verified that 539983 is fixed on XP (and crashes in Firefox 3.6.0) with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.2pre) Gecko/20100225 Namoroka/3.6.2pre (.NET CLR 3.5.30729). Is that enough to verify this fix for 1.9.2.2?

I cannot reproduce the problem with the steps in comment 20.
Al, the STR in comment 20 was 100% reproducible for me, the crucial bit
is "Enable it to open in sidebar".  Try opening/closing some tabs intermixed
with choosing that bookmark and clicking links in it.
Flags: wanted1.9.2?
Keywords: regression
I was finally able to reproduce the problem with Firefox 3.6 and it is fixed in 3.6.2: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2 (.NET CLR 3.5.30729).
Keywords: verified1.9.2
Crash Signature: [@ PL_DHashTableEnumerate]
Product: Core → Core Graveyard
Component: Layout: Misc Code → Layout
Product: Core Graveyard → Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: