Last Comment Bug 683059 - Crash [@ nsObjectFrame::IsOpaque] with applet and changing style onunload
: Crash [@ nsObjectFrame::IsOpaque] with applet and changing style onunload
Status: RESOLVED FIXED
: crash, regression, testcase
Product: Core
Classification: Components
Component: Plug-ins (show other bugs)
: Trunk
: x86 Windows 7
: -- critical (vote)
: mozilla13
Assigned To: Jim Mathies [:jimm]
:
Mentors:
Depends on: 724355
Blocks: 532972 90268
  Show dependency treegraph
 
Reported: 2011-08-29 17:53 PDT by Martijn Wargers [:mwargers] (not working for Mozilla)
Modified: 2012-03-30 12:13 PDT (History)
8 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
testcase (194 bytes, text/html)
2011-08-29 17:53 PDT, Martijn Wargers [:mwargers] (not working for Mozilla)
no flags Details
possible fix v1.0 (689 bytes, text/plain)
2012-02-01 09:18 PST, Josh Aas
no flags Details
fix plus original patch (1.42 KB, patch)
2012-02-01 15:34 PST, Jim Mathies [:jimm]
no flags Details | Diff | Splinter Review
test crash stack (7.22 KB, text/plain)
2012-02-01 18:27 PST, Josh Aas
no flags Details
patch (1.30 KB, patch)
2012-02-02 12:21 PST, Jim Mathies [:jimm]
jaas: review+
Details | Diff | Splinter Review

Description Martijn Wargers [:mwargers] (not working for Mozilla) 2011-08-29 17:53:34 PDT
Created attachment 556724 [details]
testcase

See testcase, after the fix for bug 90268 this started crashing.

https://crash-stats.mozilla.com/report/index/bp-35505502-ece0-4a18-bc10-bf63d2110829
0 	xul.dll 	nsQueryReferent::operator 	obj-firefox/xpcom/build/nsWeakReference.cpp:88
1 	xul.dll 	xul.dll@0xbbfbdb 	
2 	xul.dll 	nsCaret::GetCaretVisible 	
3 	xul.dll 	nsObjectFrame::IsOpaque 	layout/generic/nsObjectFrame.cpp:1118
4 	xul.dll 	nsDisplayPlugin::GetOpaqueRegion 	layout/generic/nsObjectFrame.cpp:1021
5 	xul.dll 	nsDisplayList::ComputeVisibilityForSublist 	layout/base/nsDisplayList.cpp:519
6 	xul.dll 	nsDisplayClip::ComputeVisibility 	layout/base/nsDisplayList.cpp:2073
7 	xul.dll 	nsDisplayList::ComputeVisibilityForSublist 	layout/base/nsDisplayList.cpp:516
8 	xul.dll 	nsDisplayWrapList::ComputeVisibility 	layout/base/nsDisplayList.cpp:1565
9 	xul.dll 	nsDisplayList::ComputeVisibilityForSublist 	layout/base/nsDisplayList.cpp:516
10 	xul.dll 	nsDisplayClip::ComputeVisibility 	layout/base/nsDisplayList.cpp:2073
11 	xul.dll 	nsDisplayList::ComputeVisibilityForSublist 	layout/base/nsDisplayList.cpp:516
12 	xul.dll 	nsDisplayClip::ComputeVisibility 	layout/base/nsDisplayList.cpp:2073
13 	xul.dll 	nsDisplayList::ComputeVisibilityForSublist 	layout/base/nsDisplayList.cpp:516
14 	xul.dll 	nsDisplayList::ComputeVisibilityForRoot 	layout/base/nsDisplayList.cpp:428
15 	xul.dll 	nsRootPresContext::GetPluginGeometryUpdates 	layout/base/nsPresContext.cpp:2491
16 	xul.dll 	nsRootPresContext::UpdatePluginGeometry 	
17 	xul.dll 	PresShell::FlushPendingNotifications 	layout/base/nsPresShell.cpp:4813
18 	xul.dll 	nsDocument::FlushPendingNotifications 	content/base/src/nsDocument.cpp:6325
19 	xul.dll 	nsDocument::FlushPendingNotifications 	content/base/src/nsDocument.cpp:6320
20 	xul.dll 	nsObjectLoadingContent::NotifyStateChanged 	content/base/src/nsObjectLoadingContent.cpp:
etc..

I've also seen crashes like these:
https://crash-stats.mozilla.com/report/index/bp-fe7128cf-25cf-440a-8911-c05782110829
https://crash-stats.mozilla.com/report/index/bp-c6eb52a4-1e62-43f0-99c7-eb69e2110829
https://crash-stats.mozilla.com/report/index/bp-bd4686e4-6d3d-4cd0-a368-b9a5f2110829
https://crash-stats.mozilla.com/report/index/bp-b1e72091-f2e0-42b2-bdab-340dc2110829
Comment 1 Martijn Wargers [:mwargers] (not working for Mozilla) 2012-02-01 07:49:50 PST
This is crashing again now that bug 90268 has relanded.
Comment 2 Josh Aas 2012-02-01 09:18:48 PST
Created attachment 593478 [details]
possible fix v1.0

D'oh, sorry I missed this.

In looking at the code I see one place where we don't NULL check but the crash doesn't seem to be on a NULL address so I'm not sure this will fix the problem.
Comment 3 Josh Aas 2012-02-01 15:07:45 PST
My possible fix does not fix this problem, though I still think we should take the patch.

I have a new theory about this crash, will look into it later today.
Comment 4 Jim Mathies [:jimm] 2012-02-01 15:34:49 PST
Created attachment 593634 [details] [diff] [review]
fix plus original patch

This seems to fix it, although I'm not 100% on this as I'm not 100% familiar with what's going on here post the landing of bug 90268.
Comment 5 Jim Mathies [:jimm] 2012-02-01 15:42:24 PST
https://tbpl.mozilla.org/?noignore=0&tree=Try&rev=0d0a6a6d0fed

Posted to try for a full set of builds and tests.
Comment 6 Josh Aas 2012-02-01 16:42:13 PST
Comment on attachment 593634 [details] [diff] [review]
fix plus original patch

Review of attachment 593634 [details] [diff] [review]:
-----------------------------------------------------------------

If this passes try it should at least be good enough for now. In theory this reference is supposed to be cleaned up by the code releasing the owning ref to the instance owner, so we might want to go back later and find out why that isn't happening. Even if we figure it out though, this is probably a worthwhile safety net solution.

Thanks for the patch Jim!
Comment 7 Josh Aas 2012-02-01 18:27:51 PST
Created attachment 593696 [details]
test crash stack

A crash stack from the try server failures. Will look into this later tonight.
Comment 8 Mozilla RelEng Bot 2012-02-01 21:30:17 PST
Try run for 0d0a6a6d0fed is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=0d0a6a6d0fed
Results (out of 207 total builds):
    success: 174
    warnings: 32
    failure: 1
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jmathies@mozilla.com-0d0a6a6d0fed
Comment 9 Jim Mathies [:jimm] 2012-02-02 05:51:58 PST
(In reply to Josh Aas (Mozilla Corporation) from comment #7)
> Created attachment 593696 [details]
> test crash stack
> 
> A crash stack from the try server failures. Will look into this later
> tonight.

That's on retrieving the pres context pointer from the dtor. Unfortunately I can't reproduce the test crash locally. I could check the result of PresContext(), but that call makes a number of similar calls to other pointers internally, so I'm not sure it would fix the problem.
Comment 10 Jim Mathies [:jimm] 2012-02-02 05:58:30 PST
https://hg.mozilla.org/try/rev/075c8c0ef44e

Pushed this added patch to try for a debug oth run.
Comment 11 Josh Aas 2012-02-02 08:39:46 PST
On Mac OS X you can reproduce the crashing test with:

TEST_PATH=content/base/test/chrome/test_bug391728.html make -C obj-x86_64-apple-darwin11.3.0/ mochitest-chrome

I don't think PresContext() is supposed to be able to return a NULL value there but I can test.
Comment 12 Jim Mathies [:jimm] 2012-02-02 08:49:31 PST
(In reply to Jim Mathies [:jimm] from comment #10)
> https://hg.mozilla.org/try/rev/075c8c0ef44e
> 
> Pushed this added patch to try for a debug oth run.

This didn't fix the test failures. :/
Comment 13 Jim Mathies [:jimm] 2012-02-02 09:35:37 PST
Comment on attachment 593634 [details] [diff] [review]
fix plus original patch

This isn't a good fix. The pres context here (during a cycle collection) has already been blown away. All the data in it is invalid. We need to set this instance owner to null earlier.
Comment 14 Jim Mathies [:jimm] 2012-02-02 09:52:57 PST
New test patch pushed to try:

https://hg.mozilla.org/try/rev/90d6d697d6b5

This moves the SetInstanceOwner(nsnull) call up into Destroy. Locally this passes mochitests and the test case posted here.
Comment 15 Jim Mathies [:jimm] 2012-02-02 12:21:29 PST
Created attachment 593938 [details] [diff] [review]
patch

This is passing on Linux oth and other tests, so so far this is looking good.
Comment 16 Jim Mathies [:jimm] 2012-02-02 14:48:43 PST
So doing a little logging, I see the following order - 

nsPluginInstanceOwner::Destroy()
nsPluginInstanceOwner::~nsPluginInstanceOwner()
nsObjectFrame::~nsObjectFrame()

on that test case. That jibes with the test runs succeeding as well, since if ~nsObjectFrame() occurred before ~nsPluginInstanceOwner(), that mObjectFrame->SetInstanceOwner(nsnull) call would crash.
Comment 18 Jim Mathies [:jimm] 2012-02-03 04:55:43 PST
Doing some more debugging this morning, I'm still experiencing problems where the pres context is trashed:

 	xul.dll!nsStyleContext::GetRuleNode()  Line 224 + 0xa bytes	C++
 	xul.dll!nsIFrame::PresContext()  Line 551 + 0xf bytes	C++
>	xul.dll!nsObjectFrame::SetInstanceOwner(nsPluginInstanceOwner * aOwner=0x00000000)  Line 790 + 0x8 bytes	C++
 	xul.dll!nsPluginInstanceOwner::Destroy()  Line 2737	C++
 	xul.dll!nsObjectLoadingContent::DoStopPlugin(nsPluginInstanceOwner * aInstanceOwner=0x0b8918a8, bool aDelayedStop=false)  Line 2037	C++
 	xul.dll!nsObjectLoadingContent::StopPluginInstance()  Line 2063 + 0x16 bytes	C++

Here, the object frame is still in good shape, but the pres context it requests isn't. This case should start showing up in a new crash signature after this lands.

Also, while debugging bug 723523, I hit a case where a java applet hung on shutdown and the object frame ended up getting destroyed before the Destroy call. :/
Comment 19 Jim Mathies [:jimm] 2012-02-03 05:06:43 PST
This is pretty wacky:

0xB1B2AA0 nsPluginInstanceOwner::SetFrame(0x0)
0x6B54140 nsObjectFrame::SetInstanceOwner(0x0)
0xB1B2AA0 nsPluginInstanceOwner::Destroy()
0xB1B2AA0 nsPluginInstanceOwner::~nsPluginInstanceOwner()
^^ dtor 0xB1B2AA0
++DOMWINDOW == 18 (0F2E19B0) [serial = 19] [outer = 0B0FEB70]
0x6B54140 nsObjectFrame::DestroyFrom()
0x6B54140 nsObjectFrame::SetInstanceOwner(0x0)
0x6B54140 nsObjectFrame::~nsObjectFrame()
0x69F9558 nsPluginInstanceOwner::Init(0x102F45E0)
0x69F9558 nsPluginInstanceOwner::SetFrame(0x102F45E0)
0x102F45E0 nsObjectFrame::SetInstanceOwner(0x69F9558)
For application/x-java-vm found plugin npjp2.dll
0xB1B2AA0 nsPluginInstanceOwner::Destroy()
^^ huh?
0x43 nsObjectFrame::SetInstanceOwner(0x0)
Comment 20 Jim Mathies [:jimm] 2012-02-03 05:27:24 PST
(In reply to Jim Mathies [:jimm] from comment #19)
> This is pretty wacky:
> 
> 0xB1B2AA0 nsPluginInstanceOwner::SetFrame(0x0)
> 0x6B54140 nsObjectFrame::SetInstanceOwner(0x0)
> 0xB1B2AA0 nsPluginInstanceOwner::Destroy()
> 0xB1B2AA0 nsPluginInstanceOwner::~nsPluginInstanceOwner()
> ^^ dtor 0xB1B2AA0
> ++DOMWINDOW == 18 (0F2E19B0) [serial = 19] [outer = 0B0FEB70]
> 0x6B54140 nsObjectFrame::DestroyFrom()
> 0x6B54140 nsObjectFrame::SetInstanceOwner(0x0)
> 0x6B54140 nsObjectFrame::~nsObjectFrame()
> 0x69F9558 nsPluginInstanceOwner::Init(0x102F45E0)
> 0x69F9558 nsPluginInstanceOwner::SetFrame(0x102F45E0)
> 0x102F45E0 nsObjectFrame::SetInstanceOwner(0x69F9558)
> For application/x-java-vm found plugin npjp2.dll
> 0xB1B2AA0 nsPluginInstanceOwner::Destroy()
> ^^ huh?
> 0x43 nsObjectFrame::SetInstanceOwner(0x0)

These are coming from InDocCheckEvent::Run() events in nsObjectLoadingContent.cpp. 

http://mxr.mozilla.org/mozilla-central/source/content/base/src/nsObjectLoadingContent.cpp#146
Comment 21 Jim Mathies [:jimm] 2012-02-03 05:45:35 PST
http://hg.mozilla.org/mozilla-central/annotate/af9f286b38fa/content/base/src/nsObjectLoadingContent.cpp#l1517

Josh, you added this code with your landing, this appears to be the root of a lot of evil. Can you shed some light on why we delay shutting down the plugin when the underlying document is being destroyed?
Comment 23 Ed Morley [:emorley] 2012-02-03 11:06:41 PST
https://hg.mozilla.org/mozilla-central/rev/a7087f91031e

Note You need to log in before you can comment on or make changes to this bug.