Closed Bug 626343 Opened 9 years ago Closed 9 years ago

Crash [@ nsPluginInstanceOwner::CreateWidget() ]

Categories

(Core :: Plug-ins, defect, critical)

x86
Windows XP
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- final+

People

(Reporter: scoobidiver, Assigned: tnikkel)

Details

(Keywords: crash, topcrash, Whiteboard: [hardblocker])

Crash Data

Attachments

(2 files)

This crash signature is supposed to be fixed in 4.0b9pre/20101225 after the fixing of bug 584251 but there has been no crash stop. I don't reopen it as it has a patch.
It is #7 top crasher in 4.0b9 over the last week.

Signature	nsPluginInstanceOwner::CreateWidget()
UUID	ae94955a-bed7-43f4-9917-436492110116
Time 	2011-01-16 13:53:43.216190
Uptime	1834
Last Crash	1118682 seconds (1.8 weeks) before submission
Install Age	1834 seconds (30.6 minutes) since version was first installed.
Product	Firefox
Version	4.0b10pre
Build ID	20110116030325
Branch	2.0
OS	Windows NT
OS Version	5.1.2600 Service Pack 3
CPU	x86
CPU Info	GenuineIntel family 6 model 22 stepping 1
Crash Reason	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address	0x0
App Notes 	AdapterVendorID: 8086, AdapterDeviceID: 2772

Frame 	Module 	Signature [Expand] 	Source
0 	xul.dll 	nsPluginInstanceOwner::CreateWidget 	layout/generic/nsObjectFrame.cpp:6323
1 	xul.dll 	nsPluginHost::DoInstantiateEmbeddedPlugin 	modules/plugin/base/src/nsPluginHost.cpp:1101
2 	xul.dll 	nsPluginHost::InstantiateEmbeddedPlugin 	modules/plugin/base/src/nsPluginHost.cpp:967
3 	xul.dll 	nsObjectFrame::InstantiatePlugin 	layout/generic/nsObjectFrame.cpp:1043

More reports at:
http://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=exact&query=&range_value=4&range_unit=weeks&hang_type=any&process_type=any&plugin_field=&plugin_query_type=&plugin_query=&do_query=1&admin=&signature=nsPluginInstanceOwner%3A%3ACreateWidget%28%29
blocking2.0: --- → ?
In nsPluginInstanceOwner::CreateWidget the mObjectFrame->CreateWidget call is inside an "if (!view || !mWidget)" where view is the (outer) view of the object frame. Although it seems quite unlikely that we wouldn't have a view, it seems wrong to proceed in that case. If view is null and mWidget is not then proceeding is certainly be the wrong thing to do. And if both are null I don't think its the right thing to do either. We could try taking the !view part out of that if, or just null check parentWidget at the crashing site. Mats, what do you think?
Looking at the code history the view null check stems from a time when nsObjectFrame::CreateWidget created the view for the nsObjectFrame and it looks like we didn't have inner views at all for object frames (?). It actually used to just be a "if (!view)" check, the !widget part was added later.
This is a high ranking regression since 3.6.x.  Its been around for the last few betas and is up to nearly 400 crashes per day. It should block 4.0 final.

         nsPluginInstanceOwner::CreateWidget..
date     total    breakdown by build
         crashes  count build, count build, ...

20110117 386  	243 4.0b92011011019, 
        		78 4.0b82010121417, 	32 4.0b72010110414, 
        		11 4.0b62010091408, 	9 4.0b22010072019, 
        		3 4.0b52010083108, 	2 4.0b32010080519, 
        		2 4.0b10pre2011011603, 	2 4.0b10pre2011011503, 
        		2 3.6.132010120307, 	1 4.0b42010081813, 
        		1 4.0b10pre2011011703,
Assignee: nobody → tnikkel
blocking2.0: ? → final+
Whiteboard: [hardblocker]
Let's try this, if it has no effect on the crash numbers we can look elsewhere.
Attachment #504788 - Flags: review?(roc)
Landed that patch
http://hg.mozilla.org/mozilla-central/rev/dd10fd6a8b24

We'll need to evaluate if this had any effect once we get new nightlies and enough data.
looks like 250-400 crashes per day on the betas, but much lower volume on trunk so we might need to wait for a beta to see full impact.  here are recent numbers.

         nsPluginInstanceOwner::CreateWidget..

date     total    breakdown by build
         crashes  count build, count build, ...

20110117 386  	243 4.0b92011011019, 
        		78 4.0b82010121417, 	32 4.0b72010110414, 
        		11 4.0b62010091408, 	9 4.0b22010072019, 
        		3 4.0b52010083108, 	2 4.0b32010080519, 
        		2 4.0b10pre2011011603, 	2 4.0b10pre2011011503, 
        		2 3.6.132010120307, 	1 4.0b42010081813, 
        		1 4.0b10pre2011011703,
3 crashes so far on the 2011-01-19 nightly, so we can probably conclude that patch didn't fix it.
I noticed that in nsObjectFrame::Instantiate the InstantiatePlugin call (which eventually leads to our crash site) is followed by a weak frame check. So that means the object frame can go away, perhaps it goes away somewhere and we need to check that somewhere.
If my 'frame is destroyed' theory is correct, we would need to figure out where it can be destroyed so we know where to add weak frame checks.
Any time you make an NPAPI call (NPP_New, NPP_SetWindow, and many others) you should assume that the plugin may do scripting which may destroy the frame. Plugins do crazy things.
Enough beta10 data to conclude this had no effect now?
To see if my "frame is destroyed" theory is true at all we could do something ugly and check if a weak frame we init in nsObjectFrame::InstantiatePlugin and store somewhere accessible by nsPluginInstanceOwner::CreateWidget is alive and then crash by accessing an invalid and sentinel address and then see if we get that in the crash stats.
I have less faith in that theory as if the frame is destroyed the pointer stored in nsPluginInstanceOwner would get cleared, so it wouldn't be stale.
nsWindow::Create can fail and return a failure code, we don't get this. It can fail if ::CreateWindowEx fails for some reason. I'm not sure if this is happening, but we should probably do this anyway.
Attachment #508326 - Flags: review?(roc)
http://hg.mozilla.org/mozilla-central/rev/73f7643d522d

And I've got more ideas on how to justify another wallpaper patch if that doesn't work.
That added some debugging code by accident, backed that part out:
http://hg.mozilla.org/mozilla-central/rev/6fbc3c08894e
I've been operating under the theory that this crash is caused by using the parent widget at the crashing location where we didn't before. We started using the parent widget with retained layers (landed mid July 2010) and the first bug on this crash (bug 584251) was filed August 3, 2010.
We have no crashes so far in builds that have the last patch. Given the frequency before that it is pretty good evidence that the patch fixed the crash.

So that means that the call to ::CreateWindowExW in nsWindow::Create fails. Why would that fail? Maybe nsWindow::OnDestroy has fired on the parent widget and so it's mWnd is null?
(In reply to comment #18)
> We have no crashes so far in builds that have the last patch. Given the
> frequency before that it is pretty good evidence that the patch fixed the
> crash.
> 
> So that means that the call to ::CreateWindowExW in nsWindow::Create fails. Why
> would that fail? Maybe nsWindow::OnDestroy has fired on the parent widget and
> so it's mWnd is null?

Yep, if the parent hwnd is null and you're creating a child, it'll fail. We have a warning in there on this that you should see.
If the reason for this crash is that OnDestroy has been called on the parent widget then I think the correct fix here is just to correctly handle that error case.
Adding [has patch] to whiteboard to indicate that we've got a fix for this issue and make some bugzilla accounting a bit easier. If you decide to go for a better fix in this bug, feel free to remove this whiteboard addition.
Whiteboard: [hardblocker] → [hardblocker][has patch][wallpaper fix landed]
I disagree that it is wallpaper. This seems to be a valid state and we need to deal with errors.

Before resolving this I'm going to wait for beta 11 data. I will do any further work in a followup.
Whiteboard: [hardblocker][has patch][wallpaper fix landed] → [hardblocker][waiting for more crash data before resolving]
I'm sorry for mis-classifying as wallpaper. I wasn't making any judgement, I just misread one of your comments about the patch. I've removed that bit from the whiteboard but am keeping [has patch] since this bug does have a patch and any future patching will, if I understand it correctly, happen in a follow-up bug. Sorry also for the interruption and confusion.
Whiteboard: [hardblocker][waiting for more crash data before resolving] → [hardblocker][has patch][waiting for more crash data before resolving]
Calling this fixed. If it turns out its not, we can reopen or a new bug.

The followup mentioned in comment 22 was just to check the return value in callers of CreateWidget. Which caused orange on try server, so maybe I'll just stop poking the sleeping-plugin-bear and call this a win.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Whiteboard: [hardblocker][has patch][waiting for more crash data before resolving] → [hardblocker]
(FTR the the try server failure in comment 24 was layout/reftests/bugs/599476.html on mac.)
Hmm, still a few crashes in beta 11. The volume is vastly reduced though.
We got one crash in a win64 build, and this one actually has a deeper stack. And the result is interesting because we are thawing a document when it comes out of the bfcache. But still I don't see how it could fail. The view hierarchy etc should be intact at this point. I just wanted to preserve this somewhere in case I tackle this crash signature again.

https://crash-stats.mozilla.com/report/index/ddc73da6-d950-44fd-9681-9a4912110218

0 	xul.dll 	nsPluginInstanceOwner::CreateWidget 	layout/generic/nsObjectFrame.cpp:6627
1 	xul.dll 	nsPluginHost::DoInstantiateEmbeddedPlugin 	modules/plugin/base/src/nsPluginHost.cpp:1104
2 	mozcrt19.dll 	arena_dalloc 	obj-firefox/memory/jemalloc/crtsrc/jemalloc.c:4291
3 	mozcrt19.dll 	arena_dalloc 	obj-firefox/memory/jemalloc/crtsrc/jemalloc.c:4291
4 	mozcrt19.dll 	memcmp 	obj-firefox/memory/jemalloc/crtsrc/memcmp.c:60
5 	xul.dll 	SearchTable 	obj-firefox/xpcom/build/pldhash.c:439
6 	xul.dll 	mozilla::FramePropertyTable::Get 	layout/base/FramePropertyTable.cpp:105
7 	xul.dll 	nsACString_internal::Equals 	xpcom/string/src/nsTSubstring.cpp:600
8 	xul.dll 	nsComponentManagerImpl::GetService 	xpcom/components/nsComponentManager.cpp:1425
9 	xul.dll 	nsDocument::QueryInterface 	content/base/src/nsDocument.cpp:1730
10 	xul.dll 	nsPluginHost::InstantiateEmbeddedPlugin 	modules/plugin/base/src/nsPluginHost.cpp:970
11 	xul.dll 	nsCOMPtr_base::assign_from_qi 	obj-firefox/xpcom/build/nsCOMPtr.cpp:96
12 	xul.dll 	nsObjectFrame::InstantiatePlugin 	layout/generic/nsObjectFrame.cpp:1141
13 	xul.dll 	nsCOMPtr_base::assign_with_AddRef 	obj-firefox/xpcom/build/nsCOMPtr.cpp:88
14 	xul.dll 	nsObjectFrame::Instantiate 	layout/generic/nsObjectFrame.cpp:2692
15 	xul.dll 	nsObjectLoadingContent::Instantiate 	content/base/src/nsObjectLoadingContent.cpp:1893
16 	xul.dll 	matchPrefEntry 	modules/libpref/src/prefapi.cpp:111
17 	xul.dll 	NS_TableDrivenQI 	obj-firefox/xpcom/build/nsISupportsImpl.cpp:49
18 	xul.dll 	nsObjectLoadingContent::EnsureInstantiation 	content/base/src/nsObjectLoadingContent.cpp:917
19 	xul.dll 	nsCOMPtr_base::assign_from_qi 	obj-firefox/xpcom/build/nsCOMPtr.cpp:96
20 	xul.dll 	ThawElement 	layout/base/nsPresShell.cpp:7628
21 	xul.dll 	xul.dll@0x364b8f 	
22 	xul.dll 	EnumWalkAllRules 	content/xbl/src/nsBindingManager.cpp:1350
23 	xul.dll 	PL_DHashTableEnumerate 	obj-firefox/xpcom/build/pldhash.c:754
24 	xul.dll 	nsIDocument::EnumerateFreezableElements 	content/base/src/nsDocument.cpp:8040
25 	xul.dll 	xul.dll@0x364b8f 	
26 	xul.dll 	xul.dll@0x363993 	
27 	xul.dll 	xul.dll@0x62c83f 	
28 	xul.dll 	PresShell::Thaw 	layout/base/nsPresShell.cpp:7651
29 	xul.dll 	nsDocShell::RestoreFromHistory 	docshell/base/nsDocShell.cpp:7347
Note that Win64 stacks are slightly less trustworthy, since I never actually fixed bug 548035, so they rely exclusively on stack scanning to walk the stack.
Bug 507926 seems related.
Crash Signature: [@ nsPluginInstanceOwner::CreateWidget() ]
You need to log in before you can comment on or make changes to this bug.