Last Comment Bug 793370 - crash in nsWebShellWindow::Initialize
: crash in nsWebShellWindow::Initialize
Status: RESOLVED FIXED
[startupcrash]
: crash, topcrash
Product: Core
Classification: Components
Component: Document Navigation (show other bugs)
: 16 Branch
: All Windows XP
: -- critical (vote)
: ---
Assigned To: Bobby Holley (:bholley) (busy with Stylo)
:
: Andrew Overholt [:overholt]
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-09-21 23:52 PDT by Scoobidiver (away)
Modified: 2014-01-10 10:40 PST (History)
11 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed
+
verified


Attachments
Null-check SSM. v1 (1.93 KB, patch)
2012-09-28 08:33 PDT, Bobby Holley (:bholley) (busy with Stylo)
bzbarsky: review+
bugs: review+
akeybl: approval‑mozilla‑aurora+
akeybl: approval‑mozilla‑beta+
Details | Diff | Splinter Review
beta (1.78 KB, patch)
2012-10-01 15:32 PDT, Olli Pettay [:smaug]
no flags Details | Diff | Splinter Review

Description Scoobidiver (away) 2012-09-21 23:52:23 PDT
There are a few startup crashes across all versions.

Signature 	nsWebShellWindow::Initialize(nsIXULWindow*, nsIXULWindow*, nsIURI*, int, int, bool, nsWidgetInitData&) More Reports Search
UUID	657104ce-1d23-4005-bbca-959272120922
Date Processed	2012-09-22 00:12:03
Uptime	0
Last Crash	16 seconds before submission
Install Age	0 seconds since version was first installed.
Install Time	2012-09-22 00:11:51
Product	Firefox
Version	16.0
Build ID	20120919065210
Release Channel	beta
OS	Windows NT
OS Version	6.1.7601 Service Pack 1
Build Architecture	x86
Build Architecture Info	AuthenticAMD family 15 model 107 stepping 2
Crash Reason	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address	0x0
User Comments	
Processor Notes 	WARNING: JSON file missing Add-ons
EMCheckCompatibility	False
Adapter Vendor ID	
Adapter Device ID	
Total Virtual Memory	2147352576
Available Virtual Memory	2013954048
System Memory Use Percentage	32
Available Page File	5375729664
Available Physical Memory	2163609600

Frame 	Module 	Signature 	Source
0 	xul.dll 	nsWebShellWindow::Initialize 	xpfe/appshell/src/nsWebShellWindow.cpp:215
1 	xul.dll 	nsAppShellService::JustCreateTopWindow 	xpfe/appshell/src/nsAppShellService.cpp:351
2 	xul.dll 	nsAppShellService::CreateHiddenWindow 	xpfe/appshell/src/nsAppShellService.cpp:105
3 	xul.dll 	nsAppStartup::CreateHiddenWindow 	toolkit/components/startup/nsAppStartup.cpp:226
4 	xul.dll 	XREMain::XRE_mainRun 	toolkit/xre/nsAppRunner.cpp:3722
5 	xul.dll 	XREMain::XRE_main 	toolkit/xre/nsAppRunner.cpp:3871
6 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3947
7 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:100
8 	firefox.exe 	__tmainCRTStartup 	crtexe.c:552
9 	kernel32.dll 	BaseThreadInitThunk 	
10 	ntdll.dll 	__RtlUserThreadStart 	
11 	ntdll.dll 	_RtlUserThreadStart

More reports at:
https://crash-stats.mozilla.com/report/list?signature=nsWebShellWindow%3A%3AInitialize%28nsIXULWindow*%2C+nsIXULWindow*%2C+nsIURI*%2C+int%2C+int%2C+bool%2C+nsWidgetInitData%26%29
Comment 1 Scoobidiver (away) 2012-09-24 03:24:37 PDT
It's #19 top browser crasher in 16.0b4.

There are no obvious correlations (no extensions loaded when the crash happens on startup) but it started spiking around September 20 so likely related to an extension update.
Comment 2 Alex Keybl [:akeybl] 2012-09-24 16:05:45 PDT
(In reply to Scoobidiver from comment #1)
> It's #19 top browser crasher in 16.0b4.
> 
> There are no obvious correlations (no extensions loaded when the crash
> happens on startup) but it started spiking around September 20 so likely
> related to an extension update.

I'm not sure whether this could be caused by an in-product change or external SW update. On one hand, crashes across multiple build IDs (as far back as FF14/20120612164001) all started around 9/22. On the other hand, I'd expect the prevalence of crashes in beta 3 to be about as high as beta 4 if we didn't have a hand in the issue... and FF15.0.1 is really underrepresented in crashes.

At the very least, this is much more common in FF16 than in FF15. Is there any other actionable info (DLL correlations, probably not URLs) that could help the engineering investigation?
Comment 3 Olli Pettay [:smaug] 2012-09-24 16:11:36 PDT
Based on blame, Bug 789773 might have something to do with this.
Comment 4 Scoobidiver (away) 2012-09-25 00:11:08 PDT
(In reply to Alex Keybl [:akeybl] from comment #2)
> Is there any other actionable info (DLL correlations, probably not URLs) that
> could help the engineering investigation?
I checked crash reports that haven't happened too soon and there's no correlation to an extension. There's also no correlation to non-Windows DLLs.
I suspected MS12-063 (http://technet.microsoft.com/en-us/security/bulletin/ms12-063) but crashes happen also on Windows 8 which doesn't have the hotfix.

(In reply to Olli Pettay [:smaug] from comment #3)
> Based on blame, Bug 789773 might have something to do with this.
I don't think so because in Aurora (resp. Beta) it first appeared in 17.0a2/20120918 (resp. 16.0b3) while the patch of bug 789773 landed in 17.0a2/20120919 (resp. 16.0b4).
Comment 5 Marcia Knous [:marcia - use ni] 2012-09-25 07:38:29 PDT
Only one URL shows at the moment: http://www.youtube.com/watch?v=Q9_d-90GZZc&feature=relmfu
Comment 6 Olli Pettay [:smaug] 2012-09-26 17:09:49 PDT
Any crash ids for crashes before bug 789773 ?
Comment 7 Olli Pettay [:smaug] 2012-09-26 17:15:53 PDT
https://crash-stats.mozilla.com/report/index/efe1463f-279c-4f4d-8fd9-98c3e2120921
seems to be older, but it has 
http://hg.mozilla.org/releases/mozilla-aurora/annotate/006174be2306/xpfe/appshell/src/nsWebShellWindow.cpp#l209
which is from Bug 789773.

Can anyone find any crash report from a build which doesn't have that code?
Comment 8 Olli Pettay [:smaug] 2012-09-26 17:21:22 PDT
Based on all the crash reports I've seen ssm is null.
Is it possible that we for some odd reason have started the shutdown so that
services aren't available anymore?
Or do we run this code before NS_SCRIPTSECURITYMANAGER_CONTRACTID has been registered?
Comment 9 Bobby Holley (:bholley) (busy with Stylo) 2012-09-27 01:25:23 PDT
(In reply to Olli Pettay [:smaug] from comment #8)
> Or do we run this code before NS_SCRIPTSECURITYMANAGER_CONTRACTID has been
> registered?

Probably that one. When I wrote the initial patch, I tried using nsContentUtils::GetSecurityManager, but it was actually null, because nsContentUtils::Init hadn't been called yet. Given that we're creating the hidden DOM window, this is probably pretty early on in startup.

I think a simple null check for the SSM is probably the way to go here. Smaug, do you agree?
Comment 10 Olli Pettay [:smaug] 2012-09-27 01:49:54 PDT
And if ssm is null, then what?
Why are we running this code so early?
Comment 11 Bobby Holley (:bholley) (busy with Stylo) 2012-09-27 01:56:06 PDT
(In reply to Olli Pettay [:smaug] from comment #10)
> And if ssm is null, then what?

Then we don't eagerly create the initial about:blank document, and let the original machinery do its thing. Presumably, if we're running early enough that the SSM is null, then we also aren't running any addon scripts that might add themselves as nsIWebProgressListeners and trigger bug 789773.
Comment 12 Olli Pettay [:smaug] 2012-09-27 02:00:43 PDT
What would guarantee that?

Why do we call nsWebShellWindow::Initialize so early?
Could you try to use nsContentUtils::GetSecurityManager and if it is returning null, 
check the stack trace.
Comment 13 Bobby Holley (:bholley) (busy with Stylo) 2012-09-27 02:40:53 PDT
(In reply to Olli Pettay [:smaug] from comment #12)
> What would guarantee that?

Because I can't imagine that we could run script before the entity that generates principals (the SSM) is accessible.

> Why do we call nsWebShellWindow::Initialize so early?
> Could you try to use nsContentUtils::GetSecurityManager and if it is
> returning null, 
> check the stack trace.

Oh, nevermind - looking back at bug 749535, the issue was that this was causing the standalone TestAppShellSteadyState to break, because that test never calls nsContentUtils::Init. So scratch that data point.
Comment 14 Bobby Holley (:bholley) (busy with Stylo) 2012-09-28 08:33:51 PDT
Created attachment 665919 [details] [diff] [review]
Null-check SSM. v1

I think this is better than crashing.
Comment 15 Olli Pettay [:smaug] 2012-09-28 09:18:13 PDT
Comment on attachment 665919 [details] [diff] [review]
Null-check SSM. v1

This is horrible but ok to me as a *temporary* fix.
But please check why the method gets called too early
(or too late).

bz should comment too since he reviewed Bug 789773
Comment 16 Bobby Holley (:bholley) (busy with Stylo) 2012-09-28 09:26:54 PDT
(In reply to Olli Pettay [:smaug] from comment #15)
> Comment on attachment 665919 [details] [diff] [review]
> Null-check SSM. v1
> 
> This is horrible but ok to me as a *temporary* fix.
> But please check why the method gets called too early
> (or too late).

Until QA can reproduce the crash, I don't think we can easily do that. Unless we understand by inspection how a hidden DOM window can be created before the service manager is set up? I know very little about startup...
Comment 17 Olli Pettay [:smaug] 2012-09-28 09:35:32 PDT
Yes, that is what I hope someone could check. How do we create this stuff during startup.
Is there some path where we call this stuff during component init/registration?
Comment 18 Boris Zbarsky [:bz] (still a bit busy) 2012-09-28 10:30:46 PDT
Comment on attachment 665919 [details] [diff] [review]
Null-check SSM. v1

Yeah, I can live with this.
Comment 19 Scoobidiver (away) 2012-09-28 10:37:06 PDT
There are no crashes after 18.0a1/20120925 and 17.0a2/20120926, but there are crashes in 16.0b5.
Comment 20 Alex Keybl [:akeybl] 2012-09-28 14:41:32 PDT
Comment on attachment 665919 [details] [diff] [review]
Null-check SSM. v1

[Triage Comment]
If we think this is low risk enough to take for our final beta, please land on m-c and m-a as soon as possible.

(this m-a pre-approval is contingent upon a green m-c landing)
Comment 21 Bobby Holley (:bholley) (busy with Stylo) 2012-09-28 16:44:48 PDT
I'm about to go AFK for a couple of days - if someone else could take over landing this I'd be grateful. :-)
Comment 22 Olli Pettay [:smaug] 2012-09-30 12:13:07 PDT
Landing...
Comment 24 Alex Keybl [:akeybl] 2012-10-01 10:20:57 PDT
(In reply to Olli Pettay [:smaug] from comment #23)
> https://hg.mozilla.org/mozilla-central/rev/a4ea3c56646b
> https://hg.mozilla.org/releases/mozilla-aurora/rev/307c0b460b7e
> 
> Do we need this to beta too?

Yes please - this is our top crash with what we hope is a very low risk fix (a null check).
Comment 25 Olli Pettay [:smaug] 2012-10-01 14:00:10 PDT
Comment on attachment 665919 [details] [diff] [review]
Null-check SSM. v1

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 789773
User impact if declined: crashes
Testing completed (on m-c, etc.): landed to m-c and aurora yesterday 
Risk to taking this patch (and alternatives if risky): null check, but I don't
know if this can regress Bug 789773 in some way. bholley?
String or UUID changes made by this patch: NA
Comment 26 Bobby Holley (:bholley) (busy with Stylo) 2012-10-01 14:16:14 PDT
(In reply to Olli Pettay [:smaug] from comment #25)
> Risk to taking this patch (and alternatives if risky): null check, but I
> don't
> know if this can regress Bug 789773 in some way. bholley?

I don't believe it can, per comment 13.
Comment 27 Alex Keybl [:akeybl] 2012-10-01 14:57:01 PDT
Comment on attachment 665919 [details] [diff] [review]
Null-check SSM. v1

[Triage Comment]
Thanks for getting this ready for Beta landing - approving. Please land asap (go to build is in hours).
Comment 28 Olli Pettay [:smaug] 2012-10-01 15:32:00 PDT
Created attachment 666744 [details] [diff] [review]
beta

The previous patch needed some fuzz to apply to beta because of 
nsnull -> nullptr change.
Comment 30 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-10-16 15:44:40 PDT
Please verify by checking Socorro.
Comment 31 Paul Silaghi, QA [:pauly] 2012-10-17 07:24:55 PDT
I don't see any crashes on FF 17b1, but there are still some crashes on FF 16.0.1. So, I wouldn't say it's fixed on 16?
Comment 32 Olli Pettay [:smaug] 2012-10-17 09:18:50 PDT
Paul, do you have any crash ids for the FF16.* crashes ?
Comment 33 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-10-17 11:41:25 PDT
Marking this verified fixed for Firefox 17 based on comment 31.
Comment 34 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-10-17 11:43:14 PDT
(In reply to Olli Pettay [:smaug] from comment #32)
> Paul, do you have any crash ids for the FF16.* crashes ?

I only see one Firefox 16.0.1 crash in the last week with this signature:
https://crash-stats.mozilla.com/report/index/8234e55f-9c4c-48ec-8f0d-174d12121017
Comment 35 Tracy Walker [:tracy] 2014-01-10 10:40:59 PST
mass remove verifyme requests greater than 4 months old

Note You need to log in before you can comment on or make changes to this bug.