Closed Bug 798446 Opened 13 years ago Closed 13 years ago

startup crash in nsGlobalWindow::nsGlobalWindow

Categories

(Core :: DOM: Core & HTML, defect)

17 Branch
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla20
Tracking Status
firefox18 + fixed
firefox19 + fixed
firefox20 + verified
firefox-esr17 + verified

People

(Reporter: scoobidiver, Assigned: mounir)

Details

(Keywords: crash, regression, topcrash, Whiteboard: [native-crash][startupcrash])

Crash Data

Attachments

(1 file)

It started spiking in 17.0a1/20120803 and is #21 top browser crasher w/o hangs in 17.0a2. The regression range might be: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=588424024294&tochange=89dcadd42ec4 Signature PL_DHashTableOperate | nsGlobalWindow::nsGlobalWindow(nsGlobalWindow*) More Reports Search UUID e883fc2d-86fc-43ed-bc4e-ce2382121005 Date Processed 2012-10-05 13:48:05 Uptime 0 Last Crash 4 seconds before submission Install Age 5 seconds since version was first installed. Install Time 2012-10-05 13:47:47 Product Firefox Version 18.0a1 Build ID 20121005030609 Release Channel nightly OS Windows NT OS Version 6.1.7601 Service Pack 1 Build Architecture x86 Build Architecture Info GenuineIntel family 6 model 15 stepping 13 Crash Reason EXCEPTION_ACCESS_VIOLATION_READ Crash Address 0x0 User Comments Processor Notes WARNING: JSON file missing Add-ons EMCheckCompatibility False Total Virtual Memory 2147352576 Available Virtual Memory 1984593920 System Memory Use Percentage 42 Available Page File 3053518848 Available Physical Memory 1236635648 Frame Module Signature Source 0 xul.dll PL_DHashTableOperate obj-firefox/xpcom/build/pldhash.cpp:577 1 xul.dll nsGlobalWindow::nsGlobalWindow dom/base/nsGlobalWindow.cpp:786 2 xul.dll nsGlobalChromeWindow::nsGlobalChromeWindow dom/base/nsGlobalWindow.h:1127 3 xul.dll NS_NewScriptGlobalObject dom/base/nsGlobalWindow.h:1186 4 xul.dll nsDocShell::EnsureScriptEnvironment docshell/base/nsDocShell.cpp:11243 5 xul.dll nsDocShell::GetInterface docshell/base/nsDocShell.cpp:941 6 xul.dll nsGetInterface::operator obj-firefox/xpcom/build/nsIInterfaceRequestorUtils.cpp:19 7 xul.dll nsCOMPtr_base::assign_from_helper obj-firefox/xpcom/build/nsCOMPtr.cpp:110 8 xul.dll nsAppShellService::UnregisterTopLevelWindow xpfe/appshell/src/nsAppShellService.cpp:523 9 xul.dll nsXULWindow::Destroy xpfe/appshell/src/nsXULWindow.cpp:432 10 xul.dll nsXULWindow::~nsXULWindow xpfe/appshell/src/nsXULWindow.cpp:124 11 xul.dll nsWebShellWindow::`scalar deleting destructor' 12 xul.dll nsWebShellWindow::Release xpfe/appshell/src/nsWebShellWindow.cpp:102 13 xul.dll nsRefPtr<nsAsyncDOMEvent>::~nsRefPtr<nsAsyncDOMEvent> obj-firefox/dist/include/nsAutoPtr.h:874 14 xul.dll nsAppShellService::JustCreateTopWindow xpfe/appshell/src/nsAppShellService.cpp:257 15 xul.dll nsAppShellService::CreateHiddenWindow xpfe/appshell/src/nsAppShellService.cpp:105 16 xul.dll nsAppStartup::CreateHiddenWindow toolkit/components/startup/nsAppStartup.cpp:259 17 xul.dll XREMain::XRE_mainRun toolkit/xre/nsAppRunner.cpp:3716 18 xul.dll XREMain::XRE_main toolkit/xre/nsAppRunner.cpp:3848 19 xul.dll XRE_main toolkit/xre/nsAppRunner.cpp:3923 20 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:105 21 firefox.exe __tmainCRTStartup crtexe.c:552 22 kernel32.dll BaseThreadInitThunk 23 ntdll.dll __RtlUserThreadStart 24 ntdll.dll _RtlUserThreadStart More reports at: https://crash-stats.mozilla.com/report/list?signature=PL_DHashTableOperate+|+nsGlobalWindow%3A%3AnsGlobalWindow%28nsGlobalWindow*%29
The only way I see for that to happen is if nsLayoutStatics::Initialize() isn't called before nsGlobalWindow ctor. I do not know enough our startup code to know if that's actually possible. Actually, I wonder why we do not call nsLayoutStatics::Initialize() as soon as it is AddRef'd. Are we not doing that for performance optimization? As a caller, I would consider that something AddRef'd is alive and working.
I would think that nsLayoutStatics::Initialize is called before any other code in the layout module runs...
Crash Signature: [@ PL_DHashTableOperate | nsGlobalWindow::nsGlobalWindow(nsGlobalWindow*)] → [@ PL_DHashTableOperate | nsGlobalWindow::nsGlobalWindow(nsGlobalWindow*)] [@ PL_DHashTableOperate | nsBaseHashtable<nsUint64HashKey, nsGlobalWindow*, nsGlobalWindow*>::Put]
OS: Windows 7 → All
Hardware: x86 → All
Whiteboard: [startupcrash] → [native-crash][startupcrash]
I do not see any reason why |sWindowsById| could be null except if |nsGlobalWindow::Init()| wasn't called... unless nsGlobalWindow::ShutDown() was called in the meantime?
This showed up on the explosive report today - Here are the recent module correlations - there are no addon correlations showing in today's manual report: PL_DHashTableOperate | nsGlobalWindow::nsGlobalWindow(nsGlobalWindow*)|EXCEPTION_ACCESS_VIOLATION_READ (169 crashes) 62% (104/169) vs. 6% (2044/31616) browsemngr.dll 76% (128/169) vs. 34% (10712/31616) wshtcpip.dll 76% (128/169) vs. 34% (10717/31616) hnetcfg.dll 66% (111/169) vs. 24% (7689/31616) MSCTF.dll 76% (128/169) vs. 38% (12091/31616) ws2help.dll 76% (128/169) vs. 38% (12091/31616) iphlpapi.dll 75% (127/169) vs. 38% (11941/31616) comres.dll 99% (168/169) vs. 63% (20037/31616) browsercomps.dll 100% (169/169) vs. 64% (20250/31616) firefox.exe 100% (169/169) vs. 64% (20266/31616) xpcom.dll 51% (87/169) vs. 16% (5050/31616) MSCTFIME.IME 100% (169/169) vs. 65% (20489/31616) dbghelp.dll 100% (169/169) vs. 72% (22848/31616) mswsock.dll 76% (129/169) vs. 51% (16156/31616) secur32.dll 62% (105/169) vs. 48% (15231/31616) imagehlp.dll 7% (11/169) vs. 0% (134/31616) SC2Hook.dll Module versions: PL_DHashTableOperate | nsGlobalWindow::nsGlobalWindow(nsGlobalWindow*)|EXCEPTION_ACCESS_VIOLATION_READ (169 crashes) 62% (104/169) vs. 6% (2044/31616) browsemngr.dll 0% (0/169) vs. 0% (4/31616) 2.2.565.25 0% (0/169) vs. 0% (8/31616) 2.2.630.40 0% (0/169) vs. 0% (33/31616) 2.2.643.41 2% (3/169) vs. 0% (156/31616) 2.3.762.17 2% (3/169) vs. 1% (275/31616) 2.3.765.24 14% (23/169) vs. 3% (797/31616) 2.3.787.43 44% (75/169) vs. 2% (771/31616) 2.3.796.11 browsemngr.dll has been problematic before and is cited in numerous other bugs. The one URL in the list was http://start.funmoods.com/?f=1&a=Cmiwbst, which points to the funmoods toolbar being involved.
browsemngr.dll is also known to come bundled with Babylon (see bug 782706#c3). A browser manager extension has been blocked recently (see https://addons.mozilla.org/firefox/blocked/i167) but not the DLL which is loaded independently from the extension.
It's #18 top browser crasher w/o hangs in 17.0b6, #22 in 18.0a2 and #66 in 19.0a1. It's no longer correlated to browsemngr.dll.
Crash Signature: [@ PL_DHashTableOperate | nsGlobalWindow::nsGlobalWindow(nsGlobalWindow*)] [@ PL_DHashTableOperate | nsBaseHashtable<nsUint64HashKey, nsGlobalWindow*, nsGlobalWindow*>::Put] → [@ PL_DHashTableOperate | nsGlobalWindow::nsGlobalWindow(nsGlobalWindow*)] [@ PL_DHashTableOperate | nsBaseHashtable<nsUint64HashKey, nsGlobalWindow*, nsGlobalWindow*>::Put] [@ PL_DHashTableOperate | nsTHashtable<nsBaseHashtableET<nsUint64HashKey nsGlob…
Keywords: topcrash
(In reply to Scoobidiver from comment #6) > It's #18 top browser crasher w/o hangs in 17.0b6, #22 in 18.0a2 and #66 in > 19.0a1. > > It's no longer correlated to browsemngr.dll. Are there any other correlations or URLs that QA should be specifically exploring? (In reply to Mounir Lamouri (:mounir) from comment #3) > I do not see any reason why |sWindowsById| could be null except if > |nsGlobalWindow::Init()| wasn't called... unless nsGlobalWindow::ShutDown() > was called in the meantime? Is there anything we can do to guard against this, even if we don't understand the root cause at this point?
Assignee: nobody → mounir
Keywords: needURLs
We have almost no URLs for this one (probably because we're too early in startup): 2 http://www.seznam.cz/ 1 http://www.google.hu/ 1 http://adpica.mediaweb.co.kr/RealMedia/ads/adstream_sx.ads/ID_push/start_browser... 1 http://eursapp01.eur.galderma.com:8000/sap(bD1lbg==)/public/bsp/sap/system/sessi... The signature from comment #0 sits at topcrash rank #18 for 17.0 release. 99% of those crashes are within the first minute of running, so this is very clearly a startup crash. The comments only tell us that people have no idea why "Firefox doesn't open". Correlations don't give away anything really useful.
Keywords: needURLs
Attached patch PatchSplinter Review
Lets try that. I'm not sure how |sWindowsById| could be null in a normal startup but not adding the id in the list wouldn't be that bad and if that fixes the crash... I kept the ASSERTIONs outside of the condition on purpose so a debug build will still crash and hopefully, we might be able to understand what is happening.
Attachment #685667 - Flags: review?(bzbarsky)
Attachment #685667 - Flags: review?(bzbarsky) → review+
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla20
Attachment #685667 - Flags: checkin+
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
(In reply to Mounir Lamouri (:mounir) from comment #11) > https://hg.mozilla.org/integration/mozilla-inbound/rev/cdb336b77f97 Please nominate for uplift as soon as you're comfortable with the change's bake time on m-c.
Comment on attachment 685667 [details] [diff] [review] Patch Asking for approval given that there is no reason to have this bake in m-c. However, I would like some feedback from QA to make sure that the fix actually prevents the crash. [Approval Request Comment] Bug caused by (feature/regressing bug #): no idea User impact if declined: startup crashes Risk to taking this patch (and alternatives if risky): the patch is not risky at all per se, because it is just checking if a pointer isn't null before dereferencing it. However, not doing that call might creates bug (though, I think it is unlikely, most of the code related to that pointer assume it can be null because on shutdown it is null). Anyway, as long as we except crash worse than runtime bug, I think the risk/benefit ration is pretty low. We should take this in branches. String or UUID changes made by this patch: no
Attachment #685667 - Flags: approval-mozilla-beta?
Attachment #685667 - Flags: approval-mozilla-aurora?
If this really fixes the crash, I'd be happy to get this on beta, as it's still the #22 crash in 18.0b1 and 99% of those happening at startup.
For approval considerations: Unfortunately I can't tell from crash-stats if the fix worked or not because this seem to happen extremely rarely with the low level of users we have on Nightly and Aurora, we'll need it in beta to verify the fix.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #16) > For approval considerations: Unfortunately I can't tell from crash-stats if > the fix worked or not because this seem to happen extremely rarely with the > low level of users we have on Nightly and Aurora, we'll need it in beta to > verify the fix. Thanks KaiRo - we'll move forward with landing this given the risk evaluation in Comment 14.
Attachment #685667 - Flags: approval-mozilla-beta?
Attachment #685667 - Flags: approval-mozilla-beta+
Attachment #685667 - Flags: approval-mozilla-aurora?
Attachment #685667 - Flags: approval-mozilla-aurora+
Just as a note, this is #11 in 18.0b2 right now - it should have landed in time for b3, we should verify there if it's gone.
(In reply to Mounir Lamouri (:mounir) from comment #18) > https://hg.mozilla.org/releases/mozilla-aurora/rev/adf3fc9e7c16 > https://hg.mozilla.org/releases/mozilla-beta/rev/910cfe5dbba5 > > Should we push that to esr if it happens to fix the crash? It's just outside the top crash range, but if you wouldn't mind it would be a good idea to land there as well (once verified).
Comment on attachment 685667 [details] [diff] [review] Patch [Approval Request Comment] According to the previous comment, we should land this to esr.
Attachment #685667 - Flags: approval-mozilla-esr17?
Attachment #685667 - Flags: approval-mozilla-esr17? → approval-mozilla-esr17+
No crashes in the last 4 weeks on FF 20 and ESR > 17.0. Verified fixed
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: