Closed Bug 709860 Opened 13 years ago Closed 12 years ago

OOM crash in nsIDNService::Init @ mozilla::Preferences::GetBranch

Categories

(Core :: Networking: DNS, defect)

9 Branch
x86
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox13 + affected
firefox14 + wontfix

People

(Reporter: marcia, Assigned: benjamin)

References

Details

(Keywords: crash, regression, topcrash, Whiteboard: startupcrash)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-69c1b2ed-cbb4-4cd8-85fd-d3b782111210 .
============================================================= 

This ranks as the #33 crash in Firefox 9b5 data but did not have a bug associated with it. https://crash-stats.mozilla.com/report/list?signature=mozalloc_abort%28char%20const*%20const%29%20|%20mozalloc_handle_oom%28%29%20|%20mozilla::Preferences::GetBranch%28char%20const*,%20nsIPrefBranch**%29

Happened in small volume in 8.0 but has increased in Firefox 9 and happens up to b5. Will look for some correlations.

Frame 	Module 	Signature [Expand] 	Source
0 	mozalloc.dll 	mozalloc_abort 	memory/mozalloc/mozalloc_abort.cpp:77
1 	mozalloc.dll 	mozalloc_handle_oom 	memory/mozalloc/mozalloc_oom.cpp:54
2 	xul.dll 	mozilla::Preferences::GetBranch 	modules/libpref/src/Preferences.cpp:563
3 	xul.dll 	nsIDNService::Init 	netwerk/dns/nsIDNService.cpp:88
4 	xul.dll 	nsIDNServiceConstructor 	netwerk/build/nsNetModule.cpp:352
5 	xul.dll 	mozilla::GenericFactory::CreateInstance 	obj-firefox/xpcom/build/GenericFactory.cpp:48
6 	xul.dll 	nsComponentManagerImpl::CreateInstanceByContractID 	xpcom/components/nsComponentManager.cpp:1299
7 	xul.dll 	nsComponentManagerImpl::GetServiceByContractID 	xpcom/components/nsComponentManager.cpp:1701
8 	xul.dll 	nsCOMPtr_base::assign_from_gs_contractid 	obj-firefox/xpcom/build/nsCOMPtr.cpp:132
9 	xul.dll 	nsDNSService::Init 	netwerk/dns/nsDNSService2.cpp:431
10 	xul.dll 	nsDNSServiceConstructor 	netwerk/build/nsNetModule.cpp:85
11 	xul.dll 	mozilla::GenericFactory::CreateInstance 	obj-firefox/xpcom/build/GenericFactory.cpp:48
12 	xul.dll 	nsComponentManagerImpl::CreateInstanceByContractID 	xpcom/components/nsComponentManager.cpp:1299
13 	xul.dll 	nsComponentManagerImpl::GetServiceByContractID 	xpcom/components/nsComponentManager.cpp:1701
14 	xul.dll 	nsCOMPtr_base::assign_from_gs_contractid_with_error 	obj-firefox/xpcom/build/nsCOMPtr.cpp:141
15 	xul.dll 	nsIOService::Init 	netwerk/base/src/nsIOService.cpp:199
16 	xul.dll 	nsIOService::GetInstance 	netwerk/base/src/nsIOService.cpp:321
17 	xul.dll 	nsIOServiceConstructor 	netwerk/build/nsNetModule.cpp:82
18 	xul.dll 	mozilla::GenericFactory::CreateInstance 	obj-firefox/xpcom/build/GenericFactory.cpp:48
19 	xul.dll 	nsComponentManagerImpl::CreateInstanceByContractID 	xpcom/components/nsComponentManager.cpp:1299
20 	xul.dll 	nsComponentManagerImpl::GetServiceByContractID 	xpcom/components/nsComponentManager.cpp:1701
21 	xul.dll 	nsCOMPtr_base::assign_from_gs_contractid 	obj-firefox/xpcom/build/nsCOMPtr.cpp:132
22 	xul.dll 	mozilla::services::GetIOService 	xpcom/build/ServiceList.h:8
23 	xul.dll 	nsChromeRegistryChrome::ManifestResource 	chrome/src/nsChromeRegistryChrome.cpp:1044
24 	xul.dll 	ParseManifestCommon 	xpcom/components/ManifestParser.cpp:649
25 	xul.dll 	ParseManifest 	xpcom/components/ManifestParser.cpp:687
26 	xul.dll 	nsComponentManagerImpl::RegisterJarManifest 	xpcom/components/nsComponentManager.cpp:583
27 	xul.dll 	XRE_AddJarManifestLocation 	xpcom/components/nsComponentManager.cpp:2131
28 	xul.dll 	LoadExtensionDirectories 	
29 	xul.dll 	nsXREDirProvider::LoadExtensionBundleDirectories 	toolkit/xre/nsXREDirProvider.cpp:557
30 	xul.dll 	nsXREDirProvider::DoStartup 	toolkit/xre/nsXREDirProvider.cpp:734
31 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3429
32 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:107
33 	firefox.exe 	firefox.exe@0x4033 	
34 	firefox.exe 	__tmainCRTStartup 	crtexe.c:594
35 	firefox.exe 	_SEH_epilog4 	
36 	kernel32.dll 	kernel32.dll@0x51113 	
37 	ntdll.dll 	__RtlUserThreadStart 	
38 	kernel32.dll 	kernel32.dll@0x62acc 	
39 	kernel32.dll 	kernel32.dll@0x62acc 	
40 	ntdll.dll 	LdrpGetShimEngineInterface 	
41 	ntdll.dll 	_RtlUserThreadStart 	
42 	firefox.exe 	pre_c_init 	crtexe.c:304
43 	firefox.exe 	pre_c_init 	crtexe.c:304
44 		@0x7ffd5fff
It's #121 top crasher in 9.0.1, #14 in 11.0a2, and #31 in 12.0a1.
Component: General → Networking: DNS
Product: Firefox → Core
QA Contact: general → networking.dns
Summary: Firefox startup crash mozalloc_abort → OOM crash in nsIDNService::Init @ mozilla::Preferences::GetBranch
Whiteboard: startupcrash
Crash Signature: [@ mozalloc_abort(char const* const) | mozalloc_handle_oom() | mozilla::Preferences::GetBranch(char const*, nsIPrefBranch**)] → [@ mozalloc_abort(char const* const) | mozalloc_handle_oom() | mozilla::Preferences::GetBranch(char const* nsIPrefBranch**)] [@ mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | mozilla::Preferences::GetBranch(char con…
A user's computer crashed and since then Firefox opens with Crash Reporter ( https://support.mozilla.org/en-US/questions/918384 )

Crash Report -> https://crash-stats.mozilla.com/report/index/9bc3a57b-aa15-4581-80b3-4bb122120210
It's now #3 top browser crasher in 9.0.1 and #25 in 11.0b2.
Keywords: topcrash
Depends on: 733261
Does this have the same regression range as bug 725280?
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #4)
> Does this have the same regression range as bug 725280?
It's the same bug indeed. It's only the signature that has changed since 12.0.
Crash Signature: nsIPrefBranch**)] [@ mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | mozilla::Preferences::GetBranch(char const*, nsIPrefBranch**)] → nsIPrefBranch**)] [@ mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | mozilla::Preferences::GetBranch(char const* nsIPrefBranch**)] [@ mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xreal…
Do we have a nightly regression range for this signature?
Marking this as tracking FF12.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #7)
> Do we have a nightly regression range for this signature?
It was a residual signature in release versions but it started in Nightly from 9.0a1/20110923. The regression range might be:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=4495e1f795c2&tochange=259d1556c221
Keywords: regression
That's a much more useful regression range than bug 725280! The two reasonable candidates here are:

the *backout* of bug 477578 (originally landed 2011-09-13, backed out 2011-09-22)
Bug 687722 - Make swapping two nsAutoTArrays preserve their auto-ness when possible

khuey, I know this was long ago, but do you remember whether bug 477578 was a straight backout or whether there were any fuzz/merge issues?

I don't see any autoarray swapping in the code leading up to this crash, but since we're probably looking at a memory corruption bug with the jemalloc 72-byte sizeclass, this could be a bug in code which runs well before the current stack trace. It's also possible that there is some thread race, although I haven't found a crash report with any other interesting threads running, and I don't think we've even started the I/O thread or the DNS threadpool when this crash occurs.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #10)
> khuey, I know this was long ago, but do you remember whether bug 477578 was
> a straight backout or whether there were any fuzz/merge issues?

I don't recall.  Sorry.
Given that this is not a new regression in Firefox 12, do you still want to track on it? I think there's something interesting going on, but unless this is really one of the high topcrashers I'm not sure it's worth tracking specifically on.
bsmedberg, no I don't see a reason to track it for 12 so I will remove the flag. Looking at the signature summary report for the last week. Most of these are on 9.0.1. It's not really in any significant volume for 10.0.2. I will leave it as a top crash for now.
This has been rising to #10 on Firefox 11.* in yesterday's data.
Crash Signature: nsIPrefBranch**)] [@ mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xrealloc | nsIDNServiceConstructor] → nsIPrefBranch**)] [@ mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xrealloc | nsIDNServiceConstructor] [@ mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | nsIDNServiceConstructor]
It's #158 top crasher in 12.0, #12 in 13.0b6, #17 in 14.0a2.
(In reply to Scoobidiver from comment #15)
> It's #158 top crasher in 12.0, #12 in 13.0b6, #17 in 14.0a2.

The most recent spike does appear new to FF13, but not to 13.0b6 specifically. We need to get URLs and a correlation report asap.
Several of the signatures such as mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | mozilla::Preferences::GetBranch(char const*, nsIPrefBranch**) show no URLs. The uptime is nonexistent so I guess it is not able to grab any kind of URL before the crash happens. Our best next step is to look at module correlations and comments, which I will do next.
Some module correlations for the first signature. I will attach the longer detailed version breakdown in an attachment.

mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | mozilla::Preferences::GetBranch(char const*, nsIPrefBranch**)|EXCEPTION_BREAKPOINT (335 crashes)
     99% (333/335) vs.  73% (32514/44497) browsercomps.dll
    100% (335/335) vs.  74% (33037/44497) firefox.exe
    100% (335/335) vs.  74% (33074/44497) xpcom.dll
    100% (335/335) vs.  75% (33337/44497) dbghelp.dll
     99% (330/335) vs.  74% (33033/44497) softokn3.dll
     32% (106/335) vs.  16% (6984/44497) snxhk.dll
     26% (88/335) vs.  14% (6341/44497) aswJsFlt.dll
     63% (212/335) vs.  54% (24148/44497) comres.dll
     63% (212/335) vs.  55% (24570/44497) ws2help.dll
     63% (212/335) vs.  55% (24573/44497) iphlpapi.dll
      7% (24/335) vs.   0% (114/44497) zvexescn.dll
      7% (24/335) vs.   0% (115/44497) ZVFORT.DLL
Some percentage of the users have Avast installed: 26% (88/335) vs.  14% (6341/44497) aswJsFlt.dll. The problem version seems to be specifically  
21% (70/335) vs.  12% (5284/44497) 7.0.1426.0
I installed Avast 7.0.1426.0 down on a Win XP lab machine, but so far have not been able to generate a crash yet.
This signature seems to have taken a nose-dive since 5/30 and is no longer on the topcrash list for 14.0a2 - can someone take a look and see if something was landed which would possibly have addressed this crash?
With combined signatures, it's #5 top browser crasher in 13.0b7 and #9 in 14.0a2 over the last 3 days.
I know Marcia is already looking into this, but adding qawanted for accuracy.
Keywords: qawanted
http://tinyurl.com/cy53rak is the signature in FF 13 Beta 7 that seems to have the most crashes, and it is a still a startup crash.

Still seems to be correlated to Avast:

mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | mozilla::Preferences::GetBranch(char const*, nsIPrefBranch**)|EXCEPTION_BREAKPOINT (340 crashes)
    100% (340/340) vs.  70% (21303/30293) firefox.exe
    100% (340/340) vs.  70% (21312/30293) xpcom.dll
    100% (340/340) vs.  71% (21532/30293) dbghelp.dll
     89% (304/340) vs.  69% (20929/30293) browsercomps.dll
     36% (123/340) vs.  17% (5032/30293) snxhk.dll
     89% (303/340) vs.  70% (21263/30293) softokn3.dll
     34% (115/340) vs.  15% (4563/30293) aswJsFlt.dll
     39% (133/340) vs.  29% (8821/30293) DWrite.dll
      9% (29/340) vs.   2% (512/30293) protector.dll
      6% (22/340) vs.   1% (370/30293) asoehook.dll
I have tried turning on all different Avast settings, running in Sandboxed mode etc and still haven't generated a crash. Because it is a startup crash there are no URLs to go from so that makes it a bit difficult.
So my understanding is that this is a likely memory corruption bug causing malloc to barf in necko, but not necessarily a necko bug (if the regression range in comment 9 is correct, it's very unlikely to be in necko).   Is it time to ping the authors/reviewers of the bugs in the regression range and ask them to look over their commits for possible pointer errors?  I'm not sure how we handle these sorts of bugs, but I can't think of a better strategy in the absence of steps to repro.
We don't have URLs as this is very early in startup when we don't have even the browser UI loaded yet.
Keywords: needURLs
For 13, the specific signature that showed up in the betas and 13 final is: mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | mozilla::Preferences::GetBranch(char const*, nsIPrefBranch**). http://tinyurl.com/7a6v6qo is the link to those crashes.

From what I can gather from the signature summary data, this signature has dropped off in volume a bit since Beta 7. Currently a 13 specific query is showing 300 crashes.
jlebar, I'm pretty certain now that this is a memory corruption bug specific to the 72-byte jemalloc size class. I'm wondering if there is release-mode checking we can add *just* for this sizeclass that we could get into Fx14 to get a better handle on this? For example sanity-checking free() calls more aggressively, or memory-poisoning the data just in this sizeclass to detect the location of the double-free or read/write-after-free? If you're not the jemalloc expert to ask, please let me know who is ;-)
FF 13 query shows 894 crashes in the last week in this stack in mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | mozilla::Preferences::GetBranch(char const*, nsIPrefBranch**) signature.
> For example sanity-checking free() calls more aggressively, or memory-poisoning the data just in 
> this sizeclass to detect the location of the double-free or read/write-after-free?

I think we could probably do that.  I'll get back to you in a few hours, once I have a look through the code.
I've landed a patch on m-i to add these assertions.  I can land on aurora/beta once I get approval.

No guarantees that it'll work, but it's worth a shot...
I know there's nothing actionable right now, but it appears you're heading up the engineering investigation Benjamin (hopefully bug 764192 helps). Sending over to you.
Assignee: nobody → benjamin
This sounds a little like the native Fennec crashes in bug 759674, bug 759675 and bug 759680.  Those crashes look to be caused by smallish mallocs (500 or 2000 bytes) failing.  Of course, it is mobile, so maybe they are legitimately failing, but it is odd we're seeing a ton of crashes right in XPCWrappedNativeScope creation, but not elsewhere.
Mozilla/5.0 (Windows NT 6.1; rv:13.0) Gecko/20100101 Firefox/13.0

Also tried to reproduce this on Avast anti virus installed (Free-30 days protection). Unsuccessfull so far. Changed some settings related to browser protection, blocked some URLs, played a bit with the third party add-on installed with Avast and upgraded from 12 to 13. No crash.
This may be fixed by bug 766173.
Depends on: 766173
Alex asked me to follow up with rank - this crash still appears in 13.0.1 crash data in the #31 spot with 1248 crashes. Now that Bug 766173 landed we should keep an eye on Aurora data.
I believe this signature morphed into bug 718575 on aurora and possibly trunk.
Is there still any need for the qawanted keyword here? Any new information which we can use to reproduce? I could not reproduce in comment 36 with the Avast antivirus installed.
Removing the qawanted keyword for the moment, as there is no new information based on which we could reproduce.
Keywords: qawanted
We shipped FF13 with this crash, and sadly do not have more actionable data at this time. Wontfixing for FF14.
There are no crashes in 15.0.1 and above.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: