Closed Bug 736522 Opened 10 years ago Closed 3 years ago

OOM crash in ReadCookieDBListener::HandleResult

Categories

(Core :: General, defect)

14 Branch
ARM
Android
defect
Not set
critical

Tracking

()

RESOLVED WONTFIX
Tracking Status
blocking-fennec1.0 --- -
fennec - ---

People

(Reporter: scoobidiver, Unassigned)

References

Details

(Keywords: crash, regression, Whiteboard: [native-crash][startupcrash])

Crash Data

It first appeared in 14.0a1/20120315064903. The regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=c71845b3b2a6&tochange=082d016c341f

Signature 	TouchBadMemory More Reports Search
UUID	f962722d-cbef-4e71-8d7b-03bb12120316
Date Processed	2012-03-16 15:04:04
Uptime	4
Last Crash	1.4 minutes before submission
Install Age	3.3 minutes since version was first installed.
Install Time	2012-03-16 14:58:33
Product	FennecAndroid
Version	14.0a1
Build ID	20120316031151
Release Channel	nightly
OS	Linux
OS Version	0.0.0 Linux 2.6.35.7-g98a0a06 #1 SMP PREEMPT Sat Dec 17 10:33:25 PST 2011 armv7l
Build Architecture	arm
Build Architecture Info	
Crash Reason	SIGSEGV
Crash Address	0x0
App Notes 	
EGL? EGL+ AdapterVendorID: mapphone_umts, AdapterDeviceID: MB865.
AdapterDescription: 'Android, Model: 'MB865', Product: 'edison_att_us', Manufacturer: 'motorola', Hardware: 'mapphone_umts''.
GL Context? GL Context+ GL Layers? GL Layers- 
motorola MB865
MOTO/edison_att_us/edison:2.3.6/5.5.1-175_EDMR1.25/5.51.175.25:user/release-keys

Frame 	Module 	Signature [Expand] 	Source
0 	libmozalloc.so 	TouchBadMemory 	memory/mozalloc/mozalloc_abort.cpp:68
1 	libmozalloc.so 	mozalloc_abort 	memory/mozalloc/mozalloc_abort.cpp:89
2 	libmozalloc.so 	moz_xmalloc 	memory/mozalloc/mozalloc.cpp:105
3 	libxul.so 	nsCookie::Create 	mozalloc.h:229
4 	libxul.so 	ReadCookieDBListener::HandleResult 	netwerk/cookie/nsCookieService.cpp:1893
5 	libxul.so 	mozilla::storage::::CallbackResultNotifier::Run 	storage/src/mozStorageAsyncStatementExecution.cpp:109
6 	libxul.so 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:657
7 	libxul.so 	NS_ProcessNextEvent_P 	obj-firefox/xpcom/build/nsThreadUtils.cpp:245
8 	libxul.so 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:134
9 	libxul.so 	MessageLoop::RunInternal 	ipc/chromium/src/base/message_loop.cc:208
10 	libxul.so 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:201
11 	libxul.so 	nsBaseAppShell::Run 	widget/xpwidgets/nsBaseAppShell.cpp:189
12 	libxul.so 	nsAppStartup::Run 	toolkit/components/startup/nsAppStartup.cpp:295
13 	libxul.so 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3703
14 	libxul.so 	GeckoStart 	toolkit/xre/nsAndroidStartup.cpp:109
15 	libmozglue.so 	__res_nsend 	other-licenses/android/res_send.c:1086
...

More reports at:
https://crash-stats.mozilla.com/report/list?product=FennecAndroid&signature=TouchBadMemory
Blocks: 719373
Crash Signature: [@ TouchBadMemory] → [@ TouchBadMemory] [@ TouchBadMemory | mozalloc_abort | moz_xmalloc | nsCookie::Create]
Crash Signature: [@ TouchBadMemory] [@ TouchBadMemory | mozalloc_abort | moz_xmalloc | nsCookie::Create] → [@ TouchBadMemory] [@ TouchBadMemory | mozalloc_abort | moz_xmalloc | nsCookie::Create ]
Whiteboard: [native-crash] → [native-crash][startupcrash]
8th top crash in 3 day list for fennec (since other crashes have been fixed)
Keywords: topcrash
It's currently #4 top crasher in 14.0a2 over the last 3 days.
blocking-fennec1.0: --- → ?
Assignee: nobody → joshmoz
blocking-fennec1.0: ? → +
Mark - did you mean to assign this to me?
Jason - you are the only networking person I spotted in the regression range. Thoughts?
Assignee: joshmoz → nobody
All my changesets in the regression range are just the same patch that got landed and backed out twice, so no changes from me here.

This is an OOM error:  the cookie code is calling moz_xmalloc and it's failing and calling mozalloc_handle_oom, which aborts. 

hg log -r a0df03570b26:082d016c341f netwerk/cookie is not showing me any changes to the cookie code in the revision interval.  So I suspect that the cookie code winds up being the malloc that hits OOM, but the culprit is something else:  either some other code in the revision range that's eating more RAM, or possibly even just a change in fennec users' browsing histories that's causing them to hit OOM more often.   IIRC we put limits on how long cookies can be, so it seems unlikely (?) that we're actually allocating so much cookie memory that it's really to blame here.

Sorry not to be of more help.  Do we have any sort of tracking of memory usage with commit history, or memory usage in crash reports?  That could certainly be useful here.
Component: Networking: Cookies → General
QA Contact: networking.cookies → general
Maybe here http://hg.mozilla.org/mozilla-central/annotate/a0356446a56a/netwerk/cookie/nsCookieService.cpp#l1872 the string are filled wrong (maybe the DB is corrupted and the object in the row has a wrong state?).  It seems like nsCookie::Create cannot allocate because the sum of string lengths is some 4GB or so.
Honza:  so we grab a really big value for 'name' and/or 'value', and when 

Well, Honza's theory explains why the failure would happen so often at this particular line of code, so that's promising.

  hg log -r a0df03570b26:082d016c341f db/sqlite3 is not showing any changes in the regression range.   Do we know of any other bugs that might be corrupting sqlite data?
meh--hit return too fast.  First sentence should have been:

So we grab a really big value for 'name' and/or 'value', and then the nsCookie::Create trying to allocate the cookie with those values?  This would require that the name/value is not quite big enough to OOM until it's allocated twice?
(In reply to Jason Duell (:jduell) from comment #5)
> Sorry not to be of more help.  Do we have any sort of tracking of memory
> usage with commit history, or memory usage in crash reports?  That could
> certainly be useful here.
OOM Allocation Size is around 100. See https://crash-stats.mozilla.com/report/list?product=FennecAndroid&signature=TouchBadMemory%20|%20mozalloc_abort%20|%20moz_xmalloc%20|%20nsCookie%3A%3ACreate
(In reply to Josh Aas (Mozilla Corporation) from comment #3)
> Mark - did you mean to assign this to me?

More of a "ping" for you. Mobile beta triage is focusing on crashes and stability.
Looking at the last 25 crashes, nearly all are Motorola hardware.

This is now #3 crash over the past 3 days on 14.0a2
One from a XPERIA pro a.k.a. MK16a FWIW

bp-590b4185-c4c1-40a0-bc0b-7c6142120505
Alex, did you crash prior to this crash?  If so can you place the link for the crash here as well?


On a side note, since the crashes are startups they don't provide useful URL links unforunately...
URLS:
about:empty
about:crashes
about:home
The crash right before these two above was reported as bug 752229 FWIW.
I can cause a OOM which doesn't bring up the crash reporter, I can't seem to crash with this specific crash report.

1. go to http://myownplayground.atspace.com/cookietest.html
2. in the field place in è
3. in the next field place in 30000
4. hit the execute button

Expected: to catch the bad cookie/oom and prevent it from crashing/operating
Actual: OOM crash w/o crash reporter.
Keywords: reproducible
Brian - Can you try to get this to happen in a debugger?
Assignee: nobody → bnicholson
Summary: crash in ReadCookieDBListener::HandleResult @ TouchBadMemory → OOM crash in ReadCookieDBListener::HandleResult
The steps in comment 16 freeze up the browser and cause OOM crashes, but these don't result in the same stacktrace in this bug.

Since the allocation size is ~100, I think this means something else is consuming the RAM, and it's probably not a corrupt DB as suggested in comment 6.
Keywords: reproducibleqawanted
Jason - Can you look at comment 16 and make any head way on this?
Assignee: bnicholson → jduell.mcbugs
I've got a head cold and my android device is in the shop with a cracked screen (waiting for part to arrive by mail), so I'm not going to get to this this in the next few days at least.  Honza, can you take it?
Assignee: jduell.mcbugs → honzab.moz
Assignee: honzab.moz → joshmoz
Let's take a look at the crash reporter and see if we can get to anywhere new. Right now this is not clear as to how to reproduce.
I don't see this showing up in b3.  Fixed by something else like Vlad or Chris' patches last week?  Coincidental that it started around the maple merge?
(In reply to JP Rosevear [:jpr] from comment #22)
> I don't see this showing up in b3.
I see currently 34 crashes in 14.0b3 (0.7% of crashes): https://crash-stats.mozilla.com/report/list?signature=TouchBadMemory+|+mozalloc_abort+|+moz_xmalloc+|+nsCookie%3A%3ACreate
(In reply to Scoobidiver from comment #23)
> (In reply to JP Rosevear [:jpr] from comment #22)
> > I don't see this showing up in b3.
> I see currently 34 crashes in 14.0b3 (0.7% of crashes):
> https://crash-stats.mozilla.com/report/
> list?signature=TouchBadMemory+|+mozalloc_abort+|+moz_xmalloc+|+nsCookie%3A%3A
> Create

Oops, indeed.
We have exhausted manual steps to try and repro this. Will continue to watch for changes to this bug that could help us with repro steps.
Keywords: qawanted
Assignee: joshmoz → jduell.mcbugs
Jason, we need an assessment in the next 24 hours of how bad this is and whether we should be blocking on it.
> we need an assessment in the next 24 hours of how bad this is

I'm not the right person to give that assessment.  I assume comment 25 and/or comment 23 give the most recent data on how prevalent it is.

Update: my plan was to look into whether we can detect an overly large cookie and drop it on the floor rather than hit OOM.  But as comment 18 points out, the allocation size in the crash reports is actually small (generally around 100 bytes), so too large a cookie is not our issue.  I talked with bsmedberg about this a bit on IRC, and he suspects that memory corruption (rather than actually being OOM) is causing the allocator to fail, similarly to bug 709860.  

Sorry to pass the hot potato, but without steps to reproduce I have no idea how to get any traction on this :(
Assignee: jduell.mcbugs → nobody
Malloc is returning null when asked to allocate ~100 bytes.  I'm not clear on how bad/corrupt cookie values could be responsible for that.  It could certainly be a bug in the cookie code somewhere (though it hasn't changed very much IIRC recently).  But it could also be literally any other code in the browser with a bad pointer access.
#9 top crash, no STR and no traction with the assigned developer so renom'ing
blocking-fennec1.0: + → ?
tracking-fennec: --- → 15+
blocking-fennec1.0: ? → -
device : 
Sony Ericsson X10S
HTC Desire HD A9191 	23
Sony Ericsson LT18i 	21
HTC ADR6300 	19
Sony Ericsson R800a 	18
HTC ADR6350 	14
Sony Ericsson R800x 	10
HTC Glacier 	9
HTC Desire S 	9
Sony Ericsson ST15i 	6
Sony Ericsson WT19i 	6
HTC ADR6400L 	6
Sony Ericsson ST18i 	5
Sony Ericsson R800i 	4
Sony Ericsson LT26i 	4
Sony Ericsson MT15a 	4
Sony Ericsson SK17i 	4
Sony Ericsson MT11i 	4
HTC Desire HD 	3
Sony Ericsson SK17a 	3
Sony Ericsson MT11a 	3
Sony Ericsson MK16i 	3
HTC Sensation XL with Beats Audio X315e 	3
Samsung GT-N7000 	2
Sony Ericsson SO-01C 	2
HTC PH44100 	2
HTC Incredible S 	2
Sony Ericsson ST17i 	1
Sony Ericsson WT19a 	1
Sony Ericsson ST18a 	1
Sony Ericsson LT15a 	1
Samsung GT-I9000 	1
Samsung GT-I9100 	1
HTC T-Mobile G2 	1
HTC PC36100 	1
HTC ADR6330VW 	1
Samsung SCH-I405 	1
Samsung SGH-I717 	1
Sony Ericsson LT18a 	1
Sony Ericsson LT15i 	1
Samsung SPH-M930BST 	1
Samsung SGH-I727 	1
Sony Ericsson MT15i
There are only 7 crashes in 14.0b7. The crash volume becomes lower build after build probably because users that hit this crash can't start Fennec so they can't update it.
Keywords: topcrash
It's only #71 top crasher in 14.0.
tracking-fennec: 15+ → -
Closing because no crash reported since 12 weeks.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
Closing because no crash reported since 12 weeks.
You need to log in before you can comment on or make changes to this bug.