Last Comment Bug 761249 - abort if map creation fails during XPCWrappedNativeScope creation
: abort if map creation fails during XPCWrappedNativeScope creation
Status: RESOLVED FIXED
[MemShrink]
:
Product: Core
Classification: Components
Component: XPConnect (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla16
Assigned To: Andrew McCreight (PTO-ish through 6-29) [:mccr8]
:
Mentors:
Depends on:
Blocks: 759674 759675 759680
  Show dependency treegraph
 
Reported: 2012-06-04 11:35 PDT by Andrew McCreight (PTO-ish through 6-29) [:mccr8]
Modified: 2012-06-18 11:35 PDT (History)
3 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
fixed
fixed
-


Attachments
hacky WIP at trying to recover from hashtable allocation failure (7.99 KB, patch)
2012-06-04 11:52 PDT, Andrew McCreight (PTO-ish through 6-29) [:mccr8]
no flags Details | Diff | Review
abort if allocation of ClassInfo2WrappedNativeProtoMap or Native2WrappedNativeMap fails (1.51 KB, patch)
2012-06-04 17:27 PDT, Andrew McCreight (PTO-ish through 6-29) [:mccr8]
bobbyholley: review+
akeybl: approval‑mozilla‑aurora+
Details | Diff | Review

Description Andrew McCreight (PTO-ish through 6-29) [:mccr8] 2012-06-04 11:35:35 PDT
Bug 759680, bug 759675, and bug 759674 appear to be caused by ClassInfo2WrappedNativeProtoMap and Native2WrappedNativeMap ending up as NULL.  I think this can only happen in the constructor for XPCWrappedNativeScope, if JS_NewDHashTable fails, causing their respective newMap to return NULL.  I think the only reason that will happen for these hash tables is if malloc fails.

bholley suggested that we could turn these into infallible allocations.  This won't solve the crashes, but I guess we'll at least see the state of the browser when things go awry.  It will also be a little safer because we won't end up running the browser in a weird state, but any places we might fail are probably fairly controlled, because it is just a NULL pointer.

Another approach would be to try to propagate these failures out a bit, and hope that if we do that enough, we'll get to some point where the failure can be handled in a sane way without blowing up the browser.  For instance, if you open a tab, maybe opening the tab just fails rather than just losing everything.

I've started looking at the latter approach, and it isn't too hard to propogate it outwards to a failure in XPCWrappedNativeScope::GetNewOrUsed.  There are three uses of that, and only one place doesn't check for null.  I added a check there, and return a failure, but unfortunately that does not make everything magically happy, so I'll have to investigate that some more.
Comment 1 Bobby Holley (busy) 2012-06-04 11:39:21 PDT
These aren't huge allocations though - we're talking a few hundred bytes, I'd think. It seems like if we're OOMing, the browser is just about to fall flat on its face.
Comment 2 Andrew McCreight (PTO-ish through 6-29) [:mccr8] 2012-06-04 11:45:04 PDT
(In reply to Bobby Holley (:bholley) from comment #1)
> These aren't huge allocations though - we're talking a few hundred bytes,
> I'd think. It seems like if we're OOMing, the browser is just about to fall
> flat on its face.

Yeah, definitely.  I do find it very odd, though, that we have two top ten mobile crashes from apparent OOMs during creation of XPCWrappedNativeScope, and not scattered around in a bunch of other places near allocations, especially given how small these apparently are (there are 16 and 64 entries to start with for these hash tables).  Maybe my diagnosis is wrong.

Another idea I had was we could tweak the size of these hash tables, but as you say, we're probably doomed already if that's what is happening here.
Comment 3 Andrew McCreight (PTO-ish through 6-29) [:mccr8] 2012-06-04 11:45:31 PDT
(This isn't really a MemShrink issue, but I tagged it so I can remember to bring it up during the meeting.)
Comment 4 Andrew McCreight (PTO-ish through 6-29) [:mccr8] 2012-06-04 11:52:42 PDT
Created attachment 629867 [details] [diff] [review]
hacky WIP at trying to recover from hashtable allocation failure
Comment 5 Andrew McCreight (PTO-ish through 6-29) [:mccr8] 2012-06-04 14:55:42 PDT
Per Bobby's suggestion, I'll try out infallible allocation and then we can see how that goes from there.

On my 65-bit OSX system, Native2WrappedNativeMap is 2104 bytes and  ClassInfo2WrappedNativeProtoMap is 456 bytes.
Comment 6 Andrew McCreight (PTO-ish through 6-29) [:mccr8] 2012-06-04 14:56:28 PDT
(In reply to Andrew McCreight [:mccr8] from comment #5)
> On my 65-bit OSX system,

My machine has an extra bit for awesomeness...
Comment 7 Andrew McCreight (PTO-ish through 6-29) [:mccr8] 2012-06-04 17:27:52 PDT
Created attachment 630007 [details] [diff] [review]
abort if allocation of ClassInfo2WrappedNativeProtoMap or Native2WrappedNativeMap fails

I didn't change the allocations to real infallible mallocs, because that's deep in JSDHash, which I obviously don't want to change.  Instead, I use the existing failing-catching path to do a runtime abort.

https://tbpl.mozilla.org/?tree=Try&rev=9e394ec6678d
Comment 8 Bobby Holley (busy) 2012-06-05 01:27:10 PDT
Comment on attachment 630007 [details] [diff] [review]
abort if allocation of ClassInfo2WrappedNativeProtoMap or Native2WrappedNativeMap fails

Please add a comment explaining why we're doing this.
Comment 9 Andrew McCreight (PTO-ish through 6-29) [:mccr8] 2012-06-05 11:49:20 PDT
Thanks for the review.  I added a comment.  We can leave any sort of nicer handling of these problems to a followup.

https://hg.mozilla.org/integration/mozilla-inbound/rev/d9b3c4577c5a
Comment 10 Ed Morley [:emorley] 2012-06-06 08:42:50 PDT
https://hg.mozilla.org/mozilla-central/rev/d9b3c4577c5a
Comment 11 Andrew McCreight (PTO-ish through 6-29) [:mccr8] 2012-06-13 09:06:39 PDT
Comment on attachment 630007 [details] [diff] [review]
abort if allocation of ClassInfo2WrappedNativeProtoMap or Native2WrappedNativeMap fails

I guess we should land this where the crashes are happening.  I didn't see many on Nightly when I looked.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): This may help ferret out some weird Fennec Native top crashes by making them crash closer to where things go bad.
User impact if declined: well, it will impede crash investigation.
Testing completed (on m-c, etc.): its been on m-c for about a week.
Risk to taking this patch (and alternatives if risky): low.  If we reach this code, we're doomed anyways.
String or UUID changes made by this patch: none.
Comment 12 Alex Keybl [:akeybl] 2012-06-15 15:30:05 PDT
Comment on attachment 630007 [details] [diff] [review]
abort if allocation of ClassInfo2WrappedNativeProtoMap or Native2WrappedNativeMap fails

Seeing if we want this for FN14/14.0.1, but already approving for Aurora.
Comment 13 Andrew McCreight (PTO-ish through 6-29) [:mccr8] 2012-06-15 15:35:01 PDT
https://hg.mozilla.org/releases/mozilla-aurora/rev/dc2a16b97890
Comment 14 Andrew McCreight (PTO-ish through 6-29) [:mccr8] 2012-06-17 08:51:20 PDT
Comment on attachment 630007 [details] [diff] [review]
abort if allocation of ClassInfo2WrappedNativeProtoMap or Native2WrappedNativeMap fails

scoobidiver says these crashes have gone away in 14.0b7, so no need to land this patch in beta.
Comment 15 Brad Lassey [:blassey] (use needinfo?) 2012-06-18 11:35:39 PDT
Already in Aurora, which is all we want

Note You need to log in before you can comment on or make changes to this bug.