Bug 759680, bug 759675, and bug 759674 appear to be caused by ClassInfo2WrappedNativeProtoMap and Native2WrappedNativeMap ending up as NULL. I think this can only happen in the constructor for XPCWrappedNativeScope, if JS_NewDHashTable fails, causing their respective newMap to return NULL. I think the only reason that will happen for these hash tables is if malloc fails. bholley suggested that we could turn these into infallible allocations. This won't solve the crashes, but I guess we'll at least see the state of the browser when things go awry. It will also be a little safer because we won't end up running the browser in a weird state, but any places we might fail are probably fairly controlled, because it is just a NULL pointer. Another approach would be to try to propagate these failures out a bit, and hope that if we do that enough, we'll get to some point where the failure can be handled in a sane way without blowing up the browser. For instance, if you open a tab, maybe opening the tab just fails rather than just losing everything. I've started looking at the latter approach, and it isn't too hard to propogate it outwards to a failure in XPCWrappedNativeScope::GetNewOrUsed. There are three uses of that, and only one place doesn't check for null. I added a check there, and return a failure, but unfortunately that does not make everything magically happy, so I'll have to investigate that some more.
These aren't huge allocations though - we're talking a few hundred bytes, I'd think. It seems like if we're OOMing, the browser is just about to fall flat on its face.
(In reply to Bobby Holley (:bholley) from comment #1) > These aren't huge allocations though - we're talking a few hundred bytes, > I'd think. It seems like if we're OOMing, the browser is just about to fall > flat on its face. Yeah, definitely. I do find it very odd, though, that we have two top ten mobile crashes from apparent OOMs during creation of XPCWrappedNativeScope, and not scattered around in a bunch of other places near allocations, especially given how small these apparently are (there are 16 and 64 entries to start with for these hash tables). Maybe my diagnosis is wrong. Another idea I had was we could tweak the size of these hash tables, but as you say, we're probably doomed already if that's what is happening here.
(This isn't really a MemShrink issue, but I tagged it so I can remember to bring it up during the meeting.)
Created attachment 629867 [details] [diff] [review] hacky WIP at trying to recover from hashtable allocation failure
Per Bobby's suggestion, I'll try out infallible allocation and then we can see how that goes from there. On my 65-bit OSX system, Native2WrappedNativeMap is 2104 bytes and ClassInfo2WrappedNativeProtoMap is 456 bytes.
(In reply to Andrew McCreight [:mccr8] from comment #5) > On my 65-bit OSX system, My machine has an extra bit for awesomeness...
Created attachment 630007 [details] [diff] [review] abort if allocation of ClassInfo2WrappedNativeProtoMap or Native2WrappedNativeMap fails I didn't change the allocations to real infallible mallocs, because that's deep in JSDHash, which I obviously don't want to change. Instead, I use the existing failing-catching path to do a runtime abort. https://tbpl.mozilla.org/?tree=Try&rev=9e394ec6678d
Comment on attachment 630007 [details] [diff] [review] abort if allocation of ClassInfo2WrappedNativeProtoMap or Native2WrappedNativeMap fails Please add a comment explaining why we're doing this.
Thanks for the review. I added a comment. We can leave any sort of nicer handling of these problems to a followup. https://hg.mozilla.org/integration/mozilla-inbound/rev/d9b3c4577c5a
Comment on attachment 630007 [details] [diff] [review] abort if allocation of ClassInfo2WrappedNativeProtoMap or Native2WrappedNativeMap fails I guess we should land this where the crashes are happening. I didn't see many on Nightly when I looked. [Approval Request Comment] Bug caused by (feature/regressing bug #): This may help ferret out some weird Fennec Native top crashes by making them crash closer to where things go bad. User impact if declined: well, it will impede crash investigation. Testing completed (on m-c, etc.): its been on m-c for about a week. Risk to taking this patch (and alternatives if risky): low. If we reach this code, we're doomed anyways. String or UUID changes made by this patch: none.
Comment on attachment 630007 [details] [diff] [review] abort if allocation of ClassInfo2WrappedNativeProtoMap or Native2WrappedNativeMap fails Seeing if we want this for FN14/14.0.1, but already approving for Aurora.
Comment on attachment 630007 [details] [diff] [review] abort if allocation of ClassInfo2WrappedNativeProtoMap or Native2WrappedNativeMap fails scoobidiver says these crashes have gone away in 14.0b7, so no need to land this patch in beta.
Already in Aurora, which is all we want