Closed Bug 1037214 Opened 5 years ago Closed 5 years ago
crash in mozalloc
_abort(char const* const) | NS _Debug Break | mozilla::dom::Fragment Or Element::Get Text Content Internal(ns AString _internal&)
This bug was filed from the Socorro interface and is report bp-8f3bc16c-96cb-45a9-90b7-be9ef2140704. ============================================================= 0 mozalloc.dll mozalloc_abort(char const * const) memory/mozalloc/mozalloc_abort.cpp 1 xul.dll NS_DebugBreak xpcom/base/nsDebugImpl.cpp 2 xul.dll mozilla::dom::FragmentOrElement::GetTextContentInternal(nsAString_internal &) content/base/src/FragmentOrElement.cpp 3 xul.dll mozilla::dom::NodeBinding::get_textContent obj-firefox/dom/bindings/NodeBinding.cpp 4 xul.dll mozilla::dom::GenericBindingGetter(JSContext *,unsigned int,JS::Value *) dom/bindings/BindingUtils.cpp 5 mozjs.dll js::Invoke(JSContext *,JS::CallArgs,js::MaybeConstruct) js/src/vm/Interpreter.cpp 6 xul.dll xpc::XrayWrapper<js::CrossCompartmentWrapper,xpc::DOMXrayTraits>::getPropertyDescriptor(JSContext *,JS::Handle<JSObject *>,JS::Handle<jsid>,JS::MutableHandle<JSPropertyDescriptor>) js/xpconnect/wrappers/XrayWrapper.cpp 7 xul.dll xpc::XrayWrapper<js::CrossCompartmentWrapper,xpc::XPCWrappedNativeXrayTraits>::get(JSContext *,JS::Handle<JSObject *>,JS::Handle<JSObject *>,JS::Handle<jsid>,JS::MutableHandle<JS::Value>) js/xpconnect/wrappers/XrayWrapper.cpp 8 mozjs.dll js::jit::DoGetPropFallback js/src/jit/BaselineIC.cpp 9 @0x81000000 ============================================================= More reports: https://crash-stats.mozilla.com/report/list?signature=mozalloc_abort%28char%20const*%20const%29%20|%20NS_DebugBreak%20|%20mozilla%3A%3Adom%3A%3AFragmentOrElement%3A%3AGetTextContentInternal%28nsAString_internal%26%29 This looks like a new crash in Firefox 31. I'm only seeing reports for Firefox 31 Beta, 32 Aurora, and 33 Nightly. Most of these crashes appear to be on Windows 7 32-bit. Two comments mention opening a new tab. The top URL is facebook.com but that might be a red herring. It's not really high volume right now (it doesn't even show up in the top 300). I'm only reporting it because it would seem to be a Firefox 31 regression and has a fairly high explosiveness rating. https://crash-analysis.mozilla.com/rkaiser/2014-07-09/2014-07-09.firefox.31.explosiveness.html
The code is: if(!nsContentUtils::GetNodeTextContent(this, true, aTextContent)) NS_RUNTIMEABORT("OOM"); and we're hitting the NS_RUNTIMEABORT. We used to just silently return the wrong textContent instead, before the changes in bug 950076. We should probably throw OOM to script here instead of aborting the process.
Hmm, outside of the actual fix here, if this is an OOM case, we should see to crash with an abort signature if we crash at all. I see this will probably fixed to not actually crash (yay) but do we have a similar pattern elsewhere (doing an OOM abort that doesn't get OOM signatures)? If so, we should see to change those places (ask bsmedberg what to do so we correctly detect the OOM in crash-stats).
It might be worthwhile discussing this with :jst, since he suggested to abort instead of returned things in a broken state in the first place ( https://bugzilla.mozilla.org/show_bug.cgi?id=950076#c22 ) (In reply to Boris Zbarsky [:bz] from comment #1) > The code is: > > if(!nsContentUtils::GetNodeTextContent(this, true, aTextContent)) > NS_RUNTIMEABORT("OOM"); > > and we're hitting the NS_RUNTIMEABORT. > > We used to just silently return the wrong textContent instead, before the > changes in bug 950076. > > We should probably throw OOM to script here instead of aborting the process.
We can try, but jst is on vacation for a bit here.
Actually, I just looked at what jst suggested. He suggested aborting in the nsXBLPrototypeBinding code, not in general, in bug 950076 comment 22. In cases when it's being called with script on the stack, as here, the right thing is to throw an exception.
Flags: needinfo?(jst) → needinfo?(alessio.placitelli)
I see. I'm sorry, I completely misunderstood :jst's suggestion. I can take care of the fix if you wish! What do you mean by throwing OOM to the script? Can you provide me with an example? Robert, the error propagation part of the patch in bug 950076 (https://bugzilla.mozilla.org/attachment.cgi?id=8394385) adds a NS_RUNTIMEABORT("OOM") in a couple more places. Should I ping :bsmedberg to understand what to do so we correctly detect the OOM in crash-stats?
(In reply to Alessio Placitelli [:Dexter] from comment #6) > Robert, the error propagation part of the patch in bug 950076 > (https://bugzilla.mozilla.org/attachment.cgi?id=8394385) adds a > NS_RUNTIMEABORT("OOM") in a couple more places. Should I ping :bsmedberg to > understand what to do so we correctly detect the OOM in crash-stats? Yes, I think something should be done there that actually 1) reports some OOM signature, 2) reports the size of the failing allocation. That said, Benjamin is out atm, but :dmajor should be able to help as well. David?
> What do you mean by throwing OOM to the script? Can you provide me with an example? Sure. You mark the attrbute as [Throws] in Node.webidl (it's already SetterThrows, which you'll want to remove; [Throws] means both getter and setter throw). Then you add an ErrorResult& argument to nsINode::GetTextContent and nsINode::GetTextContentInternal. In the latter, you do aResult.Throw(NS_ERROR_OUT_OF_MEMORY) in the failure case. And then you fix any compile errors that arise from internal callers who need to pass in an ErrorResult now.
Thanks Boris and Robert for your replies. @Boris, did you mean something along the line of this patch? @Robert, perfect, I'll wait for :dmajor to come by and leave a comment then :)
Comment on attachment 8459301 [details] [diff] [review] bug_1037214.patch Yep, though you don't need the "mozilla::" prefix in nsGenericHTMLElement::GetItemValueText. r=me
I've only skimmed this bug and 950076, but I think the concern from the crash-stats team is about making sure we can properly categorize these in crash reports. Generally the options for error handling are, from best to worst: 1. Handle the OOM gracefully and safely 2. Crash very close to the failure, with an OOM signature and a size 3. Crash in some other way These NS_RUNTIMEABORT("OOM") are a case of #3. An improvement would be NS_ABORT_OOM(size) but only if you know the size of the failed allocation. If you have a pattern where the code doing the null-check is several stack frames away from the allocator, then your size information is probably lost. Or, said another way: the decision to crash should be made at a point where you still know the size. Any code above that should pass down its preference as a fallible_t parameter.
Thanks for the review Boris. As for the "mozilla::" prefix, I just kept it to be locally consistent with the one in nsGenericHTMLElement::SetItemValueText. Do you think it would be better to remove it (from nsGenericHTMLElement::GetItemValueText ?). (In reply to Boris Zbarsky [:bz] from comment #10) > Comment on attachment 8459301 [details] [diff] [review] > bug_1037214.patch > > Yep, though you don't need the "mozilla::" prefix in > nsGenericHTMLElement::GetItemValueText. > > r=me
ni(bz) for comment 12. Does Alessio's patch address your concerns in comment 11, David?
Assignee: nobody → alessio.placitelli
Probably not, since there are other GetNodeTextContent callers not affected by this patch that abort if it fails.
Alessio, please post a link to a try run and if it's all green, set the checkin-needed keyword.
(In reply to Andrew Overholt [:overholt] from comment #13) > Does Alessio's patch address your concerns in comment 11, David? This patch is an improvement over the current tree, in that it removes one NS_RUNTIMEABORT("OOM"). I don't like that there are still a bunch of other aborts from bug 950076 still remaining. Those other aborts should either be handled or replaced with NS_ABORT_OOM(size). If that's better done in another bug, I'd be happy to open it. Or if you tell me that the others are so rare that I'll never see them in reports, I'd be fine with that as well.
(In reply to Andrew Overholt [:overholt] from comment #15) > Alessio, please post a link to a try run and if it's all green, set the > checkin-needed keyword. Andrew, I'm afraid there's still work that needs to be done!
How can I know if there's a script on the stack or not? Or, in other words, is there any other place where the applied by the attached patch would be useful? Unfortunately I'm pretty much new in this area, so I'm not quite sure what's the best way to tackle the other |NS_RUNTIMEABORT("OOM")| without introducing other bugs. Does anybody have any suggestion? Would it make sense to have two versions of |nsContentUtils::GetNodeTextContent|, one which is fallible and the other which is not? (i.e., we need to add the infallible version).
> How can I know if there's a script on the stack or not? I don't think you should care. The relevant question is more "is there a sane thing the caller can do if I return a failure here?" Script can: it can throw. For other callers you have to look at it on a case-by-case basis. > I'm not quite sure what's the best way to tackle the other |NS_RUNTIMEABORT("OOM")| I would claim the best way is in a separate bug. .textContent is likely to be far and away the most likely way this code is called and the most likely way for it to end up with a large string that triggers OOM. > Would it make sense to have two versions of |nsContentUtils::GetNodeTextContent| That really depends on the callers. We should look through them and decide what makes sense for them, but in a separate bug.
> I just kept it to be locally consistent with the one in > nsGenericHTMLElement::SetItemValueText I don't think the prefix should be there either.
I did a try push without that prefix (but on top of some of my local patches that I also wanted to test) at https://tbpl.mozilla.org/?tree=Try&rev=5a9308fe2b00
Looks like the patch needs to also fix the caller in accessible/mac/mozHTMLAccessible.mm
This takes care of the problem in accessible/mac/mozHTMLAccessible.mm. This is a new try run, I just enabled the tests for Mac OSX since the others were all green in the previous test run and I just modified Mac code here: https://tbpl.mozilla.org/?tree=Try&rev=adb15dd886ce
Attachment #8459301 - Attachment is obsolete: true
(In reply to On vacation Aug 5-18. Please do not request review. from comment #19) > > I'm not quite sure what's the best way to tackle the other |NS_RUNTIMEABORT("OOM")| > > I would claim the best way is in a separate bug. Thanks, I've just opened a new bug to discuss about how to fix the other aborts (Bug 1048012). Does that mean I can flag this bug as checkin-needed?
Looks like it has xpcshell failures? https://tbpl.mozilla.org/php/getParsedLog.php?id=45140119&tree=Try
As per IRC chat with :RyanVM, I'm flagging this again as checkin-needed. This new patch contains no changes, it was just rebased against latest m-c. The test was failing due to bug 1045421.
Attachment #8466778 - Attachment is obsolete: true
The relative try run: https://tbpl.mozilla.org/?tree=Try&rev=bfc7f06a9833
Follow-up for non-unified bustage: https://hg.mozilla.org/integration/mozilla-inbound/rev/cf5d28075da4
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
There are zero crashes reported with Firefox 34.0 built later than 2014-08-05.
Alessio, can we get an uplift request for aurora (33) and beta (32)? Thanks
Comment on attachment 8467740 [details] [diff] [review] bug_1037214.patch Approval Request Comment [Feature/regressing bug #]: bug 950076 [User impact if declined]: The handled OOM condition would make Firefox crash with the wrong crash signature. [Describe test coverage new/current, TBPL]: This landed on m-c almost two weeks ago, no problems have been raised about it. [Risks and why]: None [String/UUID change made/needed]: None
Attachment #8467740 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8467740 [details] [diff] [review] bug_1037214.patch beta+
Attachment #8467740 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
https://hg.mozilla.org/releases/mozilla-aurora/rev/f0bdcf487ac4 https://hg.mozilla.org/releases/mozilla-beta/rev/353ade05d903 Is this bad enough to warrant esr31 consideration as well?
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #36) > https://hg.mozilla.org/releases/mozilla-aurora/rev/f0bdcf487ac4 > https://hg.mozilla.org/releases/mozilla-beta/rev/353ade05d903 > > Is this bad enough to warrant esr31 consideration as well? I don't think so. Based on crashstats, there are 17 reports of this signature for Firefox 31.0esr in the last week, ranking somewhere around #364.
No crashes on Firefox 32 beta or Aurora 33.0a2 later than 2014-08-17, marking as verified based on crash-stats.
You need to log in before you can comment on or make changes to this bug.