Closed Bug 1132499 Opened 5 years ago Closed 4 years ago

Large OOM in NS_ConvertUTF8toUTF16::NS_ConvertUTF8toUTF16

Categories

(Core :: DOM: Core & HTML, defect, critical)

36 Branch
x86
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox36 --- wontfix
firefox37 --- wontfix
firefox38 --- wontfix
firefox39 --- wontfix
firefox44 --- wontfix
firefox45 --- affected
firefox46 --- fixed
firefox47 --- fixed

People

(Reporter: alex_mayorga, Assigned: froydnj)

References

Details

(Keywords: crash)

Crash Data

Attachments

(3 files, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-8be10840-0330-494c-882c-ef0402150212.
=============================================================
Component: General → DOM: Core & HTML
Product: Firefox → Core
Trying to allocate a 23MB string for the .value of an input...
bp-edfd0dbf-5bef-4c1e-83e5-96e062150212
	12/02/2015	11:12 a.m.
38.0b9 is indeed affected...

bp-506ccc5c-92a6-402c-8775-93d702150501
	01/05/2015	10:48 a.m.
bp-836aeb44-c13b-4620-ab0a-ecd192150606
	06/06/2015	04:37 p.m.

Crashing Thread
Frame 	Module 	Signature 	Source
0 	xul.dll 	NS_ABORT_OOM(unsigned int) 	xpcom/base/nsDebugImpl.cpp
1 	xul.dll 	NS_ConvertUTF8toUTF16::NS_ConvertUTF8toUTF16(nsACString_internal const&) 	xpcom/string/nsString.h
These are coming from different sources; we'll need bug 1130960 in order to sort this out.
Depends on: 1130960
Bah, even after bug 1130960, a bunch of these are still getting the generic signature because of a failed stack walk.

From a quick glance the stragglers seem to be nsTextEditorState::GetValue like in comment 1.

bz, do you know anyone who could own this? I'm kinda worried it might be a lot of work; the UTF8 converters don't seem to have any notion of fallible currently.
Flags: needinfo?(bzbarsky)
Henri or Simon might have some ideas here.
Hmm.  The string code owned by dbaron with bsmedberg as the active peer.  I'm not sure either of them would have time offhand either, but worth checking with them to see whom they'd suggest.
Flags: needinfo?(bzbarsky)
(In reply to Olli Pettay [:smaug] from comment #7)
> Henri or Simon might have some ideas here.

This looks like expected behavior arising from "infallible" malloc. I guess we could re-introduce fallible variants for these conversions and use the fallible variants when persisting form state, since form state allows the user or Web content to cause a large allocation.
Crash Signature: [@ OOM | large | NS_ABORT_OOM(unsigned int) | NS_ConvertUTF8toUTF16::NS_ConvertUTF8toUTF16(nsACString_internal const&)] → [@ OOM | large | NS_ABORT_OOM(unsigned int) | NS_ConvertUTF8toUTF16::NS_ConvertUTF8toUTF16(nsACString_internal const&)] [@ OOM | large | NS_ABORT_OOM | NS_ConvertUTF8toUTF16::NS_ConvertUTF8toUTF16]
The view of Firefox menu just before a crash. The crash occurred after "Exit" button was pressed. Crash-report: https://crash-stats.mozilla.com/report/index/21d470ea-105c-449f-813e-6b7212151123
The visual artifacts during HTML rendering before these crashes are the as for crash [@ nsWindow::ClearTranslucentWindow ] https://crash-stats.mozilla.com/report/index/8a6a8e4c-220c-4a47-a261-ea2952151204  Should a separate bug be filed on the latest crash-report?
Ruvim, yes, please file a new report about 
https://crash-stats.mozilla.com/report/index/8a6a8e4c-220c-4a47-a261-ea2952151204 since that seems to be unrelated issue
¡Hola!

This happens on 44 as well.

Report ID 	Date Submitted
bp-4363e587-724f-4a90-b2e7-d72fe2151229
	28/12/2015	09:03 p.m.

¡Gracias!
Ugh. There are really bad crashes around here. Anything trying to allocate megabytes or more should surely be made fallible and fail gracefully, but this signature has crashes that even go into the hundreds of megabytes - I'm pretty sure that finding that much contigous space will be hard on most systems we run on, see e.g.:
bp-a8367b21-5d54-4712-a2d6-7c1572160226
bp-fb776bf8-1aad-4ae0-8a2c-e81442160226
bp-a284e16e-2bf4-453b-b2d0-ddbf62160226
bp-c6814f59-b13e-45c3-a156-04e8a2160226
bp-d43e90c3-bf98-4349-a080-240cc2160225
bp-42cf730a-67b2-4f98-ad5d-2f5ed2160225

And those are only from the first page of results I saw when taking a look.


We really need to find out what's up here and do something about it.
Summary: crash in OOM | large | NS_ABORT_OOM(unsigned int) | NS_ConvertUTF8toUTF16::NS_ConvertUTF8toUTF16(nsACString_internal const&) → Large OOM in NS_ConvertUTF8toUTF16::NS_ConvertUTF8toUTF16
Those are ... not useful stacks.  :(  In particular, they don't indicate which caller of ConvertUTF8toUTF16 would need to be doing a fallible conversion...
(In reply to Boris Zbarsky [:bz] from comment #16)
> Those are ... not useful stacks.  :(  In particular, they don't indicate
> which caller of ConvertUTF8toUTF16 would need to be doing a fallible
> conversion...

Yes, someone will need to load the stacks into WinDBG and find out if that can walk the stacks more usefully than minidump_stackwalk and can get decent stack traces for this.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #17)
> (In reply to Boris Zbarsky [:bz] from comment #16)
> > Those are ... not useful stacks.  :(  In particular, they don't indicate
> > which caller of ConvertUTF8toUTF16 would need to be doing a fallible
> > conversion...
> 
> Yes, someone will need to load the stacks into WinDBG and find out if that
> can walk the stacks more usefully than minidump_stackwalk and can get decent
> stack traces for this.

I will try to do that today if nobody else beats me to it.
For https://crash-stats.mozilla.com/report/index/c6814f59-b13e-45c3-a156-04e8a2160226 , the stack trace is kind of bogus (windbg thinks jemalloc is calling into NS_ConvertUTF8toUTF16), but examining stack memory, it looks like a call from nsTextEditorState::GetValue to NS_ConvertUTF8toUTF16 might have led to the OOM, which seems plausible:

https://dxr.mozilla.org/mozilla-central/rev/4ea7408b3eef059aa248f4b00328f8fdb4475112/dom/html/nsTextEditorState.cpp#1921

windbg reports that https://crash-stats.mozilla.com/report/index/a8367b21-5d54-4712-a2d6-7c1572160226 is calling NS_ConvertUTF8toUTF16 from somewhere in the networking code, but nsTextEditorState::GetValue is also living on the stack there.

The story is the same for https://crash-stats.mozilla.com/report/index/c679306d-81b7-4f6f-a3bb-a68a92160302 , though I think we're actually coming through FragmentOrElement::SaveSubtreeState, somehow.

So I'm going to tentatively say we should fix nsTextEditorState::GetValue to be fallible.  Which is unfortunate, as it's currently not. :(  And IIUC, fixing it to be fallible means we might not save the users's form state sometimes...which is not so great, but I suppose that's better than crashing.  We could just fix the assignment call to be fallible, without propagating that error out (ew)?
In other words, we'd need to do something like this, which has not been tested,
but does compile.  I have very little confidence in this being correct.  WDYT
of the overall idea?
Attachment #8726348 - Flags: feedback?(bzbarsky)
Also, why are we storing the value as a CString?  It looks like we have to do a conversion into CString and out of...and that's all we do with it.  We even store the CString as an allocated object!  Are/were we trying to save space or something?
Comment on attachment 8726348 [details] [diff] [review]
try to make nsTextEditorState::GetValue fallible

Going to defer to ehsan on the editor state stuff...
Attachment #8726348 - Flags: feedback?(bzbarsky) → feedback?(ehsan)
Comment on attachment 8726348 [details] [diff] [review]
try to make nsTextEditorState::GetValue fallible

Review of attachment 8726348 [details] [diff] [review]:
-----------------------------------------------------------------

Spoke with Nathan about this on IRC.  It looks like we should first convert nsTextEditorState::mValue to be an nsString or something and see how much of these issues that will avoid.  Correctly propagating the error out of GetValue is a huge pain specially since the BindToFrame/UnbindFromFrame machinery is not really designed to be fallible.
Attachment #8726348 - Flags: feedback?(ehsan)
This change is just a minor tidying; we need to distinguish between
"have a value" and "don't have a value" in nsTextEditorState::GetValue,
but we can do better than heap-allocating the string.
Attachment #8726746 - Flags: review?(ehsan)
Attachment #8726348 - Attachment is obsolete: true
I don't know the full history here; roc asked about this exact thing in
bug 534785 when reviewing the creation of nsTextEditorState.  The answer
then was "historical reasons" (the original code is hg@1)...maybe to try
and save some space?  Regardless, since the only thing we do here is
convert from/to incoming/outgoing nsStrings, which at least sometimes
appear to be causing OOMs, we might as well hold it as an nsString all
the time.  This change will ideally eliminate allocations, as we'll be
able to use nsString's buffer sharing underneath the hood.
Attachment #8726747 - Flags: review?(ehsan)
> the original code is hg@1

Don't let that stop you!  

That said, the char* mValue in nsHTMLInputElement was introduced in bug 34297.  The actual patch attached there has it as nsString*, afaict.  I have no idea why what landed has char*, and I doubt jst remembers...
Attachment #8726746 - Flags: review?(ehsan) → review+
Attachment #8726747 - Flags: review?(ehsan) → review+
https://hg.mozilla.org/mozilla-central/rev/abd001def2e8
https://hg.mozilla.org/mozilla-central/rev/80c48500bf8f
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Assignee: nobody → nfroyd
Little spike around 50% of increase in 45.0.1 between 2016-03-23 and 2016-03-28.
Liz, this is a crash spike on 45.0.1. Perhaps we should consider uplifting this to Beta 46.
Flags: needinfo?(lhenry)
Nathan, what do you think about uplift here? I would want some confirmation that the fix worked and isn't causing other problems, unless you think it's super safe.  We are heading into beta 7 so it's a bit late in the cycle.
Flags: needinfo?(lhenry) → needinfo?(nfroyd)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #31)
> Nathan, what do you think about uplift here? I would want some confirmation
> that the fix worked and isn't causing other problems, unless you think it's
> super safe.  We are heading into beta 7 so it's a bit late in the cycle.

Judging by:

https://crash-stats.mozilla.com/report/list?range_unit=days&range_value=28&signature=OOM+%7C+large+%7C+NS_ABORT_OOM+%7C+NS_ConvertUTF8toUTF16%3A%3ANS_ConvertUTF8toUTF16#tab-sigsummary

there are no more crashes from this signature in 47 or later, which suggests this patch has done some good.

I don't know how to show that there aren't other problems, but I think the patches themselves are straightforward and reasonably safe.
Flags: needinfo?(nfroyd)
Comment on attachment 8726746 [details] [diff] [review]
part 1 - convert nsTextEditorState::mValue to use Maybe instead of heap allocation

This request is for both of the patches in this bug, given that they seem to have fixed some OOM crashes.

Approval Request Comment
[Feature/regressing bug #]: bug 34297, bug 534785, I guess?
[User impact if declined]: OOM crashes
[Describe test coverage new/current, TreeHerder]: I know we have a fair number of editing-related tests; I don't know that we have any tests that definitively exercise the code these patches modify, as I'm not familiar with this code.  I could try to investigate, if need be.
[Risks and why]: Low risk; these patches simply remove a memory-allocating conversion.
[String/UUID change made/needed]: None.
Attachment #8726746 - Flags: approval-mozilla-beta?
Comment on attachment 8726746 [details] [diff] [review]
part 1 - convert nsTextEditorState::mValue to use Maybe instead of heap allocation

Should prevent some OOM crashes. OK for uplift for the beta 8 build.
Attachment #8726746 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.