Closed Bug 367474 Opened 18 years ago Closed 7 years ago

XPConnect out of memory handling needs to be rearchitected

Categories

(Core :: XPConnect, defect)

x86
All
defect
Not set
critical

Tracking

()

RESOLVED WONTFIX

People

(Reporter: timeless, Unassigned)

References

Details

(Keywords: crash)

firefox.exe is using 1,911,132 K of VM and 668,084 K of ram (not bad)

Unfortunately, it also crashed twice (well, it has 12 threads, so it could have crashed up to 12 times, 17% isn't bad, right?)

Thead 1 (0 based), the timer thread crashed:
ntdll!KiFastSystemCallRet
ntdll!NtRaiseHardError+0xc
kernel32!UnhandledExceptionFilter+0x653
MSVCR80D!_XcptFilter(unsigned long xcptnum = 0xe06d7363, struct _EXCEPTION_POINTERS * pxcptinfoptrs = 0x00e8f9c4)+0x61 [f:\rtm\vctools\crt_bld\self_x86\crt\src\winxfltr.c @ 237]
MSVCR80D!_callthreadstartex(void)+0x7a [f:\rtm\vctools\crt_bld\self_x86\crt\src\threadex.c @ 350]
MSVCR80D!_EH4_CallFilterFunc(void)+0x12
MSVCR80D!_except_handler4_common(unsigned int * CookiePointer = 0x10310974, <function> * CookieCheckFunction = 0x102150b0, struct _EXCEPTION_RECORD * ExceptionRecord = 0x00e8fad8, struct _EXCEPTION_REGISTRATION_RECORD * EstablisherFrame = 0x00e8ff98, struct _CONTEXT * ContextRecord = 0x00e8faf8, void * DispatcherContext = 0x00e8faac)+0xba
MSVCR80D!_except_handler4(struct _EXCEPTION_RECORD * ExceptionRecord = 0x00e8fad8, struct _EXCEPTION_REGISTRATION_RECORD * EstablisherFrame = 0x00e8ff98, struct _CONTEXT * ContextRecord = 0x00e8faf8, void * DispatcherContext = 0x00e8faac)+0x22
ntdll!ExecuteHandler2+0x26
ntdll!ExecuteHandler+0x24
ntdll!KiUserExceptionDispatcher+0xe
kernel32!RaiseException+0x53
MSVCR80D!_CxxThrowException(void * pExceptionObject = 0x00e8fe68, struct _s__ThrowInfo * pThrowInfo = 0x103037e4)+0x50 [f:\rtm\vctools\crt_bld\self_x86\crt\prebuild\eh\throw.cpp @ 166]
MSVCR80D!operator new(unsigned int size = 0x18)+0x75 [f:\rtm\vctools\crt_bld\self_x86\crt\src\new.cpp @ 64]
xpcom_core!nsTimerImpl::PostTimerEvent(void)+0x10 [c:\home\mozilla.org\mozilla\xpcom\threads\nstimerimpl.cpp @ 470]
xpcom_core!TimerThread::Run(void)+0x12d [c:\home\mozilla.org\mozilla\xpcom\threads\timerthread.cpp @ 284]
xpcom_core!nsThread::ProcessNextEvent(int mayWait = 1, int * result = 0x00e8ff18)+0x1a0 [c:\home\mozilla.org\mozilla\xpcom\threads\nsthread.cpp @ 483]
xpcom_core!NS_ProcessNextEvent_P(class nsIThread * thread = 0x00c6baa0, int mayWait = 1)+0x56 [c:\home\mozilla.org\mozilla\dbg-firefox-i686-pc-mingw32\xpcom\build\nsthreadutils.cpp @ 225]
xpcom_core!nsThread::ThreadFunc(void * arg = 0x00c6baa0)+0xce [c:\home\mozilla.org\mozilla\xpcom\threads\nsthread.cpp @ 251]
nspr4!_PR_NativeRunThread(void * arg = 0x10204767)+0xdb [c:\home\mozilla.org\mozilla\nsprpub\pr\src\threads\combined\pruthr.c @ 436]
xpcom_core!nsSubstring::~nsSubstring(void)+0xf
MSVCR80D!_threadstartex(void * ptd = 0x00c0bf48)+0x87 [f:\rtm\vctools\crt_bld\self_x86\crt\src\threadex.c @ 331]
kernel32!BaseThreadStart+0x37

Thread 0 (the main, ui thread) crashed:

kernel32!RaiseException+0x53
MSVCR80D!_CxxThrowException(void * pExceptionObject = 0x0012ec04, struct _s__ThrowInfo * pThrowInfo = 0x103037e4)+0x50 [f:\rtm\vctools\crt_bld\self_x86\crt\prebuild\eh\throw.cpp @ 166]
MSVCR80D!operator new(unsigned int size = 0x58)+0x75 [f:\rtm\vctools\crt_bld\self_x86\crt\src\new.cpp @ 64]
xpc3250!XPCConvert::JSErrorToXPCException(class XPCCallContext * ccx = 0x0012edec, char * message = 0x00579dac "out of memory", char * ifaceName = 0x00000000 "", char * methodName = 0x00000000 "", struct JSErrorReport * report = 0x0012ee94, class nsIException ** exceptn = 0x0012ede8)+0x83 [c:\home\mozilla.org\mozilla\js\src\xpconnect\src\xpcconvert.cpp @ 1526]
xpc3250!xpcWrappedJSErrorReporter(struct JSContext * cx = 0x0afe2040, char * message = 0x00579dac "out of memory", struct JSErrorReport * report = 0x0012ee94)+0xa0 [c:\home\mozilla.org\mozilla\js\src\xpconnect\src\xpcwrappedjsclass.cpp @ 704]
js3250!js_ReportOutOfMemory(struct JSContext * cx = 0x0afe2040)+0x118 [c:\home\mozilla.org\mozilla\js\src\jscntxt.c @ 899]
js3250!JS_ReportOutOfMemory(struct JSContext * cx = 0x0afe2040)+0xc [c:\home\mozilla.org\mozilla\js\src\jsapi.c @ 4731]
js3250!JS_malloc(struct JSContext * cx = 0x0afe2040, unsigned int nbytes = 0x10)+0x74 [c:\home\mozilla.org\mozilla\js\src\jsapi.c @ 1680]
js3250!js_Enumerate(struct JSContext * cx = 0x0afe2040, struct JSObject * obj = 0x0c57bb40, JSIterateOp enum_op = JSENUMERATE_INIT (0), long * statep = 0x0012ef40, long * idp = 0x00000000)+0x322 [c:\home\mozilla.org\mozilla\js\src\jsobj.c @ 4074]
js3250!InitNativeIterator(struct JSContext * cx = 0x0afe2040, struct JSObject * iterobj = 0x0cb6ffa0, struct JSObject * obj = 0x0c57bb40, unsigned int flags = 1)+0xd6 [c:\home\mozilla.org\mozilla\js\src\jsiter.c @ 146]
js3250!js_ValueToIterator(struct JSContext * cx = 0x0afe2040, unsigned int flags = 1, long * vp = 0x74bd064c)+0x1ee [c:\home\mozilla.org\mozilla\js\src\jsiter.c @ 416]
js3250!js_Interpret(struct JSContext * cx = 0x0afe2040, unsigned char * pc = 0x09adefea "g???", long * result = 0x0012f5a4)+0x1c3a [c:\home\mozilla.org\mozilla\js\src\jsinterp.c @ 2773]
js3250!js_Invoke(struct JSContext * cx = 0x0afe2040, unsigned int argc = 1, unsigned int flags = 2)+0xbb5 [c:\home\mozilla.org\mozilla\js\src\jsinterp.c @ 1367]
xpc3250!nsXPCWrappedJSClass::CallMethod(class nsXPCWrappedJS * wrapper = 0x74beab80, unsigned short methodIndex = 3, struct XPTMethodDescriptor * info = 0x01d39998, struct nsXPTCMiniVariant * nativeParams = 0x0012f90c)+0xf7f [c:\home\mozilla.org\mozilla\js\src\xpconnect\src\xpcwrappedjsclass.cpp @ 1419]
xpc3250!nsXPCWrappedJS::CallMethod(unsigned short methodIndex = 3, struct XPTMethodDescriptor * info = 0x01d39998, struct nsXPTCMiniVariant * params = 0x0012f90c)+0x41 [c:\home\mozilla.org\mozilla\js\src\xpconnect\src\xpcwrappedjs.cpp @ 531]
xpcom_core!PrepareAndDispatch(class nsXPTCStubBase * self = 0x74658d20, unsigned int methodIndex = 3, unsigned int * args = 0x0012f9cc, unsigned int * stackBytesToPop = 0x0012f9bc)+0x323 [c:\home\mozilla.org\mozilla\xpcom\reflect\xptcall\src\md\win32\xptcstubs.cpp @ 114]
xpcom_core!SharedStub(void)+0x16 [c:\home\mozilla.org\mozilla\xpcom\reflect\xptcall\src\md\win32\xptcstubs.cpp @ 142]
gklayout!nsXMLHttpRequest::NotifyEventListeners(class nsCOMArray<nsIDOMEventListener> * aListeners = 0x74658d20, class nsIDOMEvent * aEvent = 0x74bbffb8)+0xe7 [c:\home\mozilla.org\mozilla\content\base\src\nsxmlhttprequest.cpp @ 939]
gklayout!nsXMLHttpRequest::NotifyEventListeners(class nsCOMArray<nsIDOMEventListener> * aListeners = 0x0012fa30, class nsIDOMEvent * aEvent = 0x74bbffb8)+0xe7 [c:\home\mozilla.org\mozilla\content\base\src\nsxmlhttprequest.cpp @ 939]
gklayout!nsXMLHttpRequest::ChangeState(unsigned int aState = 0x10, int aBroadcast = 1, int aClearEventListeners = 1)+0x152 [c:\home\mozilla.org\mozilla\content\base\src\nsxmlhttprequest.cpp @ 1939]
gklayout!nsXMLHttpRequest::RequestCompleted(void)+0x14d [c:\home\mozilla.org\mozilla\content\base\src\nsxmlhttprequest.cpp @ 1472]
gklayout!nsXMLHttpRequest::OnStopRequest(class nsIRequest * request = 0x74896210, class nsISupports * ctxt = 0x00000000, unsigned int status = 0)+0x204 [c:\home\mozilla.org\mozilla\content\base\src\nsxmlhttprequest.cpp @ 1414]
necko!nsHTTPCompressConv::OnStopRequest(class nsIRequest * request = 0x74896210, class nsISupports * aContext = 0x00000000, unsigned int aStatus = 0)+0x23 [c:\home\mozilla.org\mozilla\netwerk\streamconv\converters\nshttpcompressconv.cpp @ 125]
necko!nsStreamListenerTee::OnStopRequest(class nsIRequest * request = 0x74896210, class nsISupports * context = 0x00000000, unsigned int status = 0)+0xa8 [c:\home\mozilla.org\mozilla\netwerk\base\src\nsstreamlistenertee.cpp @ 66]
necko!nsHttpChannel::OnStopRequest(class nsIRequest * request = 0x74b3d838, class nsISupports * ctxt = 0x00000000, unsigned int status = 0)+0x34e [c:\home\mozilla.org\mozilla\netwerk\protocol\http\src\nshttpchannel.cpp @ 4041]
necko!nsInputStreamPump::OnStateStop(void)+0xde [c:\home\mozilla.org\mozilla\netwerk\base\src\nsinputstreampump.cpp @ 572]
necko!nsInputStreamPump::OnInputStreamReady(class nsIAsyncInputStream * stream = 0x748040e8)+0x90 [c:\home\mozilla.org\mozilla\netwerk\base\src\nsinputstreampump.cpp @ 396]
xpcom_core!nsInputStreamReadyEvent::Run(void)+0x4a [c:\home\mozilla.org\mozilla\xpcom\io\nsstreamutils.cpp @ 112]
xpcom_core!nsThread::ProcessNextEvent(int mayWait = 1, int * result = 0x0012fbc4)+0x1a0 [c:\home\mozilla.org\mozilla\xpcom\threads\nsthread.cpp @ 483]
xpcom_core!NS_ProcessNextEvent_P(class nsIThread * thread = 0x00c00360, int mayWait = 1)+0x56 [c:\home\mozilla.org\mozilla\dbg-firefox-i686-pc-mingw32\xpcom\build\nsthreadutils.cpp @ 225]
gkwidget!nsBaseAppShell::Run(void)+0x5d [c:\home\mozilla.org\mozilla\widget\src\xpwidgets\nsbaseappshell.cpp @ 153]
tkitcmps!nsAppStartup::Run(void)+0x6b [c:\home\mozilla.org\mozilla\toolkit\components\startup\src\nsappstartup.cpp @ 171]
xul!XRE_main(int argc = 1, char ** argv = 0x00b6b788, struct nsXREAppData * aAppData = 0x004036b4)+0x1da2 [c:\home\mozilla.org\mozilla\toolkit\xre\nsapprunner.cpp @ 2513]
firefox!main(int argc = 1, char ** argv = 0x00b6b788)+0x16 [c:\home\mozilla.org\mozilla\browser\app\nsbrowserapp.cpp @ 61]
firefox!__tmainCRTStartup(void)+0x1a6 [f:\rtm\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 586]
firefox!mainCRTStartup(void)+0xd [f:\rtm\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 403]
kernel32!BaseProcessStart+0x23

What bothers me is that xpconnect was asked to report an out of memory situation and then proceeded to try a number of allocations hoping for the best.

The first heap allocation is:
XPCConvert::JSErrorToXPCException
        data = new nsScriptError();
        if(!data)
            return NS_ERROR_OUT_OF_MEMORY;
Which is where we are right now. We crashed because I'm using vc8 (instead of 6) and the vc8 crt throws exceptions when allocs fail for new. bsmedberg will eventually fix that, I hope.

It would then call
nsScriptError::Init
        data->Init(bestMessage.get(),
                   NS_ConvertASCIItoUTF16(report->filename).get(),
                   (const PRUnichar *)report->uclinebuf, report->lineno,
                   report->uctokenptr - report->uclinebuf, report->flags,
                   "XPConnect JavaScript");
which performs a number of string operations that would involve heap allocations.

Unfortunately, our string operations don't return messages indicating if they failed for OOM. Which means that if we did actually manage to succeed in creating the object, init would lose all the data anyway.
and currently we don't error check the init statement because it always returns NS_OK.

Then it would call
nsScriptError::ToString (nsCAutoString(
        nsCAutoString formattedMsg;
        data->ToString(formattedMsg);
this might be ok if the entire message is "out of memory", if it's longer, then you probably have heap allocations and silent dataloss assuming the string apis use malloc and don't choose to crash. but the function and its children don't/can't signal failure.

Finally it calls
XPCConvert::ConstructException
which calls
        if (NS_SUCCEEDED(errorObject->GetMessage(getter_Copies(xmsg)))) {
which is a NS_Alloc call

And then it calls:
            CopyUTF16toUTF8(xmsg, sxmsg);

Which will probably be ok, since the actual message is "out of memory" and will fit into the stack buffer.

now, in our case it will not call:
        msg = sz = JS_smprintf(format, msg, ifaceName, methodName);

which is good, because JS_smprintf calls malloc. But it's theoretically possible for this to be the first thing to fail.
If it does, we should actually use the old msg, not sz.

lastly, that tails to
    nsresult res = nsXPCException::NewException(msg, rv, nsnull, data, exceptn);

The first sign of hope is in nsXPCException::NewException which has thankfully some code to create one extra exception.
I'm hoping that instead of simply creating and destroying an exception, we have the module create an emergency exception (and all the other emergency bits) at module load time, and when we run out of memory, we would use them instead of asking the heap for more memory.

next, we have:
    nsXPCException* e = new nsXPCException();

which is i believe the first second chance we've had at actually throw()ing for oom.

the next problem would be here:
        nsIStackFrame* location;
            rv = xpc->GetCurrentJSStack(&location);

I'm not going to dig through that call frame, but i know that it allocates, thankfully the function allows the caller to pass an object, so we can keep a stack frame object available, even just a dummy that doesn't have anything.

in unrelated code auditing notes:
                nsCOMPtr<nsIStackFrame> caller;
                caller->QueryInterface(NS_GET_IID(nsIStackFrame), (void **)&location);
We seem to use QI instead of do_QI.

then we call
nsXPCException::Initialize
        rv = e->Initialize(aMessage, aResult, nsnull, location, aData, nsnull);

which uses NS_Alloc, twice.

And we're done.

The last bit is:
        ccx.GetXPCContext()->SetException(e);

Which i believe is a simple operation.

The next questions go to bz since if we're lucky, we can now bubble up to:
 nsXMLHttpRequest::NotifyEventListeners
which helpfully losses our critical error message:
      listener->HandleEvent(aEvent);

does it really not care about important failures?
In our stack, there are two of these frames, not sure why, but it loses the error message twice because it's void both times, and it also proceeds to do more work iterating over the arrays, which i think we could nicely choose not to do when we've run out of memory.

We then return to 
nsXMLHttpRequest::ChangeState

Which can fail for other reasons.

i.e., it sometimes returns a failure message to:
nsXMLHttpRequest::RequestCompleted

which chooses to ignore that failure:
  // Clear listeners here unless we're multipart
  ChangeState(XML_HTTP_REQUEST_COMPLETED, PR_TRUE,
              !(mState & XML_HTTP_REQUEST_MULTIPART));
and do work under the assumption that ChangeState must have succeeded...
  if (NS_SUCCEEDED(rv) && domevent) {

Anyway, this function can return failures, but again doesn't for us.

It returns to nsXMLHttpRequest::OnStopRequest, which ignores its results:
    RequestCompleted();

now we get to the good part:
  if (mScriptContext) {
    // Force a GC since we could be loading a lot of documents
    // (especially if streaming), and not doing anything that would
    // normally trigger a GC.
    mScriptContext->GC();
  }

oho, a point where we could get back some of that memory that i needed...

and then we return a possible failure rv, except not for RequestCompleted (even though that could fail):
  return rv;

We bubble up to:
nsHttpChannel::OnStopRequest, which ignores our failure:
        mListener->OnStopRequest(this, mListenerContext, status);

and returns:
    return NS_OK;

to: nsInputStreamPump::OnStateStop
    // if an error occured, we must be sure to pass the error onto the async
    // stream.  in some cases, this is redundant, but since close is idempotent,
    // this is OK.  otherwise, be sure to honor the "close-when-done" option.

At this point, we're at the top of the world, and I think it's actually ok that necko ignores the error message by the time it gets to nsInputStreamPump::OnStateStop, although it'd be kinda nice if it maybe thought about flushing caches...

-- thus ends the tale of my ui thread's crash --

that leaves the crash on the timer thread.

        timer = NS_STATIC_CAST(nsTimerImpl*, mTimers[0]);
        if (!TIMER_LESS_THAN(now, timer->mTimeout + mTimeoutAdjustment)) {
    next:
          // NB: AddRef before the Release under RemoveTimerInternal to avoid
          // mRefCnt passing through zero, in case all other refs than the one
          // from mTimers have gone away (the last non-mTimers[i]-ref's Release
          // must be racing with us, blocked in gThread->RemoveTimer waiting
          // for TimerThread::mLock, under nsTimerImpl::Release.

          NS_ADDREF(timer);
          RemoveTimerInternal(timer);

          // We are going to let the call to PostTimerEvent here handle the
          // release of the timer so that we don't end up releasing the timer
          // on the TimerThread instead of on the thread it targets.
          timer->PostTimerEvent();
We crashed in PostTimerEvent.
Unfortunately, we also leaked the timer because as per the comment, it's the job of PostTimerEvent to make sure that the timer is freed on the right thread eventually.
          timer = nsnull;


void nsTimerImpl::PostTimerEvent()
{
  nsTimerEvent* event = new nsTimerEvent(this, mGeneration);
  if (!event)
    return;

The place where we'd arrange to actually release the timer reference is here:
  mCallingThread->Dispatch(event, NS_DISPATCH_NORMAL);

So, that's good, we're running low on memory and we leak a timer and the object it referenced.

I bit of checking shows:
0:001> dt this mCallbackType
Local var @ 0xe8fe8c Type nsTimerImpl*
0x748fe498 
   +0x018 mCallbackType : 0x2 '' /* CALLBACK_TYPE_FUNC */
0:001> dt this mCallback.c
Local var @ 0xe8fe8c Type nsTimerImpl*
0x748fe498 
   +0x014 mCallback   : 
      +0x000 c           : 0x0253c9c0        gklayout!nsGlobalWindow::TimerCallback+0
0:001> dt this mClosure
Local var @ 0xe8fe8c Type nsTimerImpl*
0x748fe498 
   +0x010 mClosure : 0x749316c8 
0:001> dt gklayout!nsTimeout 0x749316c8 mWindow.mRawPtr
   +0x008 mWindow         : 
      +0x000 mRawPtr         : 0x142fd158 nsGlobalWindow

So, we're leaking a reference to a timer which holds a reference to a global window. that's a good thing to leak.

When a timer can't fire, do we want to lose it, or reschedule it? I'm assuming we want to reschedule it, otherwise if the timer relies on doing NS_RELEASE(closure) or similarly, the closure data will leak.

I'll probably file specific bugs for some of my observations (especially the timerimpl stuff), but i'm filing this bug now, because otherwise bugzilla will eat it like it did the last bug i tried to file.
> does it really not care about important failures?

Probably not too much, no.  ;)

In particular, if some JS listener throws we do NOT want to stop notifying listeners.  Imo.

First, what's the out of memory story for Gecko as a whole? Ideally what you want to have happen is in an out of memory condition attempt to free up some trash laying around. Is there anything like the in memory cache that can be dumped when an out of memory occurs? Until something like that can be done once you're out of memory there's not much more you can do than to terminate the process as gracefully as you can.

I think it would be more useful to have things register with the memory allocation system, things that could be thrown away safely. Then when it detects and out of memory condition it could start dropping things from the list. It would be nice if we could start a GC cycle when this happens, but I think that requires memory allocation. Maybe if we could free up some stuff then do the GC.

In the end, how common is this event? Are people losing work due to this? I know Windows seems to be more stable now a days when running low on memory. Use to be that it was a mute point because once the an app ran out of memory you were pretty much hosed across the bored most of the time.
i am, i'm crashing about twice a week because of out of memory.

our device will run out of memory much faster than i will.

the official story is that we have a memory flusher.

unfortunately, currently the memory flusher is probably at most flushing one of the following things:
 network cache
 image cache
 font cache
 string bundle cache
David, timeless' company is running Gecko on a portable device.  They have something like 64MB of memory, without a virtual memory system.  So I do fully expect them to be running out of memory.  Often.
just to point out that the memory flusher is probably not the right approach for small devices.  Instead you may want to trap malloc() and do something interesting it in (hold a reserve hand it out during low memory).
Blocks: 369823
in practice, the memory flusher is of course *not* tickled when we throw a(n uncaught) C++ exception for oom. but it will be once bsmedberg (or darin or whomever steps up) writes the replacement new which returns null.

If you happen to use NS_Alloc, then it's supposed to tickle the OOM handler, unfortunately of course, JS doesn't. There's a patch that brendan has been toying with that might enable js to switch its allocator. I think for now we should be able to hack xpconnect so that when it gets an OOM failure from JS it calls the OOM flusher first (it won't do much atm, but my people are supposed to be hooking it up to some of the things from comment 3 this month).

Anyway, "next, we have:" from comment 0 really can happen, I hit it yesterday - bug 369823.

I think that on average, my best chance of crashing because of OOM atm is in XPConnect because on average, I'm going to run out of memory in JS, which does survive, and then JS reports to XPConnect, which has as we can see in this bug a number of chances of crashing. (I think that I will probably average 2 crashes per oom, because one other thread will probably also run out of memory until we fix all of those crash points.)
fwiw, we do have high and low watermarks, handling for which is among the bits i've indicated. someone will hook it up so that it triggers the memory flusher.

as for holding a reserve, I think that's probably something we can change the memory flusher to do. probably in the form of allocating a 2mb block [or some other magic number that will safely cooperate with all the allocators and memory flags] and releasing it when it gets a low memory or oom signal. I'll probably also try to arrange for the system to reacquire that buffer when it feels it's safe.

as for hacking malloc, our core people yell and scream anytime the browser team talks about it. so that's pretty much a no go. (the standard allocator *sucks* for browsers, but the allocators that work well for browsers don't work well for anything else.)
OS: Windows XP → All
Since then, Gecko has moved to infallible malloc, so there's probably less need for a complicated OOM handling set up.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.