Closed Bug 590674 Opened 9 years ago Closed 9 years ago

[Win64] OOM Crash at 1.5 GB [@ mozalloc_abort(char const*) ] [@ mozalloc_handle_oom ]

Categories

(Core :: Memory Allocator, defect)

x86
All
defect
Not set

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: bugzilla.mozilla.org, Unassigned)

References

Details

(Keywords: 64bit, crash)

Crash Data

Attachments

(3 files)

User-Agent:       Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b4) Gecko/20100818 Firefox/4.0b4
Build Identifier: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b4) Gecko/20100818 Firefox/4.0b4

I'm using ~270 open tabs and a long history (99 days) to keep the awesome bar fed with data. Many of these tabs contain high resolution images or simply are image-heavy, this leads to firefox consuming over 1.5GB of memory

For me firefox reproducibly crashes once it reaches about 1.4 to 1.5GB of memory usage (6GB overall on this machine).

Currently i'm working around this issue by setting 
image.mem.decodeondraw = true
image.mem.discardable = true

But this only helps so much, eventually the memory usage climbs to 1.5GB again and firefox crashes. In most cases it does not even open the "submit crash report" window or when it does it often displays the message "there was an error with submitting your crash report"

Broken bug reports:
bp-45379bee-19f2-446f-801f-5f74d2100825
bp-8ef75fd7-07aa-459d-b219-df27e2100825
bp-d6bd1eac-3692-4f81-841a-fd5552100825

This is the most recent one that did go through, but i'm not certain whether it's related to the OOM crash or not:
bp-cbe2a388-f2f5-4608-9aa0-c143b2100825

Reproducible: Always

Steps to Reproduce:
1. Setup firefox for a "heavy user" configuration (99 days history, 1GB disk cache, 270+ image-heavy tabs open)
2. Watch memory usage climb as you open the tabs
3. Crash
Blocks: 590371
Please, open firefox with windbg and try to reproduce the issue :
https://developer.mozilla.org/en/How_to_get_a_stacktrace_for_a_bug_report

Then attach the windbg log.
Looking at the log it seems like the access violation is related to mozalloc, which confirms my suspicion that this a memory-related issue.

Anyway, i hope this helps...
Version: unspecified → Trunk
mozalloc_abort is an irrelevant signature (see bug 588433 description).
Which function call mozalloc_abort ? That is the point.

May be your 1 GB cache size is not enough for a 1.5 GB memory usage ?
Try to set your cache size to 2 GB then report if it crashes again.

Be aware that a 32-bit application in 64-bit OS can not use more than 2 GB of virtual address space :
http://msdn.microsoft.com/en-us/library/aa366778%28VS.85%29.aspx
Product: Firefox → Core
QA Contact: general → general
(In reply to comment #3)
> mozalloc_abort is an irrelevant signature (see bug 588433 description).
> Which function call mozalloc_abort ? That is the point.

uh, look at the windbg log i provided. Also, i don't see how bug 588433 is relevant to this crash.

> May be your 1 GB cache size is not enough for a 1.5 GB memory usage ?
> Try to set your cache size to 2 GB then report if it crashes again.

no, about:cache shows
Maximum storage size:  	1024000 KiB
Storage in use: 	275225 KiB

So there's plenty of room in it. Are you getting swap and cache mixed up?


> Be aware that a 32-bit application in 64-bit OS can not use more than 2 GB of
> virtual address space :
> http://msdn.microsoft.com/en-us/library/aa366778%28VS.85%29.aspx

Yes, but it's crashing at 1.5GB virtual memory size, not 2GB. Also, i'm on a 64bit system, so if FF had been compiled with IMAGE_FILE_LARGE_ADDRESS_AWARE it could use up to 4GB since i'm on a 64bit system.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: OOM Crash at 1.5 GB → [Win64] OOM Crash at 1.5 GB [@ mozalloc_abort(char const*) ]
So, what actually matters is mozalloc_handle_oom, that's what means you're oom, the abort function is shared by all of our "we have to die now" callers.

amusingly, you crashed on 3 threads.

#  4  Id: 720.774 Suspend: 1 Teb: 7efdd000 Unfrozen
mozalloc_handle_oom
nsHttpActivityDistributor::ObserveActivity(class nsISupports * aHttpChannel = 0x0bd1b250, unsigned int aActivityType = 2, unsigned int aActivitySubtype = 0x5001, int64 aTimestamp = 0n1283045391780000, unsigned int64 aExtraSizeData = 0, class nsACString_internal * aExtraStringData = 0x7edd2cd0)+0x49
nsHttpTransaction::Init+0x2a6adc
nsHttpChannel::SetupTransaction
nsHttpChannel::Connect(int firstTime = 0n0)
nsHttpChannel::AsyncOpen
nsScriptLoader::StartLoad(class nsScriptLoadRequest * aRequest = 0x797c8f70, class nsAString_internal * aType = 0x00000000)+0x201
nsScriptLoader::ProcessScriptElement+0x46c062
nsScriptElement::MaybeProcessScript(void)+0x8c
nsHTMLScriptElement::MaybeProcessScript(void)+0x1e
nsHTMLScriptElement::DoneAddingChildren(int aHaveNotified = 0n1851950917)+0xf
nsHtml5TreeOpExecutor::RunScript(class nsIContent * aScriptElement = 0xffffffff)+0x7e
nsHtml5TreeOpExecutor::FlushDocumentWrite+0x3a1795
nsHtml5Parser::Parse+0x32453d
nsHTMLDocument::WriteCommon(class nsAString_internal * aText = 0x0043c808, int aNewlineTerminate = 0n1)+0x117
nsHTMLDocument::Write(class nsAString_internal * aText = 0x15168400)+0x15
nsIDOMHTMLDocument_Write(struct JSContext * cx = 0x15168400, unsigned int argc = 1, unsigned int64 * vp = 0x035a0168)+0xa8
js::Interpret(struct JSContext * cx = 0x15168400)+0x48c
js::Execute(struct JSContext * cx = 0x15168400, struct JSObject * chain = 0x3daa1d80, struct JSScript * script = 0x7ecee160, struct JSStackFrame * down = 0x00000000, unsigned int flags = 0, class js::Value * result = 0x00000000)+0x19b
JS_EvaluateUCScriptForPrincipals(struct JSContext * cx = 0x15168400, struct JSObject * obj = 0x3daa1d80, struct JSPrincipals * principals = 0x6c7f8dd4, wchar_t * chars = 0x797ee008 ".(function(){.var a=0;.var b=0;.var r=/BackCompat/i;.if (r.test(document.compatMode)){.a=document.body.clientWidth;.b=document.body.clientHeight;.}.else{.a=document.documentElement.clientWidth;.b=document.documentElement.clientHeight;.}.if(a>=500&&b>=500){.document.write('<scr' + 'ipt ' + 'language=\'javascript\'' + 'charset=\'utf-8\'' + 'type=\'text/javascript\'' + 'src=\'http://pages.etology.com/cbjs2/41234.php\'>' + '</scr' + 'ipt>');.}})();.", unsigned int length = 0x1c1, char * filename = 0x085c3748 "http://gelbooru.com/index.php?page=post&s=list&tags=angel_wings", unsigned int lineno = 0x10d, unsigned int64 * rval = 0x00000000)+0x5b
nsJSContext::EvaluateString(class nsAString_internal * aScript = 0x0043cfec, void * aScopeObject = 0x3daa1d80, class nsIPrincipal * aPrincipal = 0x6c7f8dd0, char * aURL = 0x085c3748 "http://gelbooru.com/index.php?page=post&s=list&tags=angel_wings", unsigned int aLineNo = 0x10d, unsigned int aVersion = 0, class nsAString_internal * aRetValue = 0x00000000, int * aIsUndefined = 0x0043cf44)+0x183
nsScriptLoader::EvaluateScript(class nsScriptLoadRequest * aRequest = 0x00000000, class nsString * aScript = 0x0043cfec)+0x17c
nsScriptLoader::ProcessRequest(class nsScriptLoadRequest * aRequest = 0x75127690)+0xb2
nsScriptLoader::ProcessScriptElement(class nsIScriptElement * aElement = 0x00000000)+0x2f3
nsScriptElement::MaybeProcessScript
nsHTMLScriptElement::MaybeProcessScript
nsHTMLScriptElement::DoneAddingChildren(int aHaveNotified = 0n1847332657)+0xf
nsHtml5TreeOpExecutor::RunScript
nsHtml5TreeOpExecutor::RunFlushLoop
nsHtml5ExecutorReflusher::Run
nsThread::ProcessNextEvent
mozilla::ipc::MessagePump::Run
MessageLoop::RunInternal
MessageLoop::RunHandler
MessageLoop::Run
nsBaseAppShell::Run
nsAppShell::Run
nsAppStartup::Run
XRE_main
wmain(int argc = 0n1, wchar_t ** argv = 0x009190f0)+0x34c

#  6  Id: 720.152c Suspend: 1 Teb: 7efa6000 Unfrozen
ChildEBP RetAddr  
ntdll!RtlEnterCriticalSection+0x150
PR_Lock(struct PRLock * lock = 0x6e577cd1)
nsAutoLock::nsAutoLock
nsHttpActivityDistributor::ObserveActivity
nsHttpTransaction::OnTransportStatus+0x26cdf5
nsHttpConnection::OnTransportStatus
nsSocketTransport::SendStatus
0303fc18 6e317b96 xul!nsSocketInputStream::Read(char * buf = 0x79f9d004 "HTTP/1.1 200 OK..Date: Sun, 29 Aug 2010 01:29:52 GMT..Server: Apache..Set-Cookie: TD_UNIQUE_IMP=134283a5607509;expires=Mon, 29-Aug-2011 01:29:52 GMT;path=/;domain=.tradedoubler.com..Set-Cookie: TD_PIC=1198152*WoB*5gst*15TA4****1*1CUPlm*1CVk7m*%28%29;expires=Thu, 02-Sep-2010 01:29:52 GMT;path=/;domain=.tradedoubler.com..Set-Cookie: BT=z11zz12oz1riYR9zzzz6yUOcqHHH;expires=Mon, 29-Aug-2011 01:29:52 GMT;path=/;domain=.tradedoubler.com..Cache-Control: private, max-age=0..Pragma: no-cache..P3P: policyref="http://tracker.tradedoubler.com/w3c/p3p.xml",CP="NOI DSP COR NID CUR OUR NOR"..Content-Length: 787..Connection: close..Content-Type: application/x-javascript; charset=ISO-8859-1....document.writeln('<img src=\"http://portal.o2online.de/nws/img/postview.gif?x=1130981519&partnerId=pptrd\" width=\"1\" height=\"1\" border=\"0\"/><SCRIPT type=\"text/javascript\">\n\nfunction readCookie(name) {\n  var nameEQ = name + \"=\";\n  var ca = document.cookie.split(\';\');\n  for(var i=0;i < ca.length;i++) {\n    var c = ...

nsHttpConnection::OnWriteSegment
nsHttpTransaction::WritePipeSegment
nsPipeOutputStream::WriteSegments
nsHttpTransaction::WriteSegments
nsHttpConnection::OnSocketReadable
nsHttpConnection::OnInputStreamReady
nsSocketInputStream::OnSocketReady
nsSocketTransport::OnSocketReady
nsSocketTransportService::DoPollIteration
nsSocketTransportService::OnProcessNextEvent
nsThread::ProcessNextEvent
NS_ProcessNextEvent_P
nsSocketTransportService::Run
nsThread::ProcessNextEvent
nsThread::ThreadFunc

# 10  Id: 720.1064 Suspend: 1 Teb: 7ef9a000 Unfrozen
ChildEBP RetAddr  
mozalloc_handle_oom
nsTimerImpl::PostTimerEvent
TimerThread::Run
nsThread::ProcessNextEvent
nsThread::ThreadFunc

# 59  Id: 720.1550 Suspend: 1 Teb: 7ef5e000 Unfrozen
ChildEBP RetAddr  
mozalloc_handle_oom(void)+0xa
nsProxyEventKey::Clone(void)+0xb
nsHashtable::Put
nsProxyObjectManager::GetProxyForObject
KERNELBASE!WaitForSingleObjectEx+0xcb
nsThread::ProcessNextEvent
nsThread::ThreadFunc

From my perspective, the html parser needs to be more aware of memory pressure and give up parsing instead of running the browser into the ground (that's the main thread crash), the other two crashes are timers and proxies. I've left a bit of data from the network i/o thread because it's amusing.

Roughly, one of nsHtml5TreeOpExecutor::RunScript / nsScriptLoader::StartLoad needs to be responsible for not trying to do work when there isn't enough space to do work.
Component: General → HTML: Parser
Keywords: crash
QA Contact: general → parser
Summary: [Win64] OOM Crash at 1.5 GB [@ mozalloc_abort(char const*) ] → [Win64] OOM Crash at 1.5 GB [@ mozalloc_abort(char const*) ] [@ mozalloc_handle_oom ]
Yeah well, the question is why isn't there enough space? My system has free memory, it hasn't reached the 2GB barrier either so it's some artificial internal limitation? If so, shouldn't it try to free up memory (caches or whatever) first and wait until garbage collection is finished?
We don't have an artificial internal limitation here that I know of...  Henri, is it possible we're trying to perform a 0.5GB allocation here?  Or that we're pretty badly fragmented?
Ah, i noticed that you were talking about the new HTML5 parser. I'll see if disabling it works as workaround.
I tried with html5.enable = false and it still crashes, i'll see if i can produce another debug log.
Ok, new crash log, this time without the HTML5 parser. It crashes somewhere between 1.4 and 1.5GB memory usage again.
If this helps: I'm reloading a session from an intact cache, which means it populates all tabs from the cache in rapid succession since most HTTP responses are 304 Not Modified.
(In reply to comment #5)
> Roughly, one of nsHtml5TreeOpExecutor::RunScript / nsScriptLoader::StartLoad
> needs to be responsible for not trying to do work when there isn't enough space
> to do work.

The plan for infallible malloc was that when memory is low, the allocator code would take necessary measures including "stopping sinks". The HTML5 parser project was planned from the start with the assumption that infallible malloc would be done first.

Now we have a situation where malloc doesn't return null but malloc doesn't take the necessary low-memory measures. That's unfortunate, but since you can hit OOM anywhere, it doesn't really make sense to play whack-a-mole and add low-memory checks all over the place. That would defeat the point of the infallible malloc project.

The proper fix is to follow through with the original plan for infallible malloc and to make the infallible malloc stop page loads, run GC, etc. when it runs low on memory.

(In reply to comment #7)
> We don't have an artificial internal limitation here that I know of...  Henri,
> is it possible we're trying to perform a 0.5GB allocation here?  Or that we're
> pretty badly fragmented?

Not at the crash site, I think. However, before the crash, parsing can allocate arbitrarily much if attribute or element names, attribute values, text nodes, comments or doctype public and system ids are arbitrarily long.

FWIW, my previous attempts to hard-code an arbitrary site limit broke stuff. mozalloc_abort crashes aren't exploitable to run arbitrary code, so I think we shouldn't hard-code magic limits in the parser. A practical problem is authors intentionally packing long stuff in attribute values.

One problem is, though, that WOW64 users might expect to be able to use all their installed RAM but a 32-bit Firefox build only has 2 GB to work with.
(In reply to comment #11)
> One problem is, though, that WOW64 users might expect to be able to use all
> their installed RAM but a 32-bit Firefox build only has 2 GB to work with.

The thing is that my firefox instance OOM-crashes before reaching the 2GB mark. It usually happens around the 1.5GB mark, +-100MB. That's why i asked if there was some artificial limit.
I experienced a crash with the signature [@ mozalloc_handle_oom ] on Mac OS X 10.6.4 (64-bit). I am, however, using the 32-bit version of Firefox (4.0b4), because the 64-bit version doesn't seem to play well with Flash.

Here's the crash report:
bp-16468e3f-5a96-42fa-b2e5-44d072100903

I also tend to have a number of tabs open, with Firefox taking up a lot of memory. However, recently, Firefox has also been using 100% CPU when I wake the computer up, and the only way to stop it from doing that is to kill it and restart it. I wonder if this is the same problem, and Firefox crashed instead?

For the record, I have 4 GB total RAM, but I don't know how much of it Firefox was using before it crashed. (1.5 GB sounds about right, though.)

Also, as can be seen in the crash report, the most recent non-libmozalloc signature was nsHtml5TreeBuilder::createElement, with nsHtml5TreeBuilder::appendToCurrentNodeAndPushFormattingElementMayFoster before it.
Not planning on addressing safe (as in not allowing arbitrary code execution) HTML parser OOM crashes for Firefox 4.0.
Priority: -- → P4
(In reply to comment #14)
> Not planning on addressing safe (as in not allowing arbitrary code execution)
> HTML parser OOM crashes for Firefox 4.0.

This isn't just about the HTML parser. The crash-at-1.5GB issue interests me more, i have plenty of ram in my system, so i'd prefer firefox to actually make use of it rather than crashing.
Well, i updated to FF4b5 now and my workaround stopped working (i.e. it's using even more memory before).

Firefox now crashes every single time in the process of restoring my session. Even in safe-mode or by transplanting sessionrestore.js to a new profile.
In light of Comment 14 (basically "WONTFIX for HTML parser, for now"), I'm moving this to the jemalloc component, in the hopes that that'll get answers/investigation about why we're hitting OOM at 1.5GB when more RAM and address-space is available. (That's what the reporter is primarily concerned with, per comment 12 & comment 15.)
Component: HTML: Parser → jemalloc
QA Contact: parser → jemalloc
Severity: critical → normal
Priority: P4 → --
As suggested by someone on IRC suggested i've also tried this in safe-mode (now with beta5), same issue.
I'm getting this repeatedly on Mac OS X—so often that I think it may be related to (or the same as) bug 593400.

My most recent two crashes had to do with NewURI and nsIOService::NewURI. Both of these crashes also occurred while the computer was asleep (and I was AFK):

bp-146f7a20-d530-4076-9a15-dd82a2100907
bp-ea23ae4b-97a2-4ac0-9e93-5006d2100908

This one also occurred while the computer was asleep, but it has a signature of [@ JSC::X86Assembler::pop_r ] and may be unrelated:

bp-69a7b5e5-3f22-40cb-a3fb-f79832100905
OS: Windows 7 → All
After some investigating it seems that firefox is not linked with /largeaddressaware, which limits the virtual address space to 2GB under windows. Some additional address space is lost to linked libraries, which would explain the ~1.5GB limitation, especially if firefox's allocator expects a contiguous block of memory.

So, fixing bug 556382 would probably solve my issue.
Depends on: 556382
Depends on: 598466
as discussed in bug 556382 comments 10, 11 and 17 i filed an additional bug regarding the increased memory usage between FF3 and FF4.

I'm requesting blocker status on this bug instead, which can be solved by fixing either the virtual address space shortage for 32bit builds or decreasing the actual memory consumption.
blocking2.0: --- → ?
This bug isn't very useful, bug 556382 and bug 598466 are better places to deal with both sets of problems.
Status: NEW → RESOLVED
blocking2.0: ? → ---
Closed: 9 years ago
Resolution: --- → INCOMPLETE
Crash Signature: [@ mozalloc_abort(char const*) ] [@ mozalloc_handle_oom ]
You need to log in before you can comment on or make changes to this bug.