Closed Bug 633848 Opened 9 years ago Closed 6 years ago

Crash reporter not triggered; Firefox vanishes silently

Categories

(Toolkit :: Crash Reporting, defect, major)

x86
Windows 7
defect
Not set
major

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: mozilla, Unassigned)

Details

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Windows NT 6.1; rv:2.0b11) Gecko/20100101 Firefox/4.0b11
Build Identifier: Mozilla/5.0 (Windows NT 6.1; rv:2.0b11) Gecko/20100101 Firefox/4.0b11

I've been battling a mysterious bug (possibly in an extension) that causes Firefox (currently 4.0b11, though the same problem was visible in earlier builds) to start using 100% CPU core and the memory usage to swell. 

After a while of this happening, Firefox silently vanishes. No Windows crash dialog. No Mozilla crash reporter. This makes it a little hard to meaningfully file a bug report. 

Reproducible: Always

Steps to Reproduce:
1. Wait for CPU usage issue to happen. 
2. Leave Firefox sitting there, happily guzzling CPU/memory.
3. Watch it go away.
Actual Results:  
Nothing.

Expected Results:  
Crash reporter dialog.
Version: unspecified → Trunk
Okay, I followed those instructions. After a while, Firefox (4.0b11) was throwing exceptions and effectively died. I have no idea if it's this particular bug, something else, etc. 

I'll attach the log and hope it's helpful...
Well, this is wonderful. I must have made a mistake somewhere, because the log file just isn't there. Sorry guys...
Attached file WinDBG log file
Okay, so I managed to find the log file from a few days ago!!!!!!!!! I've attached it above.

The problem reproduced itself twice today while I wasn't running WinDbg - in both cases, I leave Firefox 4.0b11 running, and when I come back 8-10 hours later, it's just gone with no crash reporter. I've now updated this laptop to the current nightly, and intend to run it in WinDbg. If I can reproduce the problem I'll post a further log.
Attachment #513068 - Attachment mime type: application/octet-stream → text/plain
fwiw, you can use windbg /I to make windbg your jit, see http://www.codeproject.com/KB/debug/windbg_part1.aspx#_Toc64133667

http://www.codeproject.com/KB/debug/windbg_part1.aspx#_Toc64133667

unfortunately, all we can see is that you ran out of memory.

more specifically, the main thread was trying to GC (which is supposed to recover memory)
and another thread (5) failed to allocate memory while doing <something>. I think thread (6) was sending a timer event and it's possible that's what woke up thread 5 - not certain, but basically there wasn't any memory available for poor thread 5 and well, it killed itself.
unfortunately, the only interesting frame of thread 5 isn't really present, it's the caller of mozalloc_handle_oom, and for whatever reason, the frame there isn't really meaningful. We define not meaningful as a function whose offset is say >10000h:

> xul!nsGenericElement::cycleCollection::Traverse+0x39b70e <- this offset is way too big to actually be in the labeled function, thus we know very close to nothing.

#  5  Id: 137c.1e80 Suspend: 1 Teb: 7ffda000 Unfrozen
ChildEBP RetAddr  
02a9faf4 71e81a6c mozalloc!mozalloc_abort
02a9fafc 5466b2ee mozalloc!mozalloc_handle_oom(void)+0xa
02a9fcac 71ecb04b xul!nsGenericElement::cycleCollection::Traverse+0x39b70e
02a9fcc4 5445c15a nspr4!PR_WaitCondVar(struct PRCondVar * cvar = 0x542eb500, unsigned int timeout = 0x542ba1a0)+0x3b [e:\builds\moz2_slave\rel-cen-w32-bld\build\nsprpub\pr\src\threads\combined\prucv.c @ 547]
xul!mozilla::CondVar::Wait
xul!nsThread::ProcessNextEvent
xul!nsThread::ThreadFunc
nspr4!_PR_NativeRunThread(void * arg = 0x0052b280)
nspr4!pr_root
_callthreadstartex(void)
MOZCRT19!_threadstartex
kernel32!BaseThreadInitThunk+0xe

offhand, my guess is that if we're really out of memory, it's possible that the exception handler code won't be able to run successfully and would trigger a double fault and just die. kinda unfortunate. we could chase that side of the problem, i guess, that's actually probably "easier" than trying to figure out what's going wrong to cause the first fault.
I've got some guesses as to what is causing the first fault... it helps that I'm seeing this problem fairly consistently on a number of systems, which share a common set of extensions/app tabs. I had opened another bug for this first fault, but marked it closed when I (and others) thought (wrongly) that Xmarks was to blame and when it stopped happening for a day or two - I've disabled Xmarks and it started happening again.

On my x64 desktop, since Minefield is compiled /largeaddressaware , it seems to have enough memory available to it to not die before I, frustrated with the horrible performance, kill the process. On my 32-bit laptop and netbook, this crash happens after 4.0b11 (and previous betas) or Minefield sit idle for 6-8 hours. So SOMETHING that is running while it idles must be leaking memory.

Oh, I forgot to mention: on this laptop, there recently were a few crashes where the crash reporter was triggered in addition to the 'silent' crashes. Crash IDs:
bp-163d6e5e-5a84-41d4-bd51-c7da42110216 2/16/20111:47 PM
bp-ba0550d8-ce6a-4223-ab99-9fc0c2110212 2/12/201110:55 PM
bp-377ed79a-ced4-471c-94e7-6a91b2110211 2/11/20117:40 PM
When I tried to pull the crash reports, none of them could be found. Consistent with an out of memory state?

I intend to do some more experimenting, perhaps on another system with a clean profile. Add one app tab, go away for 12 hours, see what happens. My leading suspicion right now, based on... a few little things... is that Facebook's JavaScript is causing memory/CPU leaks over time, and that's leading to our first fault. If I can get the problem to reoccur with just Facebook open, I'll file another bug for that...
There's a crash-stats bug where it fails to display crash reports that were submitted as empty files. I thought it had been fixed at some point, but I guess not. If we OOM on Windows, often the minidump generation code produces a zero-byte file. It's possible that if you OOM badly enough, the whole process would fall down and be unable to even launch the crash reporter, although I've never personally seen that. bug 587729 should fix most of these issues.
Ah, it's bug 607810, which just recently got fixed and should wind up in production soon.
First fault is now bug 635121.
Component: General → Breakpad Integration
Product: Firefox → Toolkit
QA Contact: general → breakpad.integration
vivienm do  you still see this?
Flags: needinfo?(mozilla)
Whiteboard: [closeme 2013-08-15]
I don't think so? 

Then again, after the underlying cause of the crashes was identified (see 635121), I wouldn't have had an opportunity to reproduce this. 

Really, it's a 2.5 year old bug, it should probably be closed...
Flags: needinfo?(mozilla)
Status: UNCONFIRMED → RESOLVED
Closed: 6 years ago
Resolution: --- → INCOMPLETE
Whiteboard: [closeme 2013-08-15]
You need to log in before you can comment on or make changes to this bug.