When we crash because we're out of memory, we can't reliably create a backtrace because the backtracer library can't allocate any memory. This results in us creating an empty minidump, which we then don't send to breakpad - it's lost forever. In order to let us get some idea of how bad our memory issues are, we should always submit these empty reports, and Breakpad should mark down as much information as it has about the machine (OS, build id, etc).
We should really get this done for Beta 5, especially considering that we now trap exceptions on 64-bit OSes, many of which can be out of memory.
If we write out an empty minidump (which is currently the case in these conditions), then we do try to submit it, we just fail because the Windows sender code doesn't cope with zero byte files (filed as bug 427446). However, Socorro currently relies on the minidump to provide the OS information, so the resulting crash report would not be very useful. If you want to add more information, we probably shouldn't do so in the crashed process, since it's in a really bad state if we fail like this. We could make the crash reporter client figure this out and add some information, I suppose. Socorro would need to be modified to know to use this additional information instead of relying on the minidump.
Presuming we can't get it done for b5 now
I don't think this is worthwhile blocking Firefox 4 on (or fixing at all, actually). We get empty crash reports on OS X/Linux, and the only thing they tell us is "we have problems". This isn't going to help us actually fix the problems. The main impetus for this bug was the D2D memory leak, which was fixed in bug 589809, so I don't think we're any worse off now than we were in previous versions. The right fix for all of this is bug 587729, which is probably too much effort/too risky to get into FF4 now, but I'd like to try to get it done for 4.next.
How much memory are we talking here? Would it be feasible to preallocate some or all of it at startup, so we can guarantee success (unless we OOM on startup!) at crash time?
Alternatively, we could fire the XPCOM memory pressure topic, and cross our fingers that it gives us back enough. (And if not, go make more services aware of it.)
No, we are not going to enter XPCOM from within a crash handler. AFAICT, the problem here is that we don't know exactly who is allocating the memory, so a reserve is very difficult to manage properly.
Yeah, nevermind my comment about XPCOM. That didn't exactly make sense. :/
The Linux and OS X dump generators are careful not to allocate memory. On Windows, we call DbgHelp!MinidumpWriteDump, which is apparently not as careful. I have no idea what kind of memory it allocates or how much, though.
We could just preallocate 1 or 2 MB, then free it before writing minidumps.
I'm moving this off beta6, which joe agrees with. I'd sort of like to not block at all - but Joe's gonna work on articulating why "knowing we crashed without any data about where or why" matters enough to justify that block.
That is certainly news to me, but I can do that! Right now we have absolutely no information about crashes in which we fail to generate a crash report. We don't know whether we hit OOM conditions (presuming that is the only time we generate empty crash reports) all the time, some of the time, or almost never. We also don't know whether any changes we have made make it worse or better. This is a sorry state of affairs, and we shouldn't have to deal with it!
Indeed -- the reason is that with the data that we get right now, we might see, let's say, a 2% daily crash rate out of 1000 users. But the /real/ rate could be like 10%. We'd behave very differently if that was the case, including doing things like prioritizing actually collecting more data here (e.g. out-of-process minidump writing and similar). But right now whatever that delta is is just entirely lost, we never see it.
Okay. I can fix this (I believe the only actual bug here is that the submission code fails to handle zero byte minidumps on Windows), but I really think it's not going to be as useful as you think it is. (Take a look at all the (null signature) crashes we have on OS X, and the lack of progress there.)
While I understand comment 14, I do think that comment 13 wins the day. We need to know the overall crashiness, as it will be a metric on which we base our release readiness.
Patch up for review upstream: http://breakpad.appspot.com/243001
Patch landed upstream, will land in m-c shortly: http://code.google.com/p/google-breakpad/source/detail?r=743
Pushed to m-c: http://hg.mozilla.org/mozilla-central/rev/68529f865a6e There's still a Socorro bug that makes it impossible to view individual reports from zero-byte minidumps (bug 607810), but they ought to show up in topcrash reports.