Closed Bug 527095 Opened 14 years ago Closed 8 months ago

Breakpad produces zero-byte or malformed dumps for some crashes [@ EMPTY: no crashing thread identified; corrupt dump]

Categories

(Toolkit :: Crash Reporting, defect)

1.9.1 Branch
x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1360392
Tracking Status
blocking2.0 --- -

People

(Reporter: cbook, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug, )

Details

Steps to reproduce:

Tried to generate crash report for the testcase from bug 526500 -> http://crash-stats.mozilla.com/report/index/e0d9138b-e18f-4857-b848-d33632091106 but failed to process:

/home/processor/stackwalk/bin/stackwalk.sh returned no header lines for reportid: 70229147; No thread was identified as the cause of the crash; No signature could be created because we do not know which thread crashed; /home/processor/stackwalk/bin/stackwalk.sh returned no frame lines for reportid: 70229147; /home/processor/stackwalk/bin/stackwalk.sh failed with return code 1 when processing dump e0d9138b-e18f-4857-b848-d33632091106
What platform was this on?
XP 3.5.5 Release Build
Ok. Odds are we're getting a malformed minidump here. On Windows we simply call the MinidumpWriteDump library function, so I don't think there's anything we can do until we get out-of-process minidump writing, which probably won't happen until e10s. Can you reproduce the crash, but before submitting the crash report, go to %APPDATA%/Mozilla/Firefox/Crash Reports/pending, grab the .dmp file and email it to me? I can verify that this is the problem.
Tomcat says he's getting an empty dump. Since we're using an OS API to generate the dump on Windows, there isn't anything we can fix here, short of switching to out-of-process dump generation, which should be a lot safer. You're welcome to keep this bug open, but it's not going to be very useful. I've seen other cases similar to this, and I think OOP dump generation is the only way we're going to fix them all.
Depends on: 423745
Summary: Breakpad fails to process crash report - no header lines for reportid: 70229147 → Breakpad produces zero-byte or malformed dumps for some crashes
xref bug 427446.
also, my zero byte crash which is in \pending doesn't show up in about:crashes.
I can sort of understand why, as it's a useless dump. but it's a little confusing to know I crashed and not see the dump there.
This seems to happen reliably when we run out of heap space. It makes debugging crashes caused by low/no memory difficult because we cannot identify or count the locations in the code that are causing the crash.
Yes, it's a known problem. The only real solution is going to be to switch to doing all minidump generation in a separate process (bug 587729).
Depends on: 587729
See also bug 615798, which might be another symptom of the problem(s) underlying our windows breakpad client.

This bug isn't specific enough on its own to be a blocker, but I'll nom this to track what was discussed in the OOM meeting today.

To summarize comments above in the context of other discussion today,
 - yes, we know how to pretty reliably generate minidumps from signal handlers in the face of head corruption, address space exhaustion, etc.
 - breakpad's linux impl does just that
 - on windows, breakpad uses the OS interface MiniDumpWriteDump(), which we obviously don't control
 - based on empirical evidence, MiniDumpWriteDump() is not particularly reliable when memory is in a bad state

To fix this, options include
 - something like bug 615798 comment 6
 - bug 587729 comment 0
 - write our own minidump generator for windows, following the principles of the linux one, but substituting analogous windows debugging APIs

Listed in increasing order of time+difficulty, in my approximate estimation (also in decreasing order of fragility).  Ted could comment more authoritatively.
blocking2.0: --- → ?
I think out of process minidump generation is the best (and least fragile) solution, and I had planned to implement it anyway, since it should fix this entire class of problems. I don't think writing our own minidump generator for Windows is a good use of time, since it's probably more effort than implementing OOP dump generation (we already have a bunch of code to support that, at least, and we're doing it for OOPP already).

Allocating a chunk of memory and freeing it before the exception handler tries to write a dump might be a decent stopgap solution to ship with Firefox 4. I think OOP crash reporting in general is going to be an invasive change, and I'd be worried about trying to wedge it in at this point.
Reserve-memory would help in OOM cases, but not in heap-corruption cases.
Based on my personal experiences with OOM crashes, I think bugs 427446 and 493779 are more important than this one. 427446 because "failed to submit report" after a crash makes for a poor user experience (especially if the user restores their session, and it then crashes and fails to submit again). 493779 because the stack traces likely won't do much good without the context of an OOM condition (who here would pick this out as an OOM crash? http://crash-stats.mozilla.com/report/index/bp-566759fe-da7a-43ac-8b66-a565b2101114  I know it is one because I submitted two dumps for it among my OOM crashes)


That said, I'll volunteer to run some tests to help estimate how much reserve memory would be needed. I'd use windbg to gather some memory details for each OOM crash, and then see if the minidump succeeds. Would the tail of '!address summary' provide enough info? Sample output:
-------------------- Type SUMMARY --------------------------
    TotSize (      KB)   Pct(Tots)  Usage
   4f5c7000 ( 1300252) : 41.33%   : <free>
    5805000 (   90132) : 02.87%   : MEM_IMAGE
    5eb2000 (   96968) : 03.08%   : MEM_MAPPED
   65372000 ( 1658312) : 52.72%   : MEM_PRIVATE

-------------------- State SUMMARY --------------------------
    TotSize (      KB)   Pct(Tots)  Usage
   2a654000 (  694608) : 22.08%   : MEM_COMMIT
   4f5c7000 ( 1300252) : 41.33%   : MEM_FREE
   463d5000 ( 1150804) : 36.58%   : MEM_RESERVE

Largest free region: Base 36b6e000 - Size 12522000 (300168 KB)
bug 427446 is going to be fixed for Firefox 4, but I don't think it's going to help much, since that will just let us submit empty dumps that aren't going to give us any info about the crash, just the fact that we crashed.

It would be good to figure out if we can do a memory reserve to work around this problem for Firefox 4. I think we should probably write an OOM test extension using js-ctypes so that we can reliably and predictably OOM, and then see if we can get that case to work first, and then we can test to see if it fixes the real-world cases you're seeing.
Also, OOM crashes in third-party code are always going to be a pain. We can probably provide some extra info if we fix bug 493779, but I don't know how much impact it's going to have.
What kind of percentage of crashes are we going to see this with? Need to get an idea of the impact before we can evaluate if this needs to block
It's essentially impossible to say right now. Fixing bug 589955 should give us that info.
blocking2.0: ? → -
Blocks: 610551
(In reply to Dave Townsend (:Mossop) from comment #16)
> What kind of percentage of crashes are we going to see this with? Need to
> get an idea of the impact before we can evaluate if this needs to block

6.54% of v11.0 crashes - EMPTY: no crashing thread identified; corrupt dump
https://crash-stats.mozilla.com/report/list?range_value=7&range_unit=days&date=2012-04-02&signature=EMPTY%3A%20no%20crashing%20thread%20identified%3B%20corrupt%20dump&version=Firefox%3A11.0
Summary: Breakpad produces zero-byte or malformed dumps for some crashes → Breakpad produces zero-byte or malformed dumps for some crashes [@ EMPTY: no crashing thread identified; corrupt dump]
Blocks: 507876
Lots more analysis happening in bug 837835.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → DUPLICATE

(In reply to (not currently active) Ted Mielczarek from comment #20)

Lots more analysis happening in bug 837835.

That bug got closed incomplete, so reopening this one.

Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
See Also: → tb-NoCrashReport

Closing in favor of bug 1360392 where we're tracking these crashes. We already know what issues are responsible for most of the empty minidumps and we'll progressively fix them in the oxidized writers. Several known issues leading to empty minidumps are already tracked in the minidump-writer crate issue tracker on GitHub.

Status: REOPENED → RESOLVED
Closed: 9 years ago8 months ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.