Closed
Bug 517010
Opened 16 years ago
Closed 16 years ago
save a minidump per unique topcrash signature in order to investigate
Categories
(Socorro :: General, task, P2)
Socorro
General
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: ted, Unassigned)
References
Details
(Keywords: topcrash, Whiteboard: [notacrash][crashkill][crashkill-metrics])
We often get crashes that are really hard to figure out, and the stack trace isn't enough info, or doesn't look accurate. If we could preserve one or more minidumps for crashes in the topcrash list, and make them available to developers, they could get more info out of the minidump than is currently available, since the minidump contains the raw stack contents and register state.
Comment 2•16 years ago
|
||
This is mainly for Windows where you can load a minidump directly into a debugger, right? As far as I know there aren't any tools to do much with minidump files on Linux/Mac.
Comment 3•16 years ago
|
||
Also, the minidump will need to be protected so that only trusted people may access it, since it may contain private information.
| Reporter | ||
Comment 4•16 years ago
|
||
Breakpad has a "md2core" util for Linux nowadays:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/tools/linux/md2core/minidump-2-core.cc
I bet that could be adapted for Mac. But yes, we should only allow this for authorized users.
Where do I sign up?!
Comment 6•16 years ago
|
||
This would be very useful for figuring out what's going on with the crashes here:
http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.5.3&query_search=signature&query_type=exact&query=&date=&range_value=1&range_unit=weeks&do_query=1&signature=%400x0
All three of the top crashes don't have great stacks and this could really help us figure out what's going on here.
| Reporter | ||
Comment 7•16 years ago
|
||
(In reply to comment #2)
> This is mainly for Windows where you can load a minidump directly into a
> debugger, right? As far as I know there aren't any tools to do much with
> minidump files on Linux/Mac.
To reply again, in the worst case you could run Breakpad's "minidump_dump" utility, which will show you the raw contents of each entry in the minidump. This way you could get the raw stack contents + starting registers for each thread and unwind by hand.
Comment 8•16 years ago
|
||
Thinking about implementation of this:
Since the top crashes report is technically about things that happened in the past, it's too late to save a minidump. So we'd have to wait for a new crash that has the same characteristics as a crash that appears in the top crashes by signature report.
I'm imagining a cron job that runs daily and looks at the top crashes by signature table. For the top N signatures, it saves the product, version, os and signature in a "bounty" table. Meanwhile, the processors crank through crashes, but before they stamp a crash as completed, they cross check to "see if the sheriff has posted a bounty" for that crash. If so, they tag the crash and mark the entry in the bounty table as "apprehended" (probably by just saving the uuid). Some time later, the monitor, acting in it's normal capacity in file system cleanup (the undertaker), saves (buries) tagged crashes in a special file system location (boot hill) for later analysis (exhumation).
Sorry about the metaphor, I've lately been reflecting on my tendencies to be a cowboy programmer.
How critical is the need for this feature? Should drop everything and get something working for next week or mark it as a 09Q4 or 10Q1 goal?
There is a manual alternative that could be done in a really critical case. Monitor can be configured to save completed minidumps rather than delete them. Perhaps we could just save them for a day, and then using the UI, drill down until we can find a crash that is a top crasher from within the time period that we saved minidumps. Then make an IT request to fetch that uuid from the filesystem (I'd imagine that IT would hate this).
| Reporter | ||
Comment 9•16 years ago
|
||
(In reply to comment #8)
> Since the top crashes report is technically about things that happened in the
> past, it's too late to save a minidump. So we'd have to wait for a new crash
> that has the same characteristics as a crash that appears in the top crashes by
> signature report.
This is what I was thinking. Generally if a crash shows up on the topcrash list, we don't have a shortage of it, so it should work fine to grab a minidump from a later instance of it. Ideally we'd save a few minidumps per topcrash signature, as there may be a few distinct crashes per signature.
Comment 10•16 years ago
|
||
(In reply to comment #8)
> How critical is the need for this feature? Should drop everything and get
> something working for next week or mark it as a 09Q4 or 10Q1 goal?
Talking to johnath and beltzner it sounds like fixing these crashes is pretty important and is one the engineering goals for q4. So I think this at least needs to be done for early 09q4.
Updated•16 years ago
|
Flags: blocking1.9.2+
Priority: -- → P2
Summary: save a minidump per unique topcrash signature → save a minidump per unique topcrash signature [@ @0x0]
Comment 11•16 years ago
|
||
Also, is there something temporary we can do to get a least one of the minidumps for a crash at 0x0?
Comment 12•16 years ago
|
||
Perhaps this could be done more simply with a formalization of my "manual" alternative in Comment #8. As a policy, perhaps we should save the minidumps for all processed crashes for a configurable time period (24hrs?). Then offer an interface to download any minidump from that file system. That would be more filesystem storage intense, but could probably be implemented much faster than my other Comment #8 thoughts.
Offering access to that file system could be problematic. Since the minidumps and associated json files are raw, they contain "sensitive" information that could not be offered to the general public.
Comment 13•16 years ago
|
||
We're going to try get some of these minidumps temporarily in bug 518927
Comment 14•16 years ago
|
||
Yes, saving minidumps for 24 hours is likely to be good enough... even if we limit access to people with MoCo LDAP auth or inside one of the VPNs that's probably good enough for now.
| Reporter | ||
Comment 15•16 years ago
|
||
I filed this bug about the more general solution, since we're likely to hit crashes that are difficult to investigate again in the future. Let's not lose sight of that longer term goal.
Summary: save a minidump per unique topcrash signature [@ @0x0] → save a minidump per unique topcrash signature
Comment 16•16 years ago
|
||
An IT request has been submitted (Bug 519219) to save all raw minidump files for the next 24 hours. Once that is done and IT has figured out a way to get developers access to that file system, it will be possible to find the minidump if you know the UUID for the crash.
I'll post more on how to do this once IT responds to the save request.
Updated•16 years ago
|
Summary: save a minidump per unique topcrash signature → save a minidump per unique topcrash signature in order to investigate [@ 0x0]
Updated•16 years ago
|
Summary: save a minidump per unique topcrash signature in order to investigate [@ 0x0] → save a minidump per unique topcrash signature in order to investigate [@ @0x0]
Comment 17•16 years ago
|
||
Now that we have minidumps available, I tried to find one matching the 0x0 signature but I haven't had any luck yet.
Comment 18•16 years ago
|
||
I've found one: d6eddf4e-a71e-4efe-b796-924112090928
Comment 19•16 years ago
|
||
Now that we have a minidump to work with I'm going to track the work on the 0x0 crashes in bug 519616.
Summary: save a minidump per unique topcrash signature in order to investigate [@ @0x0] → save a minidump per unique topcrash signature in order to investigate
Updated•16 years ago
|
Whiteboard: [notacrash]
Comment 20•16 years ago
|
||
Sadly, I think we can ship without this. Or maybe I think we can't hold our ship schedule for this fix. Either way, blocking-, but very-very-very wanted+
Flags: wanted1.9.2+
Flags: blocking1.9.2-
Flags: blocking1.9.2+
Whiteboard: [notacrash] → [notacrash][crashkill]
| Reporter | ||
Comment 21•16 years ago
|
||
I actually think the fix in bug 523650 is probably sufficient. Developers have access to recent minidumps, and I haven't heard of any issues getting access to them lately.
Updated•16 years ago
|
Whiteboard: [notacrash][crashkill] → [notacrash][crashkill][crashkill-metrics]
| Reporter | ||
Comment 22•16 years ago
|
||
bug 466022 is actually even more awesome than this bug ever hoped to be.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → WONTFIX
| Assignee | ||
Updated•13 years ago
|
Component: Socorro → General
Product: Webtools → Socorro
You need to log in
before you can comment on or make changes to this bug.
Description
•