601312 - Useless crash reports on OS 10.6/x86-64 when crashing in method jit [@ JaegerShot ]

Reporter

Description

•

14 years ago

I crashed twice today and once yesterday. The reports are: http://crash-stats.mozilla.com/report/index/bp-e4c18a23-f693-4979-a8bd-1c9f82101001 http://crash-stats.mozilla.com/report/index/bp-38135e53-55d3-4351-a560-ea4f72101001 http://crash-stats.mozilla.com/report/index/bp-c4ea3590-86a5-4b2e-aba8-5234f2100930 All three have no stack, and say: "No proper signature could be created because no good data for the crashing thread (0) was found" in the "processor notes". A few more crashes like that were mentioned on irc today. I just queried the crash CSV for 2010-09-30, and of the 48 64-bit Mac crashes it has, 35 look like this. The other 13 look like plausible crashes.

Gordon P. Hemsley [:GPHemsley]

Comment 1

•

14 years ago

I've had this problem, too. (I mentioned them on IRC. :P) Crashed three times today: http://crash-stats.mozilla.com/report/index/76903d35-8b5b-4766-981b-511182101001 http://crash-stats.mozilla.com/report/index/bp-7e54763b-2937-4133-9999-c32682101001 http://crash-stats.mozilla.com/report/index/bp-cbd899fd-0bac-4000-9127-ea1bb2101001 But I also had a similar crash on Sept. 27th: http://crash-stats.mozilla.com/report/index/bp-312b6423-63e8-486a-8a0e-819eb2100927 I had another crash on the 27th that created a signature just fine: http://crash-stats.mozilla.com/report/index/bp-6a4f63c4-db70-4e59-af3c-425392100927 Note that in between those two crashes on the 27th, the build ID changed. (I must have updated.) The good signature occurred in 20100926031017 and the bad signature showed up in 20100926215019. I believe those build IDs coincide with the 64-bit switch, as bz says, but I offer the hard evidence here, just in case it ever comes into question. :)

(not currently active) Ted Mielczarek

Assignee

Comment 2

•

14 years ago

See bug 600412 comment 2 for the rationale. I've already fixed this upstream in Breakpad, it just needs to get rolled out to our production server (bug 601114).

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → DUPLICATE

Boris Zbarsky [:bzbarsky]

Reporter

Comment 3

•

14 years ago

I'm reopening this. I just crashed three times in a row in a span of a few minutes, and every single one of the crashes had this problem. Note that these were in reasonably current nightlies: http://crash-stats.mozilla.com/report/index/bp-c44b0682-d1d2-42aa-9769-2275b2101012 http://crash-stats.mozilla.com/report/index/bp-ba57e2e0-b1aa-4730-a913-f893d2101012 http://crash-stats.mozilla.com/report/index/bp-5407c8bc-d471-49c9-b46f-2c4d22101012 I'm running the nightly under a debugger now, since I can't rely on breakpad, but it hasn't crashed yet....

Status: RESOLVED → REOPENED

blocking2.0: --- → ?

Resolution: DUPLICATE → ---

Boris Zbarsky [:bzbarsky]

Reporter

Comment 4

•

14 years ago

Er, that third one has a stack now (didn't when I filed). And I caught the most recent crash (http://crash-stats.mozilla.com/report/index/bp-3e67135b-9038-47e3-87d3-f516d2101012) in gdb, and _that_ couldn't go up the stack either, past saying that I was called from JaegerShot. So maybe the real issue here is that mjit is just really screwing up. But it'd be nice if breakpad got the JaegerShot part....

(not currently active) Ted Mielczarek

Assignee

Comment 5

•

14 years ago

I agree. What do the stack bytes look like? The Breakpad stackwalker will only scan back 15 words of stack memory while looking for a return address: http://code.google.com/p/google-breakpad/source/browse/trunk/src/google_breakpad/processor/stackwalker.h#120

Summary: Lots of useless crash reports on OS 10.6 after 64-bit switch → Useless crash reports on OS 10.6/x86-64 when crashing in method jit [@ JaegerShot ]

(not currently active) Ted Mielczarek

Assignee

Updated

•

14 years ago

Status: REOPENED → NEW

Boris Zbarsky [:bzbarsky]

Reporter

Comment 6

•

14 years ago

> What do the stack bytes look like? How do I get that information?

(not currently active) Ted Mielczarek

Assignee

Comment 7

•

14 years ago

In gdb, type "info frame" to find the stack address (Stack level X, frame at <address>:) then x/15xg <address> will show you all the stack memory that Breakpad's stackwalker would scan (in 8-byte words). If you don't see the address of the caller frame in there, then that's why Breakpad can't find it.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 8

•

14 years ago

Thanks. I'll see if I can get this to happen in gdb again....

Boris Zbarsky [:bzbarsky]

Reporter

Comment 9

•

14 years ago

Looks like |x/15xg <address> - 15*8| is the right thing (stack grows down). And <address> should be $rsp, which doesn't necessarily match the thing info frame says. Within those constraints, looks like the return address was 18 words from $rsp.

(not currently active) Ted Mielczarek

Assignee

Comment 10

•

14 years ago

Okay, so the simpler fix is probably just to bump that constant from 15 to 20 or so. However, if methodjit's stack frame grows by 2 or 3 words, that means we'd be back in this situation. I'm open to other ideas on how to improve Breakpad/JIT integration. We could, for example, have it provide a frame pointer on x86-64 and teach the stack walker how to look for valid frame pointers and follow them.

Benjamin Smedberg

Comment 11

•

14 years ago

I think we should do a client stackwalk and save the eh-pointer memory areas on x86-64. I really don't think that blocks, though.

blocking2.0: ? → -

Boris Zbarsky [:bzbarsky]

Reporter

Comment 12

•

14 years ago

Well, the only question is how many such crashes we have now and whether we're properly collating them to get an idea of how crashy mjit is.... and whether we need to spend time de-crashifying it.

(not currently active) Ted Mielczarek

Assignee

Comment 13

•

14 years ago

bsmedberg: will that help here? Does methodjit write out eh_frame data? (I honestly have no idea.)

Benjamin Smedberg

Comment 14

•

14 years ago

Probably not, but the EH frame pointer should be pointing at the compiled caller and we'll pick that up.

Jim Blandy :jimb

Comment 15

•

14 years ago

The constant regulating how far Breakpad tries to scan should be set so that it's a good 2x or 3x the size of the typical stack frame. Increase that puppy, I say.

(not currently active) Ted Mielczarek

Assignee

Comment 16

•

14 years ago

I'm going to bump this constant up, but I also talked to Jim about a possible API for letting our JITs inform Breakpad about JIT code pages and functions that call into them so it can have more useful data instead of fumbling around on the stack.

Assignee: nobody → ted.mielczarek

(not currently active) Ted Mielczarek

Assignee

Comment 17

•

14 years ago

Upstream patch: http://breakpad.appspot.com/215001

(not currently active) Ted Mielczarek

Assignee

Comment 18

•

14 years ago

I filed bug 604725 on the API proposal.

Mike Beltzner [:beltzner, not reading bugmail]

Comment 19

•

14 years ago

(In reply to comment #11) > I think we should do a client stackwalk and save the eh-pointer memory areas on > x86-64. I really don't think that blocks, though. Renominating. AIUI, this bug means that we're going to see a bunch of meaningless crash reports, doesn't it?

blocking2.0: - → ?

(not currently active) Ted Mielczarek

Assignee

Updated

•

14 years ago

Depends on: 605798

(not currently active) Ted Mielczarek

Assignee

Comment 20

•

14 years ago

I landed an upstream fix that should fix a lot of these cases (by doubling the size of the stack area we search, as suggested by jimb): http://code.google.com/p/google-breakpad/source/detail?r=715 I just filed bug 605798 to get the production Socorro copy updated.

(not currently active) Ted Mielczarek

Assignee

Comment 21

•

14 years ago

Ok, this is live in staging, and it seems to do the trick for some of these crashes, anyway. Compare: http://crash-stats.stage.mozilla.com/report/index/8dc72891-cb8d-49c9-a456-0d41f2101020 which is the same exact dump as: https://crash-stats.mozilla.com/report/index/1aeae141-c5d0-4a74-b4be-c17a72101020 resubmitted to production with the updated minidump_stackwalk. Is that a sane stack? I don't know, it looks kind of crazy to me, but I don't really know anything about Jaeger. It does eventually wind its way down to the event loop, so I guess it's not completely off the rails.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 22

•

14 years ago

> Is that a sane stack? For something like gmail or a site using jquery or prototype, yes. That long call() cascade is what your typical api entry point's implementation looks like for "modern" js libraries...

(not currently active) Ted Mielczarek

Assignee

Comment 23

•

14 years ago

Should be fixed in production now.

Status: NEW → RESOLVED

Closed: 14 years ago → 14 years ago

Resolution: --- → FIXED

Dave Townsend [:mossop]

Updated

•

14 years ago

blocking2.0: ? → betaN+