Closed Bug 610947 Opened 14 years ago Closed 14 years ago

Lots of Camino crash reports have "corrupt" crashing thread 0

Categories

(Socorro :: General, task)

x86
macOS
task
Not set
major

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: alqahira, Unassigned)

Details

I started noticing this on a few crashes after the release of Camino 2.0.5 a couple of weeks ago, and at the time I thought it was just a particular crash and perhaps an anomaly with some of those reports.

Over time, it's gotten progressively worse :(

My next thought was maybe the 2.0.5 symbols had gotten corrupted somehow, but they looked OK, and now I've also been able to find current/recent crashes in older Camino versions like 2.0.4 and 2.0.3 that are now displaying this anomaly.

So my guess is that something changed/broke on the Socorro end a few weeks ago that started injecting bogus frames into many, but not all, crash reports (maybe one or more processors is going crazy?).  

At this point I'm not sure what reports I can trust to be accurate (particularly for crashes I've not seen before) and which might be partly or wholly bogus.

Typically in thread 0 below "main" there should be two Camino@hex frames (these are start and _start in a Mac OS X crash report) and an optional "garbage" @0x1-type frame that has appeared in some crashes but not all since the dawn of Camino-Socorro interaction.

Instead, often crashes now have a frame from something related to our transient bar class, or some other wrong method, below those "end-of-thread" frames:

https://crash-stats.mozilla.com/report/index/16e3c7e9-0ca1-4f94-b0d6-70de82101105 (Nov 5 crash for 2.0.5; frame 40 doesn't belong)

https://crash-stats.mozilla.com/report/index/72419456-cfb4-4583-ba85-720ac2101109 (Nov 9 crash for 2.0.5; frame 28 is wrong)

https://crash-stats.mozilla.com/report/index/e3e80eac-d8d4-4dad-a30a-912192101109 (Nov 9 crash for 2.0.5; frame 37 is wrong)

https://crash-stats.mozilla.com/report/index/b0e39032-e028-496b-9c55-837052101101 (Nov 1 crash for 2.0.5; neither frame 31 nor 32 belong)

https://crash-stats.mozilla.com/report/index/6ed59f47-e8fd-4ca6-9f0a-63d152101026 (Oct 26 crash for 2.0.4; neither frame 39 or 40 belong, but I also don't trust thread 0 there in general)

https://crash-stats.mozilla.com/report/index/ff57da58-feb7-44fb-bb75-d13c92101023 (Oct 23 crash for 2.0.3[!]; frames 31-36 look like they've been appended from several other places)

Not sure where this problem might be and who might be able to look, but CCing Ted and Lars with the hope they can at least point the right people at this.
This is probably just fallout from bug 601312 ( http://breakpad.appspot.com/215001/show ). I made it so the stackwalker will try even harder to scan for a return address, so sometimes it's going to produce false positives. If it's only ever occurring at the end of the stack it seems pretty harmless. (Aside from mild confusion, it's not like it's producing incorrect signatures or anything, right?)
(In reply to comment #1)
> This is probably just fallout from bug 601312 (
> http://breakpad.appspot.com/215001/show ). I made it so the stackwalker will
> try even harder to scan for a return address, so sometimes it's going to
> produce false positives.

There shouldn't even be frames there, though.  Where is it inventing those frames from?

I can't see real raw dumps, only the processed ones in the "Raw Dump" tab, so I can't tell for sure, but we've never had anything below the two unresymbolized frames that represent start and _start, aside from the one raw hex in some crashes, before, and now we're getting 1 or more new frames below that.  

For comment 1's explanation to be true, we'd have had to have been getting crashes with several garbage hex frames below "start", and Socorro would have had to have been not showing those garbage frames at all in the web UI, right?  Otherwise, where is the stackwalker finding information that it's now attempting to convert into frames in thread 0?

> If it's only ever occurring at the end of the stack it
> seems pretty harmless. (Aside from mild confusion, it's not like it's producing
> incorrect signatures or anything, right?)

As I mentioned in comment 0, I don't trust the thread 0 in https://crash-stats.mozilla.com/report/index/6ed59f47-e8fd-4ca6-9f0a-63d152101026 at all.  I haven't had a lot of time lately to go through reports extensively (and find other things that seemed awry), so I'm not sure if there are places where there are incorrect signatures (if there are, it doesn't seem common, at least).
Can you tie this down to a timeline?  We haven't made a lot of changes - since September we've only done two minor updates, and these barely touched processing.  

How do you know those frames are bogus?  

Would getting access to the "true" raw dumps help?
(In reply to comment #2)
> (In reply to comment #1)
> > This is probably just fallout from bug 601312 (
> > http://breakpad.appspot.com/215001/show ). I made it so the stackwalker will
> > try even harder to scan for a return address, so sometimes it's going to
> > produce false positives.
> 
> There shouldn't even be frames there, though.  Where is it inventing those
> frames from?

It's just finding data on the stack that happens to be the address of a function, and deciding that that's the return value.

> For comment 1's explanation to be true, we'd have had to have been getting
> crashes with several garbage hex frames below "start", and Socorro would have
> had to have been not showing those garbage frames at all in the web UI, right? 

No, the stack would have ended because the stackwalker would have given up.

> Otherwise, where is the stackwalker finding information that it's now
> attempting to convert into frames in thread 0?

In the raw stack memory, which isn't displayed anywhere in Socorro. Socorro only gets the output of the stackwalker.
(In reply to comment #3)
> Can you tie this down to a timeline?  We haven't made a lot of changes - since
> September we've only done two minor updates, and these barely touched
> processing.

I first noticed it on the 26th, when I did my weekly pre-meeting crash report (the previous sweep would have been on the 19th, and I didn't notice anything awry then).

> How do you know those frames are bogus?  

We can't have a very specific browser UI feature method being called before "start" ;)

> Would getting access to the "true" raw dumps help?

I don't know what version of the dump gets stored, so I don't know.

However, the timeline and Ted's explanations make bug 601312/bug 605798/comment 1 seem like the likely source of these extra frames.
I don't think this is a particularly bad problem. At most it's mildly confusing, but it should never give a crash an incorrect signature or anything bad like that.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.