Closed Bug 1549717 Opened 6 years ago Closed 6 years ago

minidump-stackwalk fails assertion and crashes

Categories

(Socorro :: Processor, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

References

Details

Attachments

(1 file)

+++ This bug was initially created as a clone of Bug #1549201 +++

minidump-stackwalk is segfaulting in the processor container. It creates a core file and the core files are accumulating which leads to a full disk which then leads to processing crash reports incorrectly. The latter part is covered in bug #1549201.

This bug covers figuring out why it's segfaulting. It could be multiple reasons.

Pulling over relevant comments.

comment #3: Gabriele Svelto [:gsvelto] (https://bugzilla.mozilla.org/show_bug.cgi?id=1549201#c3)

Will, if you can provide me with on of the minidumps that are causing minidump-stackwalk to segfault I can have a look.

comment #5: Will Kahn-Greene [:willkg] (https://bugzilla.mozilla.org/show_bug.cgi?id=1549201#c5)

I found a crash report that reproduces the issue and it dumps core in the local dev environment docker container. I uploaded it via google drive with the crash id.

That gives you the core dump we're getting and the mini dumps.

minidump-stackwalk is crashing with both the new breakpad client (4d550cceca107f36c4bc1ea1126b7d32cc50f424) as well as the previous one we had (44384d80b32a5bb361c2ec3bee667f7ccee566d7).

Gabriele: Any help would be appreciated!

Current status is that Gabriele needs more crash reports to look at. Bug #1549540 covers fixing BreakpadStackwalkerRule2015 to make that easier. Making this bug block on that one.

Once I get that done and pushed to prod, I can build a list.

Assignee: nobody → willkg
Status: NEW → ASSIGNED
Type: task → defect
Depends on: 1549540
No longer depends on: 1549201
Summary: minidump-stackwalk is segfaulting → minidump-stackwalk is fails assertion and crashes

Updating this bug with bits that have been bandied about on IRC because otherwise I'll forget where I wrote it down.

I pushed out a fix to BreakpadStackwalkerRule2015 where it now adds "SIGABRT" to the processor notes making it possible for us to craft a supersearch that lists the crash reports that minidump-stackwalk failed assertions on.

John reprocessed all the crash reports with "EMPTY" in the signature from the last week. Now we have all the crash reports in the last week that kicked up a SIGABRT:

https://crash-stats.mozilla.com/search/?processor_notes=~SIGABRT&date=%3E%3D2019-05-01T00%3A00%3A00.000Z&date=%3C2019-05-08T23%3A59%3A00.000Z&_facets=signature&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#crash-reports

I ran one of them through minidump-stackwalk and got this output:

blahblahblah
2019-05-08 16:38:04: basic_code_modules.cc:110: INFO: No module at 0x0
2019-05-08 16:38:04: minidump_processor.cc:326: INFO: Processed ./crashdata_mdsw_tmp/v1/dump/5c1c41de-9868-4b65-884c-55a1a0190508
2019-05-08 16:38:04: minidump.cc:5876: INFO: GetStream: type 16 not present
2019-05-08 16:38:04: minidump.cc:5876: INFO: GetStream: type 15 not present
terminate called after throwing an instance of 'std::out_of_range'
  what():  stoi
timeout: the monitored command dumped core
./scripts/run_mdsw.sh: line 46:    14 Aborted                 timeout -s KILL 600 "${STACKWALKER}" --raw-json $RAWCRASHFILE --symbols-url "https://s3-us-west-2.amazonaws.com/org.mozilla.crash-stats.symbols-public/v1" --symbols-cache /tmp/symbols/cache --symbols-tmp /tmp/symbols/tmp ${DATADIR}/v1/dump/$1

Grabriele said this:

willkg: that seems like it's failing to parse an integer, so it's not asserting per-se but it's the C++ runtime that's throwing an exception

I opened up bug #1550043 to improve building and debugging minidump-stackwalk in a Docker container and improve the docs. I'm working on that today and tomorrow. That'll make all of this a lot easier.

I threw together a rough minidump-stackwalk build/debug environment in bug #1550043 and landed that. Rough docs are here:

https://socorro.readthedocs.io/en/latest/stackwalk.html#getting-a-build-debug-shell-for-minidump-stackwalk

With that, I did some poking around. It's dying on this line:

https://github.com/mozilla-services/socorro/blob/be1ca53e40c1360f2dd7ff9cb42c59d7d86d80d4/minidump-stackwalk/stackwalker.cc#L893

I added some logging and the value there is "0xffffffff" in the case where it's failing.

I think that's a long (my C++ days are 20 years ago), so it's failing in stoi. We could add handling for this case by adding a stol call if stoi fails. We could just drop any non-int data. I don't know what values make sense for ["system_info"]["cpu_microcode_version"].

Gabriele: What makes sense to do here? Handle longs as well or junk the data figuring it's unhelpful?

Flags: needinfo?(gsvelto)

I think the right solution here would be to use stoul() since the firmware field is a 32-bit unsigned integer. The exception is being thrown because stoi() expects a signed 32-bit integer and 0xffffffff falls out of the range.

Flags: needinfo?(gsvelto)

I changed the stoi to stoul and then ran that through 400 crash reports to see if there are other SIGABRT cases. There's one other:

stackwalker: stackwalker.cc:570: bool {anonymous}::ConvertStackToJSON(const google_breakpad::ProcessState&, const google_breakpad::CallStack*, Json::Value&, int, bool): Assertion `!frame->module->code_file().empty()' failed.

It's this line:

https://github.com/mozilla-services/socorro/blob/be1ca53e40c1360f2dd7ff9cb42c59d7d86d80d4/minidump-stackwalk/stackwalker.cc#L570

Here's a crash report that causes minidump-stackwalk to fail with the assertion error:

bp-2618428f-934f-42b2-9707-9899a0190502

I think I'm going to land the fix for the stoi -> stoul now, then look into this new one next week.

(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #6)

I changed the stoi to stoul and then ran that through 400 crash reports to see if there are other SIGABRT cases. There's one other:

stackwalker: stackwalker.cc:570: bool {anonymous}::ConvertStackToJSON(const google_breakpad::ProcessState&, const google_breakpad::CallStack*, Json::Value&, int, bool): Assertion `!frame->module->code_file().empty()' failed.

It's this line:

https://github.com/mozilla-services/socorro/blob/be1ca53e40c1360f2dd7ff9cb42c59d7d86d80d4/minidump-stackwalk/stackwalker.cc#L570

Here's a crash report that causes minidump-stackwalk to fail with the assertion error:

bp-2618428f-934f-42b2-9707-9899a0190502

That's an easy one: the code_file can be empty on Windows. I believe this happens on Windows' equivalent of Linux anonymous mappings. Removing the assertion should fix the issue unless there's something downstream that relies on each module having a name.

Bah--I had a collision and didn't realize it.

Gabriele: Does this look like what you were suggesting?

https://github.com/mozilla-services/socorro/pull/4936/files#diff-1b37274ea66591122c5f6d8a539bc189R570

I processed a couple of crash reports and it seems fine with the other processor rules and looks fine in the webapp.

Flags: needinfo?(gsvelto)

Yes, looking good. For reference the minidump-analyzer in mozilla-central generates similar JSON but more compact because we remove all redundant data, including the name:

https://searchfox.org/mozilla-central/rev/b9da45f63cb567244933c77b2c7e827a057d3f9b/toolkit/crashreporter/minidump-analyzer/minidump-analyzer.cpp#152-177

minidump_stackwalk on the other hand would just print out the empty string in this case:

https://searchfox.org/mozilla-central/rev/b9da45f63cb567244933c77b2c7e827a057d3f9b/toolkit/crashreporter/google-breakpad/src/processor/stackwalk_common.cc#269

Flags: needinfo?(gsvelto)

I had checked stackwalk_common.cc but hadn't checked the minidump-analyzer code. That's really helpful!

I should look at switching from our minidump-stackwalk to minidump-analyzer. I think I have it in my list of things to do, but there's no bug for it. I'll create one now.

Thank you!

Bah! I really thought we had one, but I didn't think it'd be outside of the Socorro product. I'll dupe my new one to that.

Oh wait! That's a different problem. That's to replace the other other stackwalker--not the one that Socorro uses. So many stackwalkers.

Oh you're right, that's the third one! Or fourth?

I landed this last week and after landing it, there were no more -6 exit codes on stage.

I just pushed this to prod in 377.

This fixes two different sigabrt issues. If there are others, we'll open up new bugs. Marking this one as FIXED.

Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Summary: minidump-stackwalk is fails assertion and crashes → minidump-stackwalk fails assertion and crashes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: