Closed Bug 1744323 Opened 3 years ago Closed 3 years ago

analyze output difference between breakpad stackwalker and rust minidump-stackwalk

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

References

Details

Attachments

(2 files)

compare.sh 3 years ago Will Kahn-Greene [:willkg] ET needinfo? me 1.37 KB, application/x-shellscript		Details
dumpdiff.py 3 years ago Will Kahn-Greene [:willkg] ET needinfo? me 7.65 KB, text/x-python		Details

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Description

•

3 years ago

We've done some ad hoc analysis comparing the output of the two stackwalkers, but it's prudent to formalize that and produce a report.

This bug covers that.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 1

•

3 years ago

•

Edited

Methodology

I wrote a script that fetches the processed crash from stage and prod. I wrote another script that diffs the json_dump portion of the processed crash.

I looked at 100 crash reports for Firefox and 20 crash reports for Fenix. They were whatever supersearch returned--I didn't spend time thinking about categories of crash reports (versions, channels, platforms, etc).

Analysis

Differences between rust minidump-stackwalk and breakpad stackwalker:

order of modules and unloaded modules

breakpad stackwalker reorders modules and unloaded modules by address.

rust minidump-stackwalk emits module and unloaded module information as it appears in the minidump stream. Further, can be duplicates in the lists.

The duplicates thing is covered in issue 332: https://github.com/luser/rust-minidump/issues/332

This affects the "modules", "unloaded_modules", and "main_module" fields. Socorro lists modules and unloaded_modules in the report view intereface.

We don't need to fix this before we switch.
thread_index vs. threads_index

This is my fault for not noticing ages ago when Aria produced the schema.

This is tracked in issue 331: https://github.com/luser/rust-minidump/issues/331

This should get fixed before we switch.
null vs nothing

There are a bunch of fields that breakpad stackwalker doesn't emit at all if there's no value, whereas rust minidump-stackwalk will always emit the field even if it's null.
- mac_crash_info
- lsb_release
- crash_info.assertion
- system_info.cpu_microcode_version
- crashing_thread.last_error_value
- crashing_thread.thread_name
- crashing_thread.frames.N.line
- crashing_thread.frames.N.file
- crashing_thread.frames.N.module
- crashing_thread.frames.N.module_offset
- crashing_thread.frames.N.function_offset
- crashing_thread.frames.N.function
- threads.N.last_error_value
- threads.N.thread_name
- threads.N.frames.N.file
- threads.N.frames.N.function
- threads.N.frames.N.function_offset
- threads.N.frames.N.line
- threads.N.frames.N.module
- threads.N.frames.N.module_offset
- modules.N.cert_subject
- modules.N.symbol_url
- unloaded_modules.N.cert_subject
This is expected. I wrote up some bugs in Socorro to fix the webapp to handle the new null values correctly.
false vs. nothing

Similar to null vs. nothing, there are a couple of fields where breakpad stackwalker emits nothing, but rust minidump-stackwalk emits false.
- modules.N.loaded_symbols
- modules.N.corrupt_symbols
This is expected. I wrote up some bugs in Socorro to fix the webapp to handle the new null values correctly.
ignored fields

There were a set of fields that I ignored when comparing output.

breakpad and minidump-stackwalk have different values for this:
- stackwalk_version
minidump-stackwalk doesn't output these because we decided not to support these fields:
- tiny_block_size
- largest_free_vm_block
- write_combine_size
minidump-stackwalk changed this to frame_count to be more self-consistent:
- crashing_thread.total_frames
These fields are ephemeral and change between stackwalker runs:
- modules.N.symbol_disk_cache_hit
- modules.N.symbol_fetch_time
- modules.N.missing_symbols
- threads.N.frames.N.missing_symbols
- crashing_thread.frames.N.missing_symbols
This differs between minidump-stackwalk and stackwalker, but that's fine:
- modules.N.version
This is fine--there's nothing we need to do here.
stackwalking differences

rust minidump-stackwalk and breakpad stackwalker do some different things when unwinding the stack.

Issue 330 covers one instance: https://github.com/luser/rust-minidump/issues/330

In that, rust minidump-stackwalk and breakpad stackwalker do different things in a situation where the stackwalker needs to guess. Jeff said he prefers what rust minidump-stackwalk does.

Differences in stackwalking output can affect crash signatures, but otherwise stackwalking is and always has been a "best effort" thing.

I'm sure there will be cases where stackwalking is different and rust minidump-stackwalk is "worse".

We don't have any known differences that need to be fixed at this time. We can fix those cases as they come up.

Summary

This should get fixed:

https://github.com/luser/rust-minidump/issues/331

I don't think this needs to get fixed because it doesn't affect how Crash Stats works or what it displays, but it's probably good to figure out at some point:

https://github.com/luser/rust-minidump/issues/330

rust minidump-stackwalk is behind a feature flag, so we can switch back and forth between the stackwalkers. Further, because we're now capturing which stackwalker processed the minidump and that information is searchable, we can identify crash reports we want to reprocess. I think this makes moving forward pretty low risk.

Once issue 331 gets fixed, I don't think there are any outstanding output-related issues blocking moving forward.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 2

•

3 years ago

Attached file compare.sh — Details

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 3

•

3 years ago

Attached file dumpdiff.py — Details

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 4

•

3 years ago

Oops--issue 331 is fixed now. I just need to update Socorro.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 5

•

3 years ago

I landed all that.

Then over the weekend, I thought about looking at signature generation changes. We had a series of things we knew would affect signatures already--mostly differences in stackwalking. However, I didn't look at changes stemming from other differences between the two stackwalkers.

I grabbed a few hundred crash ids from stage from today (which uses rust-minidump minidump-stackwalk) and then compared the signatures between stage (rust-minidump minidump-stackwalk) and prod (breakpad stackwalker) and discovered some exciting things!

Because rust-minidump minidump-stackwalk pads addresses with 0, crash signatures generated from rust-minidump minidump-stackwalk output has 0-padded addresses when there aren't symbols whereas the crash signatures generated from breakpad stackwalk won't.

Example:
```
app@socorro:/app$ python signature_compare.py 7688ccc3-3ff1-4773-ad94-e25020211213
Crash id: 7688ccc3-3ff1-4773-ad94-e25020211213
stage:    'libxul.so@0x0000000005ca281e | libxul.so@0x0000000004ffe0ac'
prod:     'libxul.so@0x5ca281e | libxul.so@0x4ffe0ac'
```
I wrote this up as bug #1745816.
rust-minidump minidump-stackwalk spits errors out to stderr and doesn't populate json_dump["status"]. breakpad stackwalk puts error codes in json_dump["status"]. The processor puts json_dump["status"] value in mdsw_status_string and signature generation looks at that value for error signatures and appends the error code to the end.

Example:
```
app@socorro:/app$ python signature_compare.py 086b7059-2f0a-41b5-8f31-fc2870211213
Crash id: 086b7059-2f0a-41b5-8f31-fc2870211213
stage:    'OOM | large | EMPTY: no crashing thread identified'
prod:     'OOM | large | EMPTY: no crashing thread identified; ERROR_NO_MINIDUMP_HEADER'
```
I wrote this up as bug #1745663.

Aria and I fixed both issues today. After landing fixes, I don't see any other new issues that aren't from differences in stackwalking.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 6

•

3 years ago

The new stackwalker is in production now. Any new issues that come up will be addressed in new bugs. Marking this as FIXED.

Status: ASSIGNED → RESOLVED

Closed: 3 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

analyze output difference between breakpad stackwalker and rust minidump-stackwalk

Categories

(Socorro :: Processor, task, P2)

Tracking

(Not tracked)

People

(Reporter: willkg, Assigned: willkg)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Comment 1

Methodology

Analysis

Summary

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Attachment

General

Description

File Name

Content Type