Bug 1744323 Comment 1 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

# Methdology

I wrote a script that fetches the processed crash from stage and prod. I wrote another script that diffs the `json_dump` portion of the processed crash.

I looked at 100 crash reports for Firefox and 20 crash reports for Fenix. They were whatever supersearch returned--I didn't spend time thinking about categories of crash reports (versions, channels, platforms, etc).

# Analysis

Differences between rust minidump-stackwalk and breakpad stackwalker:

1. order of modules and unloaded modules

   breakpad stackwalker reorders modules and unloaded modules by address.

   rust minidump-stackwalk emits module and unloaded module information as it
   appears in the minidump stream. Further, can be duplicates in the lists.

   The duplicates thing is covered in issue 332:
   https://github.com/luser/rust-minidump/issues/332

   This affects the "modules", "unloaded_modules", and "main_module" fields.
   Socorro lists modules and unloaded_modules in the report view intereface.
   
   We don't need to fix this before we switch.

2. thread_index vs. threads_index

   This is my fault for not noticing ages ago when Aria produced the schema.

   This is tracked in issue 331: https://github.com/luser/rust-minidump/issues/331

   This should get fixed before we switch.
   
3. null vs nothing

   There are a bunch of fields that breakpad stackwalker doesn't emit at all if
   there's no value, whereas rust minidump-stackwalk will _always_ emit the
   field even if it's null.

   * mac_crash_info
   * lsb_release
   * crash_info.assertion
   * system_info.cpu_microcode_version
   * crashing_thread.last_error_value
   * crashing_thread.thread_name
   * crashing_thread.frames.N.line
   * crashing_thread.frames.N.file
   * crashing_thread.frames.N.module
   * crashing_thread.frames.N.module_offset
   * crashing_thread.frames.N.function_offset
   * crashing_thread.frames.N.function
   * threads.N.last_error_value
   * threads.N.thread_name
   * threads.N.frames.N.file
   * threads.N.frames.N.function
   * threads.N.frames.N.function_offset
   * threads.N.frames.N.line
   * threads.N.frames.N.module
   * threads.N.frames.N.module_offset
   * modules.N.cert_subject
   * modules.N.symbol_url
   * unloaded_modules.N.cert_subject

   This is expected. I wrote up some bugs in Socorro to fix the webapp to handle
   the new null values correctly.

4. false vs. nothing

   Similar to null vs. nothing, there are a couple of fields where breakpad
   stackwalker emits nothing, but rust minidump-stackwalk emits false.
   
   * modules.N.loaded_symbols
   * modules.N.corrupt_symbols
   
   This is expected. I wrote up some bugs in Socorro to fix the webapp to handle
   the new null values correctly.

5. ignored fields

   There were a set of fields that I ignored when comparing output.

   breakpad and minidump-stackwalk have different values for this:
   
   * stackwalk_version
  
   minidump-stackwalk doesn't output these because we decided not to support
   these fields:
   
   * tiny_block_size
   * largest_free_vm_block
   * write_combine_size
  
   minidump-stackwalk changed this to frame_count to be more self-consistent:

   * crashing_thread.total_frames
  
   These fields are ephemeral and change between stackwalker runs:

   * modules.N.symbol_disk_cache_hit
   * modules.N.symbol_fetch_time
   * modules.N.missing_symbols
   * threads.N.frames.N.missing_symbols
   * crashing_thread.frames.N.missing_symbols
  
   This differs between minidump-stackwalk and stackwalker, but that's fine:

   * modules.N.version

   This is fine--there's nothing we need to do here.
   
6. stackwalking differences

   rust minidump-stackwalk and breakpad stackwalker do some different things
   when unwinding the stack.

   Issue 330 covers one instance:
   https://github.com/luser/rust-minidump/issues/330
   
   In that, rust minidump-stackwalk and breakpad stackwalker do different
   things in a situation where the stackwalker needs to guess. Jeff said he
   prefers what rust minidump-stackwalk does.
   
   Differences in stackwalking output can affect crash signatures, but
   otherwise stackwalking is and always has been a "best effort" thing.

   I'm sure there will be cases where stackwalking is different and rust
   minidump-stackwalk is "worse".

   We don't have any known differences that need to be fixed at this time. We
   can fix those cases as they come up.

# Summary

This should get fixed:

https://github.com/luser/rust-minidump/issues/331

I don't think this _needs_ to get fixed because it doesn't affect how Crash Stats works or what it displays, but it's probably good to figure out at some point:

https://github.com/luser/rust-minidump/issues/330

rust minidump-stackwalk is behind a feature flag, so we can switch back and forth between the stackwalkers. Further, because we're now capturing which stackwalker processed the minidump and that information is searchable, we can identify crash reports we want to reprocess. I think this makes moving forward pretty low risk.

Once issue 331 gets fixed, I don't think there are any outstanding output-related issues blocking moving forward.
# Methodology

I wrote a script that fetches the processed crash from stage and prod. I wrote another script that diffs the `json_dump` portion of the processed crash.

I looked at 100 crash reports for Firefox and 20 crash reports for Fenix. They were whatever supersearch returned--I didn't spend time thinking about categories of crash reports (versions, channels, platforms, etc).

# Analysis

Differences between rust minidump-stackwalk and breakpad stackwalker:

1. order of modules and unloaded modules

   breakpad stackwalker reorders modules and unloaded modules by address.

   rust minidump-stackwalk emits module and unloaded module information as it appears in the minidump stream. Further, can be duplicates in the lists.

   The duplicates thing is covered in issue 332: https://github.com/luser/rust-minidump/issues/332

   This affects the "modules", "unloaded_modules", and "main_module" fields. Socorro lists modules and unloaded_modules in the report view intereface.
   
   We don't need to fix this before we switch.

2. thread_index vs. threads_index

   This is my fault for not noticing ages ago when Aria produced the schema.

   This is tracked in issue 331: https://github.com/luser/rust-minidump/issues/331

   This should get fixed before we switch.
   
3. null vs nothing

   There are a bunch of fields that breakpad stackwalker doesn't emit at all if there's no value, whereas rust minidump-stackwalk will _always_ emit the field even if it's null.

   * mac_crash_info
   * lsb_release
   * crash_info.assertion
   * system_info.cpu_microcode_version
   * crashing_thread.last_error_value
   * crashing_thread.thread_name
   * crashing_thread.frames.N.line
   * crashing_thread.frames.N.file
   * crashing_thread.frames.N.module
   * crashing_thread.frames.N.module_offset
   * crashing_thread.frames.N.function_offset
   * crashing_thread.frames.N.function
   * threads.N.last_error_value
   * threads.N.thread_name
   * threads.N.frames.N.file
   * threads.N.frames.N.function
   * threads.N.frames.N.function_offset
   * threads.N.frames.N.line
   * threads.N.frames.N.module
   * threads.N.frames.N.module_offset
   * modules.N.cert_subject
   * modules.N.symbol_url
   * unloaded_modules.N.cert_subject

   This is expected. I wrote up some bugs in Socorro to fix the webapp to handle the new null values correctly.

4. false vs. nothing

   Similar to null vs. nothing, there are a couple of fields where breakpad stackwalker emits nothing, but rust minidump-stackwalk emits false.
   
   * modules.N.loaded_symbols
   * modules.N.corrupt_symbols
   
   This is expected. I wrote up some bugs in Socorro to fix the webapp to handle the new null values correctly.

5. ignored fields

   There were a set of fields that I ignored when comparing output.

   breakpad and minidump-stackwalk have different values for this:
   
   * stackwalk_version
  
   minidump-stackwalk doesn't output these because we decided not to support these fields:
   
   * tiny_block_size
   * largest_free_vm_block
   * write_combine_size
  
   minidump-stackwalk changed this to frame_count to be more self-consistent:

   * crashing_thread.total_frames
  
   These fields are ephemeral and change between stackwalker runs:

   * modules.N.symbol_disk_cache_hit
   * modules.N.symbol_fetch_time
   * modules.N.missing_symbols
   * threads.N.frames.N.missing_symbols
   * crashing_thread.frames.N.missing_symbols
  
   This differs between minidump-stackwalk and stackwalker, but that's fine:

   * modules.N.version

   This is fine--there's nothing we need to do here.
   
6. stackwalking differences

   rust minidump-stackwalk and breakpad stackwalker do some different things when unwinding the stack.

   Issue 330 covers one instance: https://github.com/luser/rust-minidump/issues/330
   
   In that, rust minidump-stackwalk and breakpad stackwalker do different things in a situation where the stackwalker needs to guess. Jeff said he prefers what rust minidump-stackwalk does.
   
   Differences in stackwalking output can affect crash signatures, but otherwise stackwalking is and always has been a "best effort" thing.

   I'm sure there will be cases where stackwalking is different and rust minidump-stackwalk is "worse".

   We don't have any known differences that need to be fixed at this time. We can fix those cases as they come up.

# Summary

This should get fixed:

https://github.com/luser/rust-minidump/issues/331

I don't think this _needs_ to get fixed because it doesn't affect how Crash Stats works or what it displays, but it's probably good to figure out at some point:

https://github.com/luser/rust-minidump/issues/330

rust minidump-stackwalk is behind a feature flag, so we can switch back and forth between the stackwalkers. Further, because we're now capturing which stackwalker processed the minidump and that information is searchable, we can identify crash reports we want to reprocess. I think this makes moving forward pretty low risk.

Once issue 331 gets fixed, I don't think there are any outstanding output-related issues blocking moving forward.

Back to Bug 1744323 Comment 1