Add support for CFI-based stack-walking to the minidump-analyzer tool

VERIFIED FIXED in Firefox 58

Status

()

defect
P1
normal
VERIFIED FIXED
2 years ago
a year ago

People

(Reporter: gsvelto, Assigned: ccorcoran)

Tracking

(Blocks 3 bugs)

unspecified
mozilla59
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr52 wontfix, firefox57 wontfix, firefox58 fixed, firefox59 fixed)

Details

Attachments

(4 attachments)

Reporter

Description

2 years ago
+++ This bug was initially created as a clone of Bug #1280477 +++

The current implementation of the minidump-analyzer (bug 1280477) can walk stacks only by following frame-pointers or by using stack-scanning. This means that the stack traces we currently get are less accurate than they could be, especially on Windows 64-bit where we don't have - and we'll never have IIUC - frame-pointers.

Fortunately this issue can be solved by adding support for CFI-based stack-walking which is already supported in breakpad. What needs to be added is a way for the minidump analyzer to retrieve the CFI information (which we ship in the Win64 build already AFAIK) and feed it into the breakpad code when analyzing a minidump.

Comment 1

2 years ago
My understanding is that even though win64 doesn't have "frame pointers", it does have stackwalk information built into the binaries. So I'm not convinced we need externally-fetched stackwalking: could we get it directly from the binaries instead? That would improve the situation with doing external fetching and dealing with bandwidth/scaling other issues inherent in that.
Flags: needinfo?(gsvelto)
Reporter

Comment 2

2 years ago
(In reply to Benjamin Smedberg [:bsmedberg] from comment #1)
> My understanding is that even though win64 doesn't have "frame pointers", it
> does have stackwalk information built into the binaries. So I'm not
> convinced we need externally-fetched stackwalking: could we get it directly
> from the binaries instead? That would improve the situation with doing
> external fetching and dealing with bandwidth/scaling other issues inherent
> in that.

Yes, I've worded the description poorly: my intention is to retrieve the CFI data directly from the binaries, not from an external source.
Flags: needinfo?(gsvelto)

Comment 3

2 years ago
Ah ok, I misunderstood. Carry on!
FWIW, the code in question is PDBSourceLineWriter::PrintFrameDataUsingEXE:
https://chromium.googlesource.com/breakpad/breakpad/+/master/src/common/windows/pdb_source_line_writer.cc#689

It's not really written in a way to make it easy to use for this purpose, but I think you should be able to dump a .sym file containing STACK CFI lines given a Win64 exe/dll and then feed that to the BasicSourceLineResolver machinery from the stackwalker.

I wrote the code for dumping the unwind tables originally, but it's been a while so I'm a little fuzzy on them.
To clarify: it's not really "CFI" on Windows, it's a set of unwind tables (which are basically the same thing):
https://msdn.microsoft.com/en-us/library/0kd71y96.aspx

On Linux/OS X there is actual CFI available in .eh_frame in the binaries that we should be able to use.
Reporter

Comment 6

2 years ago
Thanks for the pointers Ted!
Assignee

Updated

2 years ago
Assignee: nobody → ccorcoran
Assignee

Updated

2 years ago
Attachment #8861146 - Flags: review?(ted)
Reporter

Comment 8

2 years ago
I've begun reviewing this, I hope it's done by the end of today. I did some testing, causing crashes both with and without this patch and there's a night-and-day difference in the stack traces we get. CFI support reliably brings us all the way to the top frame, including frames within kernel32.dll and ntdll.dll. It's awesome! The sooner we land this the sooner we can rely on crash pings for augmenting Socorro.
Reporter

Comment 9

2 years ago
mozreview-review
Comment on attachment 8861146 [details]
Bug 1333126: use win64 PE unwind metadata to improve client-side stack walking;

https://reviewboard.mozilla.org/r/133096/#review140548

Overall this is looking good, there's a few things I'd like to see addressed though:
* Code formatting should match https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style#Control_Structures, this mostly involves wrapping lines to 80 characters and using the correct bracing and naming. I added some nits below but I didn't mark all the issues because that would be pointless, just make sure you adapt the whole files.
* We should avoid code duplication if possible, using Breakpad's `PDBSourceLineWriter` if possible instead of rolling our own code
* Implementation for the various classes should live in a .cpp file that's conditionally included in the build via the `moz.build` file rather than in the headers

After you've addressed these nits I'll have another look and there's a good chance that it will be ready to land.

::: commit-message-62a2d:1
(Diff revision 1)
> +bug 1333126: use unwind info from Win64 binaries to aide client-side stack walking; r=gsvelto

nit: The comment should be in this format

`Bug 1333126 - Use unwind info from Win64 binaries to aide client-side stack walking; r?gsvelto`

Note the question mark before the reviewers name. It's an important detail because of how mozreview works:

https://mozilla-version-control-tools.readthedocs.io/en/latest/mozreview/commits.html#specifying-reviewers-in-commit-messages

::: toolkit/crashreporter/minidump-analyzer/MinidumpAnalyzerUtils.h:5
(Diff revision 1)
> +/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +

nit: Leave only one empty line, the same applies to all the other instances below where there's more than one.

::: toolkit/crashreporter/minidump-analyzer/MinidumpAnalyzerUtils.h:16
(Diff revision 1)
> +
> +namespace CrashReporter {
> +
> +
> +template <typename ...Args>
> +inline std::string stringWithFormat(const std::string& format, Args && ...args)

nit: Function arguments are prefixed with an 'a', like `aFormat`, `aArgs`, etc... See https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style#Prefixes for more details. Also, lines should not exceed 80 characters.

::: toolkit/crashreporter/minidump-analyzer/MinidumpAnalyzerUtils.h:20
(Diff revision 1)
> +template <typename ...Args>
> +inline std::string stringWithFormat(const std::string& format, Args && ...args)
> +{
> +  auto size = std::snprintf(nullptr, 0, format.c_str(), std::forward<Args>(args)...);
> +  std::string output(size + 1, '\0');// Ensure the null-terminator
> +  output.resize(size);// Ensure the reported length is correct.

Why are you resizing the string here? And more importantly why are you making it one character shorter than the original allocation in the line above?

::: toolkit/crashreporter/minidump-analyzer/MinidumpAnalyzerUtils.h:109
(Diff revision 1)
> +
> +
> +} // namespace
> +
> +
> +#endif// MinidumpAnalyzerUtils_h

nit: Add a space between the `#endif` directive and the comment

::: toolkit/crashreporter/minidump-analyzer/Win64ModuleUnwindMetadata.h:9
(Diff revision 1)
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +#ifndef Win64ModuleUnwindMetadata_h
> +#define Win64ModuleUnwindMetadata_h
> +
> +#if defined(_WIN64)

You should put the implementation for this file and the symbol supplier into two .cpp files which would be included conditionally via moz.build:

```
if CONFIG['OS_ARCH'] == 'WINNT':
    UNIFIED_SOURCES += [
        'Win64ModuleUnwindMetadata.cpp',
        'Win64ModuleUnwindMetadataSymbolSupplier.cpp',
    ]
```

Then you could remove this directive and include the header file conditionally in `minidump-analyzer.cpp`

::: toolkit/crashreporter/minidump-analyzer/Win64ModuleUnwindMetadata.h:12
(Diff revision 1)
> +#define Win64ModuleUnwindMetadata_h
> +
> +#if defined(_WIN64)
> +
> +
> +typedef unsigned __int8 UBYTE;

Why not use uint8_t directly instead?

::: toolkit/crashreporter/minidump-analyzer/Win64ModuleUnwindMetadata.h:14
(Diff revision 1)
> +#if defined(_WIN64)
> +
> +
> +typedef unsigned __int8 UBYTE;
> +
> +#include <windows.h>

nit: Split the includes between Windows-specific and standard C++ ones then sort them by name where possible.

::: toolkit/crashreporter/minidump-analyzer/Win64ModuleUnwindMetadata.h:82
(Diff revision 1)
> +};
> +
> +
> +
> +inline bool
> +DumpAsSYMCFI(const string& code_file_, std::ostream& output_) {

It seems to me that most of this code comes from Breakpad's `pdb_source_line_writer.cc` file. Would it be possible to use it directly instead of duplicating it here?

::: toolkit/crashreporter/minidump-analyzer/minidump-analyzer.cpp:36
(Diff revision 1)
>  
>  #endif
>  
> +#include "MinidumpAnalyzerUtils.h"
> +
> +#if defined(_WIN64)

Use `#if XP_WIN && HAVE_64BIT_BUILD` instead to check if we're on 64-bit Windows.

::: toolkit/crashreporter/minidump-analyzer/minidump-analyzer.cpp:388
(Diff revision 1)
>    if (!FileExists(dumpPath)) {
>      // The dump file does not exist
>      return 1;
>    }
>  
> +  // Calculate the location for storing generated symbols, or "" to not save symbols to disk.

Better put this into a separate function to make the intent clear.

::: toolkit/crashreporter/minidump-analyzer/minidump-analyzer.cpp:389
(Diff revision 1)
>      // The dump file does not exist
>      return 1;
>    }
>  
> +  // Calculate the location for storing generated symbols, or "" to not save symbols to disk.
> +#if defined(_WIN64)

nit: `#if XP_WIN && HAVE_64BIT_BUILD`

::: toolkit/crashreporter/minidump-analyzer/minidump-analyzer.cpp:391
(Diff revision 1)
>    }
>  
> +  // Calculate the location for storing generated symbols, or "" to not save symbols to disk.
> +#if defined(_WIN64)
> +  string symbolPath = dumpPath;
> +  {

Why the extra scope? Also, braces for control structures use the K&R style:

https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style#Control_Structures

::: toolkit/crashreporter/minidump-analyzer/minidump-analyzer.cpp:416
(Diff revision 1)
> +        symbolPath = "";// symbols dir is not a directory; not saving symbols to disk
> +      }
> +    }
> +  }
> +#else
> +  string symbolPath;

nit: Better to declary `symbolPath` only once outside of the preprocessor directives.

::: toolkit/crashreporter/minidump-analyzer/minidump-analyzer.cpp:418
(Diff revision 1)
> +    }
> +  }
> +#else
> +  string symbolPath;
> +#endif
> +  

nit: Trim all trailing space

::: toolkit/crashreporter/minidump-analyzer/moz.build:32
(Diff revision 1)
>      if CONFIG['OS_TARGET'] == 'Darwin':
>          DIST_SUBDIR = 'crashreporter.app/Contents/MacOS'
>  
> +if CONFIG['OS_TARGET'] == 'WINNT' and CONFIG['CPU_ARCH'] == 'x86_64':
> +    OS_LIBS += [
> +    'Dbghelp',

nit: Both lines should be intended by another 4 characters
Assignee

Updated

2 years ago
Blocks: 1372830
Assignee

Comment 10

2 years ago
Created bug 1372830, pointing out a couple things about the limitations of our current CFI generation.
Assignee

Comment 11

2 years ago
Please let me know if my approach below makes sense; I'm already in progress.

Knowing that we will need incremental improvements for accuracy, here is my current plan as a first releasable version. IOW, let me state the technical compromises I think we can accept for landing this.
> I highlight my proposed actions below like this.

My guiding principle is to guarantee never producing worse stack traces than previously.

This means we need to ensure that any time we don't consider CFI usable, we always fall back to stack scanning as before. So how do we detect when CFI is usable or not? Let me outline some scenarios I have identified that will lead to WORSE stack traces than before, and how we can handle it in the future.

The CFI we produce basically calculates the size of the stack for each function. From that description alone it's clear how some functions are not going to be represented accurately this way.


1) Functions that use alloca() or similar. Or more fundamentally, functions that emit UWOP_SET_FPREG and/or mess with RSP during the function body.

Unwind info + minidump can give us the correct stack size, however
> for V1 I propose that we don't generate CFI for functions that emit UWOP_SET_FPREG, or any other similar unsupported opcode


2) During a function prologue
Unwind info gives instruction-accurate information about tracking RSP through the function prologue. This would only be relevant if a stack frame's return address is within a function prologue. It's not completely rare, but...
> for V1 I propose ignoring function prologue. At the moment we're using a static stack size for the entire duration of the function.
> 
> It may be technically more accurate to use that stack size only AFTER the prologue, but I don't think it matters much either way. All bets are off during prologue.


3) Cases we'll probably never be able to handle:
- Corrupt unwind info that, by pure chance, produces a valid stack trace. We can't really detect this, but I suspect would be incredibly rare, not worth considering.
- Return addresses in the middle of a function epilogue. Unwind info doesn't contain anything about epilogue. Likely more rare than prologue calls though.
Reporter

Comment 12

2 years ago
Comment on attachment 8861146 [details]
Bug 1333126: use win64 PE unwind metadata to improve client-side stack walking;

As per our discussion on IRC, we might consider a slightly different approach so clearing the review flag for now.
Attachment #8861146 - Flags: review?(gsvelto)
Assignee

Comment 13

2 years ago
I want to propose a set of tests to start from, and collect feedback/opinions on the approach/depth. Of course during review we can dig deeper but for now I just want to get a feel.

First, an update about implementation. Instead of generating .sym files, I now parse unwind info directly into the final CFIFrameInfo data that Breakpoint uses. Processing minidumps down from 16sec to about 1sec, solving the performance issue, and removing any dependency on file system.

So there's a bit less to test now. Here's what I propose:

- Correctly measure stack size for all supported unwind codes and variants
  This amounts to about 8 individual tests at the moment. Maybe I should consolidate them?

- Correct handling of unknown or missing unwind codes
  This is currently a single test. Unknown opcode results in skipping CFI generation which causes stack scan.

- Correct handling of corrupt DLL. I don't know how detailed we want to get with this; we check failure in many places and handle it all the same way: by skipping CFI generation for that DLL. I'm inclined to just add a command line flag to minidump-analyzer telling it instead of loading xul.dll, load not_a_dll.txt instead. Is that sufficient?

- Correct handling of infinite chains in unwind info. There is a feature of unwind metadata where it can point to other data in memory to save space because many functions have the exact same unwind signature. This could in theory cause infinite loops. Very unlikely we'd ever see this but it's probably worth adding the support for it and corresponding test for sanity. For this I'd probably check in a manually hex-edited DLL that I use during dev.
Flags: needinfo?(gsvelto)
Reporter

Comment 14

2 years ago
(In reply to Carl Corcoran [:ccorcoran] from comment #13)
> First, an update about implementation. Instead of generating .sym files, I
> now parse unwind info directly into the final CFIFrameInfo data that
> Breakpoint uses. Processing minidumps down from 16sec to about 1sec, solving
> the performance issue, and removing any dependency on file system.

Awesome!

> So there's a bit less to test now. Here's what I propose:
> 
> - Correctly measure stack size for all supported unwind codes and variants
>   This amounts to about 8 individual tests at the moment. Maybe I should
> consolidate them?

Only if they take a very long time, otherwise having fine-grained testing makes identifying and fixing regressions easier.

> - Correct handling of unknown or missing unwind codes
>   This is currently a single test. Unknown opcode results in skipping CFI
> generation which causes stack scan.

OK.

> - Correct handling of corrupt DLL. I don't know how detailed we want to get
> with this; we check failure in many places and handle it all the same way:
> by skipping CFI generation for that DLL. I'm inclined to just add a command
> line flag to minidump-analyzer telling it instead of loading xul.dll, load
> not_a_dll.txt instead. Is that sufficient?

Yes, we just want to make sure that the error path works, this is a situation which is not going to be common so it's fine to check that we have basic error handling that works.

> - Correct handling of infinite chains in unwind info. There is a feature of
> unwind metadata where it can point to other data in memory to save space
> because many functions have the exact same unwind signature. This could in
> theory cause infinite loops. Very unlikely we'd ever see this but it's
> probably worth adding the support for it and corresponding test for sanity.
> For this I'd probably check in a manually hex-edited DLL that I use during
> dev.

Sounds good, this is extremely thorough, thanks for all you're doing in this bug!
Flags: needinfo?(gsvelto)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Reporter

Comment 19

2 years ago
Comment on attachment 8861146 [details]
Bug 1333126: use win64 PE unwind metadata to improve client-side stack walking;

Stealing the reviews from Ted since I'm back in the game.
Attachment #8861146 - Flags: review?(ted) → review?(gsvelto)
Reporter

Updated

2 years ago
Attachment #8882686 - Flags: review?(ted) → review?(gsvelto)
Reporter

Comment 20

2 years ago
mozreview-review
Comment on attachment 8861146 [details]
Bug 1333126: use win64 PE unwind metadata to improve client-side stack walking;

https://reviewboard.mozilla.org/r/133096/#review167694

I went through all the patch and it seems fine to me with the exception of a bunch of coding convention nits; with those fixed this is ready to land IMO.

::: toolkit/crashreporter/minidump-analyzer/MozStackFrameSymbolizer.h:39
(Diff revision 3)
> +  MozStackFrameSymbolizer();
> +
> +  virtual SymbolizerResult FillSourceLineInfo(const CodeModules* modules,
> +		const SystemInfo* system_info,
> +		StackFrame* stack_frame);
> +  

nit: Remove trailing spaces

::: toolkit/crashreporter/minidump-analyzer/MozStackFrameSymbolizer.h:41
(Diff revision 3)
> +  virtual SymbolizerResult FillSourceLineInfo(const CodeModules* modules,
> +		const SystemInfo* system_info,
> +		StackFrame* stack_frame);
> +  
> +  virtual class google_breakpad::CFIFrameInfo* FindCFIFrameInfo(
> +  	const StackFrame* frame);

nit: remove the tab and re-indent

::: toolkit/crashreporter/minidump-analyzer/MozStackFrameSymbolizer.cpp:30
(Diff revision 3)
> +  StackFrameSymbolizer(nullptr, nullptr)
> +{
> +}
> +
> +MozStackFrameSymbolizer::SymbolizerResult
> +MozStackFrameSymbolizer::FillSourceLineInfo(const CodeModules* modules,

nit: Indent the arguments so that they're aligned (if possible without exceeding the 80 lines limit)

::: toolkit/crashreporter/minidump-analyzer/MozStackFrameSymbolizer.cpp:36
(Diff revision 3)
> +                                              const SystemInfo* system_info,
> +                                              StackFrame* stack_frame)
> +{
> +  SymbolizerResult ret = StackFrameSymbolizer::FillSourceLineInfo(
> +    modules, system_info, stack_frame);
> +  

nit: Trim trailing space

::: toolkit/crashreporter/minidump-analyzer/MozStackFrameSymbolizer.cpp:66
(Diff revision 3)
> +  bool moduleHasBeenReplaced = false;
> +  if (gMinidumpAnalyzerOptions.forceUseModule.size() > 0) {
> +    modulePath = gMinidumpAnalyzerOptions.forceUseModule;
> +    moduleHasBeenReplaced = true;
> +  } else {
> +    if (!frame->module)

nit: Add braces

::: toolkit/crashreporter/minidump-analyzer/MozStackFrameSymbolizer.cpp:86
(Diff revision 3)
> +  UnwindCFI cfi;
> +  DWORD offsetAddr;
> +
> +  if (moduleHasBeenReplaced) {
> +    // if we are replacing a module, addresses will never line up.
> +    // so just act like the 1st entry is correct.  

nit: Trime trailing space, also I suspect you wanted the previous line to end with a comma instead of a dot.

::: toolkit/crashreporter/minidump-analyzer/MozStackFrameSymbolizer.cpp:92
(Diff revision 3)
> +    offsetAddr = unwindParser->GetAnyOffsetAddr();
> +  } else {
> +    offsetAddr = frame->instruction - frame->module->base_address();
> +  }
> +
> +  if (!unwindParser->GetCFI(offsetAddr, cfi))

nit: Add braces

::: toolkit/crashreporter/minidump-analyzer/MozStackFrameSymbolizer.cpp:92
(Diff revision 3)
> +    offsetAddr = unwindParser->GetAnyOffsetAddr();
> +  } else {
> +    offsetAddr = frame->instruction - frame->module->base_address();
> +  }
> +
> +  if (!unwindParser->GetCFI(offsetAddr, cfi))

nit: Add braces

::: toolkit/crashreporter/minidump-analyzer/Win64ModuleUnwindMetadata.h:30
(Diff revision 3)
> +  uint32_t stackSize;
> +  uint32_t ripOffset;
> +};
> +
> +// Does lazy-parsing of unwind info.
> +class ModuleUnwindParser {

nit: Re-indent all members of this class so that they're aligned, also the method parameters should start with the **a** prefix, e.g.: path -> aPath

::: toolkit/crashreporter/minidump-analyzer/Win64ModuleUnwindMetadata.h:31
(Diff revision 3)
> +  uint32_t ripOffset;
> +};
> +
> +// Does lazy-parsing of unwind info.
> +class ModuleUnwindParser {
> +	PLOADED_IMAGE mImg;

nit: Adjust the indentation of all of this class members so that it's two spaces

::: toolkit/crashreporter/minidump-analyzer/Win64ModuleUnwindMetadata.cpp:58
(Diff revision 3)
> +  UnwindCode unwind_code[1];
> +};
> +
> +ModuleUnwindParser::~ModuleUnwindParser()
> +{
> +  if (mImg)

nit: Add braces

::: toolkit/crashreporter/minidump-analyzer/Win64ModuleUnwindMetadata.cpp:62
(Diff revision 3)
> +{
> +  if (mImg)
> +    ImageUnload(mImg);
> +}
> +
> +void* ModuleUnwindParser::RvaToVa(ULONG rva)

nit: The return type should be on its own line with the method name on the following one, likewise for all other functions and methods in this file.

::: toolkit/crashreporter/minidump-analyzer/Win64ModuleUnwindMetadata.cpp:76
(Diff revision 3)
> +  // Convert wchar to native charset because ImageLoad only takes
> +  // a PSTR as input.
> +  std::string code_file = UTF8toMBCS(path);
> +
> +  mImg = ImageLoad((PSTR)code_file.c_str(), NULL);
> +  if (!mImg || !mImg->FileHeader)

nit: Add braces

::: toolkit/crashreporter/minidump-analyzer/Win64ModuleUnwindMetadata.cpp:80
(Diff revision 3)
> +  mImg = ImageLoad((PSTR)code_file.c_str(), NULL);
> +  if (!mImg || !mImg->FileHeader)
> +    return;
> +
> +  PIMAGE_OPTIONAL_HEADER64 optional_header = &mImg->FileHeader->OptionalHeader;
> +  if (optional_header->Magic != IMAGE_NT_OPTIONAL_HDR64_MAGIC)

nit: Likewise

::: toolkit/crashreporter/minidump-analyzer/Win64ModuleUnwindMetadata.cpp:90
(Diff revision 3)
> +
> +  DWORD exception_size = optional_header->
> +    DataDirectory[IMAGE_DIRECTORY_ENTRY_EXCEPTION].Size;
> +
> +  auto funcs = (PIMAGE_RUNTIME_FUNCTION_ENTRY)RvaToVa(exception_rva);
> +  if (!funcs)

nit: Likewise

::: toolkit/crashreporter/minidump-analyzer/Win64ModuleUnwindMetadata.cpp:117
(Diff revision 3)
> +      return false;
> +    }
> +    visited.insert(unwind_rva);
> +
> +    auto chained_func = (PIMAGE_RUNTIME_FUNCTION_ENTRY)RvaToVa(unwind_rva);
> +    if (!chained_func)

nit: Add braces

::: toolkit/crashreporter/minidump-analyzer/Win64ModuleUnwindMetadata.cpp:125
(Diff revision 3)
> +  }
> +
> +  visited.insert(unwind_rva);
> +
> +  auto unwind_info = (UnwindInfo*)RvaToVa(unwind_rva);
> +  if (!unwind_info)

nit: Likewise

::: toolkit/crashreporter/minidump-analyzer/Win64ModuleUnwindMetadata.cpp:141
(Diff revision 3)
> +          break;
> +        }
> +        case UWOP_ALLOC_LARGE: {
> +          if (unwind_code->operation_info == 0) {
> +            c++;
> +            if (c < unwind_info->count_of_codes)

nit: Likewise

::: toolkit/crashreporter/minidump-analyzer/Win64ModuleUnwindMetadata.cpp:145
(Diff revision 3)
> +            c++;
> +            if (c < unwind_info->count_of_codes)
> +              stack_size += (unwind_code + 1)->frame_offset * 8;
> +          } else {
> +            c += 2;
> +            if (c < unwind_info->count_of_codes)

nit: Likewise

::: toolkit/crashreporter/minidump-analyzer/Win64ModuleUnwindMetadata.cpp:193
(Diff revision 3)
> +    if (unwind_info->flags & UNW_FLAG_CHAININFO) {
> +      auto chained_func = (PIMAGE_RUNTIME_FUNCTION_ENTRY)(
> +            (unwind_info->unwind_code +
> +            ((unwind_info->count_of_codes + 1) & ~1)));
> +
> +      if (visited.end() != visited.find(chained_func->UnwindInfoAddress))

nit: Likewise

::: toolkit/crashreporter/minidump-analyzer/Win64ModuleUnwindMetadata.cpp:214
(Diff revision 3)
> +}
> +
> +// For unit testing we sometimes need any address that's valid in this module.
> +// Just return the first address we know of.
> +DWORD ModuleUnwindParser::GetAnyOffsetAddr() const {
> +  if (mUnwindMap.size() < 1)

nit: Add braces

::: toolkit/crashreporter/minidump-analyzer/Win64ModuleUnwindMetadata.cpp:223
(Diff revision 3)
> +
> +bool ModuleUnwindParser::GetCFI(DWORD address, UnwindCFI& ret)
> +{
> +  // Figure out the begin address of the requested address.
> +  auto itUW = mUnwindMap.lower_bound(address + 1);
> +  if (itUW == mUnwindMap.begin())

nit: Add braces

::: toolkit/crashreporter/minidump-analyzer/Win64ModuleUnwindMetadata.cpp:225
(Diff revision 3)
> +{
> +  // Figure out the begin address of the requested address.
> +  auto itUW = mUnwindMap.lower_bound(address + 1);
> +  if (itUW == mUnwindMap.begin())
> +   return false; // address before this module.
> +  -- itUW;

nit: No space between the **--** operator and the variable name.

::: toolkit/crashreporter/minidump-analyzer/Win64ModuleUnwindMetadata.cpp:229
(Diff revision 3)
> +   return false; // address before this module.
> +  -- itUW;
> +
> +  // Ensure that the function entry is big enough to contain this address.
> +  IMAGE_RUNTIME_FUNCTION_ENTRY& func = *itUW->second;
> +  if (address > func.EndAddress)

nit: Add braces

::: toolkit/crashreporter/minidump-analyzer/Win64ModuleUnwindMetadata.cpp:240
(Diff revision 3)
> +    ret = itCFI->second;
> +    return true;
> +  }
> +
> +  // No, generate it.
> +  if (!GenerateCFIForFunction(func, ret))

nit: Likewise

::: toolkit/crashreporter/minidump-analyzer/minidump-analyzer.cpp:40
(Diff revision 3)
> +
> +#if XP_WIN && HAVE_64BIT_BUILD
> +#include "MozStackFrameSymbolizer.h"
> +#endif
> +
> +

nit: Only one empty line is enough

::: toolkit/crashreporter/minidump-analyzer/minidump-analyzer.cpp:387
(Diff revision 3)
>      // The dump file does not exist
>      return 1;
>    }
>  
> +  // Process command-line arguments
> +  for(int i = 2; i < argc; ++ i) {

nit: Add a space between **for** and the opening parenthesis, no space between the **++** operator and the variable name.

::: toolkit/crashreporter/minidump-analyzer/minidump-analyzer.cpp:389
(Diff revision 3)
>    }
>  
> +  // Process command-line arguments
> +  for(int i = 2; i < argc; ++ i) {
> +    if (strcmp(argv[i], "--force-use-module") == 0) {
> +      if ((++ i) < argc) {

If no module name is given it would be best to exit with an error instead of silently ignoring the mistake.
Attachment #8861146 - Flags: review?(gsvelto) → review+
Reporter

Comment 21

2 years ago
mozreview-review
Comment on attachment 8882686 [details]
Bug 1333126: adding tests;

https://reviewboard.mozilla.org/r/153760/#review168086

It's great that you added such extensive tests, this is awesome!

::: toolkit/crashreporter/test/nsTestCrasher.cpp:166
(Diff revision 2)
> +  case CRASH_X64CFI_SAVE_XMM128:
> +  case CRASH_X64CFI_SAVE_XMM128_FAR:
> +  case CRASH_X64CFI_EPILOG: {
> +    ReserveStack();
> +    auto m = GetWin64CFITestMap();
> +    if (m.find(how) == m.end())

nit: Add braces

::: toolkit/crashreporter/test/nsTestCrasher.cpp:217
(Diff revision 2)
> +#if XP_WIN && HAVE_64BIT_BUILD
> +  // fnid uses the same constants as Crash().
> +  // Returns the RVA of the requested function.
> +  // Returns 0 on failure.
> +  auto m = GetWin64CFITestMap();
> +  if (m.find(fnid) == m.end())

nit: Add braces

::: toolkit/crashreporter/test/unit/head_crashreporter.js:113
(Diff revision 2)
> +    return;
> +  }
> +
> +  // find minidump-analyzer executable.
> +  let ds = Cc["@mozilla.org/file/directory_service;1"]
> +  .getService(Ci.nsIProperties);

nit: Indent this call so that the dot ends up below the opening square bracket

::: toolkit/crashreporter/test/unit/head_crashreporter.js:122
(Diff revision 2)
> +  ok(bin && bin.exists());
> +  bin.append("minidump-analyzer.exe");
> +  ok(bin.exists());
> +
> +  let process = Cc["@mozilla.org/process/util;1"]
> +  .createInstance(Ci.nsIProcess);

nit: Likewise

::: toolkit/crashreporter/test/unit/head_crashreporter.js:128
(Diff revision 2)
> +  process.init(bin);
> +  let args = [dumpFile.path];
> +  if (additionalArgs) {
> +    args = args.concat(additionalArgs);
> +  }
> +  process.run(true, args, args.length);

nit: Add a **/* blocking */** comment in front of the first argument so that the intent is clear

::: toolkit/crashreporter/test/unit/head_win64cfi.js:18
(Diff revision 2)
> +  // subtraction.
> +  // e.g.    "0x1111111111112222"
> +  //       - "0x1111111111111111"
> +  // becomes 2222 - 1111
> +  for (; i < base.length; ++i) {
> +    if (base[i] != ip[i])

nit: Add braces

::: toolkit/crashreporter/test/unit/head_win64cfi.js:21
(Diff revision 2)
> +  // becomes 2222 - 1111
> +  for (; i < base.length; ++i) {
> +    if (base[i] != ip[i])
> +      break;
> +  }
> +  if (i == base.length)

nit: Add braces

::: toolkit/crashreporter/test/unit/head_win64cfi.js:42
(Diff revision 2)
> +        closestDistance = thisDistance;
> +        closestSym = sym;
> +      }
> +    }
> +  }
> +  if (closestSym === null)

nit: Add braces

::: toolkit/crashreporter/test/unit/head_win64cfi.js:88
(Diff revision 2)
> +  let ret = "frames[" + frameIndex + "] ip=" + ip +
> +    " " + symbol +
> +    ", module:" + filename +
> +    ", trust:" + frame.trust +
> +    ", moduleOffset:" + moduleOffset.toString(16)
> +    ;

nit: The semicolon goes at the end of the previous line

::: toolkit/crashreporter/test/unit/test_crash_win64cfi_infinite_code_chain.js:3
(Diff revision 2)
> +
> +function run_test() {
> +	// Test that minidump-analyzer gracefully handles chained

nit: This should use two spaces indentation

::: toolkit/crashreporter/test/unit/test_crash_win64cfi_infinite_code_chain.js:12
(Diff revision 2)
> +	ok(exe);
> +
> +	// Perform a crash. We won't get unwind info, but make sure the stack scan
> +	// still works.
> +  do_x64CFITest("CRASH_X64CFI_ALLOC_SMALL",
> +  	[

nit: Use spaces instead of tabs

::: toolkit/crashreporter/test/unit/test_crash_win64cfi_infinite_code_chain.js:15
(Diff revision 2)
> +	// still works.
> +  do_x64CFITest("CRASH_X64CFI_ALLOC_SMALL",
> +  	[
> +      { symbol: "CRASH_X64CFI_ALLOC_SMALL", trust: "context" },
> +      { symbol: "CRASH_X64CFI_NO_MANS_LAND", trust: null }
> +  	],

nit: Likewise

::: toolkit/crashreporter/test/unit/test_crash_win64cfi_infinite_entry_chain.js:3
(Diff revision 2)
> +
> +function run_test() {
> +	// Test that minidump-analyzer gracefully handles chained

Nit: Same as for the previous test, re-indent this and remove the tabs

::: toolkit/crashreporter/test/unit/test_crash_win64cfi_invalid_exception_rva.js:3
(Diff revision 2)
> +
> +function run_test() {
> +	// Test that minidump-analyzer gracefully handles an invalid pointer to the

Nit: and this test too

::: toolkit/crashreporter/test/unit/test_crash_win64cfi_not_a_pe.js:2
(Diff revision 2)
> +
> +function run_test() {

Nit: ... and also this one

::: toolkit/crashreporter/test/unit/test_crash_win64cfi_unknown_op.js:2
(Diff revision 2)
> +
> +function run_test() {

nit: ... and this one too

::: toolkit/crashreporter/test/win64UnwindInfoTests.asm:243
(Diff revision 2)
> +  mov qword ptr [rsp], rax
> +
> +  ; rsp = [pfn][r15][rbx][rsi][rbp][r10][ra]
> +
> +  .endprolog
> +  

nit: Strip the trailing spaces

::: toolkit/crashreporter/test/win64UnwindInfoTests.asm:308
(Diff revision 2)
> +
> +  mov qword ptr [rsp], rax
> +  ; rsp = [pfn][..8..][xmm15][xmm6][ra]
> +
> +  .endprolog
> +  

nit: Likewise

::: toolkit/crashreporter/test/win64UnwindInfoTests.asm:339
(Diff revision 2)
> +
> +  mov qword ptr [rsp], rax
> +  ; rsp = [pfn][..640kb..][..8..][xmm15][xmm6][ra]
> +
> +  .endprolog
> +  

nit: Likewise
Attachment #8882686 - Flags: review?(gsvelto) → review+
Reporter

Comment 22

2 years ago
There's something I've missed: in the minidump-analyzer you're using strcmp() without having included <cstring> first which is leading to build failures. Somehow it didn't show up locally nor on my first try run but then it popped up in later ones, weird. Static analysis builds might also require some love:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a2a107cabe1f2c5ab4f7cd5c327277df15b326d7&selectedJob=119884803
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Assignee

Updated

2 years ago
Keywords: checkin-needed
Autoland can't push this until all pending issues in MozReview are marked as resolved.
Keywords: checkin-needed
Assignee

Updated

2 years ago
Keywords: checkin-needed
Assignee

Comment 27

2 years ago
All should be marked resolved now.

Comment 28

2 years ago
We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again.

hg error in cmd: hg rebase -s df72c27e2b00 -d 8b76ef688a02: rebasing 411818:df72c27e2b00 "Bug 1333126: use win64 PE unwind metadata to improve client-side stack walking; r=gsvelto"
rebasing 411819:2ec52349a664 "Bug 1333126: adding tests; r=gsvelto" (tip)
merging toolkit/crashreporter/test/CrashTestUtils.jsm
merging toolkit/crashreporter/test/unit/head_crashreporter.js
warning: conflicts while merging toolkit/crashreporter/test/unit/head_crashreporter.js! (edit, then use 'hg resolve --mark')
unresolved conflicts (see hg resolve, then hg rebase --continue)
Flags: needinfo?(ccorcoran)
Keywords: checkin-needed
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Assignee

Updated

2 years ago
Flags: needinfo?(ccorcoran)
Assignee

Updated

2 years ago
Keywords: checkin-needed

Comment 31

2 years ago
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/7f10bba58580
use win64 PE unwind metadata to improve client-side stack walking; r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/4d92e459f9ab
adding tests; r=gsvelto
Keywords: checkin-needed
Backed out for eslint failures in toolkit/crashreporter/test/unit/head_win64cfi.js (strings must use doublequotes):

https://hg.mozilla.org/integration/autoland/rev/887526d5bb3bcb4bf5997429212fea1dd9d8f164
https://hg.mozilla.org/integration/autoland/rev/707970e804a15d32df7469bac05a809f19ee8ccc

Push with failure: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=4d92e459f9ab3bb31fd7e4d230762bb56d6479fd&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=runnable
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=121397835&repo=autoland

TEST-UNEXPECTED-ERROR | /home/worker/checkouts/gecko/toolkit/crashreporter/test/unit/head_win64cfi.js:73:37 | Strings must use doublequote. (quotes)
TEST-UNEXPECTED-ERROR | /home/worker/checkouts/gecko/toolkit/crashreporter/test/unit/head_win64cfi.js:128:41 | Strings must use doublequote. (quotes)
Flags: needinfo?(ccorcoran)
Comment hidden (mozreview-request)
Assignee

Updated

2 years ago
Flags: needinfo?(ccorcoran)
Assignee

Updated

2 years ago
Keywords: checkin-needed

Comment 34

2 years ago
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/c3a77d62c05d
use win64 PE unwind metadata to improve client-side stack walking; r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/64be5fefcba3
adding tests; r=gsvelto
Keywords: checkin-needed
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Reporter

Comment 40

2 years ago
mozreview-review
Comment on attachment 8898750 [details]
Bug 1333126: plugin crash tests now wait properly for crash handling;

https://reviewboard.mozilla.org/r/170150/#review175306

Ship it!
Attachment #8898750 - Flags: review?(gsvelto) → review+
Assignee

Updated

2 years ago
Flags: needinfo?(ccorcoran)
Keywords: checkin-needed

Comment 41

2 years ago
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/7bde9d6e783a
use win64 PE unwind metadata to improve client-side stack walking; r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/5872274ff102
adding tests; r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/4b675a7f0743
plugin crash tests now wait properly for crash handling; r=gsvelto
Keywords: checkin-needed
Backed out for permafailing browser-chrome's browser/base/content/test/plugins/browser_pluginCrashReportNonDeterminism.js on OS X 10.10 debug:

https://hg.mozilla.org/integration/autoland/rev/daa30ecfd1d4c50e864fcacdde28d67eb260de24
https://hg.mozilla.org/integration/autoland/rev/aba2a28c7fbe9f23caa4c7b21cf09c400a08edda
https://hg.mozilla.org/integration/autoland/rev/d04d4fdcd808a4a1f469893c5fca11c304698c5c

Push with failure: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=4b675a7f07430a85b02e03fd9f32fe5661fff11b&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=124400126&repo=autoland

09:59:48     INFO - TEST-PASS | browser/base/content/test/plugins/browser_pluginCrashReportNonDeterminism.js | Found minidump - 
09:59:48     INFO - TEST-PASS | browser/base/content/test/plugins/browser_pluginCrashReportNonDeterminism.js | Found extra file - 
09:59:48     INFO - TEST-PASS | browser/base/content/test/plugins/browser_pluginCrashReportNonDeterminism.js | Should have been showing crash report UI - "please" == "please" - 
09:59:48     INFO - Buffered messages finished
09:59:48     INFO - TEST-UNEXPECTED-FAIL | browser/base/content/test/plugins/browser_pluginCrashReportNonDeterminism.js | Test timed out - 

Also hits windows7-32-stylo debug https://treeherder.mozilla.org/logviewer.html#?job_id=124404720&repo=autoland
Flags: needinfo?(ccorcoran)
Reporter

Updated

2 years ago
Depends on: 1393800
Assignee

Comment 43

2 years ago
Posted patch 1333126.patchSplinter Review
Gabriele you may need to land this one without me; attaching a patch after latest rebase/merge. Had to change how command line arguments were passed to minidump-analyzer.exe in my tests.
Flags: needinfo?(ccorcoran)
Reporter

Comment 44

2 years ago
(In reply to Carl Corcoran [:ccorcoran] (unavailable for NI) from comment #43)
> Gabriele you may need to land this one without me; attaching a patch after
> latest rebase/merge. Had to change how command line arguments were passed to
> minidump-analyzer.exe in my tests.

Yeah, I'm sorry we couldn't land this earlier, unfortunately I haven't been able to land bug 
1393800 yet. As soon as I do I'll land your patch too.
Reporter

Comment 45

2 years ago
I've used attachment 8904643 [details] [diff] [review] to rebase the patch set and manually fixed some stuff in the minidump analyzer to make this apply cleanly to mozilla-central since it had bitrotted a bit. The resulting patches are in this try run, I hope to be able to land this soon:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=36d9c5d82164a316e4fa6ea282d78c2f401c283f

Updated

2 years ago
QA Contact: amasresha
Reporter

Comment 46

2 years ago
All the issues that prevented us from landing this patch have been fixed, I've made one last try run to be sure everything is alright and it's looking good:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9a0726221219539c86f0b50cd784db85fe0c0ec3

I'll squash the rebased patches and land them.
Depends on: 1410165, 1402248

Comment 47

2 years ago
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/bfad7e6d6190
Use win64 PE unwind metadata to improve client-side stack walking; r=gsvelto
Sorry, this had to be backed out for timing out in browser-chrome's browser/base/content/test/plugins/browser_pluginCrashCommentAndURL.js on Windows 10 x64 opt and pgo:

https://hg.mozilla.org/integration/mozilla-inbound/rev/12326a82a8555e0df3cc1637afa6a9be6416ae6b

Push range with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-resultStatus=usercancel&filter-resultStatus=runnable&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=success&tochange=2311fe1447d86bffef4ae0d9e4433325ff9d9848&fromchange=15663d3f8728823e6e48eca1d785b84d1de1edcd&filter-searchStr=windows10%20browser-chrome&selectedJob=139932020
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=139932020&repo=mozilla-inbound

17:31:59     INFO -  170 INFO TEST-START | browser/base/content/test/plugins/browser_pluginCrashCommentAndURL.js
17:31:59     INFO -  GECKO(16668) | ###!!! [Parent][MessageChannel::Call] Error: Channel error: cannot send/recv
17:31:59     INFO -  GECKO(16668) | ###!!! [Parent][MessageChannel] Error: (msgtype=0x51001E,name=PPluginInstance::Msg_AsyncSetWindow) Channel error: cannot send/recv
17:32:00     INFO -  GECKO(16668) | ###!!! [Parent][MessageChannel::Call] Error: Channel error: cannot send/recv
17:32:00     INFO -  GECKO(16668) | ###!!! [Parent][MessageChannel] Error: (msgtype=0x51001E,name=PPluginInstance::Msg_AsyncSetWindow) Channel error: cannot send/recv
17:32:00     INFO -  GECKO(16668) | ###!!! [Parent][MessageChannel::Call] Error: Channel error: cannot send/recv
17:32:00     INFO -  GECKO(16668) | ###!!! [Parent][MessageChannel] Error: (msgtype=0x51001E,name=PPluginInstance::Msg_AsyncSetWindow) Channel error: cannot send/recv
17:32:00     INFO -  GECKO(16668) | ###!!! [Parent][MessageChannel] Error: (msgtype=0x51001E,name=PPluginInstance::Msg_AsyncSetWindow) Channel error: cannot send/recv
17:32:29     INFO -  GECKO(16668) | [NPAPI 2408, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
17:32:29     INFO -  GECKO(16668) | [NPAPI 2408, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
17:49:09     INFO - Automation Error: mozprocess timed out after 1000 seconds running ['Z:\\task_1509035741\\build\\venv\\Scripts\\python', '-u', 'Z:\\task_1509035741\\build\\tests\\mochitest\\runtests.py', '--total-chunks', '7', '--this-chunk', '1', '--appname=Z:\\task_1509035741\\build\\application\\firefox\\firefox.exe', '--utility-path=tests/bin', '--extra-profile-file=tests/bin/plugins', '--symbols-path=https://queue.taskcluster.net/v1/task/bIPIYPriQv-KfzVFKX3FEQ/artifacts/public/build/target.crashreporter-symbols.zip', '--certificate-path=tests/certs', '--quiet', '--log-raw=Z:\\task_1509035741\\build\\blobber_upload_dir\\browser-chrome-chunked_raw.log', '--log-errorsummary=Z:\\task_1509035741\\build\\blobber_upload_dir\\browser-chrome-chunked_errorsummary.log', '--screenshot-on-fail', '--cleanup-crashes', '--marionette-startup-timeout=180', '--flavor=browser', '--chunk-by-runtime']
[taskcluster 2017-10-26T18:29:22.123Z] Aborting task - max run time exceeded!

Please fix the issue and submit an updated patch.
Reporter

Comment 49

2 years ago
I've rebased this again, fixed a build issue that had cropped up an I'm now investigating the weird issues that cropped up during the previous landing. The try run is here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d86a311444e04f43f9823481dd595590a83040be
Reporter

Comment 50

2 years ago
I forgot to update the patch in the previous try run, this one has the fix: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a08eb359157864ec4b8d2b3878300836b191f853
Reporter

Comment 51

2 years ago
The tests that had previously failed are now green, I'm not sure what the issue was since they hadn't failed before, only after pushing the patch. Landing this again.

Comment 52

2 years ago
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fe038577ea44
Use win64 PE unwind metadata to improve client-side stack walking; r=gsvelto

Comment 53

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/fe038577ea44
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Depends on: 1412035
Backed out Bug 1333126 for failing Browser Chrome Tests on Windows 10 browser_pluginCrashCommentAndURL.js

https://hg.mozilla.org/mozilla-central/rev/2535bad09d720e71a982f3f70dd6925f66ab8ec7

For failures, please check Bug 1412035
Status: RESOLVED → REOPENED
Flags: needinfo?(gsvelto)
Resolution: FIXED → ---
Target Milestone: mozilla58 → ---
Priority: -- → P1
Reporter

Comment 55

2 years ago
After a lot of investigation I've identified two issues that have been causing the failure. The major one is within the test harness (surprise, surprise) and another one is in the test itself. I thought I had fixed all the remaining harness issues related to crashes in bug 1393800 and bug 1410165 but I was wrong. This snippet in browser-test.js tries to remove leftover crash dump files after a browser mochitest has finished:

http://searchfox.org/mozilla-central/rev/c99d035f00dd894feff38e4ad28a73fb679c63a6/testing/mochitest/browser-test.js#732

However, it fails to wait for crashes to be recorded, so in this case it's racing against the minidump analyzer and failing when trying to remove the file here:

http://searchfox.org/mozilla-central/rev/c99d035f00dd894feff38e4ad28a73fb679c63a6/testing/mochitest/browser-test.js#741

Somehow the error caused by the file being locked is not visible in the log, so it might have been caught at some higher level, and since the error isn't handled directly in that piece of code it causes the harness to hang right after the test has finished. To make matters worse it's looking for crash dumps under the Crash Reporter/pending dir within UAppData which isn't always true when running tests.

To add insult to injury the test itself is doing something wrong, namely it's setting the MOZ_CRASHREPORTER_NO_REPORT environment variable which prevents the crashreporter client from being launched:

http://searchfox.org/mozilla-central/rev/c99d035f00dd894feff38e4ad28a73fb679c63a6/browser/base/content/test/plugins/browser_pluginCrashCommentAndURL.js#30

This doesn't really matter since this test is not dealing with browser crashes and thus the crashreporter client wouldn't be started anyway, but it also changes the place where crash dumps are stored. Instead of ending up in the temporary folder used by the test it ends up in the live directory where pending crashes go, such as C:\Users\<username>\AppData\Roaming\Mozilla\Firefox\Crash Reports\pending

This means that the test can actually leave crash dumps behind on the machine after the test has finished. I'll post a fix for both problems tomorrow.
Flags: needinfo?(gsvelto)
Reporter

Updated

2 years ago
Depends on: 1416028
Depends on: 1416078
Blocks: 1416078
No longer depends on: 1416078
Reporter

Comment 56

2 years ago
This is the latest try run with the patch for bug 1416028. I went through all the failures one by one and they all look intermittent issues unrelated to this. It's kind of crazy how this patch became a pipe cleaner for all sort of test harness problems, but at least all the issues that popped out seem fixed now; let's hope it sticks this time.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=555e19d650aa89949dba1ba2e7bd4c0f551b6ecf

Comment 57

2 years ago
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a6b0f77e29a2
Use win64 PE unwind metadata to improve client-side stack walking; r=gsvelto

Comment 58

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/a6b0f77e29a2
Status: REOPENED → RESOLVED
Last Resolved: 2 years ago2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
We tested this on Windows-10&7 x64 with 64 bits of Firefox(with and without this patch). With this fix, we verified the addition of CFI-based stack-walking support on Windows x64.
The steps used for testing and results are here: https://public.etherpad-mozilla.org/p/1333126
Status: RESOLVED → VERIFIED
Reporter

Comment 60

2 years ago
Comment on attachment 8861146 [details]
Bug 1333126: use win64 PE unwind metadata to improve client-side stack walking;

Approval Request Comment
[Feature/Bug causing the regression]: New feature
[User impact if declined]: Not directly user-visible but stack traces in crash pings on Windows 64 will be of mediocre quality which will lower our ability to quickly identify crashes
[Is this code covered by automated tests?]: Yes
[Has the fix been verified in Nightly?]: Yes
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: Bug 1416028, which is a test-only fix, is required otherwise the new tests in this bug can cause intermittent issues
[Is the change risky?]: Risk is minimal
[Why is the change risky/not risky?]: This patch doesn't affect gecko in any way, all the changes are restricted to the stand-alone minidump-analyzer tool
[String changes made/needed]: None

FYI the patches were rebased and landed as on in https://hg.mozilla.org/mozilla-central/rev/a6b0f77e29a2 so that's what would need to be uplifted
Attachment #8861146 - Flags: approval-mozilla-beta?
Hi :gsvelto,
We usually don't allow new features to be uplift to beta. Do you have any business/marketing reason why we need this in Beta58? If not, can we let it ride the train on 59?
Flags: needinfo?(gsvelto)
Reporter

Comment 62

2 years ago
This is not really a new feature since it improves upon an existing one. Starting with Firefox 57 we have support for putting stack traces in crash pings, something that should allow us to identify frequent crashes faster than on Socorro. Under 64-bit Windows this support relies on stack scanning which is a heuristic technique that might or might not produce good results when generating the stack-trace. With this patch we use CFI-based stack crawling which essentially guarantees that we'll always get a good stack-trace from a crash. Since we're moving a large portion of our Windows population to the 64-bit build we deemed it important to gather more information about crashes starting with Firefox 58. Additionally if the dependency bug 1416028 is also uplifted it will reduce the amount of intermittent issues on try for the beta branch. Also I'd like to stress that neither this patch nor bug 1416028 are touching gecko code. All the changes happen in the test harness and in an external tool that is invoked only when a crash occurs.
Flags: needinfo?(gsvelto)
68% of Windows users on the Release channel are now running 64-bit Firefox, so it would be good to improve 64-bit crash reporting in 58.
Comment on attachment 8861146 [details]
Bug 1333126: use win64 PE unwind metadata to improve client-side stack walking;

Thanks for the detailed information. Let's take it to improve 64-bit crash reporting in 58. Beta58+.
Attachment #8861146 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment 66

a year ago
Pushed by dmajor@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/64db6b9c2139
Add #include to fix non-unified build in Win64ModuleUnwindMetadata.cpp. DONTBUILD
You need to log in before you can comment on or make changes to this bug.