Closed Bug 1280470 Opened 3 years ago Closed 3 years ago

Integrate the breakpad minidump processor code

Categories

(Toolkit :: Crash Reporting, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox52 --- fixed

People

(Reporter: gsvelto, Assigned: gsvelto)

References

Details

(Whiteboard: [fce-active-legacy])

Attachments

(1 file, 7 obsolete files)

+++ This bug was initially created as a clone of Bug #1280469 +++

In order to be able to unwind the stacks in the crashreporter client we need to build the bits under the src/processor folder in breakpad. These include the stack unwinder, symbol resolver and other components used to process minidumps.
Blocks: 1280477
Whiteboard: [fce-active]
Assignee: nobody → gsvelto
Status: NEW → ASSIGNED
This does horrible things to the build system but is the first fully working version I have (though it does nothing more than printing out the traces).
Same patch as before, minus the hacks.
Attachment #8765454 - Attachment is obsolete: true
Revised patch with the necessary bits for building on Windows added.
Attachment #8765469 - Attachment is obsolete: true
This patch makes it possible to build breakpad's processor bits on Windows. It mostly addresses issue where some definition in Microsoft's header files overrides a macro or variable/field name in breakpad causing the build to fail.

Ted, I'm not sure if this is the appropriate way to handle this. Ideally this is the kind of stuff that should be fixed upstream but it seems that the Breakpad developers are uninterested in the Windows build. I managed to build some of this stuff in Breakpad's vanilla build by following this article:

http://zxstudio.org/blog/2014/10/28/integrating-google-breakpad/

It wasn't straightforward though, even on the vanilla build I had to manually update the generated Visual Studio project files and change quite a few compiler flags. All in all I'm not sure if there's any interest in keeping the code easily buildable on Windows or not so I wonder if I should try to upstream changes like these or only keep them in our repo.
Flags: needinfo?(ted)
We can take this patch upstream. Breakpad's CI story is pretty sad in general, we only just recently got Linux Travis CI builds going. Most of the processor code doesn't build on Windows, but some of it does (for Windows unit tests, at least). These look like pretty straightforward fixes, as long as they don't break the build on other platforms they should be fine.

If you can submit a patch against upstream to the Breakpad codereview site I can review and land it for you:
https://chromium.googlesource.com/breakpad/breakpad/+/master/#To-request-change-review

If that's too much hassle I can shepherd it upstream for you, but it's obviously easier for me if you submit the patch. :)

Those directions don't really cover building on Windows, but if you follow up to step 3 you should wind up with a `src/client/windows/breakpad_client.sln` that you can use to build all the Windows bits.
Flags: needinfo?(ted)
Thanks, I'll try to send them upstream myself once I've got everything building together since I also have to submit another patch for fixing bug 1280577.

I'll try building using the src/client/windows/breakpad_client.sln project. BTW I'm on VS2015, do I need to check that I'm not breaking other versions of VS too?
No, I don't think upstream cares about anything older than 2015.
Comment on attachment 8765893 [details] [diff] [review]
[PATCH] Integrate breakpad stack-walking machinery in the crash reporter

Review of attachment 8765893 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/crashreporter/google-breakpad/src/processor/moz.build
@@ +61,4 @@
>  # We allow warnings for third-party code that can be updated from upstream.
>  ALLOW_COMPILER_WARNINGS = True
>  
>  FINAL_LIBRARY = 'xul'

I think you want to remove this. This dates back to us using this code in the profiler. You're linking this code into the crashreporter client, not libxul.
After having hacked around with breakpad I've noticed that all the Windows-specific build issues I've addressed in attachment have been fixed upstream 8765898. Ted, do we have plans on when to pull from the latest breakpad upstream? We'll have to do it after I manage to land the fix for bug 1280577 there so we might wait for that and then do it. Shall I open a dedicated bug blocking this one?
Flags: needinfo?(ted)
I've already got bug 1264367 on file, I was trying to wait to get https://codereview.chromium.org/1903833002/ landed in linux-syscall-support (and updated in Breakpad) before I synced from upstream to avoid having to re-apply snorp's patch (https://hg.mozilla.org/mozilla-central/rev/0e510b9a2d46) . I haven't been able to get anyone to land that, so maybe we should just sync up and move on.
Flags: needinfo?(ted)
I'll do a Breakpad sync, and also look at reviewing and landing your patch upstream.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #11)
> I'll do a Breakpad sync, and also look at reviewing and landing your patch
> upstream.

Thanks!
Attachment #8765898 - Attachment is obsolete: true
Attachment #8765893 - Attachment description: [PATCH 1/2] Integrate breakpad stack-walking machinery in the crash reporter → [PATCH] Integrate breakpad stack-walking machinery in the crash reporter
Updated patch, not building the library into libxul.so anymore. We might need this at a later stage but for now it's not useful.
Attachment #8765893 - Attachment is obsolete: true
So there's been a bunch of discussion outside of this bug (IRC, e-mail, etc...) regarding what's the best approach to take here and the plan changed slightly. When I opened this bug the general consensus was to integrate the breakpad code into the crash reporter client and process minidumps there. We've now reached the conclusion that analyzing the minidumps is better done in a separate executable for a bunch of reasons:

- It doesn't add complexity to the crash reporter client, thus reducing the risk of causing that to crash (leaving the user with a really poor post-crash experience)

- We won't need to have two different code paths for analyzing full browser crashes (in the crash reporter client) and content crashes (in gecko). With a single external tool we can kill two birds with one stone instead.

Because of the above I'm changing the bug title to reflect the current reality and I'll post an updated patch that only builds the google-breakpad processor sources but does not link them into the crash reporter client.
Summary: Integrate the breakpad minidump processor code into the crashreporter client → Integrate the breakpad minidump processor code
Adding a dependency since we need a new breakpad version for the Windows code to build correctly.
Depends on: 1264367
Updated patch that only builds the code but doesn't link it anywhere.
Attachment #8767985 - Attachment is obsolete: true
Attachment #8774747 - Flags: review?(ted)
Blocks: 1289784
I've managed to solve the problem with logging. The only change to this patch is that it's now defining BPLOG_MINIMUM_SEVERITY to SEVERITY_ERROR. This prevents all non-error message from being printed to stderr. A quick grep through breakpad's code doesn't show any piece of code using SEVERITY_ERROR when logging so for the time being this should turn off logging entirely.
Attachment #8774747 - Attachment is obsolete: true
Attachment #8774747 - Flags: review?(ted)
Attachment #8777316 - Flags: review?(ted)
Comment on attachment 8777316 [details] [diff] [review]
[PATCH v4] Build the breakpad stack-walking machinery

Review of attachment 8777316 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/crashreporter/google-breakpad/src/processor/moz.build
@@ +53,4 @@
>      'tokenize.cc',
>  ]
>  
> +DEFINES['BPLOG_MINIMUM_SEVERITY'] = 'SEVERITY_ERROR'

Clever!

@@ +54,5 @@
>  ]
>  
> +DEFINES['BPLOG_MINIMUM_SEVERITY'] = 'SEVERITY_ERROR'
> +
> +Library('breakpad_processor_s')

You can just leave off the _s, that's a crummy historic naming convention.
Attachment #8777316 - Flags: review?(ted) → review+
Thanks for the review! I'll remove the trailing _s then. I'm not landing this yet until we can pull upstream's breakpad (or I just cherry pick the necessary bits) because with the current breakpad it would break the Windows builds.
Updated patch with review comments addressed, carrying over the r+ flag.
Attachment #8777316 - Attachment is obsolete: true
Attachment #8787174 - Flags: review+
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c85784c39e28
Build the breakpad stack-walking machinery r=ted
https://hg.mozilla.org/mozilla-central/rev/c85784c39e28
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Backed out in https://hg.mozilla.org/mozilla-central/rev/edfff7a9e4c43f6d516dfbf69a64e205e5cdb699 since I was backing out bug 1264367
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla51 → ---
oh, this patch caused linux64 pgo issues for talos tests:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=linux%20pgo%20talos&tochange=c68f029d6c55&fromchange=8b0b6ff32644322254bd8bb6d5e7ae8a31ffb700

The related patch seemed to cause a bit more talos issues, but this specific patch made talos for linux64 pretty much unusable!
Huh. I would not expect this specific patch to have any impact on Talos, since it's just building some extra code that shouldn't get linked into libxul.
well, it isn't affecting performance, more like crashing the browser- I suspect this patch won the 'unfortunate lucky bit of code lottery' and linux64 pgo is hitting some edge case.  I filed bug 1304005 to track this more- it is a serious issue which didn't exist in the past, now it is showing up on every revision (shortly after this was backed out more code landed and caused us to go bonkers again).
(In reply to Joel Maher ( :jmaher) from comment #26)
> well, it isn't affecting performance, more like crashing the browser- I
> suspect this patch won the 'unfortunate lucky bit of code lottery' and
> linux64 pgo is hitting some edge case.  I filed bug 1304005 to track this
> more- it is a serious issue which didn't exist in the past, now it is
> showing up on every revision (shortly after this was backed out more code
> landed and caused us to go bonkers again).

I'll make sure I do some Talos runs before re-landing this stuff in case it's affecting it someway, somehow.
Right, but the changes in this patch shouldn't have added any new code to the browser! It just makes us build some extra code which (in another patch) will get linked into a separate binary. It's possible the bug was tripped by the changes in bug 1264367, which did change a bunch of code that gets linked into the browser.
Now that bug 1264367 has landed and stuck we can re-land this.
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/98e543601987
Build the breakpad stack-walking machinery r=ted
https://hg.mozilla.org/mozilla-central/rev/98e543601987
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Whiteboard: [fce-active] → [fce-active-legacy]
You need to log in before you can comment on or make changes to this bug.