Closed Bug 1309172 Opened 3 years ago Closed Last year

Update Breakpad

Categories

(Toolkit :: Crash Reporting, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox63 --- fixed

People

(Reporter: gsvelto, Assigned: gsvelto)

References

(Depends on 1 open bug)

Details

Attachments

(3 files, 5 obsolete files)

+++ This bug was initially created as a clone of Bug #1264367 +++

We've run into a bunch of issues when sync'ing breakpad with upstream in bug 1264367 mostly because we had fallen back quite a bit. This is a reminder to update breakpad soon-ish, possibly after having fixed the issues with merging module mappings we encountered, least we fall back again.
Blocks: 1316159
I've landed a fix upstream that makes minidump UUIDs truly random on Linux:

https://chromium.googlesource.com/breakpad/breakpad/+/a9fca58305afcb16a7ae539d87d672c702aba3f1

This might be a good time to pull from upstream so that we get that change.
Depends on: 1421988
Since I've picked up bug 1440282 it's about time we do this.
Assignee: nobody → gsvelto
Status: NEW → ASSIGNED
These are revisions 17ad0c18b179c135fc5a3d2bba199c3fa4276035 and
8915f7be39448d9257b6da3ad0233944d1d9a92a which had not been included during
the last pull because they caused issues when generating minidumps on Fennec.
Attachment #8983791 - Attachment description: [PATCH] Apply brekpad revisions which were previously dropped → [PATCH 1/4] Apply brekpad revisions which were previously dropped
This includes only the vanilla files we haven't forked
This was done by manually applying the relevant patches all the way up to version 7b3afa9258e58a57ffbeb395d445811f92616ae9. I had to manually fix only a handful of conflicts.
This updates breakpad to match the current upstream. I've used the update-breakpad.sh script to generate attachment 8983793 [details] [diff] [review] and I created attachment 8983793 [details] [diff] [review] by manually applying the changes that were done to the client sources. Ted, do you think this looks sane?

It builds correctly and I could capture some minidumps on Linux that show no major differences with the old ones; all the relevant fields seem to be populated correctly. Still, I've got plenty of testing to do though, especially Fennec which regressed the last time we did this.
Flags: needinfo?(ted)
Here's a try run for all architectures: https://treeherder.mozilla.org/#/jobs?repo=try&revision=889d6879fd3d1b688bea59ca6e0c9a5916d6ca72

I hope it crashes so that I can have a look at the minidumps. I've included all the commits we had to previously back out in the hope that the issues with merging mappings have been sorted out in the meantime.
The minidump-analyzer needed a Windows-specific fix to handle unloaded modules, a new feature that was introduced in the meantime.
Attachment #8983794 - Attachment is obsolete: true
I've tested the patch-set on automation and there seems to be an issue symbolicating stacks on Linux, the other architectures look OK. This is a try run with a fake crash injected in the xpcshell-test with the patches applied:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b380d5e610559b3412f67c997bdfbb12bdef81a8

And this is without the patches:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b0c7e89906e0b0a925c7befc96d9ab56b17f8cf9
I've just finished analyzing the minidumps for both runs and the issue with Linux appears to be the same we encountered the last time I tried updating breakpad: the libxul.so mappings aren't been merged correctly and two entries for libxul.so are added to the modules list instead of one. This throws of symbolication since one of the two entries has no code or debug identifiers. I'll probable have to revert the problematic commits in breakpad like I did the old time. I'll also file a bug to fix this for good.
This includes only the vanilla files we haven't forked
The following changes were not included as they break merging the segments
corresponding to libxul.so in the module list:

8915f7be39448d9257b6da3ad0233944d1d9a92a
17ad0c18b179c135fc5a3d2bba199c3fa4276035
94b6309aecaddfcf11672f6cfad9575d68ad3b40

With these changes applied two entries for libxul.so are generated, the second
one is bogus and prevents symbolication from working correctly.
Attachment #8983791 - Attachment is obsolete: true
Attachment #8983792 - Attachment is obsolete: true
Attachment #8983793 - Attachment is obsolete: true
Attachment #8984947 - Attachment is obsolete: true
I've uploaded the patches to Phabricator, since it's the first time I upload a stack of them I hope I did it the right way. The patch-set is split into three parts: the first includes all the changes to the vanilla breakpad sources, the second the changes to our forked code minus the problematic changes on the breakpad side and the third the fixups required to accommodate for the changes.

Here's an updated try run with the bogus crash to ensure that symbolication is working correctly:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a79e27eab6e5902feda15dc3169293ea7b1d2000

Ted, do we need to anything on Socorro's side to accommodate for this? I remember you had to update something the previous time we did this but I don't remember what exactly.
Did you include the changes from bug 1464537?
(In reply to Mike Hommey [:glandium] from comment #16)
> Did you include the changes from bug 1464537?

No, our update script is OK with cherry-picked commits but will remove stuff we haven't pushed upstream. I'll make sure this is included (and possibly other changes we might have missed).
I've updated the patch including the changes from bug 1464537. I'll also try to upstream this so that it doesn't get lost the next time around.
After reading a bit more on bug 1464537 I was wondering if I did the right thing. You mentioned on that bug that [1] is sufficient for fixing this issue but I'm not sure if it is. Looking at the code it seems that if we fail to demangle the name the resulting name is still cleared so part of your change would still be needed. Or did I get it wrong?

[1] https://chromium.googlesource.com/breakpad/breakpad/+/7398ce15b79daf038cd07fbae4e37e183e99788d
Flags: needinfo?(mh+mozilla)
What I said in that bug is that there is one part that was already fixed upstream, and that I filed the other.
Flags: needinfo?(mh+mozilla)
FYI I r+ed the other patch in Gerrit, but I can't seem to submit it, perhaps because the patch is in a weird state? glandium said he can't log into the Google account he used to submit the change while he's away from home, so we might need to wait for him to do that.
Flags: needinfo?(ted)
Comment on attachment 8985135 [details]
Bug 1309172 - Updated breakpad to version 7b3afa9258e58a57ffbeb395d445811f92616ae9

Ted Mielczarek [:ted.mielczarek] has approved the revision.

https://phabricator.services.mozilla.com/D1640
Attachment #8985135 - Flags: review+
Alright, I'll wait for Mike's change to be upstreamed and then I'll update the patch here so that the revision in GIT-INFO includes it.
Comment on attachment 8985136 [details]
Bug 1309172 - Updated the forked Breakpad client sources by manually applying working revisions up to version 7b3afa9258e58a57ffbeb395d445811f92616ae9

Ted Mielczarek [:ted.mielczarek] has approved the revision.

https://phabricator.services.mozilla.com/D1641
Attachment #8985136 - Flags: review+
(In reply to Gabriele Svelto [:gsvelto] from comment #23)
> Alright, I'll wait for Mike's change to be upstreamed and then I'll update
> the patch here so that the revision in GIT-INFO includes it.

glandium fixed the patch state and I submitted his patch:
https://chromium.googlesource.com/breakpad/breakpad/+/69c2c51dd89965d234eec16e3a9353634831916b
Cool, I'm updating the patches right now.
BTW, I'll have to squash them before landing because they cannot be built separately and if something goes wrong I want to make the backout as painless as possible.
Patch set updated with revision 69c2c51dd89965d234eec16e3a9353634831916b, we're almost good to land.
Comment on attachment 8985137 [details]
Bug 1309172 - Build system and tools adjustments

Ted Mielczarek [:ted.mielczarek] has approved the revision.

https://phabricator.services.mozilla.com/D1642
Attachment #8985137 - Flags: review+
(In reply to Gabriele Svelto [:gsvelto] from comment #15)
> Ted, do we need to anything on Socorro's side to accommodate for this? I
> remember you had to update something the previous time we did this but I
> don't remember what exactly.

We only need to coordinate a Socorro update if we update the client code in Firefox such that it changes the generated minidumps in a backwards-incompatible way, or if we update the symbol dumping code such that generated .sym files will change in a backwards-incompatible way.

The rev of Breakpad that Socorro's stackwalker is built against is pinned here, FYI:
https://github.com/mozilla-services/socorro/blob/3a5b7131ca961d1d37c0a540ed61c9309134f0f7/scripts/build-breakpad.sh#L18

It was last updated in bug 1421988 for exactly those reasons (the symbol format got changed) so we should be in good shape as long as nothing in the client has changed the dump format in an odd way, and it doesn't sound like it has.
I tested submitting dumps to Socorro and they seem to be processed just fine; crashes on try are also properly symbolicated. I'll squash the patches and land them; if I missed something in my testing backing it out should be easy enough.
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/07a62b4e7923
Updated breakpad to version 69c2c51dd89965d234eec16e3a9353634831916b; r=ted.mielczarek
Backout by aciure@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e96c47860e6e
Backed out 1 changesets for breakpad related failures. CLOSED TREE
It seems that fix_stack_using_bpsyms.py is encountering an unexpected character in the symbol file for a PUBLIC entry. Specifically I found these lines:

webrtc-gtest.pdb/150FC4DC2A3C4F4D9E084E1B4A5FBF631/webrtc-gtest.sym:PUBLIC m 8f7d93 0 _intrinsic_setjmp
xul.pdb/6402337BFF9149C1819558441909A84E1/xul.sym:PUBLIC m 5c49783 0 _intrinsic_setjmp
xul.pdb/DEF358122E104788AFDF3C3D4AC442391/xul.sym:PUBLIC m 63f93c3 0 _intrinsic_setjmp
mozglue.pdb/87479AA1ED144313BA1FB3194F6CD9931/mozglue.sym:PUBLIC m 47082 0 _intrinsic_setjmp

I wonder where that 'm' is coming from and what does it mean. I'll check it out and fix the issue in our scripts.
Flags: needinfo?(gsvelto)
This is caused by a change in breakpad that was introduced some time ago, here's the relevant issue and patch:

https://bugs.chromium.org/p/google-breakpad/issues/detail?id=751

https://chromium.googlesource.com/breakpad/breakpad/+/b1226959a25b6a5311801d6f204b088c706e7c25

I'm trying to figure out if we need changes on our side besides being able to parse the new format.
Oh shoot, right, that was the reason for bug 1421988. Ugh, I just realized that we'll have to fix symbols.mozilla.org as well because it provides the symbolication API that snappy used to.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #37)
> Oh shoot, right, that was the reason for bug 1421988. Ugh, I just realized
> that we'll have to fix symbols.mozilla.org as well because it provides the
> symbolication API that snappy used to.

Looking at the generated .sym files the only difference seem to be the _intrinsic_setjmp entries which have this 'multiple' thing added to their line, the rest of the file is identical. Do you know what should be done about it? If I make fix_stack_using_bpsyms.py ignore that character everything still seems to work correctly but I wonder if it would actually need special handling.
Flags: needinfo?(ted)
I think simply parsing and ignoring the m will be sufficient. It shouldn't change our existing behavior. I don't know exactly what the code does now in the face of multiple functions, but presumably the output is arbitrary based on the ordering in the file.
Flags: needinfo?(ted)
I've updated the last patch with the fixes to fix_stack_using_bpsyms.py. Win64 mochitests are now passing correctly.
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5284d9a25ffb
Updated breakpad to version 69c2c51dd89965d234eec16e3a9353634831916b; r=ted.mielczarek
https://hg.mozilla.org/mozilla-central/rev/5284d9a25ffb
Status: ASSIGNED → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Depends on: 1474871
Depends on: 1526897
You need to log in before you can comment on or make changes to this bug.