Bug 1513284 (aarch64-crash-reporting)

[meta] Enable the full functionality of the crashreporter in aarch64 windows builds

RESOLVED FIXED

Status

()

enhancement
RESOLVED FIXED
6 months ago
4 months ago

People

(Reporter: froydnj, Assigned: gsvelto)

Tracking

(Depends on 1 bug, Blocks 1 bug, {meta})

Firefox Tracking Flags

(Not tracked)

Details

Reporter

Description

6 months ago
I thought I filed this bug a long time ago, but apparently not!  Ooops.

For reasons I didn't entirely understand, automation builds were falling over trying to build the crashreporter on aarch64 windows  I *thought* local builds were building the crashreporter, but it's entirely possible that different environments may have been inadvertently turning off the crashreporter for me locally.

I fixed an obvious bit in bug 1506138, but trying to address the remaining problems turned up a web of dependencies that I wasn't quite prepared to address.  So I turned them off in order to get builds into place:

https://searchfox.org/mozilla-central/source/browser/config/mozconfigs/win64-aarch64/common-opt#29
https://searchfox.org/mozilla-central/source/browser/config/mozconfigs/win64-aarch64/debug#5

We need to remove those lines before shipping a dogfoodable nightly, so that we can get reports about all the things that are going wrong.

Gabriele, are you able to look into re-enabling the crashreporter and fixing up the compilation issues?  Some of this is straight-up compilation issues, some things may be related to bug 1497583, etc. etc.
Flags: needinfo?(gsvelto)
Huh, interesting. I thought the crashreporter was disabled by default, due to the absence of an enable here https://searchfox.org/mozilla-central/rev/adcc169dcf58c2e45ba65c4ed5661d666fc3ac74/old-configure.in#2592

I guess that was only for local builds, maybe automation was going through here? https://searchfox.org/mozilla-central/rev/adcc169dcf58c2e45ba65c4ed5661d666fc3ac74/build/automation.py.in#198
Reporter

Comment 2

6 months ago
(In reply to David Major [:dmajor] from comment #1)
> Huh, interesting. I thought the crashreporter was disabled by default, due
> to the absence of an enable here
> https://searchfox.org/mozilla-central/rev/
> adcc169dcf58c2e45ba65c4ed5661d666fc3ac74/old-configure.in#2592

I am 85% sure that since --disable-crashreporter is defined with MOZ_ARG_DISABLE_BOOL, we wind up defaulting to MOZ_CRASHREPORTER=F, which is effectively the same as MOZ_CRASHREPORTER=1.

> I guess that was only for local builds, maybe automation was going through
> here?
> https://searchfox.org/mozilla-central/rev/
> adcc169dcf58c2e45ba65c4ed5661d666fc3ac74/build/automation.py.in#198

My impression was this particular bit was used for tests, not actual builds.  But I could be totally wrong about that!
(In reply to Nathan Froyd [:froydnj] from comment #2)
> I am 85% sure that since --disable-crashreporter is defined with
> MOZ_ARG_DISABLE_BOOL, we wind up defaulting to MOZ_CRASHREPORTER=F, which is
> effectively the same as MOZ_CRASHREPORTER=1.

It's been a while since I've looked, but I'm pretty sure I had 'MOZ_CRASHREPORTER': '' in config.status.
Reporter

Comment 4

6 months ago
Oh, oh, I remember what it was.  We forcibly turn the crashreporter on everywhere:

https://searchfox.org/mozilla-central/source/build/mozconfig.common#15

which gets included by the aarch64 configs.  So yes, it's not enabled by default on a local build, but it *is* enabled in automation--or will be, once this bug is completed.
Assignee

Comment 5

6 months ago
(In reply to Nathan Froyd [:froydnj] from comment #0)
> Gabriele, are you able to look into re-enabling the crashreporter and fixing
> up the compilation issues?  Some of this is straight-up compilation issues,
> some things may be related to bug 1497583, etc. etc.

Yes, I'll start working on it this week.
Assignee

Comment 6

6 months ago
I've made my first ARM64 build yesterday and the client-side part seems to build correctly. The host tools whoever don't which is weird because they should be plain x86, there's probably some issue in the build system that needs fixing.
Assignee: nobody → gsvelto
Status: NEW → ASSIGNED
Flags: needinfo?(gsvelto)
Assignee

Comment 7

6 months ago
Progress! I was missing some library dependencies for the host tools and once I added them back they're now building correctly. In half an hour I should know if I have a working firefox with crash reporting enabled *and* tools to process the dumps. Fingers crossed.
Looks like I beat you to it in bug 1514209.
Assignee

Comment 9

6 months ago
(In reply to Mike Hommey [:glandium] from comment #8)
> Looks like I beat you to it in bug 1514209.

Yup. I'll use this bug to actually make it work then.
Assignee

Updated

6 months ago
Summary: turn the crashreporter back on for aarch64 windows builds → Enable the full functionality of the crashreporter in aarch64 windows builds
Assignee

Comment 10

6 months ago
Quick update: dump_syms builds correctly once the proper host lib is added during compilation and it processes xul.dll correctly. I've compared the output to that of a regular x64 build and they seem close enough with most differences being down to code generation and platform-specific functionality. There is one notable difference between the two builds and it's that xul.dll hasn't been stripped in the AArch64 build (but the associated .pdb file is generated correctly). I don't know why it's happening but it might be related to the fact that I've compiled it with cl versus clang-cl for the x64 build.
Assignee

Comment 11

6 months ago
First the good news, what works:

- Force-enabling the crash reporter works, we can probably turn it on in all builds
- The exception handler is intercepting both content and main process crashes, nice!
- Submission of crashes from about:crashes (mostly) works
- The crash reporter client is launched correctly upon a browser crash
- Client-side stack walking works and produces reasonable stacks so we should have full crash telemetry

Then the bad, what does not work:

- Submitting crash reports from the crash reporter client always fails, I'll have to debug that
- The crashed tab page doesn't have the option to submit the crash report as if the crash reporter was disabled
- Socorro doesn't seem to like the AArch64 minidumps, here's an example with multiple problems: https://crash-stats.mozilla.com/report/index/8fc5538f-6134-44bc-b72e-2e2ff0181220
 - There's no stacks
 - The build architecture and build architecture info contain weird values
- Some aspects of hardware detection are not working correctly, we're detecting two graphics adapters on my laptop which seems odd
Assignee

Comment 12

6 months ago
Will can you check what's wrong with stack generation for the crash report I mentioned in comment 11?
Flags: needinfo?(willkg)
Running our minidump-stackwalker on that crash yields this output:

"""
app@265aa086bccc:/app$ ./run_mdsw.sh crashdata/v2/raw_crash/8fc/20181220/8fc5538f-6134-44bc-b72e-2e2ff0181220 crashdata/v1/dump/8fc5538f-6134-44bc-b72e-2e2ff0181220 
2018-12-20 15:33:34: minidump.cc:5008: INFO: Minidump opened minidump crashdata/v1/dump/8fc5538f-6134-44bc-b72e-2e2ff0181220
2018-12-20 15:33:34: minidump.cc:5128: INFO: Minidump not byte-swapping minidump
2018-12-20 15:33:34: minidump.cc:5771: INFO: GetStream: type 1197932546 not present
2018-12-20 15:33:34: minidump.cc:2203: INFO: MinidumpModule could not determine version for C:\Windows\System32\InputHost.dll
2018-12-20 15:33:34: minidump.cc:2203: INFO: MinidumpModule could not determine version for C:\Windows\System32\InputHost.dll
2018-12-20 15:33:34: minidump_processor.cc:152: INFO: Found 3093 memory regions.
2018-12-20 15:33:34: minidump_processor.cc:162: INFO: Minidump crashdata/v1/dump/8fc5538f-6134-44bc-b72e-2e2ff0181220 has CPU info, OS info, Breakpad info, exception, module list, thread list, dump thread, requesting thread, and process create time
2018-12-20 15:33:34: minidump_processor.cc:201: INFO: Looking at thread crashdata/v1/dump/8fc5538f-6134-44bc-b72e-2e2ff0181220:0/76 id 0x18a8
2018-12-20 15:33:34: minidump.cc:1118: INFO: MinidumpContext unknown context type 0x400000
2018-12-20 15:33:34: minidump.cc:1553: ERROR: MinidumpThread cannot read context
2018-12-20 15:33:34: minidump.cc:1118: INFO: MinidumpContext unknown context type 0x400000
2018-12-20 15:33:34: minidump.cc:3173: INFO: MinidumpException cannot read context
2018-12-20 15:33:34: minidump.cc:1118: INFO: MinidumpContext unknown context type 0x400000
2018-12-20 15:33:34: minidump.cc:1553: ERROR: MinidumpThread cannot read context
2018-12-20 15:33:34: stackwalker.cc:205: ERROR: Can't choose a stackwalker implementation without context
2018-12-20 15:33:34: minidump_processor.cc:287: ERROR: No stackwalker for crashdata/v1/dump/8fc5538f-6134-44bc-b72e-2e2ff0181220:0/76 id 0x18a8
2018-12-20 15:33:34: minidump_processor.cc:201: INFO: Looking at thread crashdata/v1/dump/8fc5538f-6134-44bc-b72e-2e2ff0181220:1/76 id 0x1fdc
2018-12-20 15:33:34: minidump.cc:1118: INFO: MinidumpContext unknown context type 0x400000
2018-12-20 15:33:34: minidump.cc:1553: ERROR: MinidumpThread cannot read context
etc
"""

Socorro is using breakpad rev a61afe7a3e865f1da7ff7185184fe23977c2adca. Maybe we need to update it?
Flags: needinfo?(willkg)
Can soccoro use the one from tooltool? Run `./mach artifact toolchain --tooltool-manifest=testing/config/tooltool-manifests/linux64/releng.manifest` from a m-c checkout.
Flags: needinfo?(willkg)
Assignee

Comment 15

6 months ago
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #13)
> Socorro is using breakpad rev a61afe7a3e865f1da7ff7185184fe23977c2adca.
> Maybe we need to update it?

Yeah, it's pretty old and it's missing the changes to understand AArch64 minidumps using Microsoft format. Our checkout in mozilla-central is at version 1459e5df74dd03b7d3d473e6d271413d7aa98a88 plus some of our patches. That should be enough and as Mike mentioned the one in tooltool should also be good enough.
Socorro has a taskcluster script to build it that Ted wrote. We don't have anything that tells us when the breakpad client we use is out of date, so that's why we're in the situation we're in.

I wrote up bug #1515805 to update it.
Flags: needinfo?(willkg)
Assignee

Comment 17

6 months ago
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #16)
> Socorro has a taskcluster script to build it that Ted wrote. We don't have
> anything that tells us when the breakpad client we use is out of date, so
> that's why we're in the situation we're in.
> 
> I wrote up bug #1515805 to update it.

Thanks for this Will. I'm well aware of the problems with keeping our various breakpads in sync, hopefully this year we'll consolidate everything in just one place.
(In reply to Mike Hommey [:glandium] from comment #14)
> Can soccoro use the one from tooltool? Run `./mach artifact toolchain
> --tooltool-manifest=testing/config/tooltool-manifests/linux64/releng.
> manifest` from a m-c checkout.

No, Socorro has a custom minidump_stackwalk:
https://github.com/mozilla-services/socorro/blob/master/minidump-stackwalk/stackwalker.cc
Assignee

Comment 19

6 months ago
I re-tested the crashreporter client today and it's submitting reports correctly. The reason why it didn't work the last time I tried was that it was trying to send them to the wrong URL but that's because I had hacked it for testing. So that's working too. We just need support for submitting content crashes and we're good to go.
(In reply to Ted Mielczarek [:ted] [:ted.mielczarek] from comment #18)
> (In reply to Mike Hommey [:glandium] from comment #14)
> > Can soccoro use the one from tooltool? Run `./mach artifact toolchain
> > --tooltool-manifest=testing/config/tooltool-manifests/linux64/releng.
> > manifest` from a m-c checkout.
> 
> No, Socorro has a custom minidump_stackwalk:
> https://github.com/mozilla-services/socorro/blob/master/minidump-stackwalk/
> stackwalker.cc

How does it differ from the one we have in tooltool?
Assignee

Comment 21

6 months ago
I've been testing content crash handling for a few days and I still haven't figured out what's wrong. Most of the infrastructure works correctly, including the crash generation server & client components. Both start and connect correctly, they even shut down cleanly when a process is closed... But somehow actual minidump generation isn't working. It seems that the crash generation client never requests a minidump from the server so maybe the actual exception handler code isn't doing what it's supposed to do in the content processes.
Assignee

Comment 22

6 months ago
I'm turning this into a meta-bug because it depends on multiple different things and it makes sense to fix them up incrementally.
Depends on: 1484835
Keywords: meta
Summary: Enable the full functionality of the crashreporter in aarch64 windows builds → [meta] Enable the full functionality of the crashreporter in aarch64 windows builds
Assignee

Updated

6 months ago
Depends on: 1517729
Assignee

Updated

6 months ago
Depends on: 1517730
Assignee

Updated

6 months ago
Depends on: 1517739
Assignee

Updated

5 months ago
Depends on: 1518947
Assignee

Updated

5 months ago
Depends on: 1519802
Assignee

Updated

5 months ago
Depends on: 1519845
Assignee

Updated

5 months ago
No longer depends on: 1519802
Assignee

Updated

5 months ago
Depends on: 1523588
Assignee

Updated

5 months ago
Depends on: 1524142
Assignee

Updated

5 months ago
Alias: aarch64-crash-reporting
Assignee

Comment 23

5 months ago

Removing bug 1517739 because it only affects crash pings and is basically just polish. Closing as we've got everything else working already.

Status: ASSIGNED → RESOLVED
Closed: 5 months ago
No longer depends on: 1517739
Resolution: --- → FIXED

Updated

4 months ago
Depends on: 1525671
Depends on: 1525678
Depends on: 1525670
Depends on: 1525667
Depends on: 1525743
Reporter

Updated

4 months ago
Depends on: 1529355
You need to log in before you can comment on or make changes to this bug.