Closed
Bug 1513284
(aarch64-crash-reporting)
Opened 6 years ago
Closed 6 years ago
[meta] Enable the full functionality of the crashreporter in aarch64 windows builds
Categories
(Toolkit :: Crash Reporting, enhancement)
Toolkit
Crash Reporting
Tracking
()
RESOLVED
FIXED
People
(Reporter: froydnj, Assigned: gsvelto)
References
(Blocks 1 open bug)
Details
(Keywords: meta)
I thought I filed this bug a long time ago, but apparently not! Ooops.
For reasons I didn't entirely understand, automation builds were falling over trying to build the crashreporter on aarch64 windows I *thought* local builds were building the crashreporter, but it's entirely possible that different environments may have been inadvertently turning off the crashreporter for me locally.
I fixed an obvious bit in bug 1506138, but trying to address the remaining problems turned up a web of dependencies that I wasn't quite prepared to address. So I turned them off in order to get builds into place:
https://searchfox.org/mozilla-central/source/browser/config/mozconfigs/win64-aarch64/common-opt#29
https://searchfox.org/mozilla-central/source/browser/config/mozconfigs/win64-aarch64/debug#5
We need to remove those lines before shipping a dogfoodable nightly, so that we can get reports about all the things that are going wrong.
Gabriele, are you able to look into re-enabling the crashreporter and fixing up the compilation issues? Some of this is straight-up compilation issues, some things may be related to bug 1497583, etc. etc.
Flags: needinfo?(gsvelto)
Huh, interesting. I thought the crashreporter was disabled by default, due to the absence of an enable here https://searchfox.org/mozilla-central/rev/adcc169dcf58c2e45ba65c4ed5661d666fc3ac74/old-configure.in#2592
I guess that was only for local builds, maybe automation was going through here? https://searchfox.org/mozilla-central/rev/adcc169dcf58c2e45ba65c4ed5661d666fc3ac74/build/automation.py.in#198
Reporter | ||
Comment 2•6 years ago
|
||
(In reply to David Major [:dmajor] from comment #1)
> Huh, interesting. I thought the crashreporter was disabled by default, due
> to the absence of an enable here
> https://searchfox.org/mozilla-central/rev/
> adcc169dcf58c2e45ba65c4ed5661d666fc3ac74/old-configure.in#2592
I am 85% sure that since --disable-crashreporter is defined with MOZ_ARG_DISABLE_BOOL, we wind up defaulting to MOZ_CRASHREPORTER=F, which is effectively the same as MOZ_CRASHREPORTER=1.
> I guess that was only for local builds, maybe automation was going through
> here?
> https://searchfox.org/mozilla-central/rev/
> adcc169dcf58c2e45ba65c4ed5661d666fc3ac74/build/automation.py.in#198
My impression was this particular bit was used for tests, not actual builds. But I could be totally wrong about that!
(In reply to Nathan Froyd [:froydnj] from comment #2)
> I am 85% sure that since --disable-crashreporter is defined with
> MOZ_ARG_DISABLE_BOOL, we wind up defaulting to MOZ_CRASHREPORTER=F, which is
> effectively the same as MOZ_CRASHREPORTER=1.
It's been a while since I've looked, but I'm pretty sure I had 'MOZ_CRASHREPORTER': '' in config.status.
Reporter | ||
Comment 4•6 years ago
|
||
Oh, oh, I remember what it was. We forcibly turn the crashreporter on everywhere:
https://searchfox.org/mozilla-central/source/build/mozconfig.common#15
which gets included by the aarch64 configs. So yes, it's not enabled by default on a local build, but it *is* enabled in automation--or will be, once this bug is completed.
Assignee | ||
Comment 5•6 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #0)
> Gabriele, are you able to look into re-enabling the crashreporter and fixing
> up the compilation issues? Some of this is straight-up compilation issues,
> some things may be related to bug 1497583, etc. etc.
Yes, I'll start working on it this week.
Assignee | ||
Comment 6•6 years ago
|
||
I've made my first ARM64 build yesterday and the client-side part seems to build correctly. The host tools whoever don't which is weird because they should be plain x86, there's probably some issue in the build system that needs fixing.
Assignee: nobody → gsvelto
Status: NEW → ASSIGNED
Flags: needinfo?(gsvelto)
Assignee | ||
Comment 7•6 years ago
|
||
Progress! I was missing some library dependencies for the host tools and once I added them back they're now building correctly. In half an hour I should know if I have a working firefox with crash reporting enabled *and* tools to process the dumps. Fingers crossed.
Comment 8•6 years ago
|
||
Looks like I beat you to it in bug 1514209.
Assignee | ||
Comment 9•6 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #8)
> Looks like I beat you to it in bug 1514209.
Yup. I'll use this bug to actually make it work then.
Assignee | ||
Updated•6 years ago
|
Summary: turn the crashreporter back on for aarch64 windows builds → Enable the full functionality of the crashreporter in aarch64 windows builds
Assignee | ||
Comment 10•6 years ago
|
||
Quick update: dump_syms builds correctly once the proper host lib is added during compilation and it processes xul.dll correctly. I've compared the output to that of a regular x64 build and they seem close enough with most differences being down to code generation and platform-specific functionality. There is one notable difference between the two builds and it's that xul.dll hasn't been stripped in the AArch64 build (but the associated .pdb file is generated correctly). I don't know why it's happening but it might be related to the fact that I've compiled it with cl versus clang-cl for the x64 build.
Assignee | ||
Comment 11•6 years ago
|
||
First the good news, what works:
- Force-enabling the crash reporter works, we can probably turn it on in all builds
- The exception handler is intercepting both content and main process crashes, nice!
- Submission of crashes from about:crashes (mostly) works
- The crash reporter client is launched correctly upon a browser crash
- Client-side stack walking works and produces reasonable stacks so we should have full crash telemetry
Then the bad, what does not work:
- Submitting crash reports from the crash reporter client always fails, I'll have to debug that
- The crashed tab page doesn't have the option to submit the crash report as if the crash reporter was disabled
- Socorro doesn't seem to like the AArch64 minidumps, here's an example with multiple problems: https://crash-stats.mozilla.com/report/index/8fc5538f-6134-44bc-b72e-2e2ff0181220
- There's no stacks
- The build architecture and build architecture info contain weird values
- Some aspects of hardware detection are not working correctly, we're detecting two graphics adapters on my laptop which seems odd
Assignee | ||
Comment 12•6 years ago
|
||
Will can you check what's wrong with stack generation for the crash report I mentioned in comment 11?
Flags: needinfo?(willkg)
Comment 13•6 years ago
|
||
Running our minidump-stackwalker on that crash yields this output:
"""
app@265aa086bccc:/app$ ./run_mdsw.sh crashdata/v2/raw_crash/8fc/20181220/8fc5538f-6134-44bc-b72e-2e2ff0181220 crashdata/v1/dump/8fc5538f-6134-44bc-b72e-2e2ff0181220
2018-12-20 15:33:34: minidump.cc:5008: INFO: Minidump opened minidump crashdata/v1/dump/8fc5538f-6134-44bc-b72e-2e2ff0181220
2018-12-20 15:33:34: minidump.cc:5128: INFO: Minidump not byte-swapping minidump
2018-12-20 15:33:34: minidump.cc:5771: INFO: GetStream: type 1197932546 not present
2018-12-20 15:33:34: minidump.cc:2203: INFO: MinidumpModule could not determine version for C:\Windows\System32\InputHost.dll
2018-12-20 15:33:34: minidump.cc:2203: INFO: MinidumpModule could not determine version for C:\Windows\System32\InputHost.dll
2018-12-20 15:33:34: minidump_processor.cc:152: INFO: Found 3093 memory regions.
2018-12-20 15:33:34: minidump_processor.cc:162: INFO: Minidump crashdata/v1/dump/8fc5538f-6134-44bc-b72e-2e2ff0181220 has CPU info, OS info, Breakpad info, exception, module list, thread list, dump thread, requesting thread, and process create time
2018-12-20 15:33:34: minidump_processor.cc:201: INFO: Looking at thread crashdata/v1/dump/8fc5538f-6134-44bc-b72e-2e2ff0181220:0/76 id 0x18a8
2018-12-20 15:33:34: minidump.cc:1118: INFO: MinidumpContext unknown context type 0x400000
2018-12-20 15:33:34: minidump.cc:1553: ERROR: MinidumpThread cannot read context
2018-12-20 15:33:34: minidump.cc:1118: INFO: MinidumpContext unknown context type 0x400000
2018-12-20 15:33:34: minidump.cc:3173: INFO: MinidumpException cannot read context
2018-12-20 15:33:34: minidump.cc:1118: INFO: MinidumpContext unknown context type 0x400000
2018-12-20 15:33:34: minidump.cc:1553: ERROR: MinidumpThread cannot read context
2018-12-20 15:33:34: stackwalker.cc:205: ERROR: Can't choose a stackwalker implementation without context
2018-12-20 15:33:34: minidump_processor.cc:287: ERROR: No stackwalker for crashdata/v1/dump/8fc5538f-6134-44bc-b72e-2e2ff0181220:0/76 id 0x18a8
2018-12-20 15:33:34: minidump_processor.cc:201: INFO: Looking at thread crashdata/v1/dump/8fc5538f-6134-44bc-b72e-2e2ff0181220:1/76 id 0x1fdc
2018-12-20 15:33:34: minidump.cc:1118: INFO: MinidumpContext unknown context type 0x400000
2018-12-20 15:33:34: minidump.cc:1553: ERROR: MinidumpThread cannot read context
etc
"""
Socorro is using breakpad rev a61afe7a3e865f1da7ff7185184fe23977c2adca. Maybe we need to update it?
Flags: needinfo?(willkg)
Comment 14•6 years ago
|
||
Can soccoro use the one from tooltool? Run `./mach artifact toolchain --tooltool-manifest=testing/config/tooltool-manifests/linux64/releng.manifest` from a m-c checkout.
Updated•6 years ago
|
Flags: needinfo?(willkg)
Assignee | ||
Comment 15•6 years ago
|
||
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #13)
> Socorro is using breakpad rev a61afe7a3e865f1da7ff7185184fe23977c2adca.
> Maybe we need to update it?
Yeah, it's pretty old and it's missing the changes to understand AArch64 minidumps using Microsoft format. Our checkout in mozilla-central is at version 1459e5df74dd03b7d3d473e6d271413d7aa98a88 plus some of our patches. That should be enough and as Mike mentioned the one in tooltool should also be good enough.
Comment 16•6 years ago
|
||
Socorro has a taskcluster script to build it that Ted wrote. We don't have anything that tells us when the breakpad client we use is out of date, so that's why we're in the situation we're in.
I wrote up bug #1515805 to update it.
Flags: needinfo?(willkg)
Assignee | ||
Comment 17•6 years ago
|
||
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #16)
> Socorro has a taskcluster script to build it that Ted wrote. We don't have
> anything that tells us when the breakpad client we use is out of date, so
> that's why we're in the situation we're in.
>
> I wrote up bug #1515805 to update it.
Thanks for this Will. I'm well aware of the problems with keeping our various breakpads in sync, hopefully this year we'll consolidate everything in just one place.
Comment 18•6 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #14)
> Can soccoro use the one from tooltool? Run `./mach artifact toolchain
> --tooltool-manifest=testing/config/tooltool-manifests/linux64/releng.
> manifest` from a m-c checkout.
No, Socorro has a custom minidump_stackwalk:
https://github.com/mozilla-services/socorro/blob/master/minidump-stackwalk/stackwalker.cc
Assignee | ||
Comment 19•6 years ago
|
||
I re-tested the crashreporter client today and it's submitting reports correctly. The reason why it didn't work the last time I tried was that it was trying to send them to the wrong URL but that's because I had hacked it for testing. So that's working too. We just need support for submitting content crashes and we're good to go.
Comment 20•6 years ago
|
||
(In reply to Ted Mielczarek [:ted] [:ted.mielczarek] from comment #18)
> (In reply to Mike Hommey [:glandium] from comment #14)
> > Can soccoro use the one from tooltool? Run `./mach artifact toolchain
> > --tooltool-manifest=testing/config/tooltool-manifests/linux64/releng.
> > manifest` from a m-c checkout.
>
> No, Socorro has a custom minidump_stackwalk:
> https://github.com/mozilla-services/socorro/blob/master/minidump-stackwalk/
> stackwalker.cc
How does it differ from the one we have in tooltool?
Assignee | ||
Comment 21•6 years ago
|
||
I've been testing content crash handling for a few days and I still haven't figured out what's wrong. Most of the infrastructure works correctly, including the crash generation server & client components. Both start and connect correctly, they even shut down cleanly when a process is closed... But somehow actual minidump generation isn't working. It seems that the crash generation client never requests a minidump from the server so maybe the actual exception handler code isn't doing what it's supposed to do in the content processes.
Assignee | ||
Comment 22•6 years ago
|
||
I'm turning this into a meta-bug because it depends on multiple different things and it makes sense to fix them up incrementally.
Assignee | ||
Updated•6 years ago
|
Alias: aarch64-crash-reporting
Assignee | ||
Comment 23•6 years ago
|
||
Removing bug 1517739 because it only affects crash pings and is basically just polish. Closing as we've got everything else working already.
You need to log in
before you can comment on or make changes to this bug.
Description
•