Rewrite the Linux-specific minidump writer code in Rust
Categories
(Toolkit :: Crash Reporting, task)
Tracking
()
Tracking | Status | |
---|---|---|
firefox86 | --- | fixed |
People
(Reporter: gsvelto, Assigned: msirringhaus)
References
(Depends on 2 open bugs, Blocks 1 open bug)
Details
Attachments
(1 file)
Once we have a new Linux exception-handler in place we should rewrite the code that writes out minidumps. Contrary to what Breakpad does this code will have to function exclusively out of process so while we want to match Breakpad's functionality we won't have to support its in-process capabilities.
Reporter | ||
Updated•5 years ago
|
Assignee | ||
Comment 1•4 years ago
|
||
As suggested in bug 1620989, I'd like to look into this.
I have a couple of questions:
- Would this rewrite encompass everything in
toolkit/crashreporter/breakpad-client/linux/minidump_writer/*
, or onlyminidump_writer.{cc,h}
? - If everything: Would this also entail
microdump_writer/*
? - Should the unittest-files stay in C++/gtest? gtest is well suited for crash-tests, but that would probably increase the FFI surface that needs to be generated/exposed to C++.
Reporter | ||
Comment 2•4 years ago
|
||
(In reply to msirringhaus from comment #1)
As suggested in bug 1620989, I'd like to look into this.
I have a couple of questions:
- Would this rewrite encompass everything in
toolkit/crashreporter/breakpad-client/linux/minidump_writer/*
, or onlyminidump_writer.{cc,h}
?
Yes, pretty much everything that's in there.
- If everything: Would this also entail
microdump_writer/*
?
No, we don't use microdumps.
- Should the unittest-files stay in C++/gtest? gtest is well suited for crash-tests, but that would probably increase the FFI surface that needs to be generated/exposed to C++.
The idea is to write a stand-alone crate for writing out minidumps, that would entail also designing the external interface and if possible adding tests within the crate itself. Our goal with this rewrite is to have self-contained components for every chunk of functionality in the crash reporting flow with sensible interfaces. From this POV the tests don't need to be in gecko. dump_syms which is one of the tools we already rewrote has all the required tests within its own crate and we run them there.
Naturally some compromises would need to be made on the interface side because we haven't rewritten the rest of the infrastructure around it. We would need to slot it in the existing code, essentially disabling breakpad's writer and replacing it with this crate.
One last thing: breakpad supports both in-process and out-of-process minidump generation. We don't care about in-process generation in the context of this bug because that's something we want to get rid of.
Reporter | ||
Comment 3•4 years ago
|
||
Naturally if you're interested in this I'll gladly help you all the way through the rewrite.
Assignee | ||
Comment 4•4 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #3)
Naturally if you're interested in this I'll gladly help you all the way through the rewrite.
That would be great, thanks! I'm also on Matrix, if that's more convenient for you.
The idea is to write a stand-alone crate for writing out minidumps, that would entail also designing the external interface and if possible adding tests within the crate itself. Our goal with this rewrite is to have self-contained components for every chunk of functionality in the crash reporting flow with sensible interfaces.
Should this crate then live in-tree or be external and then added to third_party/rust/
?
Currently, I see a few issues where it would be easier in-tree, but this could be moved outside as well, I think.
The issue with "not using libc" would still be valid for this rewrite, or do you plan on statically linking it in?
Rust doesn't expose syscalls directly, as far as I can tell. So for this, I'm trying to use toolkit/crashreporter/google-breakpad/src/third_party/lss/linux_syscall_support.h
at the moment (which is why in-tree is less work atm).
Since heap allocations are also out, I would try going with #![no_std]
for this crate.
Assignee | ||
Comment 5•4 years ago
|
||
Just as a quick update: Using linux_syscall_support.h
for syscalls is a bit annoying, as its a header-only lib, and (as far as I can tell) needs a shim in the crate for each function we want to use from it. At least I couldn't get it to work any other way.
Unfortunately, a pure Rust version doesn't seem to exist for this. There is https://github.com/japaric/syscall.rs but that needs the Nightly toolchain.
Reporter | ||
Comment 6•4 years ago
|
||
(In reply to msirringhaus from comment #4)
That would be great, thanks! I'm also on Matrix, if that's more convenient for you.
Join the #breakpad channel, you'll find me and the other people who know about this subsystem there
Should this crate then live in-tree or be external and then added to
third_party/rust/
?
Currently, I see a few issues where it would be easier in-tree, but this could be moved outside as well, I think.
Ideally it should live outside but I'm OK with an in-tree build if it's easier at the beginning.
The issue with "not using libc" would still be valid for this rewrite, or do you plan on statically linking it in?
Rust doesn't expose syscalls directly, as far as I can tell. So for this, I'm trying to use
toolkit/crashreporter/google-breakpad/src/third_party/lss/linux_syscall_support.h
at the moment (which is why in-tree is less work atm).Since heap allocations are also out, I would try going with
#![no_std]
for this crate.
I don't know if we'll have those restrictions, here's why: breakpad supports generating a minidump from within the crashed process. This means that the code needs to run within the exception handler without triggering a new exception. syscalls & allocations are banned because of that. However in our case we only care about out-of-process generation so we might not have that particular constraint. I don't remember exactly how OOP generation worked on Linux so I'll have to double-check just to be sure.
Assignee | ||
Comment 7•4 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #6)
Join the #breakpad channel, you'll find me and the other people who know about this subsystem there
Joined. I'll continue here until the handshake completed ;-)
Ideally it should live outside but I'm OK with an in-tree build if it's easier at the beginning.
That entirely depends on the usage of libc. If libc-calls are okay after all, outside of the tree is just as easy.
I don't know if we'll have those restrictions, here's why: breakpad supports generating a minidump from within the crashed process. This means that the code needs to run within the exception handler without triggering a new exception. syscalls & allocations are banned because of that. However in our case we only care about out-of-process generation so we might not have that particular constraint. I don't remember exactly how OOP generation worked on Linux so I'll have to double-check just to be sure.
That would make things a lot easier. linux_ptrace_dumper.cc
(which I think is the one we need for OOP) states in the header:
Since this code may run in a compromised address space, the same
rules apply as detailed at the top of minidump_writer.h: no libc calls and
use the alternative allocator.
But I'm not really sure why that would be, though.
Assignee | ||
Comment 8•4 years ago
|
||
From Matrix chat:
Question regarding the writing process of the minidump. The C++ code defines the structs, fills them and then writes them to file using (Un)TypedMDRVA
, byte by byte. I've replicated the structs in Rust using repr(C)
. But writing them as bytes to the file is currently undefined behavior thanks to possible padding bytes.
So, the main question would be: Does the binary format have to be exactly the same as in the C++ version? And if so, any suggestions on how to do it in Rust without UB? Using bincode has been suggested, but that introduces its own binary format and won't (probably?) be compatible with what the C++ version does.
Assignee | ||
Comment 9•4 years ago
|
||
Additional quick info: I think the binary format has to be changed anyways, as for example MDRawHeader
saves an epoch-timestamp in a 32bit variable, which is not Y2038 safe.
Reporter | ||
Comment 10•4 years ago
|
||
The minidump file is not really documented, its structures come from the minidumpapiset.h header file in the Windows SDK:
https://docs.microsoft.com/en-us/windows/win32/api/minidumpapiset/
If you look at the sources it's using #pragma pack(4)
on all its structures so there shouldn't be padding, and the Breakpad header files should use the same otherwise the resulting files wouldn't match.
Ted Mielczarek who wrote most of this stuff back in the day has written a bunch of tools for dealing with minidumps in rust, you can find them here: https://github.com/luser/rust-minidump
Off the top of my mind he used the serde
crate to easily read minidump structures from file. I think that might be a good option especially because we already vendor it in mozilla-central.
Assignee | ||
Comment 11•4 years ago
|
||
Updated•4 years ago
|
Reporter | ||
Comment 12•4 years ago
|
||
Try is looking good, we'll try landing this today and keep an eye on the crashes in the coming days.
Comment 13•4 years ago
|
||
Comment 14•4 years ago
|
||
Backed out for build bustages.
Push with failures: https://treeherder.mozilla.org/jobs?repo=autoland&revision=9fabf5ced4d37b2ceb226f0ea5d3cb830d1601f2&selectedTaskRun=eMzY5SjfQ0-nOnVyZVsQvQ.0
Log: https://treeherder.mozilla.org/logviewer?job_id=326568670&repo=autoland&lineNumber=45913
Backout: https://hg.mozilla.org/integration/autoland/rev/b37710eac8adcf65842f3c19e1288082f1510c51
Comment 15•4 years ago
|
||
Comment 16•4 years ago
|
||
This broke builds on aarch64-linux, and I suspect that's also true for armhf-linux.
Reporter | ||
Comment 17•4 years ago
|
||
The patch hasn't gone to central yet, can we land a follow up right away?
Comment 18•4 years ago
|
||
bugherder |
Reporter | ||
Comment 19•4 years ago
|
||
Alright, filing a follow up, the fix is almost ready.
Assignee | ||
Updated•4 years ago
|
Description
•