Closed Bug 1620993 Opened 2 years ago Closed 9 months ago

Rewrite the Linux-specific minidump writer code in Rust

Categories

(Toolkit :: Crash Reporting, task)

All
Linux
task
Not set
normal

Tracking

()

RESOLVED FIXED
86 Branch
Tracking Status
firefox86 --- fixed

People

(Reporter: gsvelto, Assigned: msirringhaus)

References

(Depends on 2 open bugs, Blocks 2 open bugs)

Details

Attachments

(1 file)

Once we have a new Linux exception-handler in place we should rewrite the code that writes out minidumps. Contrary to what Breakpad does this code will have to function exclusively out of process so while we want to match Breakpad's functionality we won't have to support its in-process capabilities.

Depends on: 1620998

As suggested in bug 1620989, I'd like to look into this.
I have a couple of questions:

  1. Would this rewrite encompass everything in toolkit/crashreporter/breakpad-client/linux/minidump_writer/*, or only minidump_writer.{cc,h}?
  2. If everything: Would this also entail microdump_writer/*?
  3. Should the unittest-files stay in C++/gtest? gtest is well suited for crash-tests, but that would probably increase the FFI surface that needs to be generated/exposed to C++.
Flags: needinfo?(gsvelto)

(In reply to msirringhaus from comment #1)

As suggested in bug 1620989, I'd like to look into this.
I have a couple of questions:

  1. Would this rewrite encompass everything in toolkit/crashreporter/breakpad-client/linux/minidump_writer/*, or only minidump_writer.{cc,h}?

Yes, pretty much everything that's in there.

  1. If everything: Would this also entail microdump_writer/*?

No, we don't use microdumps.

  1. Should the unittest-files stay in C++/gtest? gtest is well suited for crash-tests, but that would probably increase the FFI surface that needs to be generated/exposed to C++.

The idea is to write a stand-alone crate for writing out minidumps, that would entail also designing the external interface and if possible adding tests within the crate itself. Our goal with this rewrite is to have self-contained components for every chunk of functionality in the crash reporting flow with sensible interfaces. From this POV the tests don't need to be in gecko. dump_syms which is one of the tools we already rewrote has all the required tests within its own crate and we run them there.

Naturally some compromises would need to be made on the interface side because we haven't rewritten the rest of the infrastructure around it. We would need to slot it in the existing code, essentially disabling breakpad's writer and replacing it with this crate.

One last thing: breakpad supports both in-process and out-of-process minidump generation. We don't care about in-process generation in the context of this bug because that's something we want to get rid of.

Flags: needinfo?(gsvelto)

Naturally if you're interested in this I'll gladly help you all the way through the rewrite.

(In reply to Gabriele Svelto [:gsvelto] from comment #3)

Naturally if you're interested in this I'll gladly help you all the way through the rewrite.

That would be great, thanks! I'm also on Matrix, if that's more convenient for you.

The idea is to write a stand-alone crate for writing out minidumps, that would entail also designing the external interface and if possible adding tests within the crate itself. Our goal with this rewrite is to have self-contained components for every chunk of functionality in the crash reporting flow with sensible interfaces.

Should this crate then live in-tree or be external and then added to third_party/rust/ ?
Currently, I see a few issues where it would be easier in-tree, but this could be moved outside as well, I think.

The issue with "not using libc" would still be valid for this rewrite, or do you plan on statically linking it in?

Rust doesn't expose syscalls directly, as far as I can tell. So for this, I'm trying to use toolkit/crashreporter/google-breakpad/src/third_party/lss/linux_syscall_support.h at the moment (which is why in-tree is less work atm).

Since heap allocations are also out, I would try going with #![no_std] for this crate.

Just as a quick update: Using linux_syscall_support.h for syscalls is a bit annoying, as its a header-only lib, and (as far as I can tell) needs a shim in the crate for each function we want to use from it. At least I couldn't get it to work any other way.

Unfortunately, a pure Rust version doesn't seem to exist for this. There is https://github.com/japaric/syscall.rs but that needs the Nightly toolchain.

(In reply to msirringhaus from comment #4)

That would be great, thanks! I'm also on Matrix, if that's more convenient for you.

Join the #breakpad channel, you'll find me and the other people who know about this subsystem there

Should this crate then live in-tree or be external and then added to third_party/rust/ ?
Currently, I see a few issues where it would be easier in-tree, but this could be moved outside as well, I think.

Ideally it should live outside but I'm OK with an in-tree build if it's easier at the beginning.

The issue with "not using libc" would still be valid for this rewrite, or do you plan on statically linking it in?

Rust doesn't expose syscalls directly, as far as I can tell. So for this, I'm trying to use toolkit/crashreporter/google-breakpad/src/third_party/lss/linux_syscall_support.h at the moment (which is why in-tree is less work atm).

Since heap allocations are also out, I would try going with #![no_std] for this crate.

I don't know if we'll have those restrictions, here's why: breakpad supports generating a minidump from within the crashed process. This means that the code needs to run within the exception handler without triggering a new exception. syscalls & allocations are banned because of that. However in our case we only care about out-of-process generation so we might not have that particular constraint. I don't remember exactly how OOP generation worked on Linux so I'll have to double-check just to be sure.

(In reply to Gabriele Svelto [:gsvelto] from comment #6)

Join the #breakpad channel, you'll find me and the other people who know about this subsystem there

Joined. I'll continue here until the handshake completed ;-)

Ideally it should live outside but I'm OK with an in-tree build if it's easier at the beginning.

That entirely depends on the usage of libc. If libc-calls are okay after all, outside of the tree is just as easy.

I don't know if we'll have those restrictions, here's why: breakpad supports generating a minidump from within the crashed process. This means that the code needs to run within the exception handler without triggering a new exception. syscalls & allocations are banned because of that. However in our case we only care about out-of-process generation so we might not have that particular constraint. I don't remember exactly how OOP generation worked on Linux so I'll have to double-check just to be sure.

That would make things a lot easier. linux_ptrace_dumper.cc (which I think is the one we need for OOP) states in the header:

Since this code may run in a compromised address space, the same
rules apply as detailed at the top of minidump_writer.h: no libc calls and
use the alternative allocator.

But I'm not really sure why that would be, though.

From Matrix chat:

Question regarding the writing process of the minidump. The C++ code defines the structs, fills them and then writes them to file using (Un)TypedMDRVA, byte by byte. I've replicated the structs in Rust using repr(C). But writing them as bytes to the file is currently undefined behavior thanks to possible padding bytes.
So, the main question would be: Does the binary format have to be exactly the same as in the C++ version? And if so, any suggestions on how to do it in Rust without UB? Using bincode has been suggested, but that introduces its own binary format and won't (probably?) be compatible with what the C++ version does.

Flags: needinfo?(gsvelto)

Additional quick info: I think the binary format has to be changed anyways, as for example MDRawHeader saves an epoch-timestamp in a 32bit variable, which is not Y2038 safe.

The minidump file is not really documented, its structures come from the minidumpapiset.h header file in the Windows SDK:

https://docs.microsoft.com/en-us/windows/win32/api/minidumpapiset/

If you look at the sources it's using #pragma pack(4) on all its structures so there shouldn't be padding, and the Breakpad header files should use the same otherwise the resulting files wouldn't match.

Ted Mielczarek who wrote most of this stuff back in the day has written a bunch of tools for dealing with minidumps in rust, you can find them here: https://github.com/luser/rust-minidump

Off the top of my mind he used the serde crate to easily read minidump structures from file. I think that might be a good option especially because we already vendor it in mozilla-central.

Flags: needinfo?(gsvelto)
Assignee: nobody → msirringhaus
Status: NEW → ASSIGNED

Try is looking good, we'll try landing this today and keep an eye on the crashes in the coming days.

Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9fabf5ced4d3
Rewrite the Linux-specific minidump writer code in Rust r=gsvelto
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3c58e80d4d57
Rewrite the Linux-specific minidump writer code in Rust r=gsvelto

This broke builds on aarch64-linux, and I suspect that's also true for armhf-linux.

The patch hasn't gone to central yet, can we land a follow up right away?

Status: ASSIGNED → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → 86 Branch

Alright, filing a follow up, the fix is almost ready.

Regressions: 1686918
Flags: needinfo?(msirringhaus)
Regressions: 1688849
Blocks: 1688882
Blocks: 1689358
You need to log in before you can comment on or make changes to this bug.