Closed Bug 1776197 Opened 2 years ago Closed 6 months ago

Remove the IPC channel used to send crash annotations

Categories

(Toolkit :: Crash Reporting, task)

Unspecified
All
task

Tracking

()

RESOLVED FIXED
116 Branch
Tracking Status
firefox116 --- fixed

People

(Reporter: gsvelto, Assigned: gsvelto)

References

(Blocks 3 open bugs)

Details

Attachments

(5 files)

The idea here is to get rid of the pipe we use to send crash annotations from child processes to the main process. To do so the main process - and in the future the crash monitor - need to be able to pull the crash annotations directly from the crashed process' memory. This has several advantages:

  • One less IPC channel, duh
  • We can't end up in a deadlock between this channel and the one used by Breakpad to ask for minidump generation (something that has plagued us for a long time)
  • We can cut away a large piece of code that does complex stuff in the child process exception handler
  • We can make crash annotations a lot cheaper by moving from a push to a pull model

I've studied what would be the best way to do this and I found an interesting approach in Crashpad that does not involve exposing a symbol (and thus having to parse the symbol table in the target process). What they do is create a section (on Linux, Windows and macOS) with a well known name and stuff the global structure that holds crash annotations there. The process that needs to extract the allocations pulls it out of the target process' sections and thus knows where to look for them without having to touch the symbol table.

I haven't thought of a better (cross-platform) way to do it that doesn't involve any work from the child process so I think I'll go for it.

BTW if someone has alternatives on how to make the parent process locate the annotations in the child process I'm all ears.

(Addendum: I forgot how I got here, and I now see that you've implied you're well into an implementation strategy already. Sorry for the distraction!)

FWIW, I'm pretty sure it's safe to assume that all segments (and, by extension, sections) will get the same offset to their virtual addresses when loaded. In particular, if that weren't the case, all PC-relative accesses from code to data/bss would need to be relocated, and modern toolchains try pretty hard to avoid text relocations. However, we still have cross-architecture child processes for media plugins on arm64 on some platforms, so that complicates things a little.

Also, to clarify a little for the historical record: Crashpad's approach iterates the program headers (a.k.a. segments) to find the one with type PT_NOTE, which contains the various .note.* sections concatenated, iterates the notes (they're in a standard format with length fields etc.) to find the one (or ones?) with Crashpad's identifier, and that has a relative pointer to a struct that it knows how to read. IMHO it's not awful, but it's a bit complicated and we may not need that level of generality.

Thanks for the tips, one of the problems I'm facing is that I really want this system to be ready for when we'll take minidumps from an external process instead of from the main process. Because of this I'm trying to design this in a way that minimizes what the reader needs to know about the layout of the process holding the annotations. I'm stuffing the object that holds the annotations within a process in a dedicated section with an easy-to-find name. This way the reader process only needs to scour the sections looking for the right one and read from there. This is a compromise: other stuff could be put in that section, messing up my assumptions, and the annotation table could be included more than once but in practice neither are likely to happen.

BTW I hope to be able to upload my set of very-ugly-but-working-on-linux patches later tonight.

Scratch what I said above. For various practical reasons putting the crash annotations in a separate section is not a good idea because of the way we could fail to find it. I've got to do the notes thing like crashpad (sigh) hoping that I won't encounter the linker bugs they've run into back in the day.

This crate contains the required bits to store annotations in a process and
let the mozannotation_server crate find them and pull them out. Functionally
there's not much happening here as all the actual functionality is in
mozannotation_server.

This crate contains the logic used to find the annotations within another
process and pull them out.

Depends on D173697

Assignee: nobody → gsvelto
Status: NEW → ASSIGNED

Alright, here comes the WIP for this. It's still rough around the edges but it's functionally complete (on Linux and Android) and passes all the tests we have. I still need to factorize the platform-specific functionality in the new crates and add support for Windows and macOS but I'd say that I'm 90% there now.

Update: I'm stuck dealing with rust's metadata hash. I need to figure a way to retrieve a symbol name (which boils down to finding the metadata hash when it gets compiled) and I haven't found a good way yet.

Updating the update: I'm unstuck, yay! Time for the Windows and macOS implementations.

Blocks: 1831092
Blocks: 1831094
Attachment #9325192 - Attachment description: WIP: Bug 1776197 - mozannotation_client implementation → Bug 1776197 - mozannotation_client implementation r=afranchuk
Attachment #9325193 - Attachment description: WIP: Bug 1776197 - mozannotation_server crate implementation → Bug 1776197 - mozannotation_server crate implementation r=afranchuk
Attachment #9325194 - Attachment description: WIP: Bug 1776197 - Adjust crash generation to pull annotations out of child processes using the mozannotation crates → Bug 1776197 - Adjust crash generation to pull annotations out of child processes using the mozannotation crates r=afranchuk
Attachment #9325195 - Attachment description: WIP: Bug 1776197 - Remove old IPC channel → Bug 1776197 - Remove the old IPC channel used for retrieving annotations and all related machinery r=afranchuk
Depends on: 1834958
Duplicate of this bug: 1813414
No longer duplicate of this bug: 1813414
See Also: → 1835307

All the reviews are in. Rebased my patch set and a final try run is here: https://treeherder.mozilla.org/jobs?repo=try&revision=c06d3790a5a07402720ed97535c560d1a1d39ec6

If all goes well I'll land this today.

Try is green(ish), patches are rebased, manual tests look good. Alright chums, let's do this!

Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a45d357f36c6
mozannotation_client implementation r=afranchuk
https://hg.mozilla.org/integration/autoland/rev/5ba9e96d7820
mozannotation_server crate implementation r=afranchuk
https://hg.mozilla.org/integration/autoland/rev/eec04732422a
Adjust crash generation to pull annotations out of child processes using the mozannotation crates r=afranchuk
https://hg.mozilla.org/integration/autoland/rev/b665cbdccb30
Remove the old IPC channel used for retrieving annotations and all related machinery r=geckoview-reviewers,afranchuk,owlish
https://hg.mozilla.org/integration/autoland/rev/e88255995477
Tweak GeckoView tests to check for child process annotations r=owlish
Regressions: 1837353
Regressions: 1837688
Regressions: 1838801
Regressions: 1854179
Regressions: 1858390
You need to log in before you can comment on or make changes to this bug.