Open Bug 1656003 Opened 5 years ago Updated 2 years ago

[meta] Collect crash reports for non-Firefox vehicles: the updater, maintenance service, Windows Default Browser Agent (WDBA)

Categories

(Toolkit :: Application Update, enhancement, P3)

enhancement

Tracking

()

People

(Reporter: nalexander, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: meta, Whiteboard: [fidedi-ope] )

At this time we collect no crash reports (or crash-related metrics) for the vehicles that are not Firefox: the updater, maintenance service, Windows Default Browser Agent, and the Background Update Agent (future).

There are a bunch of hurdles to doing this: publishing symbols, managing compiler optimizations and settings, dumping stacks/integrating breakpad... the list goes on. So while I don't have any expectation that we'll move on this any time soon, I would like a place to collect ideas.

froydnj: so... how hard would it be to integrate breakpad into a non-Firefox vehicle?

In Android-land it was hard to have a different "product" submitting crash reports. Would you counsel pursuing our existing mechanisms as a new "product", or would you pursue something like Sentry or an alternate system for this use case?

Flags: needinfo?(nfroyd)

(In reply to Nick Alexander :nalexander [he/him] from comment #1)

froydnj: so... how hard would it be to integrate breakpad into a non-Firefox vehicle?

In Android-land it was hard to have a different "product" submitting crash reports. Would you counsel pursuing our existing mechanisms as a new "product", or would you pursue something like Sentry or an alternate system for this use case?

302 gsvelto. My guess is that the answer is going to be "use an alternate system", but it seems disappointing that breakpad/socorro is not the right answer here. :(

Flags: needinfo?(nfroyd) → needinfo?(gsvelto)

Tickets that might be helped by crash reporting in non-Firefox vehicles include:

  1. failure in the maintenance service that we hypothesize is a crash

  2. potential crashes in the WDBA

bholley suggested a possible technical approach that wouldn't require standing up new Socorro things or publishing new crash symbols: link the WDBA (and BUA, etc) into libxul (or firefox.exe) and then modify the startup path to install crash handling (as we do now) before branching (based on a command line argument) to the specific functionality requested.

I can see two potential issues:

  1. our crash handler appears tied to XPCOM

  2. for Rust vehicles, like WDBA, we might have issues with two Rust runtimes that want different memory allocators. I don't know if the Firefox Rust runtime "just works" without some configuration that sets up mozjemalloc.

froydnj: sorry to lean on you again, but do you see other problems with bholley's suggested approach?

Flags: needinfo?(nfroyd)

A few comments:

  • Symbols are not a problem, we're pushing symbols for all binaries we build together with Firefox already so that's OK
  • Can't comment about Socorro, but Will (NI?) can
  • As for breakpad, integrating it in a separate executable is trivial as long as you the only thing you care about is writing a minidump. We already build an "injectable" exception handler which can be LD_PRELOAD-ed on macOS & Linux and DLL-injected on Windows (libbreakpadinjector). Once injected in a process it will write out a minidump and exit in case of crash. This is unfortunately not enough for generating a Firefox-style crash report, you need two more bits: writing out an .extra file with at least minimal information that describes the crashed product (name, version, etc...) and invoking the crash reporter client so that the user can submit the crash. All in all it should be a moderate amount of work.
Flags: needinfo?(gsvelto) → needinfo?(willkg)

Prior to October 2019, we hadn't added a new product to Socorro in years. Since then we've added a bunch and adding new products that are like Firefox isn't hard. We've got instructions here:

https://socorro.readthedocs.io/en/latest/products.html

If your requirements are very different from Firefox, then it might be harder depending on the specifics.

Flags: needinfo?(willkg)

Thanks Will, sounds like there should be no obstacles in reusing the existing infrastructure then. I'm happy to help with the effort and/or walk people through it.

(In reply to Nick Alexander :nalexander [he/him] from comment #4)

bholley suggested a possible technical approach that wouldn't require standing up new Socorro things or publishing new crash symbols: link the WDBA (and BUA, etc) into libxul (or firefox.exe) and then modify the startup path to install crash handling (as we do now) before branching (based on a command line argument) to the specific functionality requested.

I can see two potential issues:

  1. our crash handler appears tied to XPCOM

  2. for Rust vehicles, like WDBA, we might have issues with two Rust runtimes that want different memory allocators. I don't know if the Firefox Rust runtime "just works" without some configuration that sets up mozjemalloc.

froydnj: sorry to lean on you again, but do you see other problems with bholley's suggested approach?

What are WDBA and "BUA, etc."? This approach wants to link libxul into the other crash vehicles?

Flags: needinfo?(nfroyd)

(In reply to Nathan Froyd [:froydnj] from comment #8)

(In reply to Nick Alexander :nalexander [he/him] from comment #4)

bholley suggested a possible technical approach that wouldn't require standing up new Socorro things or publishing new crash symbols: link the WDBA (and BUA, etc) into libxul (or firefox.exe) and then modify the startup path to install crash handling (as we do now) before branching (based on a command line argument) to the specific functionality requested.

I can see two potential issues:

  1. our crash handler appears tied to XPCOM

  2. for Rust vehicles, like WDBA, we might have issues with two Rust runtimes that want different memory allocators. I don't know if the Firefox Rust runtime "just works" without some configuration that sets up mozjemalloc.

froydnj: sorry to lean on you again, but do you see other problems with bholley's suggested approach?

What are WDBA and "BUA, etc."? This approach wants to link libxul into the other crash vehicles?

WDBA == Windows Default Browser Agent -- a scheduled task on Windows that monitors the user's default browser choice. See also the in-tree overview.

BUA == Background Update Agent -- a scheduled task that will download and apply updates even when Firefox is not actively running.

Both of these are standalone executables that do not link libxul at all. bholley suggested to link them into libxul/firefox.exe but divert the startup path away from the XPCOM flow just after crash handling is configured. In pseudo code:

libxul.so
  dom/...
  ...
  wdba/wdba.o
     int wdba_main(int argc, char** argv)
  bua/bua.o
     int bua_main(int argc, char** argv)

And then the real main does something like:

int main(int argc, char** argv) {
  ConfigureCrashHandling();
  if (argv[1] == "--wdba") {
    return wdba_main(argc, argv);
  }
  if (argv[1] == "--bua") {
    return bua_main(argc, argv);
  }
  ...
}

The advantage is that we don't have to set up crash handling and symbols for these new use cases. But it sounds like there is more hope for doing this "properly" than I thought -- thanks for the status update :willkg and :gsvelto.

Priority: -- → P3
Whiteboard: [fidedi-ope]
Whiteboard: [fidedi-ope] → [fidedi-ope]
Whiteboard: [fidedi-ope] → [fidedi-ope]

We've invested in background task mode for background updates and will leverage this for things in the future. The updater, MMS, and WDBA, would still benefit, so I'll leave this open.

Summary: [meta] Collect crash reports for non-Firefox vehicles: the updater, maintenance service, Windows Default Browser Agent, Background Update Agent (future) → [meta] Collect crash reports for non-Firefox vehicles: the updater, maintenance service, Windows Default Browser Agent (WDBA)
You need to log in before you can comment on or make changes to this bug.