Closed Bug 1759175 Opened 2 years ago Closed 29 days ago

Rewrite the crash reporter client in Rust

Categories

(Toolkit :: Crash Reporting, task)

Unspecified
All
task

Tracking

()

RESOLVED FIXED
126 Branch
Tracking Status
firefox126 --- fixed

People

(Reporter: gsvelto, Assigned: afranchuk)

References

(Blocks 7 open bugs)

Details

Attachments

(10 files, 3 obsolete files)

48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review

The crash reporter client is the tool we use to report full browser crashes. It is currently made of a mix of platform-specific C++ and Objective-C. It uses whatever network library is available on the system to submit HTTP request. The current implementation has several problems:

  • There are no tests
  • It cannot be localized using Fluent
  • The shared code between the different platforms is minimal, changing something usually means making three distinct changes
  • It relies heavily on environment variables to communicate with Firefox, this can cause all sort of issues at startup (see bug 1752703)
  • The UI of the macOS one relies on a binary description generated by a tool that was available only in an XCode version running on PowerPC hardware, and which has long been deprecated
  • The Windows version doesn't handle all paths correctly because of UTF8/16 conversions

In short, it is complicated to maintain and impossible to change on macOS. The best course of action is to rewrite it in Rust. Rust would still require a platform-specific UI but we could probably consolidate all the networking and file-handling code. We wouldn't need Windows-specific path handling and we could finally enable Fluent-based localization.

Gabriele how is this related to bug 1588742? Isn't it the same?

Flags: needinfo?(gsvelto)

Bug 1588742 is about the code we link in Gecko to intercept crashes and write minidumps. This is specifically the application we use to interact with the user and submit crash reports.

Flags: needinfo?(gsvelto)
Blocks: 1784069
Blocks: 1821091

GUI

There's a number of important decisions to be made here, including how we should approach the GUI. Using a simple search ("GUI") of the currently-vendored rust crates, I see that there is not yet any GUI-centric crate there. So we likely need to vendor/audit some existing GUI crate. However those can sometimes be large!

We also need to decide some fundamental aspects like whether we want to use platform-provided GUI elements versus the same (toolkit-rendered) elements across all platforms (e.g. wxWidgets vs GTK). It would probably feel best to users if we matched what Firefox does (which I think uses some platform-specific things?), however for such a small GUI app it may not matter too much. Some toolkits which draw everything themselves do offer stylized components which look system-native, too. I'm not sure whether it would be possible or desirable to use XUL, though I figure the old client didn't use it for a reason. Whichever way we go, we should prioritize making any system-specific code minimal, since that is a pain point in the current client.

The choice (including whether to use an external crate at all) should be made carefully since this sets some precedent for other utilities that need a GUI in the future. It's unlikely there will be additional GUI utilities since this is the only such utility we've needed historically, but it's not impossible.

minidump-analyzer

It occurred to me that, if the client is written in rust, we could potentially have it absorb minidump-analyzer (once the rust version is landed) rather than that being a separate program. I see that toolkit/components/crashes/CrashService.jsm also calls the binary too, though, so maybe it's best to leave as-is.

(In reply to Alex Franchuk from comment #3)

GUI

There's a number of important decisions to be made here, including how we should approach the GUI. Using a simple search ("GUI") of the currently-vendored rust crates, I see that there is not yet any GUI-centric crate there. So we likely need to vendor/audit some existing GUI crate. However those can sometimes be large!

Yes, I'd like to keep the dependencies minimal. Because of the role of this tool it needs to use the least amount of complexity, it's the only way to guarantee it's robust. For the GUI we should probably do an almost straight port of the existing platform-dependent code, possibly layering a bit of abstractions on top of it so that we don't duplicate the underlying logic like we do now. That's also the reason why we don't use XUL's network stack to send the crash report but rather the native stack: minimal dependencies, minimal working implementation.

One important aspect of the GUI is that we want it to support localization via Fluent; that need to be taken into account when designing the abstractions we'll layer upon the native widgets. The existing version uses an INI file for localization and that's suboptimal to say the least.

minidump-analyzer

It occurred to me that, if the client is written in rust, we could potentially have it absorb minidump-analyzer (once the rust version is landed) rather than that being a separate program. I see that toolkit/components/crashes/CrashService.jsm also calls the binary too, though, so maybe it's best to leave as-is.

Yes, that's another area where we can make some improvements. There are two reasons why the minidump-analyzer is a stand-alone tool: the first one is that I was unsure how robust Breakpad code was, so I didn't want to link a large chunk of complex code into Gecko or the crash reporter client, potentially lowering the stability of both. The second reason was that I didn't want to duplicate that code. rust-minidump is a lot more robust so we could imagine putting it into a shared library and using that instead of a separate tool.

I looked at Fluent when I saw it mentioned in the first comment (it looks pretty cool!), so I will keep that in mind. But great, what you've described is the direction I thought we'd be going since we want this to be as simple (and error-free) as possible. The only potentially unsafe part, of course, being the calling of native C APIs. But the surface area of that will hopefully be limited since the GUI is quite basic.

  • For OSX, I see we have core-foundation and core-graphics already vendored, but we'd probably want cocoa (from servo like the others).
  • For Windows, everything we need is probably in winapi.
  • For Linux, Fx requires GTK 3.14+. An older version of the gtk crate (0.9.2) is compatible with this minimum version. However this crate brings along a lot of dependencies (all the related gnome library bindings).

If bringing in binding crates will be problematic, I can generate the necessary bindings in our code.

Rolling our own bindings is fine, especially if it's faster than auditing large binding crates.

I expect it will be faster than an audit, so I'll go that route. Maybe we need some way of streamlining audits of pure binding crates (which amount to no more than e.g. the output of bindgen), since they are just full of stubs. In this case, aside from the auditing aspect a more compelling reason to add our own bindings is to avoid vendoring many more crates. Space is cheap, but space times hundreds or thousands of developers should be considered less so.

On the other hand, I just looked a bit more into them:

  • cocoa+cocoa-foundation are fairly small (<500kB), and wouldn't need any other dependencies
  • gtk and its dependencies are also somewhat small (each <1.5MB) but there are a lot of them, which would probably be a burden.
  • winapi is comparatively huge (~120MB vendored), but that's already vendored so not really a concern. Out of curiosity I took a look at windows-sys; that is ~30MB + ~10MB per platform.

I'll just start programming and see where I end up.

Assignee: nobody → afranchuk

This is currently a partial WIP implementation, demonstrating the UI abstraction and GTK bindings.

Vendoring which was only necessary because it is a dependency of bindgen. Bindgen was already
vendored, so I'm not sure why which wasn't there. It's small enough that I can do an audit pretty
easily (it's ignored right now).

Attachment #9326040 - Attachment description: WIP: Bug 1759175 - Rewrite the crash reporter client in Rust r=#firefox-build-system-reviewers,gsvelto → WIP: Bug 1759175 pt1 - Rewrite the crash reporter client in Rust r=#firefox-build-system-reviewers,gsvelto

This defines the UI and data binding abstraction layer that is used across all platforms.

Depends on D174916

Attachment #9326040 - Attachment is obsolete: true
Attachment #9327376 - Attachment is obsolete: true

Depends on D174918

Attachment #9327420 - Attachment description: WIP: Bug 1759175 pt1 - Repository integration r=#firefox-build-system-reviewers,gsvelto → Bug 1759175 pt1 - Repository integration r=#firefox-build-system-reviewers,gsvelto
Attachment #9327421 - Attachment description: WIP: Bug 1759175 pt2 - UI definition r=gsvelto,cmartin → Bug 1759175 pt2 - UI definition r=gsvelto,cmartin
Attachment #9327422 - Attachment description: WIP: Bug 1759175 pt3 - linux (GTK) ui implementation r=gsvelto → Bug 1759175 pt3 - linux (GTK) ui implementation r=gsvelto
Attachment #9333193 - Attachment description: WIP: Bug 1759175 pt4 - windows ui implementation r=cmartin → Bug 1759175 pt4 - windows ui implementation r=cmartin
Attachment #9327422 - Attachment description: Bug 1759175 pt3 - linux (GTK) ui implementation r=gsvelto → Bug 1759175 pt4 - linux (GTK) ui implementation r=gsvelto
Attachment #9333193 - Attachment description: Bug 1759175 pt4 - windows ui implementation r=cmartin → Bug 1759175 pt5 - windows ui implementation r=cmartin
Blocks: 1851334
Blocks: 1866863
Attachment #9348450 - Attachment description: Bug 1759175 pt3 - crash reporter business logic r=gsvelto → Bug 1759175 pt3 - Crashreporter business logic r=gsvelto
Attachment #9327422 - Attachment description: Bug 1759175 pt4 - linux (GTK) ui implementation r=gsvelto → Bug 1759175 pt5 - Linux (GTK) ui implementation r=gsvelto
Attachment #9333193 - Attachment description: Bug 1759175 pt5 - windows ui implementation r=cmartin → Bug 1759175 pt6 - Windows ui implementation r=cmartin
See Also: → 1725134
Attachment #9367165 - Attachment description: Bug 1759175 pt7 - Macos ui implementation r=gsvelto → Bug 1759175 pt7 - Macos ui implementation r=gsvelto,mstange,#mac-reviewers
Blocks: 1873210

This removes the old localization files, updates installer package manifests, and replaces code that
accessed the files with fluent accesses for the new localization info.

This also replaces some code in nsExceptionHandler to get/set the crash reporter submit reports
setting.

Depends on D199638

Attachment #9377032 - Attachment description: Bug 1759175 pt9 - Remove old crash reporter localization r=gsvelto → Bug 1759175 pt9 - Remove old crash reporter localization r=
Attachment #9377032 - Attachment description: Bug 1759175 pt9 - Remove old crash reporter localization r= → Bug 1759175 pt9 - Remove old crash reporter localization r=gsvelto

There's no need for a separate crashreporter.app anymore. It was needed to bundle binary blobs and
resources together for the old GUI application. They are no longer needed in the new application.

Depends on D199914

Blocks: 1881768
Attachment #9380286 - Attachment is obsolete: true
Attachment #9380463 - Attachment description: Bug 1759175 pt11 - Add additional documentation for the crashreporter client r=gsvelto → Bug 1759175 pt10 - Add additional documentation for the crashreporter client r=gsvelto
Attachment #9367164 - Attachment description: Bug 1759175 pt4 - Mocking and testing the crashreporter r=gsvelto,cmartin → Bug 1759175 pt4 - Mocking and testing the crashreporter r=gsvelto
Attachment #9367164 - Attachment description: Bug 1759175 pt4 - Mocking and testing the crashreporter r=gsvelto → Bug 1759175 pt4 - Mocking and testing the crashreporter r=gsvelto,cmartin
Pushed by afranchuk@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/68d2d1abab21
pt1 - Repository integration r=glandium,supply-chain-reviewers
https://hg.mozilla.org/integration/autoland/rev/4aba1755a610
pt2 - UI definition r=gsvelto,cmartin,flod
https://hg.mozilla.org/integration/autoland/rev/45a2a256b9d4
pt3 - Crashreporter business logic r=gsvelto,fluent-reviewers,flod,eemeli,cmartin
https://hg.mozilla.org/integration/autoland/rev/b34e196120aa
pt4 - Mocking and testing the crashreporter r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/e6d83c21eadc
pt5 - Linux (GTK) ui implementation r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/acde6c99092c
pt6 - Windows ui implementation r=cmartin
https://hg.mozilla.org/integration/autoland/rev/40dfe0d499df
pt7 - Macos ui implementation r=gsvelto,mac-reviewers,haik
https://hg.mozilla.org/integration/autoland/rev/00b28a515e8d
pt8 - Remove the old crash reporter code r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/fa1545689826
pt9 - Remove old crash reporter localization r=gsvelto,geckoview-reviewers,m_kato
https://hg.mozilla.org/integration/autoland/rev/3d8279b62b0e
pt10 - Add additional documentation for the crashreporter client r=gsvelto

We depend on bug 1882441 for native windows builds to work (embedding the manifest file).

Depends on: 1882441
Pushed by afranchuk@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7cb85f06517a
pt1 - Repository integration r=glandium,supply-chain-reviewers
https://hg.mozilla.org/integration/autoland/rev/b50fd0148c7f
pt2 - UI definition r=gsvelto,cmartin,flod
https://hg.mozilla.org/integration/autoland/rev/5d33c3516218
pt3 - Crashreporter business logic r=gsvelto,fluent-reviewers,flod,eemeli,cmartin
https://hg.mozilla.org/integration/autoland/rev/6b0d3b266dea
pt4 - Mocking and testing the crashreporter r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/f39525560fd6
pt5 - Linux (GTK) ui implementation r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/84b443488fb5
pt6 - Windows ui implementation r=cmartin
https://hg.mozilla.org/integration/autoland/rev/6888a063a0c3
pt7 - Macos ui implementation r=gsvelto,mac-reviewers,haik
https://hg.mozilla.org/integration/autoland/rev/021559ebc1e7
pt8 - Remove the old crash reporter code r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/51ee5a685214
pt9 - Remove old crash reporter localization r=gsvelto,geckoview-reviewers,m_kato
https://hg.mozilla.org/integration/autoland/rev/dce52d34bd91
pt10 - Add additional documentation for the crashreporter client r=gsvelto
Pushed by afranchuk@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f2aa40301a39
pt1 - Repository integration r=glandium,supply-chain-reviewers
https://hg.mozilla.org/integration/autoland/rev/754fe9369002
pt2 - UI definition r=gsvelto,cmartin,flod
https://hg.mozilla.org/integration/autoland/rev/35350f3c3b68
pt3 - Crashreporter business logic r=gsvelto,fluent-reviewers,flod,eemeli,cmartin
https://hg.mozilla.org/integration/autoland/rev/027be772e29a
pt4 - Mocking and testing the crashreporter r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/28ee74b69b66
pt5 - Linux (GTK) ui implementation r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/5894c03ccb0e
pt6 - Windows ui implementation r=cmartin
https://hg.mozilla.org/integration/autoland/rev/6736f34d5642
pt7 - Macos ui implementation r=gsvelto,mac-reviewers,haik
https://hg.mozilla.org/integration/autoland/rev/e8b6ce00b503
pt8 - Remove the old crash reporter code r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/3ce2c92bdee2
pt9 - Remove old crash reporter localization r=gsvelto,geckoview-reviewers,m_kato
https://hg.mozilla.org/integration/autoland/rev/3f7b865cce56
pt10 - Add additional documentation for the crashreporter client r=gsvelto

Backed out for causing build bustage

Backout link

Push with failures

Failure log // Failure log 2

Pushed by afranchuk@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2e605dcdcf83
pt1 - Repository integration r=glandium,supply-chain-reviewers
https://hg.mozilla.org/integration/autoland/rev/ac608a17d8c7
pt2 - UI definition r=gsvelto,cmartin,flod
https://hg.mozilla.org/integration/autoland/rev/4a525dace3cf
pt3 - Crashreporter business logic r=gsvelto,fluent-reviewers,flod,eemeli,cmartin
https://hg.mozilla.org/integration/autoland/rev/caf59b01fb96
pt4 - Mocking and testing the crashreporter r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/e994c818703c
pt5 - Linux (GTK) ui implementation r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/4e16215b14af
pt6 - Windows ui implementation r=cmartin
https://hg.mozilla.org/integration/autoland/rev/94b619f7ae5a
pt7 - Macos ui implementation r=gsvelto,mac-reviewers,haik
https://hg.mozilla.org/integration/autoland/rev/8ab327d18125
pt8 - Remove the old crash reporter code r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/b0cbac1ae1c0
pt9 - Remove old crash reporter localization r=gsvelto,geckoview-reviewers,m_kato
https://hg.mozilla.org/integration/autoland/rev/3caff3bcb5f3
pt10 - Add additional documentation for the crashreporter client r=gsvelto

Backed out for causing Bp build bustage

Backout link

Push with failures

Failure log

Depends on: 1884770
Pushed by afranchuk@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3e2f42b1eb5e
pt1 - Repository integration r=glandium,supply-chain-reviewers
https://hg.mozilla.org/integration/autoland/rev/37d6c69ff714
pt2 - UI definition r=gsvelto,cmartin,flod
https://hg.mozilla.org/integration/autoland/rev/42d629ff0469
pt3 - Crashreporter business logic r=gsvelto,fluent-reviewers,flod,eemeli,cmartin
https://hg.mozilla.org/integration/autoland/rev/acf296ab5a51
pt4 - Mocking and testing the crashreporter r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/5129aad8dc2b
pt5 - Linux (GTK) ui implementation r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/3d20ff32b07e
pt6 - Windows ui implementation r=cmartin
https://hg.mozilla.org/integration/autoland/rev/4c2515ee7140
pt7 - Macos ui implementation r=gsvelto,mac-reviewers,haik
https://hg.mozilla.org/integration/autoland/rev/801b0eb7f87f
pt8 - Remove the old crash reporter code r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/9563f82ebd95
pt9 - Remove old crash reporter localization r=gsvelto,geckoview-reviewers,m_kato
https://hg.mozilla.org/integration/autoland/rev/538aa6838630
pt10 - Add additional documentation for the crashreporter client r=gsvelto
Regressions: 1886674
Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/1e64f166abe2
Port "pt9" of Remove old crash reporter to Thunderbird. rs=bustage-fix
Flags: needinfo?(afranchuk)
Regressions: 1887386
Regressions: 1887479
Regressions: 1887502
Regressions: 1887503
Regressions: 1887504
Regressions: 1887505
Regressions: 1887606
Regressions: 1888880
Regressions: 1888664
Regressions: 1888663
See Also: → 1888201
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: