Closed Bug 1359326 Opened 3 years ago Closed 3 years ago

Simplify running the minidump analyzer using an nsIProcess instance

Categories

(Toolkit :: Crash Reporting, enhancement)

enhancement
Not set

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: gsvelto, Assigned: gsvelto)

References

Details

Attachments

(2 files, 1 obsolete file)

+++ This bug was initially created as a clone of Bug #1293656 +++

When we implemented bug 1293656 we introduced some complex C++ machinery to run the minidump-analyzer tool (which is exposed from the CrashReporter). This includes running the tool asynchronously to avoid blocking the main thread and handling a shutdown barrier. This complexity proved to be problematic since we had to patch it up in the follow ups bug 1324249 and bug 1328768.

While I didn't realize it at the time we've got a facility that would have enabled us to implement all of the above in JS: nsIProcess. With it we can asynchronously run the minidump-analyzer tool directly from JavaScript. This would greatly simplify the current code by removing all the C++ part, the shutdown barrier we handle in C++ code (we already have one in the JS part of the crash service) and not having to expose the RunMinidumpAnalyzer method from the CrashReporter.
Noting the connection to the crash in bug 1362203.
See Also: → 1362203
Assignee: nobody → gsvelto
Status: NEW → ASSIGNED
Fully working WIP but still lacking tests. With these changes I should be able to add a unit-test for CrashService.addCrash() that should cover the entire flow of adding a content crash; including minidump analysis, hashing of the minidump file and extra file parsing.
Attachment #8866806 - Attachment is obsolete: true
Comment on attachment 8868138 [details]
Bug 1359326 - Run the minidump analyzer directly from the CrashService code;

https://reviewboard.mozilla.org/r/139736/#review143090

::: toolkit/components/crashes/CrashService.js:28
(Diff revision 1)
> - *          minidump. If the hash could not be computed then null is returned
> + *          finished.
> - *          instead.
>   */
> -function computeMinidumpHash(id) {
> -  let cr = Cc["@mozilla.org/toolkit/crash-reporter;1"]
> -             .getService(Components.interfaces.nsICrashReporter);
> +function runMinidumpAnalyzer(minidumpPath) {
> +  return new Promise((resolve, reject) => {
> +    const binSuffix = AppConstants.platform === "win" ? ".exe" : "";

I guess this is ok, but it might be better to have BIN_SUFFIX in appconstants so we're not cribbing this code around so much.

::: toolkit/components/crashes/CrashService.js:39
(Diff revision 1)
> +    let args = [ minidumpPath ];
> +    let process = Cc["@mozilla.org/process/util;1"]
> +                    .createInstance(Ci.nsIProcess);
> +    process.init(exe);
> +    process.runAsync(args, args.length, {
> +      observe(subject, topic, data) {

This doesn't need to be an object: nsIObserver is a [function] interface so you can/should pass a function directly:

process.runAsync, args, args.length, (subject, topic, data) => { ... });

::: toolkit/components/crashes/nsICrashService.idl:22
(Diff revision 1)
>     * @param id
>     *        Crash ID. Likely a UUID.
>     *
>     * @return {Promise} A promise that resolves after the crash has been stored
>     */
> -  void addCrash(in long processType, in long crashType, in AString id);
> +  nsISupports addCrash(in long processType, in long crashType, in AString id);

For the record because I didn't know this: XPConnect has magic that auto-converts promises passed as nsISupports.
Attachment #8868138 - Flags: review?(benjamin) → review+
Thanks for the review Benjamin,

(In reply to Benjamin Smedberg [:bsmedberg] from comment #4)
> I guess this is ok, but it might be better to have BIN_SUFFIX in
> appconstants so we're not cribbing this code around so much.

I can file a bug for that; I'll first check how widespread this is but I think I've seen this pattern in a couple of other places too.

> This doesn't need to be an object: nsIObserver is a [function] interface so
> you can/should pass a function directly:
> 
> process.runAsync, args, args.length, (subject, topic, data) => { ... });

OK, I'll change it.

I've got a try run here and I think I broke a test in non-e10s mode which I hadn't tested locally so I'll have to fix it first:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=45ddc4b53d85ec60da8528ecb5c63defbb30ddd6

I've also noticed a small issue: on Windows the minidump-analyzer briefly opens a console window, I have to ensure it doesn't before landing.
I've tried tackling the issue on Windows in a similar way to what I did with the pingsender in bug 1360819 by turning the minidump-analyzer into a GUI application but I've hit another issue: since Windows assumes that GUI applications process input messages (even if they don't have a Window) it shows a throbber on the mouse pointer while the program is running which is annoying.

An alternative to fix this would be to add a parameter to nsIProcess to start the process in hidden mode and use it to pass the SW_HIDE flag to ShellExecuteEx(). This would ensure that the minidump-analyzer runs in the background without disturbing the user, what do you think? The only quirk with this approach is that flag would only affect Windows, as other platforms don't have this problem.
Flags: needinfo?(benjamin)
Quick update: I'm still going through the failing tests; changing code in this particular path tends to make tests that grabbed the minidump files on their own either run into races or not clean up the crashes. There's four of them apparently, three showing up on Windows and one on Linux, and I still have to fix two.
Depends on: 1366711
Another update: still going through the tests that fail and there's plenty of them mostly because of races in the way they handle crashes. I'm integrating almost all the changes I was writing for bug 1362041 to make crash-handling race-free across all tests so it's taking a while.
I know I've dealt with windows console programs before, but I totally don't remember the details. Maybe ask mhowell?
Flags: needinfo?(benjamin)
I've addressed the review comments in the original patch which is almost identical and I've added fixes for various tests in a separate one for ease of review, I'll squash them later before landing.

In a nutshell I've tried to make tests rely as much as possible on the utility methods we have to deal with crashes (e.g. SimpleTest.exceptChildProcessCrashes()/BrowserTestUtil.crashBrowser()) so that minidumps are handled properly and there are no races between the CrashService trying to analyze the minidump and the test harness trying to clean up the dumps. For most tests this meant removing whatever hand-crafter (or cargo-culted) harness had been put into the test and replacing it with the appropriate helper.

Unfortunately some tests are truly special. Those either put the minidumps where the test harness won't look for them (e.g. <UAppData>/Crash Reports/pending), or create the crash in such a way that it's not easy - or possible - to tell the harness to expect minidumps and be done with it. In those cases I've changed the tests to call CrashManager.ensureCrashIsPresent() directly to prevent races. It's a bit kludgy but it will go away as soon as I'm done with bug 1323979 which will add a single event to wait for crashes that will make it easy to play nice with the CrashService/CrashManager.
Comment on attachment 8871654 [details]
Bug 1359326 - Fix all affected tests;

https://reviewboard.mozilla.org/r/143172/#review147036

::: dom/plugins/test/mochitest/test_crash_notify_no_report.xul:45
(Diff revision 1)
>        getService(Components.interfaces.nsIProperties);
>      let pendingD = directoryService.get("UAppData",
>                                          Components.interfaces.nsIFile);
>      pendingD.append("Crash Reports");
>      pendingD.append("pending");
>      let dumpFile = pendingD.clone();    

remove trailing whitespace while you're here?
Attachment #8871654 - Flags: review?(benjamin) → review+
Thanks Benjamin, I've addressed your last comment, squashed the patches, rebased on top of central and added the bits to hide the minidump-analyzer window. The try run is here:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=8660fc09f93eec41388f26edcdb9a4b52aa593fd

If all goes well I'll land this today.
There was still a small issue in one test on MacOS X; here's another try run, hopefully the last one:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a0832cd0f7284972fbd94bbb3fa7be42fbe7dd3a
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/001d49708a35
Run the minidump analyzer directly from the CrashService code; r=bsmedberg
https://hg.mozilla.org/mozilla-central/rev/001d49708a35
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Depends on: 1371245
You need to log in before you can comment on or make changes to this bug.