Closed Bug 888373 Opened 11 years ago Closed 11 years ago

Simple API for detecting startup/shutdown crashes

Categories

(Toolkit :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla29

People

(Reporter: Yoric, Assigned: brennan.brisad)

References

Details

(Whiteboard: [mentor=Yoric][lang=js][mentored bug but not a simple bug][qa-])

Attachments

(2 files, 15 obsolete files)

7.33 KB, patch
Details | Diff | Splinter Review
11.66 KB, patch
Details | Diff | Splinter Review
At the moment, we have several – very inefficient and very imprecise – manners of detecting startup crashes and, to the best of my knowledge, no manner of detecting shutdown crashes.

We should develop a simple component to help us determine whether the last run of the application stopped with a crash, and if yes at which stage.
Hi, I would like to work with this if it's possible.

I'm quite new to Mozilla and have only done a few fixes so far, so I'd need some help to get started.  But if that's alright I'd be happy to be able to work with this one.
Flags: needinfo?(dteller)
Our objective, here, is to determine whether the previous session of Firefox has completed successfully, or, if it has crashed, at which stage. Conveniently, Firefox notifies observers of the stages as it reaches them (see https://developer.mozilla.org/en-US/docs/Observer_Notifications for the list of such observer notifications).

We will need to create a new component (let's call it CrashMonitor.jsm). This component will observe the relevant notifications and write down to disk a small file with the list of notifications that have been sent at least once during the session.

During startup, the component will also reload the file from the previous session, so that we can use it to determine which notifications have been sent at least once.

We are interested in the following notifications:
- profile-after-change (see also https://developer.mozilla.org/en-US/docs/XPCOM/Receiving_startup_notifications );
- final-ui-startup;
- sessionstore-windows-restored;
- quit-application-granted;
- quit-application;
- profile-change-net-teardown;
- profile-change-teardown;
- profile-before-change.

You should use OS.File for reading from/writing to disk.

If you have any question, don't hesitate to come and chat on IRC, channel #introduction.
Flags: needinfo?(dteller)
is it the extension of the Session Restore Bug 
https://bugzilla.mozilla.org/show_bug.cgi?id=883609
??
Isn't this what SHUTDOWN_OK telemetry reports? What more do you need?
Taras: We are attempting to unify the several reports used by Telemetry (which relies on main thread preference flushing, iirc), Session restore (which relies on rewriting the whole session restore during startup) and one or two others I forgot into something more lightweight and that requires only the off main thread reading/writing of a few hundred bytes.
I've created a new component that receives the notifications and which can read and write some information to a small file.  I'm attaching a small first implementation for feedback, and I have some accompanying thoughts and questions:

Now it is the actual .jsm module that listens for the notifications, except for the 'profile-after-change' (writing this I just realized I forgot to write that actual notification to file).  Maybe all of them should be handled by the CrashMonitor in nsCrashMonitor.js and maybe I'm cheating by just putting the observe and QueryInterface methods into the .jsm module?

I'm not sure about the naming convention of the objects and their properties.  I just simply tacked on an underscore on one of them to avoid name collision.  I've seen underscores put on some properties of objects in other code, is it a convention for "private" properties?

I don't understand the ownsWeak parameter of addObserver, and it just happens to be false for now.

Is it safe to remove the file right after creating the OS.File.read promise?
Attachment #776375 - Flags: feedback?(dteller)
(In reply to Michael Brennan from comment #6)
> Created attachment 776375 [details] [diff] [review]
> First implementation for feedback
> 
> I've created a new component that receives the notifications and which can
> read and write some information to a small file.  I'm attaching a small
> first implementation for feedback, and I have some accompanying thoughts and
> questions:
> 
> Now it is the actual .jsm module that listens for the notifications, except
> for the 'profile-after-change' (writing this I just realized I forgot to
> write that actual notification to file).  Maybe all of them should be
> handled by the CrashMonitor in nsCrashMonitor.js and maybe I'm cheating by
> just putting the observe and QueryInterface methods into the .jsm module?

No, that's ok.
You don't need the QueryInterface in the .jsm module, by the way. If you look at the definition of nsIObserver, you will see that it is labelled |function|, which means that you can pass a simple function, rather than having to implement a XPCOM component.

> 
> I'm not sure about the naming convention of the objects and their
> properties.  I just simply tacked on an underscore on one of them to avoid
> name collision.  I've seen underscores put on some properties of objects in
> other code, is it a convention for "private" properties?

It is indeed a convention. However, we don't want CrashMonitor to be private. If you just want to avoid collisions, replace
 Components.utils.import("resource://gre/modules/CrashMonitor.jsm");
with
 // Import in a scope
 let Scope = {}
 Components.utils.import("resource://gre/modules/CrashMonitor.jsm", Scope);
 // Give it a local name
 let MonitorAPI = Scopie.CrashMonitor;

> I don't understand the ownsWeak parameter of addObserver, and it just
> happens to be false for now.

False is ok in JS.

> Is it safe to remove the file right after creating the OS.File.read promise?

If you remove it with OS.File.remove and if you don't need it again, yes, as OS.File serializes all its requests.
Comment on attachment 776375 [details] [diff] [review]
First implementation for feedback

Review of attachment 776375 [details] [diff] [review]:
-----------------------------------------------------------------

Some preliminary feedback.
I have to run, I'll try and continue later.

::: toolkit/components/crashmonitor/CrashMonitor.jsm
@@ +25,5 @@
> +];
> +
> +this._CrashMonitor = {
> +
> +  previousNotifications: [],

Every field you put in _CrashMonitor is public. That's probably not what you want, so you should move all the private data out of the object.

@@ +33,5 @@
> +  decoder: new TextDecoder(),
> +
> +  encoder: new TextEncoder(),
> +
> +  path: OS.Path.join(OS.Constants.Path.profileDir, "notifications.txt"),

Rather ".json" than ".txt".

::: toolkit/components/crashmonitor/crashmonitor.manifest
@@ +1,3 @@
> +component {d9d75e86-8f17-4c57-993e-f738f0d86d42} nsCrashMonitor.js
> +contract @mozilla.org/base/crashmonitor;1 {d9d75e86-8f17-4c57-993e-f738f0d86d42}
> +category profile-after-change CrashMonitor @mozilla.org/base/crashmonitor;1

I would call the component "@mozilla.org/toolkit/crashmonitor;1".

::: toolkit/components/crashmonitor/nsCrashMonitor.js
@@ +14,5 @@
> +  observe: function (aSubject, aTopic, aData) {
> +    switch (aTopic) {
> +    case "profile-after-change":
> +      _CrashMonitor.init();
> +    }

Nit: You probably want to unregister this observer once initialization is complete.
Attachment #776375 - Flags: feedback?(dteller) → feedback+
Comment on attachment 776375 [details] [diff] [review]
First implementation for feedback

Review of attachment 776375 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/crashmonitor/CrashMonitor.jsm
@@ +27,5 @@
> +this._CrashMonitor = {
> +
> +  previousNotifications: [],
> +
> +  receivedNotifications: [],

I'm not sure I understand the difference you make between receivedNotifications and previousNotifications. In my mind, previousNotifications is the list of what happened the last time you launched Firefox, so it won't change once it's loaded.

@@ +48,5 @@
> +  },
> +
> +  observe: function(aSubject, aTopic, aData) {
> +    // Add topic to file if it's the first time received
> +    if (this.previousNotifications.indexOf(aTopic) == -1) {

That's not very nice.
Why not making an previousNotifications an object?

@@ +66,5 @@
> +  },
> +
> +  loadPreviousNotifications: function() {
> +    OS.File.read(this.path).then(
> +      this.readContents.bind(this)

You need to handle errors. Also, Task.jsm would make code easier to read.

@@ +79,5 @@
> +        file.write(array).then(function onSuccess(written) {
> +          file.close();
> +        });
> +      }).bind(this)
> +    );

Just call OS.File.writeAtomic.
- We should track "profile-do-change" as well if we want to eliminate startup crash detection via prefs:

http://mxr.mozilla.org/mozilla-central/source/toolkit/components/startup/nsAppStartup.cpp#733

- Can we give the output file a more descriptive name? "notifications.txt" is extremely vague and generic, maybe something like "sessionCheckpoint.txt"?

- Can we write patches to refactor startup crash detection and session store to use this new mechanism, before landing this patch?
Flags: needinfo?(dteller)
Thanks for the feedback!

I have tried to resolve all the problems you noted, and also the ones from Vladan.  I wasn't able to unregister the profile-after-change event though, and when i asked on IRC I got the answer it wasn't possibe when using the manifest file.

For the error handling: when reading the file at startup I check if the read was successful, and if it wasn't I just stop and write a debug message.  The file won't be there the first time this is run.  For the rest of the errors I just dump them on the console, I don't know if that good or not.
Attachment #776375 - Attachment is obsolete: true
Attachment #778814 - Flags: feedback?(dteller)
(In reply to Vladan Djeric (:vladan) from comment #10)
> - We should track "profile-do-change" as well if we want to eliminate
> startup crash detection via prefs:
> 
> http://mxr.mozilla.org/mozilla-central/source/toolkit/components/startup/
> nsAppStartup.cpp#733

The first notification that can be received easily is profile-after-change.

> - Can we give the output file a more descriptive name? "notifications.txt"
> is extremely vague and generic, maybe something like "sessionCheckpoint.txt"?

What about "sessionCheckpoint.json"?

> - Can we write patches to refactor startup crash detection and session store
> to use this new mechanism, before landing this patch?

Good idea. Tim, do you wish to handle the session store patch?
Flags: needinfo?(dteller) → needinfo?(ttaubert)
Comment on attachment 778814 [details] [diff] [review]
Updated implementation after feedback

Review of attachment 778814 [details] [diff] [review]:
-----------------------------------------------------------------

Good progress.
As suggested by Vladan, we'll want to see how that can replace some other crash monitors currently in the code before we proceed.

::: toolkit/components/crashmonitor/CrashMonitor.jsm
@@ +7,5 @@
> +
> +const Cu = Components.utils;
> +const Cc = Components.classes;
> +const Ci = Components.interfaces;
> +const Cr = Components.results;

You don't seem to need Cc, Ci or Cr, so just get rid of them.

@@ +25,5 @@
> +  "profile-before-change",
> +  "profile-do-change",
> +];
> +
> +let CrashMonitorInternal = {

You'll need a little documentation for |receivedNotifications|.
Also, I'd |checkpoints| is probably a better name.

@@ +58,5 @@
> +        yield this._writeFile();
> +      }.bind(this)).then(
> +        null,
> +        function (err) { dump(err + "\n"); }
> +      );

If you're using Task.jsm, use it more :)

Task.spawn(function() {
  try {
    yield this._writeFile();
  } catch (err) {
    Cu.reportError(err); // We don't want to use |dump| to communicate with users.
  }
});

@@ +63,5 @@
> +    }
> +  },
> +
> +  loadPreviousNotifications: function CM_loadPreviousNotifications() {
> +    let path = CrashMonitorInternal.path;

Nit: You don't need to copy |path|.

@@ +65,5 @@
> +
> +  loadPreviousNotifications: function CM_loadPreviousNotifications() {
> +    let path = CrashMonitorInternal.path;
> +    Task.spawn(function () {
> +      var data;

Please use |let| instead of |var|.

@@ +68,5 @@
> +    Task.spawn(function () {
> +      var data;
> +      try {
> +        data = yield OS.File.read(path);
> +      } catch (ex) {

Again, use Cu.reportError.

@@ +73,5 @@
> +        debug("Unable to read file " + path + ": " + ex + "\n");
> +        return;
> +      }
> +
> +      let contents = CrashMonitorInternal.decoder.decode(data);

Since you are using the decoder only once, just make it a local variable.

@@ +74,5 @@
> +        return;
> +      }
> +
> +      let contents = CrashMonitorInternal.decoder.decode(data);
> +      CrashMonitorInternal.previousNotifications = JSON.parse(contents);

You should also make it non-modifiable (Object.freeze).
Also, I want to be able to see this object from the outside, so you'll need to put it in CrashMonitor itself. Let's call it |previousCheckpoints|.

@@ +78,5 @@
> +      CrashMonitorInternal.previousNotifications = JSON.parse(contents);
> +    }).then(
> +      null,
> +      function(err) { dump(err + "\n"); }
> +    );

Rather than a |then|, just use a try/catch.

@@ +81,5 @@
> +      function(err) { dump(err + "\n"); }
> +    );
> +  },
> +
> +  _writeFile: function CM_writeFile() {

You should probably inline this method into |observe|.

::: toolkit/components/crashmonitor/nsCrashMonitor.js
@@ +18,5 @@
> +  observe: function (aSubject, aTopic, aData) {
> +    switch (aTopic) {
> +    case "profile-after-change":
> +      MonitorAPI.init();
> +    }

You should also unregister yourself from observing "profile-after-change", as this component won't be needed again during the session.
Attachment #778814 - Flags: feedback?(dteller) → feedback+
I meant "(In reply to David Rajchenbach Teller [:Yoric] from comment #13)
> Comment on attachment 778814 [details] [diff] [review]
> Updated implementation after feedback
> 
> Review of attachment 778814 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Good progress.
> As suggested by Vladan, we'll want to see how that can replace some other
> crash monitors currently in the code before we proceed.

I meant "before we land."
(In reply to David Rajchenbach Teller [:Yoric] from comment #12)
> > - Can we write patches to refactor startup crash detection and session store
> > to use this new mechanism, before landing this patch?
> 
> Good idea. Tim, do you wish to handle the session store patch?

Yes, we can do that once the API patch is usable. Steven (smacleod) will be working on it.
Flags: needinfo?(ttaubert)
Attached patch crashmon-v3.patch (obsolete) — Splinter Review
Attachment #778814 - Attachment is obsolete: true
Attachment #787072 - Flags: feedback?(dteller)
I've updated the patch with changes suggested from last feedback.

Is there anything missing that should be added? What about tests?
Comment on attachment 787072 [details] [diff] [review]
crashmon-v3.patch

Review of attachment 787072 [details] [diff] [review]:
-----------------------------------------------------------------

Good progress.
We'll need to start thinking of a way to test stuff. Are you familiar with xpcshell tests already?

::: toolkit/components/crashmonitor/CrashMonitor.jsm
@@ +30,5 @@
> +   * Object where a property with a value of |true| means that the
> +   * notification with the same name as the property name has been
> +   * received at least once by the CrashMonitor during this
> +   * session. Notifications that have not yet been received are not
> +   * present as properties.

Nit: Could you mention the fact that NOTIFICATIONS lists all possible notifications?

@@ +54,5 @@
> +   * Available after |init|. Object where a property with a value of
> +   * |true| means that the notification with the same name as the
> +   * property name was received at least once last session.
> +   */
> +  previousCheckpoints: null,

Since |loadPreviousCheckpoints| is asynchronous, this should be a promise of an object instead of simply an object.

@@ +56,5 @@
> +   * property name was received at least once last session.
> +   */
> +  previousCheckpoints: null,
> +
> +  init: function CM_init() {

Please document the fact that |init| is meant to be called by your XPCOM component only.
Also, you should either throw an error if |init| is called twice or ensure that |init| is idempotent.

@@ +64,5 @@
> +    CrashMonitorInternal.checkpoints["profile-after-change"] = true;
> +
> +    NOTIFICATIONS.forEach(function (aTopic) {
> +      Services.obs.addObserver(this, aTopic, false);
> +    }, this);

Since |loadPreviousCheckpoints()| is asynchronous, you should return the promise returned by that method (see below).

@@ +88,5 @@
> +        } catch (err) {
> +          Cu.reportError(err);
> +        }
> +      });
> +    }

Upon "profile-before-change", could you unregister all observers?

@@ +94,5 @@
> +
> +  /**
> +   * Load checkpoints from previous session.
> +   */
> +  loadPreviousCheckpoints: function CM_loadPreviousCheckpoints() {

Looks like this needs to go to CrashMonitorInternal.

@@ +111,5 @@
> +        CrashMonitor.previousCheckpoints = JSON.parse(contents);
> +        Object.freeze(CrashMonitor.previousCheckpoints);
> +      } catch (err) {
> +        Cu.reportError(err);
> +      }

- A single |try/catch| should be sufficient for this task.
- Also, it's good that you report the error, but I am coming to the conclusion that you should also rethrow it so that any client using the promise you return (see below) is informed that there has been an error.
- Also, if the error is a "file doesn't exist", just catch it and ignore it.

try {
 // ...
} catch (ex if ex instanceof OS.File.Error && ex.becauseNoSuchFile) {
  return; // Ignore
} catch (ex) {
  Cu.reportError(ex);
  return ex;
}

@@ +112,5 @@
> +        Object.freeze(CrashMonitor.previousCheckpoints);
> +      } catch (err) {
> +        Cu.reportError(err);
> +      }
> +    });

Please |return| the promise you obtain with |Task.spawn|.
Attachment #787072 - Flags: feedback?(dteller) → feedback+
Attached patch crashmon-v4.patch (obsolete) — Splinter Review
New patch with updates, not including tests.

I haven't written any xpcshell tests before, but from the documentation and some already existing tests in other components I should be able to write up some tests for it.
Attachment #787072 - Attachment is obsolete: true
Attachment #790027 - Flags: feedback?(dteller)
Assignee: nobody → brennan.brisad
Comment on attachment 790027 [details] [diff] [review]
crashmon-v4.patch

Review of attachment 790027 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me.
You have my r+, with the trivial changes below (no need to ask me again for a review on this patch unless you want one).
Now, I'm interested in tests.

Oh, and don't forget to give a meaningful commit message to your patch, e.g.
Bug 888373: Simple API for detecting startup/shutdown crashes;r=yoric

::: toolkit/components/crashmonitor/CrashMonitor.jsm
@@ +43,5 @@
> +   *
> +   * Each time a new notification is received, this file is written to
> +   * disc to reflect the information in |checkpoints|.
> +   */
> +  path: OS.Path.join(OS.Constants.Path.profileDir, "sessionCheckpoint.json"),

Nit: sessionCheckpoints.json (plural)

@@ +66,5 @@
> +        deferred.resolve(Object.freeze(notifications));
> +      } catch (ex if ex instanceof OS.File.Error && ex.becauseNoSuchFile) {
> +        // Resolve previousCheckpoints promise to an empty object if
> +        // no previous checkpoint file could be found
> +        deferred.resolve(Object.freeze({}));

Ah, I believe there was a misunderstanding over IRC and I realize I was unclear. I believe that you should rather resolve to |null|. This way, clients will be able to know that there simply is no data on the previous start (generally because of an upgrade), rather than believing that the previous start ended in an immediate crash.

@@ +67,5 @@
> +      } catch (ex if ex instanceof OS.File.Error && ex.becauseNoSuchFile) {
> +        // Resolve previousCheckpoints promise to an empty object if
> +        // no previous checkpoint file could be found
> +        deferred.resolve(Object.freeze({}));
> +        return;

That |return| is not needed.

@@ +132,5 @@
> +      CrashMonitorInternal.checkpoints[aTopic] = true;
> +      Task.spawn(function() {
> +        try {
> +          let data = JSON.stringify(CrashMonitorInternal.checkpoints);
> +          let array = CrashMonitorInternal.encoder.encode(data);

Actually, you can skip that line. Recent versions of writeAtomic will encode the data automatically if it is a string.
Attachment #790027 - Flags: feedback?(dteller) → review+
If you want to write tests that crash, and you can do them in xpcshell, you can pattern them after the crashreporter tests:
http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/test/unit/test_crashreporter_crash.js

The head file contains most of the hard work:
http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/test/unit/head_crashreporter.js
Attached patch crashmon-v5.patch (obsolete) — Splinter Review
Attachment #790027 - Attachment is obsolete: true
Attached patch crashmon-tests-v1.patch (obsolete) — Splinter Review
Here's my first attempt of some testing of the API.

There are no tests for checking that received notifications are handled correctly as I am not sure how to handle that at the moment.  Is it possible to cheat and get access to |checkpoints| from the tests even though it is not public?  And for the writing of the file, I thought I'd check that also that is done correctly, but since the promise returned by the spawned task doing the atomic write isn't saved I don't know when it has finished writing.  How can I solve this?

As you can see these tests are just simple checks that initialization and the file operations work.  I guess it would be good to perform a complete and also a crashed session to see how it behaves?  I can take a close look on how |do_get_profile| works so that I can figure out if I can run two sessions in a row.
Attachment #793106 - Flags: feedback?(dteller)
Comment on attachment 793106 [details] [diff] [review]
crashmon-tests-v1.patch

Review of attachment 793106 [details] [diff] [review]:
-----------------------------------------------------------------

These tests look good to me.
I don't think that we need to go to the lengths suggested in comment 21 for testing that, if the process crashes, the file is not modified.

The next step would be to synchronize with smacleod to determine whether he can write a prototype fix for bug 887780 based on your prototype.

::: toolkit/components/crashmonitor/test/unit/head.js
@@ +18,5 @@
> +  sessionCheckpointsPath = OS.Path.join(OS.Constants.Path.profileDir,
> +                                        "sessionCheckpoints.json");
> +  Components.utils.import("resource://gre/modules/CrashMonitor.jsm");
> +  run_next_test();
> +}

I have never seen run_test() in head.js. Looks like it will work, though.
Attachment #793106 - Flags: feedback?(dteller) → feedback+
Flags: needinfo?(smacleod)
Bug 907994 was just filed to annotate crashes with startup/shutdown metadata. I suggest we focus efforts on recording this data at the source instead of deducing it from something else. Bug 875562 is also open to write a "crash log" to make it easier for downstream consumers to get at crash events.
I disagree, because Breakpad is not perfect, and does not catch every crash. I think these mechanisms should work independent of our exception handling to give us better data on crash counts.
In the case of the current bug, our main use case is detecting whether we have crashed at all, so as to display the "This is embarrassing" dialog. Obviously, we have the feature already, but it relies on expensive and unneeded I/O.
(In reply to David Rajchenbach Teller [:Yoric] from comment #24)
> Comment on attachment 793106 [details] [diff] [review]
> crashmon-tests-v1.patch
> 
> Review of attachment 793106 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> These tests look good to me.
> I don't think that we need to go to the lengths suggested in comment 21 for
> testing that, if the process crashes, the file is not modified.
> 
> The next step would be to synchronize with smacleod to determine whether he
> can write a prototype fix for bug 887780 based on your prototype.

Okay, so I'll be on standby then?  I see that you set the needinfo?-flag.  Or is something ready to be checked in already, like the implementation part which got r+ from you?

> 
> ::: toolkit/components/crashmonitor/test/unit/head.js
> @@ +18,5 @@
> > +  sessionCheckpointsPath = OS.Path.join(OS.Constants.Path.profileDir,
> > +                                        "sessionCheckpoints.json");
> > +  Components.utils.import("resource://gre/modules/CrashMonitor.jsm");
> > +  run_next_test();
> > +}
> 
> I have never seen run_test() in head.js. Looks like it will work, though.

I found tests for another component written in that way, so I took it from there.
For the moment, let's keep this bug on standby. This is a case of two teams needing almost the same feature and working each on an implementation. We'll need to determine how to proceed.
(In reply to David Rajchenbach Teller [:Yoric] from comment #12)
> (In reply to Vladan Djeric (:vladan) from comment #10)
> > - We should track "profile-do-change" as well if we want to eliminate
> > startup crash detection via prefs:
> > 
> > http://mxr.mozilla.org/mozilla-central/source/toolkit/components/startup/
> > nsAppStartup.cpp#733
> 
> The first notification that can be received easily is profile-after-change.

I think vladan meant to link to the comment at https://mxr.mozilla.org/mozilla-central/source/toolkit/xre/nsXREDirProvider.cpp?rev=aeab42c59423#780 which I wrote. That startup crash detection code uses no file I/O during startup and is able to track crashes in profile-do-change.
If we're going to replace that then we shouldn't start tracking any later IMO. It seems like we can probably modify this existing code to address new use cases.
(In reply to David Rajchenbach Teller [:Yoric] from comment #29)
> For the moment, let's keep this bug on standby. This is a case of two teams
> needing almost the same feature and working each on an implementation. We'll
> need to determine how to proceed.

I don't think there's any reason to stop progress here; these two efforts seem somewhat orthogonal.
> I don't think there's any reason to stop progress here; these two efforts seem somewhat orthogonal.

Yes, I was reaching the same conclusion.
This API should definitely work for Bug 887780, I'll have a patch up very soon.
Flags: needinfo?(smacleod)
After discussing with smacleod, we identified two changes that need to happen before we can make it work for bug 887780:
1. we should also monitor profile-before-change2;
2. at least profile-before-change and profile-before-change2 should be AsyncShutdown blockers.
Let's make it possible to block profile-before-change2 in AsyncShutdown.
Attachment #810513 - Flags: review?(nfroyd)
(In reply to David Rajchenbach Teller [:Yoric] from comment #34)
> After discussing with smacleod, we identified two changes that need to
> happen before we can make it work for bug 887780:
> 1. we should also monitor profile-before-change2;
> 2. at least profile-before-change and profile-before-change2 should be
> AsyncShutdown blockers.

Will profile-before-change2 fire after profile-before-change at shutdown?
Flags: needinfo?(dteller)
(In reply to Michael Brennan from comment #36)
> Will profile-before-change2 fire after profile-before-change at shutdown?

Indeed.
Flags: needinfo?(dteller)
Attached patch CrashMonitor v6 (obsolete) — Splinter Review
Added profile-before-change2 notification and AsyncShutdown blockers
Attachment #793082 - Attachment is obsolete: true
Attachment #811054 - Flags: review?(dteller)
Comment on attachment 811054 [details] [diff] [review]
CrashMonitor v6

Review of attachment 811054 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/crashmonitor/CrashMonitor.jsm
@@ +39,5 @@
> +  checkpoints: {},
> +
> +  /* Promises for AsyncShutdown blockers */
> +  profileBeforeChangePromise: Promise.defer(),
> +  profileBeforeChangePromise2: Promise.defer(),

Nit: Technically, it's not a Promise but a Deferred.

@@ +121,5 @@
> +
> +    // Add shutdown blockers for profile-before-change and
> +    // profile-before-change2
> +    AsyncShutdown.profileBeforeChange.addBlocker(
> +      "CrashMonitor: Writing received notifications to file",

We use the blocker names for error reporting, so you should provide distinct strings.

@@ +156,5 @@
> +        }
> +      });
> +
> +      if (aTopic == "profile-before-change")
> +        CrashMonitorInternal.profileBeforeChangePromise.resolve(task);

No, you should only resolve these deferred after the writeAtomic is complete (or has failed).
Attachment #811054 - Flags: review?(dteller) → feedback+
Attached patch CrashMonitor v7 (obsolete) — Splinter Review
Applied feedback
Attachment #811054 - Attachment is obsolete: true
Attachment #811074 - Flags: review?(dteller)
Comment on attachment 811074 [details] [diff] [review]
CrashMonitor v7

Review of attachment 811074 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me.
Steven, can you testdrive this updated patch for bug 887780?

::: toolkit/components/crashmonitor/CrashMonitor.jsm
@@ +39,5 @@
> +  checkpoints: {},
> +
> +  /* Deferreds for AsyncShutdown blockers */
> +  profileBeforeChangePromise: Promise.defer(),
> +  profileBeforeChangePromise2: Promise.defer(),

Nit: Can you rename this |profileBeforeChangeDeferred|?

@@ +95,5 @@
> +   * property name was received at least once last session. If no
> +   * previous checkpoint file exists this will resolve to an empty
> +   * object.
> +   */
> +  previousCheckpoints: null,

Just to be on the safe side, it would be a good idea to 1/ move previousCheckpoints to CrashMonitorInternal; 2/ make this a getter; 3/ Object.freeze(this.CrashMonitor).
Attachment #811074 - Flags: review?(dteller) → review+
Comment on attachment 810513 [details] [diff] [review]
Adding support for profile-before-change2 in AsyncShutdown

Review of attachment 810513 [details] [diff] [review]:
-----------------------------------------------------------------

Yuck.  I am not a fan of exposing this to a wider audience.  Why do we need this here?  Reading through bug 887780, it sounds like AsyncShutdown doesn't really change what the crash monitor has been doing all along: the crash monitor has never been able to guarantee that when it receives "profile-before-change" that that means "profile-before-change" is all done...
Good point. We should probably just add a notification "final-sessionstore-written" or something like this.
Comment on attachment 810513 [details] [diff] [review]
Adding support for profile-before-change2 in AsyncShutdown

Review of attachment 810513 [details] [diff] [review]:
-----------------------------------------------------------------

Canceling review, then.
Attachment #810513 - Flags: review?(nfroyd)
No longer blocks: 853779
What's the status here? Bug 921581 has been fixed, can we move this forward?
Michael, do you have everything you need to proceed?
Flags: needinfo?(brennan.brisad)
Michael, we'll need an additional event: "sessionstore-final-state-write-complete"
Sure, but I'm currently on vacation.  I'll do it as soon as I get back, which is in one week.
Flags: needinfo?(brennan.brisad)
Attached patch CrashMonitor v8 (obsolete) — Splinter Review
Applied previous feedback and added new event 'sessionstore-final-state-write-complete'.

I didn't know if profile-before-change2 and its associated blocker should stay or not, so I let it be and simply added the new event to the list. Let me know if there's anything I missed.
Attachment #811074 - Attachment is obsolete: true
Attachment #832928 - Flags: review?(dteller)
Is there a plan for migration to the Crash Monitor?

For example, in Bug 887780 I'm switching Session Store to use the Crash Monitor to decide if we will restore automatically at startup. If an upgrade happens which includes both the Crash Monitor, and the switch to Session Store, for the first run the Crash Monitor will just return |{}| with no checkpoints.

If Session Store is naively checking |checkpoints['sessionstore-final-state-write-complte']|, it will think we have crashed when it might just be a first run.

A workaround might just be to check |checkpoints['sessionstore-final-state-write-complete'] || !checkpoints['final-ui-startup']|, so if the 'final-ui-startup' checkpoint wasn't hit last session, assume it's the first run. This seems pretty hacky to me though, and I haven't verified if this will always result in the correct behavior.

Do you have any thoughts on this?
Flags: needinfo?(dteller)
(In reply to Steven MacLeod [:smacleod] from comment #50)
> For example, in Bug 887780 I'm switching Session Store to use the Crash
> Monitor to decide if we will restore automatically at startup. If an upgrade
> happens which includes both the Crash Monitor, and the switch to Session
> Store, for the first run the Crash Monitor will just return |{}| with no
> checkpoints.
> 
> If Session Store is naively checking
> |checkpoints['sessionstore-final-state-write-complte']|, it will think we
> have crashed when it might just be a first run.
> 

I am not sure how an upgrade works, but to distinguish between a first run and a crash, Crash Monitor will give a |null| if it is unable to read the checkpoints file.  If |{}| is returned it should be interpreted as an instant crash.  I see now that I forgot to update the comments to reflect that, so I'll fix it.
Comment on attachment 832928 [details] [diff] [review]
CrashMonitor v8

Review of attachment 832928 [details] [diff] [review]:
-----------------------------------------------------------------

We'll need to refine a little, but I believe that we're getting close.

::: toolkit/components/crashmonitor/CrashMonitor.jsm
@@ +23,5 @@
> +  "profile-before-change",
> +  "profile-before-change2",
> +  "sessionstore-final-state-write-complete"
> +];
> +

Could you add a global documentation explaining what the crash monitor is for and what it does?

@@ +63,5 @@
> +   */
> +  path: OS.Path.join(OS.Constants.Path.profileDir, "sessionCheckpoints.json"),
> +
> +  /**
> +   * Load checkpoints from previous session.

Please mention that the loading is asynchronous.

@@ +67,5 @@
> +   * Load checkpoints from previous session.
> +   *
> +   * Create the |CrashMonitorInternal.previousCheckpoints| promise and
> +   * spawn a task which will read the previous checkpoints from disk
> +   * and resolve the promise.

That sentence is not very useful.

@@ +69,5 @@
> +   * Create the |CrashMonitorInternal.previousCheckpoints| promise and
> +   * spawn a task which will read the previous checkpoints from disk
> +   * and resolve the promise.
> +   *
> +   * @return {Promise} The promise returned by the spawned task

"A promise that resolves/reject once loading is complete"

@@ +71,5 @@
> +   * and resolve the promise.
> +   *
> +   * @return {Promise} The promise returned by the spawned task
> +   */
> +  loadPreviousCheckpoints: function CM_loadPreviousCheckpoints() {

You can now get rid of the name |CM_loadPreviousCheckpoints|. Same thing for the other methods.

@@ +89,5 @@
> +        deferred.reject(ex);
> +        Cu.reportError(ex);
> +        throw ex;
> +      }
> +    });

On second thought, you should probably resolve to |null| in case of error, regardless of the error.
So, something along the lines of:

try {
  let decoder = ...
} catch (ex if ex ...) {
  return null;
} catch (ex) {
  Cu.reportError("Error while loading crash monitor data: " + ex);
  return null;
}

without any reference to |deferred|.

@@ +102,5 @@
> +  /**
> +   * Notifications received during previous session.
> +   *
> +   * Promise which is only valid after |init|. See
> +   * |CrashMonitorInternal| for details.

Please don't send people to the implementation for the documentation.
Rather, tell them what this returns and what the data structure contains.

@@ +104,5 @@
> +   *
> +   * Promise which is only valid after |init|. See
> +   * |CrashMonitorInternal| for details.
> +   */
> +  get previousCheckpoints() { return CrashMonitorInternal.previousCheckpoints },

This should throw an error if |init| hasn't been called yet.
Also, newline after braces, please

@@ +158,5 @@
> +        try {
> +          let data = JSON.stringify(CrashMonitorInternal.checkpoints);
> +          yield OS.File.writeAtomic(
> +            CrashMonitorInternal.path,
> +            data, {tmpPath: CrashMonitorInternal.path + ".tmp"});

Could you add a comment mentioning that we're doing this off the main thread and asynchronously for performance reason, so we don't have a 100% guarantee that the file is written by the time the notification completes, except for profile-before-change and profile-before-change2 because we add a blocker?

@@ +161,5 @@
> +            CrashMonitorInternal.path,
> +            data, {tmpPath: CrashMonitorInternal.path + ".tmp"});
> +        } catch (ex) {
> +          Cu.reportError(ex);
> +        }

With a few recent changes to Promise, this try/catch is not necessary anymore.

@@ +168,5 @@
> +      if (aTopic == "profile-before-change") {
> +        CrashMonitorInternal.profileBeforeChangeDeferred.resolve(task);
> +      } else if (aTopic == "profile-before-change2") {
> +        CrashMonitorInternal.profileBeforeChange2Deferred.resolve(task);
> +      }

On the other hand, this check should go in a |finally| inside the task, otherwise we're going to resolve before the operation is complete.

::: toolkit/components/crashmonitor/nsCrashMonitor.js
@@ +4,5 @@
> +let Scope = {}
> +Components.utils.import("resource://gre/modules/CrashMonitor.jsm", Scope);
> +let MonitorAPI = Scope.CrashMonitor;
> +
> +const CLASS_ID = Components.ID("{d9d75e86-8f17-4c57-993e-f738f0d86d42}");

Could you inline this in the definition of field |classID| below?

@@ +10,5 @@
> +function CrashMonitor() {};
> +
> +CrashMonitor.prototype = {
> +
> +  classID: CLASS_ID,

Could you add the contractID, too?
Attachment #832928 - Flags: review?(dteller) → feedback+
(In reply to Steven MacLeod [:smacleod] from comment #50)
> Is there a plan for migration to the Crash Monitor?
> 
> For example, in Bug 887780 I'm switching Session Store to use the Crash
> Monitor to decide if we will restore automatically at startup. If an upgrade
> happens which includes both the Crash Monitor, and the switch to Session
> Store, for the first run the Crash Monitor will just return |{}| with no
> checkpoints.

On first use, it will return |null| instead of |{}|. I believe that this fulfills your need, doesn't it?

When/if we upgrade the crash monitor to add new events, we will need to do something slightly smarter, though.
Flags: needinfo?(dteller)
Attached patch CrashMonitor v9 (obsolete) — Splinter Review
Attachment #832928 - Attachment is obsolete: true
Attachment #8335396 - Flags: review?(dteller)
Comment on attachment 8335396 [details] [diff] [review]
CrashMonitor v9

Review of attachment 8335396 [details] [diff] [review]:
-----------------------------------------------------------------

The work in Bug 887780 is pretty much finished up, so there now exists an implemented use of the Crash Monitor. It seems to work great for Session Store's purposes.

::: toolkit/components/crashmonitor/CrashMonitor.jsm
@@ +48,5 @@
> +  "quit-application",
> +  "profile-change-net-teardown",
> +  "profile-change-teardown",
> +  "profile-before-change",
> +  "profile-before-change2",

I was under the impression we would be removing "profile-before-change2", as it's not something we really want to have a wide audience relying on (and the session restore case is now covered). Yoric, was there a reason you'd like to keep this in?

@@ +58,5 @@
> +  /**
> +   * Notifications received during the current session.
> +   *
> +   * Object where a property with a value of |true| means that the
> +   * notification with the same name as the property name has been

I think "... means that the notification of the same name has been ..." is more succinct.

@@ +61,5 @@
> +   * Object where a property with a value of |true| means that the
> +   * notification with the same name as the property name has been
> +   * received at least once by the CrashMonitor during this
> +   * session. Notifications that have not yet been received are not
> +   * present as properties. |NOTIFICATIONS| lists all possible

"...|NOTIFICATIONS| lists the notifications tracked..." would be more accurate (It's "possible" to add more and track them).

::: toolkit/components/crashmonitor/moz.build
@@ +7,5 @@
> +EXTRA_JS_MODULES = [
> +    'CrashMonitor.jsm',
> +]
> +
> +MODULE = 'crashmonitor'

This breaks building for me on latest m-c:

"
 0:13.45 The error was triggered on line 11 of this file:
 0:13.45 
 0:13.45     MODULE = 'crashmonitor'
 0:13.45 
 0:13.45 The underlying problem is an attempt to write a reserved UPPERCASE variable that does not exist.
 0:13.45 
 0:13.45 The variable write causing the error is:
 0:13.45 
 0:13.45     MODULE
 0:13.45 
 0:13.45 Please change the file to not use this variable.
 0:13.45 
 0:13.45 For reference, the set of valid variables is:
 0:13.46 
 0:13.46 A11Y_MANIFESTS, ANDROID_GENERATED_RESFILES, ANDROID_RESFILES, BROWSER_CHROME_MANIFESTS, CONFIGURE_DEFINE_FILES, CONFIGURE_SUBST_FILES, CPP_UNIT_TESTS, DEFINES, DIRS, DIST_SUBDIR, EXPORTS, EXPORT_LIBRARY, EXTERNAL_MAKE_DIRS, EXTRA_COMPONENTS, EXTRA_JS_MODULES, EXTRA_PP_COMPONENTS, EXTRA_PP_JS_MODULES, FAIL_ON_WARNINGS, FINAL_LIBRARY, FINAL_TARGET, FORCE_SHARED_LIB, FORCE_STATIC_LIB, GENERATED_EVENTS_WEBIDL_FILES, GENERATED_INCLUDES, GENERATED_SOURCES, GENERATED_UNIFIED_SOURCES, GENERATED_WEBIDL_FILES, GTEST_SOURCES, HOST_LIBRARY_NAME, HOST_PROGRAM, HOST_SIMPLE_PROGRAMS, HOST_SOURCES, IPDL_SOURCES, IS_COMPONENT, JAVA_JAR_TARGETS, JS_MODULES_PATH, LIBRARY_NAME, LIBS, LIBXUL_LIBRARY, LOCAL_INCLUDES, METRO_CHROME_MANIFESTS, MOCHITEST_CHROME_MANIFESTS, MOCHITEST_MANIFESTS, MOCHITEST_WEBAPPRT_CHROME_MANIFESTS, MSVC_ENABLE_PGO, NO_DIST_INSTALL, NO_VISIBILITY_FLAGS, OS_LIBS, PARALLEL_DIRS, PARALLEL_EXTERNAL_MAKE_DIRS, PREPROCESSED_TEST_WEBIDL_FILES, PREPROCESSED_WEBIDL_FILES, PROGRAM, SDK_LIBRARY, SIMPLE_PROGRAMS, SOURCES, TEST_DIRS, TEST_TOOL_DIRS, TEST_WEBIDL_FILES, TIERS, TOOL_DIRS, UNI
"
Attachment #8335396 - Flags: feedback+
Comment on attachment 793106 [details] [diff] [review]
crashmon-tests-v1.patch

Review of attachment 793106 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/crashmonitor/test/moz.build
@@ +3,5 @@
> +# This Source Code Form is subject to the terms of the Mozilla Public
> +# License, v. 2.0. If a copy of the MPL was not distributed with this
> +# file, You can obtain one at http://mozilla.org/MPL/2.0/.
> +
> +XPCSHELL_TESTS_MANIFESTS += ['unit/xpcshell.ini']

This moz.build file shouldn't exist. Please put test manifest declarations in the parent directory and omit the nearly-empty moz.build file.
Comment on attachment 8335396 [details] [diff] [review]
CrashMonitor v9

Review of attachment 8335396 [details] [diff] [review]:
-----------------------------------------------------------------

r=me, once you have applied Steve's feedback.
We'll also want a review from gps regarding the moz.build stuff.

::: toolkit/components/crashmonitor/CrashMonitor.jsm
@@ +48,5 @@
> +  "quit-application",
> +  "profile-change-net-teardown",
> +  "profile-change-teardown",
> +  "profile-before-change",
> +  "profile-before-change2",

Ah, Steve is right, let's remove profile-before-change2. We'll restore it if needed when needed.
Attachment #8335396 - Flags: review?(dteller) → review+
Attached patch CrashMonitor v10 (obsolete) — Splinter Review
Applied Steve's feedback

Is profile-before-change still the last notification we observe?
Attachment #810513 - Attachment is obsolete: true
Attachment #8335396 - Attachment is obsolete: true
Attached patch CrashMonitor tests v2 (obsolete) — Splinter Review
Removed unnecessary moz.build
Attachment #793106 - Attachment is obsolete: true
Attachment #8336251 - Flags: review?(gps)
Comment on attachment 8336251 [details] [diff] [review]
CrashMonitor tests v2

Review of attachment 8336251 [details] [diff] [review]:
-----------------------------------------------------------------

Just the build bits.
Attachment #8336251 - Flags: review?(gps) → review+
(In reply to Michael Brennan from comment #58)
> Created attachment 8336246 [details] [diff] [review]
> CrashMonitor v10
> 
> Applied Steve's feedback
> 
> Is profile-before-change still the last notification we observe?

Good question. It's probably a good idea to not assume this and remove observers only when *all* notifications have been received.
Attached patch CrashMonitor v11 (obsolete) — Splinter Review
Unregister observers only when all notifications have been received.

If this gets a r+, I'll add the checkin-needed flag, no?
Attachment #8336246 - Attachment is obsolete: true
Attachment #8340453 - Flags: review?(dteller)
Comment on attachment 8340453 [details] [diff] [review]
CrashMonitor v11

Review of attachment 8340453 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me.
Let's mark this checkin-needed!
Attachment #8340453 - Flags: review?(dteller) → review+
Keywords: checkin-needed
Backed out for xpcshell failures. Please run this through Try before requesting checkin again.
https://hg.mozilla.org/integration/mozilla-inbound/rev/5dc0d8f19d26

https://tbpl.mozilla.org/php/getParsedLog.php?id=31324689&tree=Mozilla-Inbound
Flags: in-testsuite+
Sorry about that. My bad.

I'll fix it and run on Try.
Hey Michael, any updates on this? Can we help you with figuring out test failures or anything?
Status: NEW → ASSIGNED
Hi Tim, yeah I had missed to include the crash monitor in package-manifest.in.  I added it to the file and that seemed to do the trick.  I just had some problems using and understanding Try together with multiple patches but I think it's sorted out now.

I hope I'll be able to attach the latest patch later today.
Good to hear, thanks!
Attached patch CrashMonitor v12 (obsolete) — Splinter Review
Two lines added to package-manifest.in.  I hope they are correct and in the right section, but they did fix the failing tests.

https://tbpl.mozilla.org/?tree=Try&rev=a3227f9d53d2
Attachment #8340453 - Attachment is obsolete: true
Attachment #8345452 - Flags: review?(dteller)
Attachment #8336251 - Attachment is obsolete: true
Comment on attachment 8345452 [details] [diff] [review]
CrashMonitor v12

I believe that these changes are part of the build reviews, so redirecting to gps.
Attachment #8345452 - Flags: review?(dteller) → review?(gps)
Attachment #8345452 - Flags: review?(ted)
Comment on attachment 8345452 [details] [diff] [review]
CrashMonitor v12

Review of attachment 8345452 [details] [diff] [review]:
-----------------------------------------------------------------

Build bits r+ modulo the XPIDL_MODULE presumed cargo cult.

::: toolkit/components/crashmonitor/moz.build
@@ +7,5 @@
> +EXTRA_JS_MODULES = [
> +    'CrashMonitor.jsm',
> +]
> +
> +XPIDL_MODULE = 'crashmonitor'

You don't need to define XPIDL_MODULE since you don't define any XPIDL_SOURCES.

::: toolkit/components/crashmonitor/nsCrashMonitor.js
@@ +1,1 @@
> +Components.utils.import("resource://gre/modules/XPCOMUtils.jsm");

Need moar MPL.
Attachment #8345452 - Flags: review?(gps) → review+
Attachment #8345452 - Flags: review?(ted)
Attached patch CrashMonitor v13Splinter Review
Added MPL header and removed XPIDL_MODULE
Attachment #8345452 - Attachment is obsolete: true
Keywords: checkin-needed
https://hg.mozilla.org/integration/fx-team/rev/dfdfcd1c37ff
Keywords: checkin-needed
Whiteboard: [mentor=Yoric][lang=js][mentored bug but not a simple bug] → [mentor=Yoric][lang=js][mentored bug but not a simple bug][fixed-in-fx-team]
https://hg.mozilla.org/mozilla-central/rev/dfdfcd1c37ff
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [mentor=Yoric][lang=js][mentored bug but not a simple bug][fixed-in-fx-team] → [mentor=Yoric][lang=js][mentored bug but not a simple bug]
Target Milestone: --- → mozilla29
Didn't push CrashMonitor tests because I assumed the patch for checkin was folded :(

https://hg.mozilla.org/integration/fx-team/rev/0ecaf7b5bc92
(In reply to Tim Taubert [:ttaubert] from comment #77)
> Didn't push CrashMonitor tests because I assumed the patch for checkin was
> folded :(
> 
> https://hg.mozilla.org/integration/fx-team/rev/0ecaf7b5bc92

I was about to comment about that earlier.  But then I found the patch in m-c and thought it had made it in somehow.  Turns out it was the earlier backed out patch I found.

For the future, should I combine patches into one before checkin?
(In reply to Michael Brennan from comment #78)
> For the future, should I combine patches into one before checkin?

Yes, it would be great to have a "patch for checkin" that has everything folded together. But this was really my fault, your patch didn't say anything about being for checkin and I should have checked that.
https://hg.mozilla.org/mozilla-central/rev/0ecaf7b5bc92 - automated tests

Is there anything in this new feature that needs QA at this moment?
Flags: needinfo?(dteller)
Flags: in-testsuite+
Can't think of anything.
Flags: needinfo?(dteller)
Whiteboard: [mentor=Yoric][lang=js][mentored bug but not a simple bug] → [mentor=Yoric][lang=js][mentored bug but not a simple bug][qa-]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: