Use AsyncShutdown instead of spinning the event loop in healthreport.jsm

RESOLVED FIXED in Firefox 32

Status

Firefox Health Report
Client: Desktop
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: Yoric, Assigned: Yoric)

Tracking

(Blocks: 1 bug)

unspecified
Firefox 32
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [Async:team][Async Shutdown])

Attachments

(1 attachment, 8 obsolete attachments)

39.71 KB, patch
Yoric
: review+
Details | Diff | Splinter Review
We need to consolidate the shutdown procedure of healthreport.jsm into AsyncShutdown. This should simplify the code, improve shutdown debugging information, and minimize the risk of introducing deadlocks with misbehaving add-ons.
Created attachment 807177 [details] [diff] [review]
Shutting down FHR without spinning the event loop

Here is a first prototype, which passes tests.
Gps, can you tell me what you think of this refactoring? Is that compatible with the design of FHR?
Assignee: nobody → dteller
Attachment #807177 - Flags: feedback?(gps)
Comment on attachment 807177 [details] [diff] [review]
Shutting down FHR without spinning the event loop

In the absence of news from gps, adding f? rnewman.
Attachment #807177 - Flags: feedback?(rnewman)

Comment 3

5 years ago
Comment on attachment 807177 [details] [diff] [review]
Shutting down FHR without spinning the event loop

Review of attachment 807177 [details] [diff] [review]:
-----------------------------------------------------------------

I think this looks good. I'll have to look into the implementation of AsyncShutdown to be sure we're not giving up any robustness guarantees as part of this change.

::: services/healthreport/healthreporter.jsm
@@ +580,5 @@
>  
> +      this._log.warn("Shutdown complete.");
> +      this._shutdownComplete = true;
> +      this._deferredShutdown.resolve();
> +      TelemetryStopwatch.finish(TELEMETRY_SHUTDOWN, this);

I think we may want to surround this entire function with a try..catch so there is no opportunity for deferredShutdown to not resolve.

How does AsyncShutdown handle an unresolved deferredShutdown?

@@ +586,5 @@
>    },
>  
> +  _onStorageClose: function() {
> +    // Do nothing.
> +    // This method is overloaded by the test suite.

This comment is misleading because it is incomplete.

The purpose of this function is to provide a "hook" point so the tests can verify certain actions occurred and so errors can be injected at certain points. Although, we may not always be using the latter.

These functions contribute to a robust test suite testing many failure modes. We either need to inline the functionality the tests need or we preserve these APIs. I vote for the latter because hook points are generic and don't require code modifications to facilitate new kinds of tests.

@@ +591,5 @@
>    },
>  
> +  _onProviderManagerShutdown: function() {
> +    // Do nothing.
> +    // This method is overloaded by the test suite.

Ditto.
Attachment #807177 - Flags: feedback?(gps) → feedback+
(In reply to Gregory Szorc [:gps] from comment #3)
> Comment on attachment 807177 [details] [diff] [review]
> Shutting down FHR without spinning the event loop
> 
> Review of attachment 807177 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I think this looks good. I'll have to look into the implementation of
> AsyncShutdown to be sure we're not giving up any robustness guarantees as
> part of this change.
> 
> ::: services/healthreport/healthreporter.jsm
> @@ +580,5 @@
> >  
> > +      this._log.warn("Shutdown complete.");
> > +      this._shutdownComplete = true;
> > +      this._deferredShutdown.resolve();
> > +      TelemetryStopwatch.finish(TELEMETRY_SHUTDOWN, this);
> 
> I think we may want to surround this entire function with a try..catch so
> there is no opportunity for deferredShutdown to not resolve.
> 
> How does AsyncShutdown handle an unresolved deferredShutdown?

Once bug 917764 has landed, it will crash after one minute of inactivity, with a report detailing which blockers have failed to resolve/reject.

> @@ +586,5 @@
> >    },
> >  
> > +  _onStorageClose: function() {
> > +    // Do nothing.
> > +    // This method is overloaded by the test suite.
> 
> This comment is misleading because it is incomplete.

Sure. Still better than what I had to work with, though :)

But I'll add something more precise – unless you want to take over that bug.
Comment on attachment 807177 [details] [diff] [review]
Shutting down FHR without spinning the event loop

gps showed up :D
Attachment #807177 - Flags: feedback?(rnewman)
Created attachment 818925 [details] [diff] [review]
Shutting down FHR without spinning the event loop, v2

Applied feedback.
Attachment #807177 - Attachment is obsolete: true
Attachment #818925 - Flags: review?(gps)

Comment 8

5 years ago
Comment on attachment 818925 [details] [diff] [review]
Shutting down FHR without spinning the event loop, v2

Review of attachment 818925 [details] [diff] [review]:
-----------------------------------------------------------------

r+ conditional on removing references to the removed telemetry probe.

::: services/healthreport/healthreporter.jsm
@@ -621,5 @@
> -      this._shutdownCompleteCallback = Async.makeSpinningCallback();
> -      this._shutdownCompleteCallback.wait();
> -      this._shutdownCompleteCallback = null;
> -    } finally {
> -      TelemetryStopwatch.finish(TELEMETRY_SHUTDOWN_DELAY, this);

Since we're removing TELEMETRY_SHUTDOWN_DELAY, I reckon we should remove the const from this file and the probe from TelemetryHistograms.json.
Attachment #818925 - Flags: review?(gps) → review+
Created attachment 820223 [details] [diff] [review]
Shutting down FHR without spinning the event loop, v3

Applied feedback.
Try: https://tbpl.mozilla.org/?tree=Try&rev=fa3202ba25c3
Attachment #818925 - Attachment is obsolete: true
Keywords: checkin-needed
Another try run had crashes, believe it's this bug (out of the 5 therein):
https://tbpl.mozilla.org/?tree=Try&rev=2c426f843d08
https://tbpl.mozilla.org/php/getParsedLog.php?id=29570855&full=1&branch=try#error0
Keywords: checkin-needed
Try: https://tbpl.mozilla.org/?tree=Try&rev=9087e350ce2e

For the moment, I cannot reproduce the crashes, so they may be unrelated to this bug.
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/f4449a06e97f
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 28
Depends on: 931642
Backed out for causing bug 931642.
https://hg.mozilla.org/integration/mozilla-inbound/rev/ff49c9feb466
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: Firefox 28 → ---
Whiteboard: [Async][Async Shutdown] → [Async:team][Async Shutdown]
Are we going to reland this?
Flags: needinfo?(dteller)
I'm working on it semi-actively, but so far I haven't found the root of the problem: some sqlite database that looks closed is not always closed by the time we hit shutdown.
Flags: needinfo?(dteller)
Created attachment 8394434 [details] [diff] [review]
Shutting down FHR without spinning the event loop, v4

A new version that apparently doesn't cause bug 931642.

https://tbpl.mozilla.org/?tree=Try&rev=738f8442bd3b
https://tbpl.mozilla.org/?tree=Try&rev=0024f6da9249
Attachment #820223 - Attachment is obsolete: true
Attachment #8394434 - Flags: review?(gps)
Comment on attachment 8394434 [details] [diff] [review]
Shutting down FHR without spinning the event loop, v4

Since gps is busy, rnewman might be available.
Attachment #8394434 - Flags: review?(rnewman)
Comment on attachment 8394434 [details] [diff] [review]
Shutting down FHR without spinning the event loop, v4

Review of attachment 8394434 [details] [diff] [review]:
-----------------------------------------------------------------

I don't have the bandwidth to give this adequate review right now, and gps is better qualified to spot any errors in this regardless.

If you're confident that this is an incremental change over the version you landed (interdiff won't tell me) to fix the errors without introducing new problems, and you've done enough manual testing to be confident, you may consider my f+ to be rs=me.

::: services/healthreport/healthreporter.jsm
@@ +121,5 @@
>  
>        try {
>          this._s = yield CommonUtils.readJSON(this._filename);
> +      } catch (ex if ex instanceof OS.File.Error
> +                  && ex.becauseNoSuchFile) {

Nit: && at end of line.

@@ +328,5 @@
> +      } catch (ex) {
> +        this._log.error("Error deleting last payload: " +
> +                        CommonUtils.exceptionStr(ex));
> +      }
> +      // As soon as we have could storage, we need to register cleanup or

s/have could/could have

@@ +331,5 @@
> +      }
> +      // As soon as we have could storage, we need to register cleanup or
> +      // else bad things happen on shutdown.
> +      Services.obs.addObserver(this, "quit-application", false);
> +      // The database needs to be shut down by the end of shutdown

Newlines before comments throughout, please. (Yeah, I know this is copypasta, but let's fix while we're breaking blame.)
Attachment #8394434 - Flags: review?(rnewman) → feedback+
Well, this is blocking me and neither gps nor you has the bandwidth. How do you suggest I proceed?
(In reply to David Rajchenbach Teller [:Yoric] (please use "needinfo?" - I'll be away on April 9th-16th) from comment #20)
> Well, this is blocking me and neither gps nor you has the bandwidth. How do
> you suggest I proceed?

I gave some suggestions on how to proceed in the second paragraph of Comment 19, but I'm now back from my PTO. I will try to find time to give this full attention later today or tomorrow.
Thanks, I appreciate.
Comment on attachment 8394434 [details] [diff] [review]
Shutting down FHR without spinning the event loop, v4

Review of attachment 8394434 [details] [diff] [review]:
-----------------------------------------------------------------

Back to you with some questions, David.

::: services/healthreport/healthreporter.jsm
@@ +311,5 @@
>      }
>  
>      this._initializeStarted = true;
>  
> +    return Task.spawn(function*() {

function* is incredibly poorly documented:

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/function*

so I honestly have no idea if this code is correct. For example, line 354 is simply:

    return;

which seems incorrect if init() is supposed to return a promise.

Could you point me to better documentation?

@@ +322,5 @@
> +          yield this._deleteOldLastPayload();
> +          this._state._s.removedOutdatedLastpayload = true;
> +          // Normally we should save this to a file but it directly conflicts with
> +          //  the "application re-upgrade" decision in HealthReporterState::init()
> +          //  which specifically does not save the state to a file.

Remove double spaces.

@@ +355,5 @@
> +        }
> +
> +        yield this._initializeProviderManager();
> +        yield this._onProviderManagerInitialized();
> +        this._initializedDeferred.resolve();

So we no longer call and return onInit(), which means that _initializedDeferred is not the promise that's returned from this method, right? That seems wrong.

@@ +570,4 @@
>    },
>  
> +  onInit: function() {
> +    return this._initializedDeferred.promise;

This is no longer called.
Attachment #8394434 - Flags: review?(gps)
(In reply to Richard Newman [:rnewman] from comment #23)
> >      this._initializeStarted = true;
> >  
> > +    return Task.spawn(function*() {
> 
> function* is incredibly poorly documented:
> 
> https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/
> function*
> 
> so I honestly have no idea if this code is correct. For example, line 354 is
> simply:
> 
>     return;
> 
> which seems incorrect if init() is supposed to return a promise.
> 
> Could you point me to better documentation?

A function* always returns an iterator, regardless of its return statements. In turn, Task.spawn returns a Promise, regardless of what the function*() returns.

Doing |return somePromise;| in a |Task.spawn(function*() { ... }| is essentially a shorthand for

 let value = yield somePromise;
 return value;

> @@ +355,5 @@
> > +        }
> > +
> > +        yield this._initializeProviderManager();
> > +        yield this._onProviderManagerInitialized();
> > +        this._initializedDeferred.resolve();
> 
> So we no longer call and return onInit(), which means that
> _initializedDeferred is not the promise that's returned from this method,
> right? That seems wrong.

The two promises are pretty much equivalent, since both of them are resolved immediately after this._initializedDeferred, but right, let's lift the ambiguity by calling onInit() again.
Created attachment 8408247 [details] [diff] [review]
Shutting down FHR without spinning the event loop, v5

Applied feedback.
Attachment #8394434 - Attachment is obsolete: true
Attachment #8408247 - Flags: review?(rnewman)
Comment on attachment 8408247 [details] [diff] [review]
Shutting down FHR without spinning the event loop, v5

Review of attachment 8408247 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good to me. Fingers crossed.
Attachment #8408247 - Flags: review?(rnewman) → review+
Yoric: please make sure that the xpcshell tests pass -- that Try push didn't run 'em.
Created attachment 8414638 [details] [diff] [review]
Shutting down FHR without spinning the event loop, v6

Same code, plus unbitrotting.
Try: https://tbpl.mozilla.org/?tree=Try&rev=7d7771ab0c6f
Attachment #8408247 - Attachment is obsolete: true
Attachment #8414638 - Flags: review+
Keywords: checkin-needed
https://hg.mozilla.org/integration/fx-team/rev/1601c1a2cbaf
Keywords: checkin-needed
Whiteboard: [Async:team][Async Shutdown] → [Async:team][Async Shutdown][fixed-in-fx-team]
This was also apparently the cause for Valgrind failures like the below:
https://tbpl.mozilla.org/php/getParsedLog.php?id=38754765&tree=Fx-Team
Weird.
I guess it could be a case of bug 985655.
Indeed, looking more carefully, it can be a case of bug 985655 aka "shutdown is very fragile, any change can break stuff." Let's see if we can make this specific part of shutdown more robust.
Depends on: 985655
Created attachment 8420986 [details] [diff] [review]
Shutting down FHR without spinning the event loop, v7

New patch, now a dependency on the brand new Sqlite.shutdown instead of the coarser grained AsyncShutdown.profileBeforeChange.
Attachment #8414638 - Attachment is obsolete: true
Created attachment 8420992 [details] [diff] [review]
Shutting down FHR without spinning the event loop, v8

Full patch, this time.
Try: https://tbpl.mozilla.org/?tree=Try&rev=979b032dd4d5
Attachment #8420986 - Attachment is obsolete: true
Attachment #8420992 - Flags: review?(rnewman)
Comment on attachment 8420992 [details] [diff] [review]
Shutting down FHR without spinning the event loop, v8

Review of attachment 8420992 [details] [diff] [review]:
-----------------------------------------------------------------

Something of a rubberstamp, given that I'm not familiar with the Barrier mechanism, but it looks fine to me. I imagine problems will be immediately apparent.

::: services/metrics/storage.jsm
@@ +686,5 @@
>  };
>  
> +// Expose an asynchronous barrier `shutdown` that clients may use to
> +// perform last minute cleanup and shutdown requests before this module
> +// is shutdown.

s/shutdown./shut down.
Attachment #8420992 - Flags: review?(rnewman) → review+
Created attachment 8426885 [details] [diff] [review]
Shutting down FHR without spinning the event loop, v9

Applied feedback.
Attachment #8420992 - Attachment is obsolete: true
Attachment #8426885 - Flags: review+
Keywords: checkin-needed
https://hg.mozilla.org/integration/fx-team/rev/b09dd59cc14a
Keywords: checkin-needed
Whiteboard: [Async:team][Async Shutdown] → [Async:team][Async Shutdown][fixed-in-fx-team]
https://hg.mozilla.org/mozilla-central/rev/b09dd59cc14a
Status: REOPENED → RESOLVED
Last Resolved: 5 years ago4 years ago
Resolution: --- → FIXED
Whiteboard: [Async:team][Async Shutdown][fixed-in-fx-team] → [Async:team][Async Shutdown]
Target Milestone: --- → Firefox 32
Depends on: 1017706
Depends on: 1035290
You need to log in before you can comment on or make changes to this bug.