Closed Bug 1122047 Opened 9 years ago Closed 9 years ago

Sketch out Telemetry environment module

Categories

(Firefox Health Report Graveyard :: Client: Desktop, defect)

defect
Not set
normal

Tracking

(firefox39 fixed)

RESOLVED FIXED
Firefox 39
Tracking Status
firefox39 --- fixed

People

(Reporter: gfritzsche, Assigned: gfritzsche)

References

Details

(Whiteboard: [ready])

Attachments

(2 files, 8 obsolete files)

This is about getting the telemetry environment implementation started:
* we want to add TelemetryEnvironment.jsm
* sketch it out
* add environment change listener interface
* add data collection interface
Assignee: nobody → gfritzsche
Attached patch Sketch v0 (WIP) (obsolete) — Splinter Review
First rough take.
Attached patch Environment module basics, v1 (obsolete) — Splinter Review
This should be a good start. Fleshed more out, integrated, added basic test.

review? pending the Telemetry design discussion.
Attachment #8551897 - Attachment is obsolete: true
Comment on attachment 8554323 [details] [diff] [review]
Environment module basics, v1

This is ready to review.
Note that this is the basic structure for the environment implementation, it will be filled out by bug 1122048, bug 1122050, bug 1122052.
Attachment #8554323 - Flags: review?(vdjeric)
Attached patch Environment module basics, v2 (obsolete) — Splinter Review
Attachment #8554323 - Attachment is obsolete: true
Attachment #8554323 - Flags: review?(vdjeric)
Attachment #8557946 - Flags: review?(vdjeric)
* Fixed not being able to notify listeners twice.
* Added test coverage for that.
Status: NEW → ASSIGNED
Comment on attachment 8557946 [details] [diff] [review]
Environment module basics, v2

Review of attachment 8557946 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetryEnvironment.jsm
@@ +27,5 @@
> +  /**
> +   * Initialize TelemetryEnvironment.
> +   * @return Promise<> that is resolved when initialization is finished.
> +   */
> +  init: Task.async(function* () {

why isn't init an ordinary function?

@@ +37,5 @@
> +  /**
> +   * Shutdown TelemetryEnvironment.
> +   * @return Promise<> that is resolved when shutdown is finished.
> +   */
> +  shutdown: Task.async(function* () {

ditto for shutdown

@@ +50,5 @@
> +   * Register a listener for environment changes.
> +   * @param name The name of the listener - good for debugging purposes.
> +   * @param listener A JS callback function.
> +   */
> +  registerChangeListener: function (name, listener) {

- i don't quite understand the model. who are the listeners going to be? and what are they going to do when they get a change notification?
- for the use cases envisioned, do we need a "topic" so that different listeners listen to different topics?

@@ +70,5 @@
> +  /**
> +   * Get the environment data in object form.
> +   * @return Promise<Object> Resolved with the data on success, otherwise rejected.
> +   */
> +  getEnvironmentData: Task.async(function* () {

same question about why this is Task.async

@@ +91,5 @@
> +  },
> +
> +  _notify: Task.async(function* () {
> +    do {
> +      this._log.debug("_notify - iteration");

roughly, how big are the Telemetry logs currently?
maybe we should only submit the logs if something actually went wrong?

@@ +106,5 @@
> +  }),
> +
> +  _checkForShutdown: function () {
> +    if (this._shutdown) {
> +      throw new Error("Already shut down");

do we really want to throw exceptions at shutdown? seems like a recipe for data loss. why not just do nothing + log an error

::: toolkit/components/telemetry/TelemetryPing.jsm
@@ +379,5 @@
>  
>      }.bind(this), testing ? TELEMETRY_TEST_DELAY : TELEMETRY_DELAY);
>  
> +    AsyncShutdown.sendTelemetry.addBlocker("Telemetry: shutting down",
> +                                           () => this.shutdown());

Disambiguate the two shutdown blockers in Telemetry (telemetrysession vs telemetryenvironment)

::: toolkit/components/telemetry/tests/unit/test_TelemetryEnvironment.js
@@ +33,5 @@
> +  yield TelemetryEnvironment.init();
> +
> +  // Register some listeners
> +  let results = new Array(4).fill(false);
> +  for (let i=0; i<results.length; ++i) {

trivial: change the spacing
Attachment #8557946 - Flags: review?(vdjeric)
(In reply to Vladan Djeric (:vladan) from comment #6)
> Comment on attachment 8557946 [details] [diff] [review]
> Environment module basics, v2
> 
> Review of attachment 8557946 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: toolkit/components/telemetry/TelemetryEnvironment.jsm
> @@ +27,5 @@
> > +  /**
> > +   * Initialize TelemetryEnvironment.
> > +   * @return Promise<> that is resolved when initialization is finished.
> > +   */
> > +  init: Task.async(function* () {
> 
> why isn't init an ordinary function?
> 
[...]
> ditto for shutdown

Because we will have async operations and need to sync with them.
Shutdown is clearer, we need to at least wait for notify() operations to finish, but even parts of the data collection will be async.
I'm not sure yet if we will need to have async operations in startup(), but i think it's easier to say it is async now then later.

> @@ +50,5 @@
> > +   * Register a listener for environment changes.
> > +   * @param name The name of the listener - good for debugging purposes.
> > +   * @param listener A JS callback function.
> > +   */
> > +  registerChangeListener: function (name, listener) {
> 
> - i don't quite understand the model. who are the listeners going to be? and
> what are they going to do when they get a change notification?
> - for the use cases envisioned, do we need a "topic" so that different
> listeners listen to different topics?

The first listener will be TelemetrySession for the subsession split on environment change.
For the use-cases we have now we don't need a "topic". We only pass something like this around internally for logging/understanding.

> @@ +70,5 @@
> > +  /**
> > +   * Get the environment data in object form.
> > +   * @return Promise<Object> Resolved with the data on success, otherwise rejected.
> > +   */
> > +  getEnvironmentData: Task.async(function* () {
> 
> same question about why this is Task.async

We definitely will have async operations in when collection environment data. E.g.:
* getting addons from addon manager
* ProfileTimesAccessor

> @@ +91,5 @@
> > +  },
> > +
> > +  _notify: Task.async(function* () {
> > +    do {
> > +      this._log.debug("_notify - iteration");
> 
> roughly, how big are the Telemetry logs currently?
> maybe we should only submit the logs if something actually went wrong?

We don't store them to disk. By default only log-level warning and error go to the browser console.
Via prefs you can enable more log levels and dump to stdout (useful for info in test logs).

> @@ +106,5 @@
> > +  }),
> > +
> > +  _checkForShutdown: function () {
> > +    if (this._shutdown) {
> > +      throw new Error("Already shut down");
> 
> do we really want to throw exceptions at shutdown? seems like a recipe for
> data loss. why not just do nothing + log an error

The top-level functions that can get this thrown are Tasks. Tasks that catch errors mean that they reject their promise.
It is mostly supposed to be a clean & succinct way to check & reject.
Note that bug 1122050 and bug 1122052 fill this with more in usage, might be good for more context.
Flags: needinfo?(vdjeric)
(In reply to Georg Fritzsche [:gfritzsche] from comment #7)
> Because we will have async operations and need to sync with them.
> Shutdown is clearer, we need to at least wait for notify() operations to
> finish, but even parts of the data collection will be async.
> I'm not sure yet if we will need to have async operations in startup(), but
> i think it's easier to say it is async now then later.

I guess my bigger concern (and perhaps this is a general gripe about Promises/Task.js) is that this patch does something very straightforward (notifies registered listeners on environment change), but it's very hard to follow its control flow and reason about its behavior.

If the initial set of change listeners isn't going to do anything heavy, wouldn't it be better to just call them synchronously and radically simplify this code.

> The first listener will be TelemetrySession for the subsession split on
> environment change.
> For the use-cases we have now we don't need a "topic". We only pass
> something like this around internally for logging/understanding.

So how will the listener in TelemetrySession know which environment variable changed?

> We definitely will have async operations in when collection environment
> data. E.g.:
> * getting addons from addon manager
> * ProfileTimesAccessor

Won't this state already be cached after Firefox initialization? Would there really be an issue with fetching it synchronously?

> We don't store them to disk. By default only log-level warning and error go
> to the browser console.
> Via prefs you can enable more log levels and dump to stdout (useful for info
> in test logs).

I got confused here. I think we should capture Telemetry errors/warnings and report them in pings (as future work).

> The top-level functions that can get this thrown are Tasks. Tasks that catch
> errors mean that they reject their promise.
> It is mostly supposed to be a clean & succinct way to check & reject.

I think getting exceptions from benign looking functions such as unregisterChangeListener is a recipe for data loss, regardless of whether the caller is in a Task or not. Failing to unregister a listener should not prevent the rest of the calling task from executing.
Flags: needinfo?(vdjeric)
Comment on attachment 8557946 [details] [diff] [review]
Environment module basics, v2

Review of attachment 8557946 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetryEnvironment.jsm
@@ +81,5 @@
> +    this._log.trace("_onEnvironmentChange for " + what);
> +
> +    this._doNotify = true;
> +    if (this._notifyTask) {
> +      return this._notifyTask;

so if there are three consecutive environment changes, will all the listeners be notified three times?
(In reply to Vladan Djeric (:vladan) from comment #9)
> (In reply to Georg Fritzsche [:gfritzsche] from comment #7)
> > Because we will have async operations and need to sync with them.
> > Shutdown is clearer, we need to at least wait for notify() operations to
> > finish, but even parts of the data collection will be async.
> > I'm not sure yet if we will need to have async operations in startup(), but
> > i think it's easier to say it is async now then later.
> 
> I guess my bigger concern (and perhaps this is a general gripe about
> Promises/Task.js) is that this patch does something very straightforward
> (notifies registered listeners on environment change), but it's very hard to
> follow its control flow and reason about its behavior.
> 
> If the initial set of change listeners isn't going to do anything heavy,
> wouldn't it be better to just call them synchronously and radically simplify
> this code.

Ok, fine with me as well.

> > The first listener will be TelemetrySession for the subsession split on
> > environment change.
> > For the use-cases we have now we don't need a "topic". We only pass
> > something like this around internally for logging/understanding.
> 
> So how will the listener in TelemetrySession know which environment variable
> changed?

It doesn't right now. I haven't seen a use-case for specifically knowing what triggered it, so i skipped adding it.

> > We definitely will have async operations in when collection environment
> > data. E.g.:
> > * getting addons from addon manager
> > * ProfileTimesAccessor
> 
> Won't this state already be cached after Firefox initialization? Would there
> really be an issue with fetching it synchronously?

We could presumably cache them (say in TelemetryEnvironment.init()), but at least retrieving addons is async anyway.

> > The top-level functions that can get this thrown are Tasks. Tasks that catch
> > errors mean that they reject their promise.
> > It is mostly supposed to be a clean & succinct way to check & reject.
> 
> I think getting exceptions from benign looking functions such as
> unregisterChangeListener is a recipe for data loss, regardless of whether
> the caller is in a Task or not. Failing to unregister a listener should not
> prevent the rest of the calling task from executing.

Ok.

(In reply to Vladan Djeric (:vladan) from comment #10)
> Comment on attachment 8557946 [details] [diff] [review]
> Environment module basics, v2
> 
> Review of attachment 8557946 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: toolkit/components/telemetry/TelemetryEnvironment.jsm
> @@ +81,5 @@
> > +    this._log.trace("_onEnvironmentChange for " + what);
> > +
> > +    this._doNotify = true;
> > +    if (this._notifyTask) {
> > +      return this._notifyTask;
> 
> so if there are three consecutive environment changes, will all the
> listeners be notified three times?

Yes.
Attached patch Environment module basics, v3 (obsolete) — Splinter Review
This should address the feedback so far.
Attachment #8557946 - Attachment is obsolete: true
Attachment #8561456 - Flags: review?(vdjeric)
Comment on attachment 8561456 [details] [diff] [review]
Environment module basics, v3

Review of attachment 8561456 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetryEnvironment.jsm
@@ +29,5 @@
> +  /**
> +   * Initialize TelemetryEnvironment.
> +   * @return Promise<> that is resolved when initialization is finished.
> +   */
> +  init: function () {

should we check for _shutdown here? some Firefox sessions are brief and begin shutting down before the Telemetry initialization timer has had a chance to fire

@@ +105,5 @@
> +    return {};
> +  }),
> +
> +  _onEnvironmentChange: function (what) {
> +    this._log.trace("_onEnvironmentChange for " + what);

do we want to notify listeners of environment changes after shutdown begins? the notified listeners won't even be able to getEnvironmentData after they get the notification
Attachment #8561456 - Flags: review?(vdjeric) → review+
(In reply to Vladan Djeric (:vladan) from comment #13)
> ::: toolkit/components/telemetry/TelemetryEnvironment.jsm
> @@ +29,5 @@
> > +  /**
> > +   * Initialize TelemetryEnvironment.
> > +   * @return Promise<> that is resolved when initialization is finished.
> > +   */
> > +  init: function () {
> 
> should we check for _shutdown here? some Firefox sessions are brief and
> begin shutting down before the Telemetry initialization timer has had a
> chance to fire

I added a check to silently return on double inits.

> 
> @@ +105,5 @@
> > +    return {};
> > +  }),
> > +
> > +  _onEnvironmentChange: function (what) {
> > +    this._log.trace("_onEnvironmentChange for " + what);
> 
> do we want to notify listeners of environment changes after shutdown begins?
> the notified listeners won't even be able to getEnvironmentData after they
> get the notification

We won't notify them. The idea is to setup the listeners etc. that trigger _onEnvironmentChange() in TelemetryEnvironment.init() and unregister them in .shutdown().
We just need to make sure that this module is shutdown late enough so that we can still send or persist e.g. shutdown pings.
Attached patch Environment module basics, v4 (obsolete) — Splinter Review
Attachment #8561456 - Attachment is obsolete: true
Attachment #8562133 - Flags: review+
Whiteboard: [ready]
(In reply to Georg Fritzsche [:gfritzsche] from comment #14)
> (In reply to Vladan Djeric (:vladan) from comment #13)
> > should we check for _shutdown here? some Firefox sessions are brief and
> > begin shutting down before the Telemetry initialization timer has had a
> > chance to fire
> 
> I added a check to silently return on double inits.

I think we misunderstood each other here:

- I was concerned about what happens when shutdown is called before init is *ever* called. This could happen during a very short Firefox session: shutdown is called in response to an event (e.g. profile-before-change), but init() hasn't been called yet because Telemetry does initialization X seconds after startup (this what Telemetry does currently, there's a 60 second initialization timer)
- Your patch handles init being called twice. I think it's best to 
do this by having a separate flag that gets set when TelemetryEnvironment is initialized.
---- Your patch would allow init() to be called after shutdown, making it possible to do init() twice during a single session (which is undesirable)

> > @@ +105,5 @@
> > > +    return {};
> > > +  }),
> > > +
> > > +  _onEnvironmentChange: function (what) {
> > > +    this._log.trace("_onEnvironmentChange for " + what);
> > 
> > do we want to notify listeners of environment changes after shutdown begins?
> > the notified listeners won't even be able to getEnvironmentData after they
> > get the notification
> 
> We won't notify them. The idea is to setup the listeners etc. that trigger
> _onEnvironmentChange() in TelemetryEnvironment.init() and unregister them in
> .shutdown().
> We just need to make sure that this module is shutdown late enough so that
> we can still send or persist e.g. shutdown pings.

Right but what about this sequence of events:

1. shutdown() is called, it sets _shutdown = true and yields on this._collectTask (shutdown task doesn't finish executing)
2. an environment change happens, and _onEnvironment change is called, notifying listeners
3. listeners try to getEnvironmentData() but the check for _shutdown causes it to reject the Promise
4. _collectTask finally finishes, the shutdown task resumes and finally clears _changeListeners

We don't want step #2 to happen
Flags: needinfo?(gfritzsche)
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #16)
> (In reply to Georg Fritzsche [:gfritzsche] from comment #14)
> > (In reply to Vladan Djeric (:vladan) from comment #13)
> > > should we check for _shutdown here? some Firefox sessions are brief and
> > > begin shutting down before the Telemetry initialization timer has had a
> > > chance to fire
> > 
> > I added a check to silently return on double inits.
> 
> I think we misunderstood each other here:
> 
> - I was concerned about what happens when shutdown is called before init is
> *ever* called. This could happen during a very short Firefox session:
> shutdown is called in response to an event (e.g. profile-before-change), but
> init() hasn't been called yet because Telemetry does initialization X
> seconds after startup (this what Telemetry does currently, there's a 60
> second initialization timer)
> - Your patch handles init being called twice. I think it's best to 
> do this by having a separate flag that gets set when TelemetryEnvironment is
> initialized.
> ---- Your patch would allow init() to be called after shutdown, making it
> possible to do init() twice during a single session (which is undesirable)

Ah, i see. We do init() & shutdown() this module multiple times in tests.
Given that TelemetryPing owns the environment, i think we should fix this there?

> 
> > > @@ +105,5 @@
> > > > +    return {};
> > > > +  }),
> > > > +
> > > > +  _onEnvironmentChange: function (what) {
> > > > +    this._log.trace("_onEnvironmentChange for " + what);
> > > 
> > > do we want to notify listeners of environment changes after shutdown begins?
> > > the notified listeners won't even be able to getEnvironmentData after they
> > > get the notification
> > 
> > We won't notify them. The idea is to setup the listeners etc. that trigger
> > _onEnvironmentChange() in TelemetryEnvironment.init() and unregister them in
> > .shutdown().
> > We just need to make sure that this module is shutdown late enough so that
> > we can still send or persist e.g. shutdown pings.
> 
> Right but what about this sequence of events:
> 
> 1. shutdown() is called, it sets _shutdown = true and yields on
> this._collectTask (shutdown task doesn't finish executing)
> 2. an environment change happens, and _onEnvironment change is called,
> notifying listeners
> 3. listeners try to getEnvironmentData() but the check for _shutdown causes
> it to reject the Promise
> 4. _collectTask finally finishes, the shutdown task resumes and finally
> clears _changeListeners
> 
> We don't want step #2 to happen

I see. I moved clearing out the listeners to before waiting on the collect task.
Also made _onEnvironmentChange bail-out early if _shutdown for robustness.
Flags: needinfo?(gfritzsche)
This addresses comment 16 and comment 17 part 2 - not firing environment changes after initing shutdown.
Attachment #8562133 - Attachment is obsolete: true
Attachment #8563408 - Flags: review+
Attached patch Fix TelemetryPing shutdown (obsolete) — Splinter Review
This fixes comment 16 and comment 17 part 1.

We didn't synchronize on the delayed init task before in TelemetryPing, leading to the concerns about calling TelemetryEnvironment.shutdown before ever calling TelemetryEnvironment.init mentioned in the above comments.
Attachment #8563410 - Flags: review?(dteller)
Comment on attachment 8563410 [details] [diff] [review]
Fix TelemetryPing shutdown

Review of attachment 8563410 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetryPing.jsm
@@ +345,5 @@
>  
>      // Delay full telemetry initialization to give the browser time to
>      // run various late initializers. Otherwise our gathered memory
>      // footprint and other numbers would be too optimistic.
>      let deferred = Promise.defer();

These days, it's `PromiseUtils.defer()`.

However, you can just do

`let promise = this._delayedInitTask = new DeferredTask(...)` and remove the `catch` and the call to `resolve` .

@@ +346,5 @@
>      // Delay full telemetry initialization to give the browser time to
>      // run various late initializers. Otherwise our gathered memory
>      // footprint and other numbers would be too optimistic.
>      let deferred = Promise.defer();
> +    this._delayedInitTask = new DeferredTask(function* () {

What happens if `this._delayedInitTask` is non-null? Don't you end up with the initialization being called twice?

@@ +393,4 @@
>      return deferred.promise;
>    },
>  
>    shutdown: function() {

I'm not entirely comfortable with this method. Can we have as documentation a summary of all possible states in which we can reach it?

@@ +396,5 @@
>    shutdown: function() {
> +    let cleanup = () => TelemetryEnvironment.shutdown();
> +
> +    if (!this._delayedInitTask) {
> +      // We already ran the delayed initialization.

Or setupTelemetry was never called. Is this the expected behavior?
Attachment #8563410 - Flags: review?(dteller) → feedback+
Attached patch Fix TelemetryPing shutdown (obsolete) — Splinter Review
Attachment #8563410 - Attachment is obsolete: true
Attachment #8565382 - Flags: review?(dteller)
Attached patch Fix TelemetryPing shutdown (obsolete) — Splinter Review
Minor fix.
Attachment #8565382 - Attachment is obsolete: true
Attachment #8565382 - Flags: review?(dteller)
Attachment #8565391 - Flags: review?(dteller)
Comment on attachment 8565391 [details] [diff] [review]
Fix TelemetryPing shutdown

Review of attachment 8565391 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetryPing.jsm
@@ +321,5 @@
>  
>    /**
>     * Initializes telemetry within a timer. If there is no PREF_SERVER set, don't turn on telemetry.
>     */
>    setupTelemetry: function setupTelemetry(testing) {

Nit: A little documentation on the state would be nice here, too.
Attachment #8565391 - Flags: review?(dteller) → review+
Thanks, addressed feedback.
Attachment #8565391 - Attachment is obsolete: true
Attachment #8565501 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/876c7db8f32a
https://hg.mozilla.org/mozilla-central/rev/67484fc9e622
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 39
Product: Firefox Health Report → Firefox Health Report Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: