Closed Bug 1178923 Opened 7 years ago Closed 7 years ago

[Metrics] Determine which telemetry metrics should be sent regardless of HUD setting status

Categories

(Firefox OS Graveyard :: Developer Tools, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(firefox43 fixed)

RESOLVED FIXED
FxOS-S6 (04Sep)
Tracking Status
firefox43 --- fixed

People

(Reporter: rnicoletti, Assigned: rnicoletti)

References

Details

Attachments

(1 file, 6 obsolete files)

Currently, the hud backend (hud.js) sends the following telemetry information without regard to their status in the developer HUD settings panel:

* security errors
* reflow duration (although reflow count is only sent when the 'reflow' setting is enabled)
* custom metrics (console API-based)

There was no conscious decision resulting in the above metrics being sent without regard to the setting status; it is essentially because the hud.js code is cleaner to send the above without regard to checking their developer HUD status.

We need to make a conscious decision regarding which metrics will be sent without regard to their developer HUD status.
Blocks: 1167710
time-to-load (cold and warm start) is also sent regardless of developer HUD setting status
No longer blocks: 1167710
While thinking about how to implement this it occurred to me that our initial idea of sending telemetry data for HUD metrics that are enabled may not be what we want to do. I'm thinking it would be better to have a core set of 'engineering' telemetry metrics that we will always record when HUD Telemetry is enabled and that we won't send any other HUD metrics regardless of whether they are enabled. For example, I'm thinking we won't ever send the HUD uss memory data because I don't think it's very useful compared to recording uss memory at the various performance marks, which will be a core metric according to me. The core metrics I propose are:

* event loop lag
* reflows
* reflow duration
* errors
* warnings
* security warnings
* app-startup at time of performance marks
* app-memory at time of performance marks

I'm finding performance is reasonable with all of these enabled in the HUD. My other reason for wanting to record these regardless of their HUD status is it seems to me we want to collect as much data as possible and ideally we would collect the data without putting a burden on the developers to explicitly enable settings.

What do you think?
Flags: needinfo?(thills)
Hi Russ,

I agree this makes sense to just always collect a core set of engineering metrics.  I think it's a configuration mess to do it any other way.

In terms of USS, I think that having it at the startup performance mark should be sufficient at least for a first round.  I believe it will be very difficult and potentially useless data to collect just massive amounts of USS data.  We can revisit later if this turns out to be wrong, but I think USS at the startup is a good place to start.

-tamara
Flags: needinfo?(thills)
After attempting a POC for the implementation suggested in comment 2, it's apparent that recording and sending telemetry data for existing hud metrics that are not enabled is in conflict with the hud backend design and architecture; it's not feasible. Therefore, the only telemetry metrics that will always be recorded and sent are:

* app-startup at time of performance marks
* app-memory at time of performance marks

In addition, as mentioned in the original description of this bug, the following metrics are currently always being recorded and sent regardless of whether they are enabled:

* security category
* reflow duration

Part of addressing this bug will be to only send these telemetry metrics when they are enabled in the developer HUD settings.

To summarize, the following telemetry metrics will be sent only when they are enabled in the developer HUD settings:

* event loop lag
* reflows
* reflow duration
* errors
* warnings
* security category

These telemetry metrics will always be sent (they cannot be disabled in the developer HUD settings):

* app-startup at time based on performance marks
* app-memory at time of performance marks
Update to comment 4:

Telemetry metrics sent when they are enabled in the developer HUD settings:

* event loop lag
* reflows
* reflow duration
* errors
* warnings
* security category

Telemetry metrics always sent (they cannot be disabled in the developer HUD settings):

* app-startup at time based on performance marks
* app-memory at time of performance marks
* custom metrics (via the AdvancedTelemetryHelper)
Attached patch respect hud settings (obsolete) — Splinter Review
Hi Jan,

This patch corrects the problem of security category and reflow duration telemetry metrics not respecting HUD settings.
Attachment #8650012 - Flags: review?(janx)
Comment on attachment 8650012 [details] [diff] [review]
respect hud settings

Review of attachment 8650012 [details] [diff] [review]:
-----------------------------------------------------------------

The patch looks correct, but I have a few nits.

Also, I'm not sure there is any value in respecting HUD settings for these two metrics (security category and reflow duration). Tracking their value has virtually no performance impact, because the console listener is active anyway, and the HUD settings are merely there to hide the widgets from the UI, in order not to overload it. I don't see any problems with sending these values to telemetry even if they're not shown in the HUD.

::: b2g/chrome/content/devtools/hud.js
@@ +412,5 @@
>            // Telemetry sends the security error category not the
>            // count of security errors.
> +          if (this._watching[metric.name]) {
> +            target._sendTelemetryEvent({
> +              name: 'security_category',

Nit: I'm not sure this telemetry event name needs to be changed. Also, I believe Tamara's recent patch assumes there will never be "_" characters in metric names. Please leave it as is, or remove the "_".

@@ +470,5 @@
>  
> +        // Telemetry also records reflow duration when recording
> +        // reflows.
> +        if (this._watching[metric.name]) {
> +          target._sendTelemetryEvent({name: 'reflow_duration',

Nit: Same comment as above, please remove the "_".
Attachment #8650012 - Flags: review?(janx) → feedback+
(In reply to Jan Keromnes [:janx] from comment #7)

> 
> ::: b2g/chrome/content/devtools/hud.js
> @@ +412,5 @@
> >            // Telemetry sends the security error category not the
> >            // count of security errors.
> > +          if (this._watching[metric.name]) {
> > +            target._sendTelemetryEvent({
> > +              name: 'security_category',
> 
> Nit: I'm not sure this telemetry event name needs to be changed. Also, I
> believe Tamara's recent patch assumes there will never be "_" characters in
> metric names. Please leave it as is, or remove the "_".
> 
> @@ +470,5 @@
> >  
> > +        // Telemetry also records reflow duration when recording
> > +        // reflows.
> > +        if (this._watching[metric.name]) {
> > +          target._sendTelemetryEvent({name: 'reflow_duration',
> 
> Nit: Same comment as above, please remove the "_".

Jan and Russ,
I believe we need the '-' changed to an '_'.  For the custom histograms, that is handled in the Gaia patch.  In the hud keyed histograms which are predefined in histograms.json, it's tied to the histogram name which is DEVTOOLS_HUD_ + metric.name.toUpperCase().  The style of the histograms.json does not include any '-' so to match up with that we have all underscores and uppercase.

Thanks,

tamara
Hi Jan,

I agree it's a good idea to always send warnings, errors, security, reflow information regardless of their HUD setting status. This patch accomplishes that.
Attachment #8650012 - Attachment is obsolete: true
Attachment #8651116 - Flags: feedback?(janx)
Jan, I am working on a simpler patch. The patch I submitted for feedback has logic to only send the HUD event when the consoleWatcher metric is enabled and to always send the telemetry event. I feel this is not the correct approach. I believe the correct approach is to send the HUD event and the telemetry event if either the metric is enabled or if telemetry is enabled. The developer can use the 'hide hud widgets' if they don't want to see the widgets.

I like this simpler apporoach. However, there is a slightly tricky aspect of it where the widget should only be cleared when the metric is not enabled and telemetry is also disabled. As I see it, this requires the consoleWatcher to be aware of when telemetry is disabled. I don't want to add a settings listener to the consoleWatcher; I'm thinking about the best approach.
Hi Russ, so sorry for the review lag, I got distracted by something else today. I'll get to it first thing tomorrow! (European time)
Hi Jan,

This patch is simpler than the previous one. It considers console watcher metrics to be `watched` if they are enabled in the HUD settings or if telemetry is enabled. This applies to the logic for sending the metrics and for clearing them.
Attachment #8651116 - Attachment is obsolete: true
Attachment #8651116 - Flags: feedback?(janx)
Attachment #8651973 - Flags: feedback?(janx)
Comment on attachment 8651973 [details] [diff] [review]
send console watcher metrics by default

Review of attachment 8651973 [details] [diff] [review]:
-----------------------------------------------------------------

Hi Russ, a few nits here, but I'm concerned by this approach: won't it cause the graphical widgets to stay visible even if the user "unwatches" it from the settings (and telemetry is enabled)?

Instead, I'd prefer an approach where we don't interfere with the un/watched mechanism, by calling `target._logHistogram()` directly (before checking for `watched`) and setting `metric.skipTelemetry = true` to avoid dual reporting. Would that work for you?

::: b2g/chrome/content/devtools/hud.js
@@ +74,5 @@
>    },
>  
> +  addOnTelemetryStatusChangedCallback(callback) {
> +    this._onTelemetryStatusChangedCallbacks.push(callback);
> +  },

Nit: Technically, `SettingsListener.observe()` already acts as a collection of callbacks. It's probably not necessary to create a new one.

Instead, the setting name 'debug.performance_data.advanced_telemetry' could be a constant at the top of this file, and all interested watchers could register their own observer function as needed.

@@ +444,5 @@
>            target.clear({name: metric});
>          }
>        });
>      }
> +  

Nit: Trailing space.

@@ +453,5 @@
>    },
>  
> +  onTelemetryStatusChanged(enabled) {
> +    // If telemetry is disabled, remove any widgets that are disabled.
> +    if (!enabled) {

Nit: Bail-out style is preferred: `if (enabled) { return; }`.

@@ +456,5 @@
> +    // If telemetry is disabled, remove any widgets that are disabled.
> +    if (!enabled) {
> +      let watching = this._watching;
> +      for (let metric in watching) {
> +        if (!watching[metric]) {

Nit: Bail-out style is preferred: `if (watching[metric]) { continue; }`.

@@ +459,5 @@
> +      for (let metric in watching) {
> +        if (!watching[metric]) {
> +          for (let target of this._targets.values()) {
> +            target.clear({name: metric});
> +          } 

Nit: Trailing space.
Attachment #8651973 - Flags: feedback?(janx) → feedback-
I've taken a slightly different approach than you suggested because before calling `_logHistogram` we need to bump the value of the metric. I've consolidated the calls to `bump` into a single function.
Attachment #8651973 - Attachment is obsolete: true
Attachment #8652499 - Flags: review?(janx)
Comment on attachment 8652499 [details] [diff] [review]
send console watcher metrics by default (updated)

Review of attachment 8652499 [details] [diff] [review]:
-----------------------------------------------------------------

A few suggestions to tweak your approach, but before addressing them, please make sure that Telemetry really cares about cumulative counts; otherwise please use the more direct `_logHistogram` approach.

Incrementing cumulative metric counts behind Target's back makes the consoleWatcher do strange things, and I'm wondering if such a count really makes sense for Telemetry? E.g. is it not sufficient to know:

> app:X had an error
> app:X had a warning
> app:Y had an error

Does Telemetry really need to know:

> app:X had its 1st error since launch
> app:X had its 4th warning since launch
> app:Y had its 27th error since launch

These numbers can grow quite high over time, e.g. if the user repeats an action triggering an error, while not being very informative: An app could have 1 critical error that prevents it from being used normally, or it could have 300 errors because the user tapped a problematic button many times.

---

If you do need cumulative counts, please proceed with the nits below. If you do not, please just call `_logHistogram()` for the type of console event and skip telemetry.

::: b2g/chrome/content/devtools/hud.js
@@ +262,5 @@
> +    this.bumpVal(metric);
> +    this.update(metric, message);
> +  },
> +
> +  bumpVal(metric) {

I don't really like the idea of adding a "dry bump" method to the Target API, because `update()` is supposed to be the main entry point with all the safety checks, and `bump()` is supposed to be just an alias.

I thought about making `bumpVal()` more obviously internal (e.g. by prefixing it with "_"), but actually the consoleWatcher needs to bypass the Target API to just increment a value, and such a method would be a detour. Please increment the value in the consoleWatcher directly, and access `target.metrics` to update it, leaving the Target API unchanged.

@@ +569,5 @@
> +   * the telemetry metric.
> +   * If the metric is not being watched, the function
> +   * increments the value of the metric and records the telemetry
> +   * metric.
> +   */  

Nit: Trailing spaces.

@@ +575,5 @@
> +    // If this metric is being watched, bump the value
> +    // and send the HUD event.
> +    if (this._watching[metric.name]) {
> +      target.bump(metric, output);
> +    } else if (developerHUD._telemetry) {

Nit: Please use bail-out style:

if (this._watching[metric.name]) {
  target.bump(metric, output);
  return;
}
if (!developerHUD._telemetry) {
  return;
}
// rest of code

@@ +581,5 @@
> +      // bump the value and record the telemetry metric.
> +      // Although `_logHistogram` respects whether telemetry is enabled,
> +      // the additional check is here so that the metric will not be
> +      // updated -- see `bumpVal` -- when it is not being recorded.
> +      target.bumpVal(metric);

Nit: As mentioned above, please increment the `metric.value` here by accessing the `target.metrics` Map.
Attachment #8652499 - Flags: review?(janx) → feedback-
Jan, thanks for the feedback. You've encouraged me to create the most concise patch of all. I thought about creating a function for the new logic but I figured it was ok inline.
Attachment #8653205 - Flags: review?(janx)
Comment on attachment 8653205 [details] [diff] [review]
send console watcher metrics by default (updated)

Review of attachment 8653205 [details] [diff] [review]:
-----------------------------------------------------------------

Hi Russ, that's a lot more concise indeed.

Technically, you could get rid off one more `if`, by moving the `if(developerHUD._telemetry){}` block inside the `if(!this._watching[metric.name]){}` (persisting the incremented value unconditionally and letting watched metrics report their telemetry normally), but actually I don't think it's a good idea to use different code paths to report telemetry depending on the metric being watched or not.

With the two micro-nits, I'm all out of things to complain about. Thanks for addressing all my concerns!

::: b2g/chrome/content/devtools/hud.js
@@ +550,5 @@
>          return;
>      }
>  
> +    if (developerHUD._telemetry) {
> +      // Always record telemetry for these metrics

Nit: Please add full stop or ":".

@@ +556,5 @@
> +        let value = target.metrics.get(metric.name);
> +        metric.value = (value || 0) + 1;
> +        target._logHistogram(metric);
> +
> +        // Telemetry has already been recorded 

Nit: Trailing space, full stop.
Attachment #8653205 - Flags: review?(janx) → review+
Attachment #8652499 - Attachment is obsolete: true
Comment on attachment 8653662 [details] [diff] [review]
send console watcher metrics by default

Latest patch addresses last review comments and is rebased against b2g-inbound. Will attach try link when gaia-try tree is open.
Attachment #8653662 - Attachment is obsolete: true
Attachment #8653205 - Attachment is obsolete: true
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/1bab2b400ab6
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → FxOS-S6 (04Sep)
You need to log in before you can comment on or make changes to this bug.