Closed Bug 1241362 Opened 5 years ago Closed 5 years ago

Regression: # of BHR jank events doesn't match # of BHR jank stacks


(Toolkit :: Telemetry, defect)

46 Branch
Not set



Tracking Status
firefox45 --- fixed
firefox46 --- fixed
firefox47 --- fixed


(Reporter: vladan, Assigned: avih)



User Story

1. Cause some jank events > 127ms
2. In about:telemetry -> "Thread Hangs", for the Gecko thread, compare # of jank events in the BHR event histogram vs sum of events reported in each BHR stack

These numbers should match


(1 file, 2 obsolete files)

No description provided.
In IRC, :jchen mentioned that when accessing histograms from JS, there's an additional bucket prepended to the time histograms.

That means that the ranges we were using in the histograms are incorrect. For example, bucket 7 in JS actually represents 64-127ms, rather than 128-255. That means that the way we're interpreting it puts it in the next bucket, which increases the reported hang duration range.

The fix would be to add 1 to the index used in the range. I am currently implementing and testing this in statuser, but about:telemetry also needs to be updated.
Duplicate of this bug: 1241296
The BHR histograms are stored differently from other Telemetry histograms, so the issue was just a matter of interpretation. 

Anthony fixed the extension in:

Fix for about:telemetry coming soon
Assignee: nobody → avihpit
The issue here is that TimeHistogram (which is only used for hangs stats) have a different javascript representation than the rest of the histograms, as follows:

1. It includes a dummy (empty) bucket with label "0".

2. For the other buckets, it uses the upper value of the bucket as label (while the other histograms use the lower value as label).

This patch fixes the JS representation of TimeHistogram to match that of the rest of the histograms, and as a result makes the display at about:telemetry behave as expected.

The issue with this patch is that any code which uses the JS representation of TimeHistogram would be affected by this, e.g. the statuser addon, possibly the ping payloads, etc.

So while it's probably the correct approach here, it's too risky without deep investigation of where are these histograms used, followed by modifying all this code, and make sure it remains backward compatible with the "old" (current) JS representation (especially for server-side code, e.g. maybe the telemetry dashboards).

So this patch is only here for reference, and the approach I've actually taken (at the next comment/attachment) is much more conservative: keep the JS representation of Timehistogram as is, and just fix the display of those histograms at about:telemetry - which is all and only the histograms of the "Thread hangs" section.
Fix the display of TimeHistograms at about:telemetry (all and only the histograms of the "Thread hangs" section).

The JS representation of those histograms remains unaffected, and as such there's no need to modify any other code which relies on this awkward representation.
Attachment #8711395 - Attachment is obsolete: true
Attachment #8711398 - Flags: review?(vladan.bugzilla)
Attachment #8711398 - Flags: review?(vladan.bugzilla) → review?(chutten)
Comment on attachment 8711398 [details] [diff] [review]

Review of attachment 8711398 [details] [diff] [review]:

One nit

::: toolkit/content/aboutTelemetry.js
@@ +1126,5 @@
>     * @param aParent Parent element
>     * @param aName Histogram name
>     * @param aHgram Histogram information
>     * @param aOptions Object with render options
>     *                 * exponential: bars follow logarithmic scale

Need to update the doc here to describe the new param
Attachment #8711398 - Flags: review?(chutten) → review+
Just updating the patch to the actual one which landed (addresses the review comment), carrying r+.

Will use this patch for uplift request to 45 once it lands on m-c.
Attachment #8711398 - Attachment is obsolete: true
Attachment #8711849 - Flags: review+
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
Target Milestone: mozilla46 → mozilla47
Comment on attachment 8711849 [details] [diff] [review]

Approval Request Comment
[Feature/regressing bug #]:

[User impact if declined]:
Potential incorrect evaluation (interpretation) of hang histograms at about:telemetry - important for the beta 45 e10s experiment.

[Describe test coverage new/current, TreeHerder]:

[Risks and why]:
Very low risk - only fixes the display of hang histograms at about:telemetry.

[String/UUID change made/needed]:
Attachment #8711849 - Flags: approval-mozilla-beta?
Attachment #8711849 - Flags: approval-mozilla-aurora?
Comment on attachment 8711849 [details] [diff] [review]

Taking it to improve the experiment results.
Attachment #8711849 - Flags: approval-mozilla-beta?
Attachment #8711849 - Flags: approval-mozilla-beta+
Attachment #8711849 - Flags: approval-mozilla-aurora?
Attachment #8711849 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.