Closed Bug 1213780 Opened 4 years ago Closed 4 years ago

Telemetry is reporting repeated hang annotations in Chrome hangs

Categories

(Toolkit :: Telemetry, defect, P1)

defect
Points:
2

Tracking

()

RESOLVED FIXED
mozilla44
Tracking Status
firefox42 --- wontfix
firefox43 --- fixed
firefox44 --- fixed

People

(Reporter: Dexter, Assigned: Dexter)

References

(Blocks 1 open bug)

Details

(Whiteboard: [unifiedTelemetry][measurement:client])

Attachments

(1 file, 3 obsolete files)

As discovered in bug 1187864 comment 4, Telemetry is reporting repeated annotations for both thread and chrome hangs [1].

In a particular sample ping, I found repeated annotations which summed up to a 2.2Mb blob.

We should probably address this issue by collapsing identical annotations entries.

[1] - https://pastebin.mozilla.org/8847691
Blocks: 1186986
Whiteboard: [unifiedTelemetry][measurement:client]
Flags: needinfo?(thuelbert)
Priority: -- → P2
Aaron, do you know where this comes from?
Any ideas on whats going wrong there or pointers to suspected code?
Flags: needinfo?(aklotz)
This issue is a function of how we're storing and serializing the annotations. The code is not making any effort to collapse identical annotation sets.

I think it would be beneficial to do this collapsing at the time that the annotation is inserted, rather than at JS reflection time.

In toolkit/components/telemetry/Telemetry.cpp at HangReports::AddHang, we should be checking if similar annotations have already been made and just append the index of the hang info to an array.

mAnnotationInfo could be a hash table keyed on the combined annotation strings, and the AnnotationInfo structure would have a vector for mHangIndex instead of being a scalar quantity.

I can do this when I have time but if anybody else is interested then I am more than happy to provide guidance or review.
Flags: needinfo?(aklotz)
(FWIW the data is actually correct in those annotations from Dexter's pastebin, it's just inefficient storage)
Thanks Aaron! We are currently running down the data points that have the biggest impact on big Telemetry ping sizes, so Alessio will probably take this one soon.
Points: --- → 2
Priority: P2 → P1
Summary: Telemetry is reporting repeated hang annotations → Telemetry is reporting repeated hang annotations in Chrome hangs
Flags: needinfo?(thuelbert)
Blocks: 1215540
Assignee: nobody → alessio.placitelli
Status: NEW → ASSIGNED
Attached patch bug1213780.patch (obsolete) — Splinter Review
This patch implements the suggestions from comment 2, fixing the repeated annotations for chrome hangs in Telemetry.

It changes the format of the resulting ping a little bit, from:

{
 "chromeHangs": {
   ...
   "stacks" : [[stack1], [stack2], [stack3]],
   "annotations": [[0, { "key": "value", "key2": "value"}], [2, { "key": "value", "key2": "value"}]]
}

To (Please note that there's no annotation for the second stack, with index 1):

{
 "chromeHangs": {
   ...
   "stacks" : [[stack1], [stack2], [stack3]],
   "annotations": [[0, 2, { "key": "value", "key2": "value"}]]
}

This patch was tested on Windows, on a Release build of Firefox, enabling profiling (--enable-profiling). Debug annotations were added in [1], and a test page with flash content was opened. A JS hanging the browser was executed from the DevTools. The results were examined from the "saved-session" ping written when shutting down the browser.

[1] - https://dxr.mozilla.org/mozilla-central/rev/d1a89632277fbaaf470c90a35573776048988f2d/dom/plugins/ipc/PluginModuleParent.cpp#1111
Attachment #8675795 - Flags: review?(aklotz)
Comment on attachment 8675795 [details] [diff] [review]
bug1213780.patch

Review of attachment 8675795 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, but I'm not 100% happy with the way that this is being serialized. I think that the "annotation object as the final element in the array" is not intuitive at all for human readability. I'd much prefer this:

{
 "chromeHangs": {
   ...
   "stacks" : [[stack1], [stack2], [stack3]],
   "annotations": [[[0, 2], { "key": "value", "key2": "value"}],[[1], { "key3": "value", "key4": "value"}]]
}

(Basically replacing the integral index in the old format with an array of integers in the new format).
Attachment #8675795 - Flags: review?(aklotz) → review-
Attached patch bug1213780.patch (obsolete) — Splinter Review
Thanks Aaron, this makes much sense. I've changed that part of the code to produce the desired output.
Attachment #8675795 - Attachment is obsolete: true
Attachment #8676695 - Flags: review?(aklotz)
Comment on attachment 8676695 [details] [diff] [review]
bug1213780.patch

Review of attachment 8676695 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/Telemetry.cpp
@@ +316,5 @@
> +  if (!aAnnotations) {
> +    return;
> +  }
> +
> +  nsString annotationsKey;

s/nsString/nsAutoString/
Attachment #8676695 - Flags: review?(aklotz) → review+
Attached patch bug1213780.patch (obsolete) — Splinter Review
Changed nsString to nsAutoString.
Attachment #8676695 - Attachment is obsolete: true
Attachment #8677254 - Flags: review+
Attached patch bug1213780.patchSplinter Review
Fixes ASAN tests fallout.
Attachment #8677254 - Attachment is obsolete: true
Attachment #8677511 - Flags: review+
sorry had to back this out for possibly causing test failures like https://treeherder.mozilla.org/logviewer.html#?job_id=5289782&repo=fx-team
Flags: needinfo?(alessio.placitelli)
This doesn't seem to cause the issue as the try push reports 0 hazards for this patch (even though it's marked as busted due to bug 1211402). Bug 1215540 is the root cause.
Flags: needinfo?(alessio.placitelli)
https://hg.mozilla.org/integration/fx-team/rev/076830f89d148b20422c6aa4d239fd787ec9c4b6
Bug 1213780 - Fix Telemetry reporting repeated hang annotations for Chrome hangs. r=aklotz
https://hg.mozilla.org/mozilla-central/rev/076830f89d14
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
Comment on attachment 8677511 [details] [diff] [review]
bug1213780.patch

Approval Request Comment
[Feature/regressing bug #]: Remove duplicated annotations in chrome hangs, consistently reducing the size of the pings sent to Telemetry servers.
[User impact if declined]: This isn't a user-facing feature. However, without this patches bug 1222894 could potentially cause sending a lot of oversized pings, wasting both storage and processing time server side, thus increasing costs.
[Describe test coverage new/current, TreeHerder]: This has been on m-c for two weeks without any reported issue.
[Risks and why]: Low risk. This only touches Telemetry & Chrome Hangs. This is not enabled for all users and restricted to Windows platforms.
[String/UUID change made/needed]: None.

Please note that this is part of an uplift stack required for bug 1222894. This is the stack (the patches correctly apply in order to mozilla-beta): 1213780, 1211411, 1215540, 1219751.
Attachment #8677511 - Flags: approval-mozilla-beta?
Comment on attachment 8677511 [details] [diff] [review]
bug1213780.patch

Keeping ping size minimal sounds good. Please uplift this and the related work; note comment 21 for the order to apply the patches.
Attachment #8677511 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.