Closed Bug 1211411 Opened 7 years ago Closed 7 years ago

Limit the number of thread hang stats reported to Telemetry


(Toolkit :: Telemetry, defect, P1)




Tracking Status
firefox42 --- wontfix
firefox43 --- fixed
firefox44 --- fixed


(Reporter: Dexter, Assigned: Dexter)


(Blocks 1 open bug)


(Whiteboard: [unifiedTelemetry][measurement:client])


(1 file, 2 obsolete files)

In bug 1187864 comment 3 we found out that some pings report too many thread hang stats.

Since the repeated "(chrome script)" stack entries are not really useful we should collapse those into one "(chrome script)" or "(chrome script <N>)".
Blocks: 1186986
Points: --- → 2
Priority: -- → P2
Whiteboard: [unifiedTelemetry][measurement:client]
We should also limit the number of reported hangs.
So, per bug 1187864 there are different things we can do here:
(1) limit the number of hang stat entries
(2) limit the stack depth to a sane number
(3) collapse repeated successive "(chrome script)" to one "(chrome script)" or "(chrome script <N>)"

For (3) i think we should just stay with collapsing into the simple "(chrome script)" (unless there are compelling reasons to make it more complicated).
I'd be fine with moving this part to a follow-up if it is no big win.
Points: 2 → 3
Assignee: nobody → alessio.placitelli
Priority: P2 → P1
Analysing the data from bug 1191846, we found out that fixing the stuff from point (2) and (3) in comment 2, along with the future fix from bug 1213780, is enough to bring down the size of the majority of the big pings to less than 1Mb.

This means that we're not limiting the number of thread hang stat entries, for the moment.

In order to find a reasonable maximum stack depth for (2), we've used a python notebook [1] to check the 95th percentile of BHR stack depths received by the Telemetry servers for the Nightly population. This value is 11, so that's the maximum depth our reported stacks will have.

Thanks Vladan and Georg for their feedback and suggestions.

[1] -
Attached patch bug1211411.patch (obsolete) — Splinter Review
This patch changes the BackgroundHangMonitor to clean up the reported hang stacks by collapsing repeated "(chrome script)"/"(content script)" entries.

It also limits the stack depth by removing the topmost frames if greater than the maximum allowed depth.
Attachment #8673517 - Flags: review?(vladan.bugzilla)
Comment on attachment 8673517 [details] [diff] [review]

Review of attachment 8673517 [details] [diff] [review]:

::: xpcom/threads/BackgroundHangMonitor.cpp
@@ +32,5 @@
>  #define BHR_BETA_MOD 100;
> +// Maximum depth of the call stack in the reported thread hangs. This value represents
> +// the 95th pecentile of the thread hangs stack depths reported by Telemetry.
> +const int kMaxThreadHangStackDepth = 11;

static const int

@@ +417,5 @@
> +  // Limit the depth of the reported stack if greater than our limit. Only keep its
> +  // last entries.
> +  if (mHangStack.length() > kMaxThreadHangStackDepth) {
> +    const int elementsToRemove = mHangStack.length() - kMaxThreadHangStackDepth;
> +    mHangStack.erase(mHangStack.begin(), mHangStack.begin() + elementsToRemove);

you can replace this with:

mHangStack.resize(std::min(mHangStack.length, kMaxThreadHangStackDepth));
Attachment #8673517 - Flags: review?(vladan.bugzilla) → review+
Attached patch bug1211411.patch (obsolete) — Splinter Review
Thanks for the review. This patch addresses your first comment and, as discussed over IRC, I've kept the second change and added a little comment there.

Moreover, this patch adds the "(reduced stack)" entry to mark that the stack was reduced.
Attachment #8673517 - Attachment is obsolete: true
Attachment #8674405 - Flags: review?(vladan.bugzilla)
Attachment #8674405 - Flags: review?(vladan.bugzilla) → review+
Attached patch bug1211411.patchSplinter Review
This patch fixes the test fallout by changing the int constant to a size_t.

Attachment #8674405 - Attachment is obsolete: true
Attachment #8674878 - Flags: review+
Bug 1211411 - Limit the number of thread hang stats reported to Telemetry. r=vladan
Points: 3 → 1
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
Alessio: how do you feel about uplifting this patch and the patch in bug 1219751 to Beta 43? It's already on Aurora 44 afaict.

If you're ok with uplifting these, I'd like to get them in by the end of the week, in time for next week's A/B experiment
Blocks: 1222894
Flags: needinfo?(alessio.placitelli)
Resolution: FIXED → ---
Comment on attachment 8674878 [details] [diff] [review]

Approval Request Comment
[Feature/regressing bug #]: Limit the depth of thread hang stacks, consistently reducing the size of the pings sent to Telemetry servers.
[User impact if declined]: This isn't a user-facing feature. However, without this patches bug 1222894 could potentially cause sending a lot of oversized pings, wasting both storage and processing time server side, thus increasing costs.
[Describe test coverage new/current, TreeHerder]: This has been on m-c for 3 weeks without any reported issue.
[Risks and why]: Low risk. This only touches Telemetry & Thread Hang stacks, no front facing features or other systems.
[String/UUID change made/needed]: None.

Please note that this is part of an uplift stack required for bug 1222894. This is the stack (the patches correctly apply in order to mozilla-beta): 1213780, 1211411, 1215540, 1219751.
Flags: needinfo?(alessio.placitelli)
Attachment #8674878 - Flags: approval-mozilla-beta?
Comment on attachment 8674878 [details] [diff] [review]

OK on m-c and m-a for a while now.  
Please uplift this as part of the telemetry work for beta 4.
Attachment #8674878 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.