Closed Bug 1187864 Opened 9 years ago Closed 9 years ago

Investigate reason for unusually large Telemetry ping sizes

Categories

(Toolkit :: Telemetry, defect, P1)

defect
Points:
1

Tracking

()

RESOLVED FIXED
Tracking Status
firefox42 --- affected

People

(Reporter: gfritzsche, Assigned: Dexter)

References

(Blocks 1 open bug)

Details

(Whiteboard: [measurement:client])

We see some rather large Telemetry pings, we should investigate the reasons behind it and list problematic sections.

Ideally we can collect examples server side and dig into them.
I performed a quick preliminary analysis on some of the samples from bug 1191846:

- Sample 1

Total Size: 6464007 bytes
chromeHangs + threadHangs 6090661 bytes (94%)
chromeHangs 11099

- Sample 2

Total Size: 11983045 bytes
chromeHangs + threadHangs 11855059 bytes (98%)
chromeHangs 39479 bytes

- Sample 3

Total Size: 2557101 bytes

There's a weird addon listed in |environment.addons| (size 1187585 bytes - 46%), which has a huge block of data in the "description" field.

- Sample 4
same as sample 1 and 2, with a bigger chromeHangs.
Points: --- → 1
Priority: -- → P2
Whiteboard: [measurement:client]
I think we need to break this down a bit more with scripting, extracting a list of affected fields from the ping samples.
Maybe we could let something run over it that prints the least-depth path that causes more than N% of the size (where N is >50, say 90, 80, 70, ... whatever gives us good results).

For chromeHangs and stackHangs i think we already want to know more details:
* threadHangStats:
   * where does the bloat in individual entries come from - stacks, histograms, ...?
   * number of entries per section etc.
* chromeHangs:
   * where is the bloat?
   * numbers of entries for stack, memory map and annotations
   * i see repeated plugin entries in "annotations" (https://pastebin.mozilla.org/8847691), is that common?
Assignee: nobody → alessio.placitelli
Priority: P2 → P1
I've been crunching the data with a python script [1], and the results are quite interesting. chromeHangs is not the culprit here (even though there are repeated plugin annotations as Georg noticed), threadHangStats and activeAddons are (at least for the samples that I own).

A typical ping with a huge threadHangsStats section will look like this:

Analysing samples\XXXXX-PING-FILE-XXXX.gz
Ping type main - size 6572640
Gecko hangs: 2735
Hang stats: min 220 max 1497694 avg 2242
/payload - 6568730(99%)
/payload/threadHangStats - 6141387(93%)
/payload/threadHangStats/Gecko/hangs - 6138502(93%)

As you can see, the biggest "hang" in the Gecko thread takes 1497694 bytes (22% of the ping!).
Other thread stats are not listed since they don't really have a big impact on the ping size.

A typical big ping with a huge active addons section will have a rougue addon with a massive description. It's always the same addon throughout the samples that I have.

Analysing samples\XXXX-ANOTHER-SAMPLE.gz
Ping type main - size 7251693
Gecko hangs: 45
Hang stats: min 205 max 653 avg 366
/environment - 7130574(98%)
/environment/addons - 7128080(98%)
/environment/addons/activeAddons - 7124778(98%)
/environment/addons/activeAddons/{XXX UUID HERE XXX} - 7124054(98%)
/environment/addons/activeAddons/{XXX UUID HERE XXX}/description - 7123779(98%)

[1] - https://gist.github.com/Dexterp37/6012d3e715095e2e369a
Going through it one-by-one:

* Addons
I think we can just set an arbitrary length limit for the addon name, description & other text fields - maybe simply 100 chars?

* threadHangStats
We really need to reduce that.
Vladan, Aaron, do either of you have good suggestions on sane limits here for (1) number of hang entries and (2) stack limits?
I guess repeated "(chrome script)" stack entries are not really useful either, collapsing those into one "(chrome script)" or "(chrome script <N>)" could help?

* chromeHangs
I'm worried they could blow up similarly, maybe we can come up with similar numbers here.
Aaron, any idea on those repeated annotations in there (https://pastebin.mozilla.org/8847691)?
Flags: needinfo?(vladan.bugzilla)
Flags: needinfo?(aklotz)
Status: NEW → ASSIGNED
There's already a meta-bug for this btw, feel free to mark it a dupe: https://bugzilla.mozilla.org/show_bug.cgi?id=896744

There's also a half-finished patch by an external contributor for limiting the # of chromehangs in https://bugzilla.mozilla.org/show_bug.cgi?id=896740
Flags: needinfo?(vladan.bugzilla)
(In reply to Georg Fritzsche [:gfritzsche] from comment #4)
> * Addons
> I think we can just set an arbitrary length limit for the addon name,
> description & other text fields - maybe simply 100 chars?

Sure

> * threadHangStats
> We really need to reduce that.
> Vladan, Aaron, do either of you have good suggestions on sane limits here
> for (1) number of hang entries and (2) stack limits?
> I guess repeated "(chrome script)" stack entries are not really useful
> either, collapsing those into one "(chrome script)" or "(chrome script <N>)"
> could help?

yes, we should do that! it would also make the BHR signatures more useful

> * chromeHangs
> I'm worried they could blow up similarly, maybe we can come up with similar
> numbers here.

I'm fine with reporting a maximum # of chromehangs per session. 
If we only report the most recent chromehangs we'll introduce a bias against startup chromehangs, so normally I would suggest using dice-rolls to decide which chrome hang stacks to drop, but we haven't been scrutinizing chrome-hangs closely in a while + chrome-will be replaced with BHR native stacks, so let's not invest in them and just drop the oldest stacks in a session.
+1
Flags: needinfo?(aklotz)
Blocks: 1211404
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.