TelemetrySession.getHistograms is expensive

NEW
Unassigned

Status

()

Toolkit
Telemetry
P3
normal
a year ago
a year ago

People

(Reporter: gfritzsche, Unassigned)

Tracking

Trunk
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [qf-])

Per Florian on IRC:
> this is shutdown on my super fast latest generation macboook pro:
> https://perf-html.io/public/6c3c44c91b0a18bfc746809de306763a41b98586/calltree/?implementation=js&range=10.3230_11.7577&search=getHistogra&thread=0
> getHistograms is called twice during shutdown, and takes 26ms the first time, and 16ms the second time
This is called rarely in every-day operation, but hits:
- when the computer wakes up from sleep with Firefox open
- on shutdown
- on some other rare events that make us generate a ping

Per comment 0 it adds something in the order of 40+ms on those events.
Priority: -- → P3
Whiteboard: [qf]
(In reply to Georg Fritzsche [:gfritzsche] from comment #2)

> Per comment 0 it adds something in the order of 40+ms on those events.

On a very fast macbook. Expect this to take significantly longer on the quantum reference hardware.
Here is a profile of waking the quantum reference device: https://perfht.ml/2qgkR4h

Telemetry stuff blocks the main thread for almost 900ms.

Bug 1363453 dominates, but even with that excluded, there's 100ms of telemetry jank left.

getHistograms is less expensive than I thought (is it cheaper in a profile that's almost empty?), but still blocks the main thread for ~40ms.
Hi Georg,

We were trying to figure out how to triage this today and this one is tricky.  From one hand, per comment 2, I'd like to say this is [qf-], but part of me also wants to say per comment 4, no function should be allowed to take 900ms!  One thing I'm not sure about is to what extent this affects users on the release channel in the default configuration of Firefox (i.e., those who haven't explicitly opted into telemetry.)  Would you mind advising please?
Flags: needinfo?(gfritzsche)

Updated

a year ago
Depends on: 1369734

Comment 6

a year ago
Telemetry has a scheduler[1] that ticks every 5min (variable by pref) and on idle and on active and on sleep wake. It checks if the user's due to send a daily ping (due to local midnight), and at most every 5min (not variable by pref) saves an updated aborted-session ping just in case everything goes boom.

On first wake in the morning, we're likely to do both of these things.

Having a sharp subsession edge for the daily ping is desirable, so we'd like to keep its logic intact.

I have filed bug 1369734 to see if there's anything more intelligent we could do for aborted-session (since that's the one in the profile anyway). Likely we can kick it by one tick (5min) so that it shouldn't interfere with wake, but I make no promises until the bug's resolved :)

[1]: http://searchfox.org/mozilla-central/source/toolkit/components/telemetry/TelemetrySession.jsm#264
Flags: needinfo?(gfritzsche)
(In reply to Chris H-C :chutten from comment #6)

> I have filed bug 1369734 to see if there's anything more intelligent we
> could do for aborted-session (since that's the one in the profile anyway).
> Likely we can kick it by one tick (5min) so that it shouldn't interfere with
> wake, but I make no promises until the bug's resolved :)

If you don't want to wait an additional 5 minutes, just starting it off requestIdleCallback (or Services.tm.mainThread.idleDispatch) would move it out of the user's way (unless it takes a very long time).
Here is an interesting startup profile: https://perfht.ml/2qWolsT

22s after the start of the process, there's telemetry code running for too long while the browser is already janky. And this happens again 29s after the start of the process. The stack contains something from resource://testpilot-addon/ so it may be an add-on triggering the collection of telemetry data, but still this takes too long for something done synchronously on the main thread.

The first time getHistograms takes 208ms, the second time it takes 107ms.
(In reply to Florian Quèze [:florian] [:flo] from comment #8)
> Here is an interesting startup profile: https://perfht.ml/2qWolsT
> 
> 22s after the start of the process, there's telemetry code running for too
> long while the browser is already janky. And this happens again 29s after
> the start of the process. The stack contains something from
> resource://testpilot-addon/ so it may be an add-on triggering the collection
> of telemetry data, but still this takes too long for something done
> synchronously on the main thread.
> 
> The first time getHistograms takes 208ms, the second time it takes 107ms.

Thanks Florian! That's interesting, looks like https://github.com/mozilla/testpilot/issues/2422
FWIW I suggest disabling the test pilot add-on.  The bug that Alessio pointed to makes the combination of testpilot + telemetry poisonous for performance testing unfortunately.
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #10)
> FWIW I suggest disabling the test pilot add-on.  The bug that Alessio
> pointed to makes the combination of testpilot + telemetry poisonous for
> performance testing unfortunately.

This profile was sent to me by another Mozilla employee (who complained about Firefox being slow to start on his old machine, so I told him to capture a profile and send it my way), it's not on a machine I control, so I won't be disabling add-ons on it.
(In reply to Chris H-C :chutten from comment #6)
> Telemetry has a scheduler[1] that ticks every 5min (variable by pref) and on
> idle and on active and on sleep wake. It checks if the user's due to send a
> daily ping (due to local midnight), and at most every 5min (not variable by
> pref) saves an updated aborted-session ping just in case everything goes
> boom.
> 
> On first wake in the morning, we're likely to do both of these things.
> 
> Having a sharp subsession edge for the daily ping is desirable, so we'd like
> to keep its logic intact.
> 
> I have filed bug 1369734 to see if there's anything more intelligent we
> could do for aborted-session (since that's the one in the profile anyway).
> Likely we can kick it by one tick (5min) so that it shouldn't interfere with
> wake, but I make no promises until the bug's resolved :)
> 
> [1]:
> http://searchfox.org/mozilla-central/source/toolkit/components/telemetry/
> TelemetrySession.jsm#264

I think based on this comment I'm gonna qf- this one and we'll triage bug 1369734.

Updated

a year ago
Whiteboard: [qf] → [qf-]
You need to log in before you can comment on or make changes to this bug.