Closed Bug 1516260 Opened 5 years ago Closed 4 years ago

[meta] Make Telemetry have close-to-zero startup performance impact

Categories

(Toolkit :: Telemetry, enhancement, P3)

enhancement

Tracking

()

RESOLVED INACTIVE
Tracking Status
firefox66 --- affected

People

(Reporter: Gijs, Unassigned)

Details

(Keywords: meta)

I ran a trypush, which showed 5% improvements (10-25ms! - https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=5c3a8123d105&newProject=try&newRevision=5be1497c0bce&framework=1&showOnlyComparable=1&showOnlyImportant=1 ) on session restore tests on the highly performant linux hardware, if we just comment out a bunch of telemetry stuff on startup ( https://hg.mozilla.org/try/rev/5be1497c0bceec835f50520fe6755d71fede94ec ). From profiles we've seen on the (slower, spinny disk) reference hardware, I expect even more significant improvements there (because a bunch of telemetry work is disk-intensive, either directly or via the windows APIs that talk to the registry that talk to disk). I'm also aware that what I commented out is just a small fraction of what telemetry does on startup, so there's definitely a possibility of still more wins in this area.

Obviously we can't just remove the telemetry startup-path/environment code completely. But I think it'd be very worthwhile to consider how much of it really really really needs to run so early on startup (ie how much of it could we add "later", once we're idle post-startup), whether we could externalize some of the environment information gathering to the separate ping sender executable, and/or whether we could cache some of it in a way that reduces the startup impact (after all, the cpu/disk information of the machine is very unlikely to change, and it'd be OK to pick up changes a while after startup).

Dexter/chutten, who could help with an effort in this area from the telemetry side?
Flags: needinfo?(chutten)
Flags: needinfo?(alessio.placitelli)
(Or Georg, or someone else, I'm not fussed. :-) )
Flags: needinfo?(gfritzsche)
(In reply to :Gijs (he/him) from comment #0)
> I ran a trypush, which showed 5% improvements (10-25ms! -
> https://treeherder.mozilla.org/perf.html#/
> compare?originalProject=try&originalRevision=5c3a8123d105&newProject=try&newR
> evision=5be1497c0bce&framework=1&showOnlyComparable=1&showOnlyImportant=1 )
> on session restore tests on the highly performant linux hardware, if we just
> comment out a bunch of telemetry stuff on startup (
> https://hg.mozilla.org/try/rev/5be1497c0bceec835f50520fe6755d71fede94ec ).
> From profiles we've seen on the (slower, spinny disk) reference hardware, I
> expect even more significant improvements there (because a bunch of
> telemetry work is disk-intensive, either directly or via the windows APIs
> that talk to the registry that talk to disk). I'm also aware that what I
> commented out is just a small fraction of what telemetry does on startup, so
> there's definitely a possibility of still more wins in this area.

The super-early annotation you commented out is due to supporting environment
annotations for crashes, as far as I remember. I guess that, if needed, we could
try to annotate with less info/later on. But that has implications for crashes.

Chris should know more about this.

> Obviously we can't just remove the telemetry startup-path/environment code
> completely. But I think it'd be very worthwhile to consider how much of it
> really really really needs to run so early on startup (ie how much of it
> could we add "later", once we're idle post-startup), whether we could
> externalize some of the environment information gathering to the separate
> ping sender executable, and/or whether we could cache some of it in a way
> that reduces the startup impact (after all, the cpu/disk information of the
> machine is very unlikely to change, and it'd be OK to pick up changes a
> while after startup).

There was some preliminary investigation in this area already, see bug 1372845.
The idea was to have a staged environment collection, with tiers that depends on the
performance impact.

> Dexter/chutten, who could help with an effort in this area from the
> telemetry side?

Georg is probably the best person to tell :)
Flags: needinfo?(alessio.placitelli)
The environment annotation stuff is indeed communicating to the crash reporter what the telemetry environment of any future crashes might be. Ensuring we have this early in startup is of some value but I know now how much... I ask :njn to nominate someone to comment on the value of having the correct TelemetryEnvironment early (profile-after-change) vs late in startup. 

We have mechanisms (`hasCrashEnvironment` denotes whether the environment on the crash ping is from the crashing session or the submitting session) to handle not having an absolutely-correct-and-current environment for that purpose, so we might be okay without that.

But if your startup gains also required returning an empty EnvironmentCache there might be another consumer someplace which would complicate things.
Flags: needinfo?(chutten) → needinfo?(n.nethercote)

Unfortunately I don't know anything about TelemetryEnvironment... looking at SearchFox, it appears to be very front-end-y. I don't have anything useful to add, sorry.

Flags: needinfo?(n.nethercote)

Bummer. Asking around I hear that marco might be or know someone who cares about whether crash reports have a useful TelemetryEnvironment at profile-after-change vs later in startup.

Flags: needinfo?(mcastelluccio)

Even though I think it might be useful to figure out if there are prefs which are set, as far as I recall I've never used it, so I'm not opposed to removing it.

Also CCing some other people who might or might not have used it.

Flags: needinfo?(mcastelluccio)

(In reply to Marco Castelluccio [:marco] from comment #6)

Even though I think it might be useful to figure out if there are prefs which are set, as far as I recall I've never used it, so I'm not opposed to removing it.

Also CCing some other people who might or might not have used it.

Of course by "removing it" I meant "not having it at "profile-after-change".

Flags: needinfo?(gfritzsche)
Priority: -- → P3

The meta keyword is there, the bug doesn't depend on other bugs and there is no activity for 12 months.
:chutten, maybe it's time to close this bug?

Flags: needinfo?(chutten)

Quite a bit of the "staged environment" work has happened and there was no clarity on the mailing lists when I asked if we could push annotateCrashReport later or if that would unacceptably reduce our ability to effectively respond to crashes.

If we want to re-pursue annotateCrashReport we can file a non-meta bug specifically for that.

Status: NEW → RESOLVED
Closed: 4 years ago
Flags: needinfo?(chutten)
Resolution: --- → INACTIVE
You need to log in before you can comment on or make changes to this bug.