Gather more detailed performance telemetry on process launch

RESOLVED FIXED in Firefox 65

Status

()

enhancement
RESOLVED FIXED
11 months ago
6 months ago

People

(Reporter: jld, Assigned: jld)

Tracking

(Blocks 2 bugs)

unspecified
mozilla65
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox65 fixed)

Details

Attachments

(3 attachments)

CONTENT_PROCESS_LAUNCH_TIME_MS measures a fairly large range, roughly corresponding to how long the main thread is blocked doing launch-related work.  This is useful data, but it includes both the process launch proper (GeckoChildProcessHost::PerformAsyncLaunchInternal) as well as ContentParent::Init{Internal,} later sending messages and setting up additional actors, and PerformAsyncLaunchInternal in turn breaks down into the “core” launching and other things we have more control over (broker policy construction on Windows, Mach IPC setup on Mac, etc.).

It would help to get data on the time taken by the LaunchApp step by itself; this will give us an idea of how far we can get with the kind of improvements collected under bug 1474681.

We might also want to get an intermediate measurement for all of PerformAsyncLaunchInternal.
Unless I'm misreading, we accumulate into CONTENT_PROCESS_LAUNCH_TIME_MS at the top of ContentParent::InitInternal… and then again in ContentParent::LaunchSubprocess after InitInternal() and before Init().  So the data we're getting is mildly garbage, in addition to being inadequate for the decisions we want to make.
Summary: Gather performance data on the LaunchApp part of process launch → Gather performance telemetry on the LaunchApp part of process launch
Bug 1446161 adds another dimension to this: some content process launches will be async, so we'll probably want to measure both total end-to-end latency as well as main thread time consumed.  Those are the same number for sync launch, but for async they're different: main thread time should be a lot lower in most cases (this is kind of the point of all this), and total time might be a little higher if the main thread is busy when the launch promise resolves.

So that suggests three histograms: leave CONTENT_PROCESS_LAUNCH_TIME_MS for sync launches only, and record async launch total and main-thread time separately.  Also of interest, if we don't have it already: fraction of content process requests satisfied by a preallocated process.

There's also the question of whether to measure the “real” process launch directly.  For async launch it's more or less the difference of total time and main thread time, give or take some delay if threads are busy.  I suppose I could add a probe and let it expire if it doesn't have informational value.  (What I really want to measure here, I think, is the CPU time consumed by launching; even if it's pushed off onto a thread pool it's not free, especially on lower-end systems with fewer cores.  But I don't know if we have code to get that info from the OS cross-platform, and wall time is likely to be a good enough proxy.)
Assignee: nobody → jld
Summary: Gather performance telemetry on the LaunchApp part of process launch → Gather more detailed performance telemetry on process launch
This patch adds some telemetry histograms:

* CONTENT_PROCESS_LAUNCH_IS_SYNC - boolean, true if the content process
was launched synchronously (blocking the main thread)

* CONTENT_PROCESS_LAUNCH_TOTAL_MS - the total time elapsed from the
start of async content process launch until the launch promise is
resolved and the ContentParent can be sent IPDL messages

* CONTENT_PROCESS_LAUNCH_MAINTHREAD_MS - the time consumed on the parent
process main thread during async content process launch; typically this
is due to ContentParent::Init.

* CHILD_PROCESS_LAUNCH_MS - for any kind of Gecko child process
(including plugins, GPU, etc.), the time taken in the common process
launch code (which is run off-main-thread)

The probes restricted to async content process launch don't have "async"
in the name because that will eventually become the only kind of content
process launch.

Depends on D8943
Posted file Data Review Request
Attachment #9017753 - Flags: review?(chutten)

Comment 5

7 months ago
Comment on attachment 9017753 [details]
Data Review Request

DATA COLLECTION REVIEW RESPONSE

    Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes. Standard Telemetry mechanisms apply.

    Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. Standard Telemetry mechanisms apply.

    If the request is for permanent data collection, is there someone who will monitor the data over time?

N/A, expires in 70

    Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 1. The requester is correct that the launch of a content process is tied to interaction, but at present the number and length of launches are an implementation detail.

    Is the data collection request for default-on or default-off?

Default on for all channels, except for CONTENT_PROCESS_LAUNCH_IS_SYNC which is pre-release only. :jld, was that intentional? If you want it for all channels as well, feel free to change it under this review before landing. 

    Does the instrumentation include the addition of any new identifiers (whether anonymous or otherwise; e.g., username, random IDs, etc. See the appendix for more details)?

No.

    Is the data collection covered by the existing Firefox privacy notice? 

Yes.

    Does there need to be a check-in in the future to determine whether to renew the data?

Yes. :jld will be responsible for renewing or removing these probes before they expire.

---
Result: datareview+
Attachment #9017753 - Flags: review?(chutten) → review+
Flags: needinfo?(jld)
(In reply to Chris H-C :chutten from comment #5)
> Default on for all channels, except for CONTENT_PROCESS_LAUNCH_IS_SYNC which
> is pre-release only. :jld, was that intentional? If you want it for all
> channels as well, feel free to change it under this review before landing. 

No, it wasn't intentional; thanks.

I've rearranged the patches so that bug 1446161 just removes the old CONTENT_PROCESS_LAUNCH_TIME_MS, and CONTENT_PROCESS_SYNC_LAUNCH_MS is added in this bug with all the rest.  I'll upload a revised version of the request.
Flags: needinfo?(jld)
Jim, mconley suggested you'd be a good person to have on the `alert_emails` list for these metrics, as a backup[1].  Any objections?

[1] https://phabricator.services.mozilla.com/D8944#inline-37650
Flags: needinfo?(jmathies)
Attachment #9017750 - Attachment description: Bug 1474991 - Add more performance telemetry for child process launching. → Bug 1474991 - Add new and improved performance telemetry for child process launching.

Comment 9

6 months ago
Comment on attachment 9026847 [details]
Data Review Request [v1.1]

r+ carried over from initial request.
Attachment #9026847 - Flags: review?(chutten) → review+

Comment 10

6 months ago
Pushed by jedavis@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7a480161fa0f
Add new and improved performance telemetry for child process launching. r=mccr8,mconley,janerik
(In reply to Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧ from comment #8)
> Jim, mconley suggested you'd be a good person to have on the `alert_emails`
> list for these metrics, as a backup[1].  Any objections?
> 
> [1] https://phabricator.services.mozilla.com/D8944#inline-37650

nope.
Flags: needinfo?(jmathies)

Comment 12

6 months ago
Pushed by jedavis@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/41195aed7eef
Add new and improved performance telemetry for child process launching. r=mccr8,mconley,janerik

Comment 13

6 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/41195aed7eef
Status: NEW → RESOLVED
Last Resolved: 6 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
You need to log in before you can comment on or make changes to this bug.