Closed Bug 1668867 Opened 2 years ago Closed 3 months ago

Have Services.profiler.StartProfiler() return a promise that resolves when the child processes are initialized

Categories

(Core :: Gecko Profiler, enhancement, P2)

All
Unspecified
enhancement

Tracking

()

RESOLVED FIXED
99 Branch
Tracking Status
firefox83 --- wontfix
firefox99 --- fixed

People

(Reporter: acreskey, Assigned: gerald)

References

(Blocks 2 open bugs, Regressed 1 open bug)

Details

Attachments

(7 files)

48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review

We frequently capture gecko profiles through the browsertime test framework, starting them through the Services.profiler.StartProfiler api.

Specifically, like so.

However it appears that the content process profiling is not yet initialized because we see cases where the first few seconds of sampling the navigation are missing.
e.g.
https://share.firefox.dev/3nkhWWb

For comparison, if we delay the navigation for a couple of seconds after we start the profiler, we see all the samples on the content process.
e.g.
https://share.firefox.dev/2HUD457

Note that in both cases the content process itself had been created prior to starting the profiler.

So the ask for JS API to allow us to wait until the profiler is initialized and collecting samples on the content process. (Or API to query if the profiler is initialized).

Do you have an example mach command to run the browsertime recording you shared?

Flags: needinfo?(acreskey)

I think the cause for the delay is the following: When the profiler is started for the first time, each process needs to initialize the stackwalker; on arm64 that's LuL which takes a long time to initialize (bug 1635810). However, we don't parallelize the initialization between the processes. Instead, we first initialize in the parent process, and then we tell the content processes to start profiling, which will then initialize.
The parent process initialization happens synchronously during the StartProfiler call, I think (but I'm not 100% sure). I think we could improve the situation a little bit by telling the other processes to start profiling before we initialize the profiler in the parent process - that way, the LuL initialization will happen in parallel.

(In reply to Andrew Creskey [:acreskey] [he/him] from comment #0)

So the ask for JS API to allow us to wait until the profiler is initialized and collecting samples on the content process. (Or API to query if the profiler is initialized).

We could also make StartProfiler return a promise which resolves when all processes are done initializing.

Greg, I use browsertime directly from a clone of their repo.
https://github.com/sitespeedio/browsertime

However you can run it from mach with these steps:
./mach browsertime --setup

and then this should navigate to the given url, capturing a profile:
./mach browsertime https://www.mozilla.org --gecko-profile

(In reply to Markus Stange [:mstange] from comment #3)

(In reply to Andrew Creskey [:acreskey] [he/him] from comment #0)

So the ask for JS API to allow us to wait until the profiler is initialized and collecting samples on the content process. (Or API to query if the profiler is initialized).

We could also make StartProfiler return a promise which resolves when all processes are done initializing.

Oh, yes, as a user I would prefer that.

Flags: needinfo?(acreskey)

So ultimately, the ordering of operations is:

  • profiler_start
    • locked_profiler_start
      • ActivePS
      • SamplerThread
      • LUL
    • NotifyProfilerStarted
      • ProfilerParent::ProfilerStarted

I agree that StartProfiler returning a promise would be a good outcome here. I think it would probably be worth initializing LuL in a more parallel fashion, but the original problem would still exist.

Summary: Provide api to query/wait until content process profiling is initialized → Have Services.profiler.StartProfiler() return a promise that resolves when the child processes are initialized
Priority: -- → P2

I've had a recent need for this (not knowing yet about this bug), and I can see how our Profiler tests could benefit, and others, so I'll have a serious look into it soon.

If possible as well, I'm thinking of adding promises to the other functions that work across processes (stop, pause, resume).

Assignee: nobody → gsquelart
Blocks: 1754688

This profiler-controlling functions are used in very few places, so it's good to have them in a separate header to reduce dependencies.

On top of making GeckoProfiler.h lighter, this is actually needed for this bug, because a later patch adds MozPromise to profiler_start and others, which would have created a header loop (GeckoProfiler -> MozPromise -> Monitor -> CondVar -> GeckoProfiler) that makes the build fail.

The returned promise is resolved when the child has completed its operation.

Depends on D139331

In async profiler tests, it's best to await profiler-controlling functions, to ensure that they took full effect in all processes.
It's especially important with StopProfiler, so that the profiler doesn't run anywhere when the following test starts.

Depends on D139335

This can help some tests get the data they expect from child processes.

Depends on D139336

Pushed by gsquelart@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/92fba281100a
Move parts of GeckoProfiler.h into ProfilerControl.h - r=canaltinova
https://hg.mozilla.org/integration/autoland/rev/3ba0ce258e64
PProfiler control functions return a completion promise - r=canaltinova
https://hg.mozilla.org/integration/autoland/rev/7580f29c6e52
ProfilerParent::Profiler{Start,Stop,etc.}ed await the corresponding PProfiler function and then resolve their GenericPromise - r=canaltinova
https://hg.mozilla.org/integration/autoland/rev/2008c5a7deb5
profiler_{start,stop,etc.} return the ProfilerParent::Profiler...ed GenericPromise - r=canaltinova
https://hg.mozilla.org/integration/autoland/rev/b1ee00c3338d
nsProfiler::{Start,Stop,etc.}Profiler await profiler_... and then resolve their JS promise - r=canaltinova
https://hg.mozilla.org/integration/autoland/rev/ffd92147d48b
Await Services.profiler.{Start,Stop,Pause,Resume}Profiler in most async profiler tests - r=canaltinova
https://hg.mozilla.org/integration/autoland/rev/3a326492bafb
nsProfiler::WaitOnePeriodicSampling also waits for children to run their sampling loop - r=canaltinova
Regressions: 1757528
Regressions: 1757610
Blocks: 1759777
You need to log in before you can comment on or make changes to this bug.