Open Bug 1982857 Opened 2 months ago Updated 13 hours ago

Talos reports 50% slower content process startup on MacOS 14.70 compared to MacOS 10.15

Categories

(Core :: DOM: Content Processes, defect)

defect

Tracking

()

People

(Reporter: whimboo, Unassigned, NeedInfo)

References

(Depends on 1 open bug, )

Details

(Keywords: perf)

(In reply to Henrik Skupin [:whimboo][⌚️UTC+2] from bug 967525 comment #107)

Haik, one thing I noticed is that on our Mac minis in CI we still have a higher content process creation time of 200ms compared to the 60ms locally on my M4.

With bug 967525 fixed we can now check on a CI machine like the existing loaner for bug 967525 and create a Gecko profile.

Also note that when we switched from macOS 10.15 to 14.7 we as well saw a 50% increase of the content process creation time.

I'll get access to the machine within the next couple of days. Ryan will take care of it.

Flags: needinfo?(rcurran)

We have macOS workers with 15.30 (M4 chipset) for CI and as well run various tests. It might be good to actually trigger some tests with the custom binary from bug 1967525. That way we can better compare the values with my MBP M4 locally.

Depends on: 1967525
No longer depends on: 967525

I pushed a try build for macOS 15.30 and 14.7 to get a comparision for mochitest and browser-chrome tests:
https://treeherder.mozilla.org/jobs?repo=try&revision=20682552afa439db5d533394a9df2be2042b39a0

Joel, are there plans to also run Talos jobs on the 15.30 machines? That would be good to get some graphs on perfherder.

Flags: needinfo?(jmaher)

yes, when we get more capacity. I think browsertime will be first unless we do get an extra 50 machines online for testing, then we can do browsertime and talos.

Flags: needinfo?(jmaher)

Just some numbers as recorded by the try build...

[task 2025-08-14T08:46:52.805+00:00] 08:46:52     INFO - GECKO(1073) | 1755161212803 XXX pid: 1079, type: 2, nsAppShell init time: 8 milliseconds
[task 2025-08-14T08:46:53.033+00:00] 08:46:53     INFO - GECKO(1073) | 1755161213032 XXX pid: 1081, type: 2, nsAppShell init time: 8 milliseconds
[task 2025-08-14T08:46:54.598+00:00] 08:46:54     INFO - GECKO(1073) | 1755161214593 XXX pid: 1085, type: 2, nsAppShell init time: 12 milliseconds
[task 2025-08-14T08:46:54.654+00:00] 08:46:54     INFO - GECKO(1073) | 1755161214651 XXX pid: 1086, type: 2, nsAppShell init time: 23 milliseconds
[task 2025-08-14T08:50:53.376+00:00] 08:50:53     INFO - GECKO(1298) | 1755161453374 XXX pid: 1309, type: 2, nsAppShell init time: 46 milliseconds
[task 2025-08-14T08:50:53.621+00:00] 08:50:53     INFO - GECKO(1298) | 1755161453618 XXX pid: 1312, type: 2, nsAppShell init time: 44 milliseconds
[task 2025-08-14T08:50:55.329+00:00] 08:50:55     INFO - GECKO(1298) | 1755161455326 XXX pid: 1317, type: 2, nsAppShell init time: 53 milliseconds
[task 2025-08-14T08:50:55.374+00:00] 08:50:55     INFO - GECKO(1298) | 1755161455367 XXX pid: 1318, type: 2, nsAppShell init time: 56 milliseconds
[task 2025-08-14T08:50:55.429+00:00] 08:50:55     INFO - GECKO(1298) | 1755161455427 XXX pid: 1319, type: 2, nsAppShell init time: 58 milliseconds
[task 2025-08-14T08:50:55.442+00:00] 08:50:55     INFO - GECKO(1298) | 1755161455439 XXX pid: 1320, type: 2, nsAppShell init time: 53 milliseconds
[task 2025-08-14T08:50:56.723+00:00] 08:50:56     INFO - GECKO(1298) | 1755161456719 XXX pid: 1323, type: 2, nsAppShell init time: 47 milliseconds
[task 2025-08-14T08:51:00.910+00:00] 08:51:00     INFO - GECKO(1298) | 1755161460908 XXX pid: 1337, type: 2, nsAppShell init time: 40 milliseconds
[task 2025-08-14T08:51:01.861+00:00] 08:51:01     INFO - GECKO(1298) | 1755161461858 XXX pid: 1338, type: 2, nsAppShell init time: 48 milliseconds

Now I wonder why Talos tests show such a high value for the content process creation. This doesn't match with what I can see above for other tests. Maybe there is a bug with the calculation?

I pushed a Wd job for the same shippable build and surprisingly it showed timings of around 50ms only:
https://treeherder.mozilla.org/logviewer?job_id=522388310&repo=try&lineNumber=977

[task 2025-08-14T14:57:59.438+00:00] 14:57:59     INFO - PID 1320 | 1755183479435 XXX pid: 1336, type: 2, nsAppShell init time: 43 milliseconds
[task 2025-08-14T14:57:59.638+00:00] 14:57:59     INFO - PID 1320 | 1755183479636 XXX pid: 1339, type: 2, nsAppShell init time: 39 milliseconds
[task 2025-08-14T14:58:01.647+00:00] 14:58:01     INFO - PID 1320 | 1755183481645 XXX pid: 1357, type: 2, nsAppShell init time: 38 milliseconds
[task 2025-08-14T14:58:01.650+00:00] 14:58:01     INFO - PID 1320 | 1755183481647 XXX pid: 1358, type: 2, nsAppShell init time: 37 milliseconds
[task 2025-08-14T14:58:01.849+00:00] 14:58:01     INFO - PID 1320 | 1755183481832 XXX pid: 1359, type: 2, nsAppShell init time: 38 milliseconds
[task 2025-08-14T14:58:03.776+00:00] 14:58:03     INFO - PID 1320 | 1755183483774 XXX pid: 1363, type: 2, nsAppShell init time: 41 milliseconds
[task 2025-08-14T14:58:04.293+00:00] 14:58:04     INFO - PID 1320 | 1755183484291 XXX pid: 1366, type: 2, nsAppShell init time: 38 milliseconds
[task 2025-08-14T14:58:04.333+00:00] 14:58:04     INFO - PID 1320 | 1755183484331 XXX pid: 1367, type: 2, nsAppShell init time: 38 milliseconds
[task 2025-08-14T14:58:04.863+00:00] 14:58:04     INFO - PID 1320 | 1755183484861 XXX pid: 1369, type: 2, nsAppShell init time: 39 milliseconds
[task 2025-08-14T14:58:05.049+00:00] 14:58:05     INFO - PID 1320 | 1755183485018 XXX pid: 1370, type: 2, nsAppShell init time: 49 milliseconds
[task 2025-08-14T14:58:05.051+00:00] 14:58:05     INFO - PID 1320 | 1755183485048 XXX pid: 1371, type: 2, nsAppShell init time: 42 milliseconds
[task 2025-08-14T14:58:05.346+00:00] 14:58:05     INFO - PID 1320 | 1755183485345 XXX pid: 1373, type: 2, nsAppShell init time: 34 milliseconds

Could it be that Talos jobs set some specific preference other test jobs don't do or vise versa? We should check that.

Summary: Content process creation time on Mac minis in CI take ~200ms compared to 60ms locally on a MacBook M4 → Content process creation time on Mac minis as reported by Talos in CI take ~200ms compared to 50ms for other test jobs

When running checks locally I saw the following in the logs:

15:33:31     INFO -  PID 80148 | 1755272011815 XXX pid: 80172, type: 2, nsAppShell init time: 6 milliseconds
15:33:31     INFO -  PID 80148 | Cycle 1(13): loaded http://127.0.0.1:63727/tests/cpstartup/cpstartup.html (next: http://127.0.0.1:63727/tests/cpstartup/cpstartup.html)
[..]
15:33:38     INFO -  PID 80148 | [#0] content-process-startup  Cycles:20  Average:66.95  Median:68.00  stddev:4.15 (6.1%)  stddev-sans-first:3.13
15:33:38     INFO -  PID 80148 | Values: 55.0  70.0  68.0  73.0  69.0  69.0  68.0  69.0  57.0  68.0  68.0  67.0  66.0  67.0  68.0  68.0  70.0  67.0  64.0  68.0

It shows that the real content process startup time is about 6 milliseconds, but for Talos we measure the 10th amount of it. This made me wonder what Talos actually measures. I found the documentation of this kind of test at https://firefox-source-docs.mozilla.org/testing/perfdocs/talos.html#cpstartup. And reading what it actually measures explains why it takes longer:

Time from opening a new tab (which creates a new content process) to having that new content process be ready to load URLs.

So that value is different and includes as well all the extra time as needed to open a tab. That now means that we cannot compare the values from Talos with the real timing for content process creation.

As such lets use this bug (similar to bug 1938328 for the parent process startup regression) to handle the 50% slow down when we upgraded our macOS workers from 10.15 to 14.70.

I'll try to get a gecko profile for the Talos job, which may help us to see what's taking longer. Sadly we don't have a 10.15 machine anymore to compare with.

See Also: → 1938328
Summary: Content process creation time on Mac minis as reported by Talos in CI take ~200ms compared to 50ms for other test jobs → Talos reports 50% slower content process startup on MacOS 14.70 compared to MacOS 10.15

I triggered a Talos profiler run as a CI job.

Two profiles were created due to 2 cycles were run:

https://share.firefox.dev/3UAWPAF
https://share.firefox.dev/3HEZWo9

Haik, or Markus could one of you may take a look please? I do not see anything outstanding. Maybe some timings for the tab opening or other stuff has been changed?

Flags: needinfo?(mstange.moz)
Flags: needinfo?(haftandilian)

Clearing the needinfo. Lmk if you need anything else :whimboo

Flags: needinfo?(rcurran)

(In reply to Henrik Skupin [:whimboo][⌚️UTC+2] from comment #8)

I triggered a Talos profiler run as a CI job.

Two profiles were created due to 2 cycles were run:

https://share.firefox.dev/3UAWPAF
https://share.firefox.dev/3HEZWo9

Haik, or Markus could one of you may take a look please? I do not see anything outstanding. Maybe some timings for the tab opening or other stuff has been changed?

The things that stick out to me are the NSApplication init and sandbox_init_with_parameters calls. Bug 1983178 may help with NSApplication init runtime. For the sandbox_init_with_parameters call, I haven't investigated how we might speed that up. We might be able to by making the policy shorter by removing old rules we no longer need and comments. When I've tested locally on more modern machines, the runtime was negligible.

Flags: needinfo?(haftandilian)
Depends on: 1983178
Keywords: perf

(In reply to Ryan Curran from comment #9)

Clearing the needinfo. Lmk if you need anything else :whimboo

Hi Ryan, please put back the macOS worker assigned to me into the pool. I don't need it at the moment. Thanks.

Flags: needinfo?(rcurran)

Done. Thank you

Flags: needinfo?(rcurran)
You need to log in before you can comment on or make changes to this bug.