Closed Bug 1816539 Opened 3 years ago Closed 1 year ago

Consider increasing the number of sockets available for speculative connect (currently 6)

Categories

(Core :: Networking, task, P2)

task

Tracking

()

RESOLVED FIXED
126 Branch
Performance Impact high
Tracking Status
firefox126 --- fixed

People

(Reporter: acreskey, Assigned: acreskey)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [necko-triaged])

Attachments

(4 files)

There is a global pool of sockets available to make speculative connections, which has been set with a preference to 6 here.
(this can be overridden in some cases, but is the general case).

Since every channel by default attempts a speculative connection, this pool can be quickly filled.

This bug is to investigate the possible performance impact of increasing the size of this pool.

Potentially after collecting telemetry: https://bugzilla.mozilla.org/show_bug.cgi?id=909865

Blocks: 1816678

This could be a good candidate for a Nimbus experiment, once Bug 1813618 is landed.

The size of our overall socket pool is 900 on desktop (providing the OS can allocate the file descriptors).
https://searchfox.org/mozilla-central/rev/3ede9deb876ad5d6389cb51b371d4a4c8d788deb/modules/libpref/init/all.js#1200

Setting performance impact to high because this affects every pageload.

Performance Impact: --- → high

The severity field for this bug is set to S3. However, the Performance Impact field flags this bug as having a high impact on the performance.
:valentin, could you consider increasing the severity of this performance-impacting bug? Alternatively, if you think the performance impact is lower than previously assessed, could you request a re-triage from the performance team by setting the Performance Impact flag to ??

For more information, please visit auto_nag documentation.

Flags: needinfo?(valentin.gosu)
Blocks: 1822348
No longer blocks: 1816678
Whiteboard: [necko-triaged] → [necko-triaged][necko-priority-next]

When doing the experiment we should also be looking at:
network_session_at_900fd to see if we hit the system socket limit.

Flags: needinfo?(valentin.gosu)

This is scheduled as a Nimbus experiment for 116, release.

We ran this experiment on 12% of the release population, with three cohorts (control: 6 sockets, treatment-a: 20 sockets, treatment-b: 40 sockets)
https://experimenter.services.mozilla.com/nimbus/speculative-connect-sockets-increased-population/summary

We are seeing contradictory results from different sources:
The nimbus outcome shows no statistical improvements in pageload time:
https://protosaur.dev/partybal/speculative_connect_sockets_increased_population.html#perf_page_load_time_ms

This differs from a custom query against on the pageload event which shows an average improvement of 8ms in percentiles beyond the 50th.

Given that the user-facing performance metric Largest Contentful Paint, Bug 1722322, has just landed we're going to evaluate this change from that perspective:

The plan is to re-run the experiment in Beta Fx 121
and continue in Release Fx 121.

If we see a positive signal of improvement, we can consider rolling out the pref flip.

Whiteboard: [necko-triaged][necko-priority-next] → [necko-triaged]

Using Browsertime automated tests we can measure a reproducible improvement in sub-resource connection times on some sites when using an increased speculative connection socket pool.
In addition, we see a regression in sub-resource connecting times if we reduce the size of the speculative connection socket pool to zero.

Here we look at two timings:
AsyncOpenToConnectEnd (time from content process ASyncOpen to completion of full TLS connection)
AsyncOpenToFirstSent (time from content process ASyncOpen to sending of first byte)

On https://www.zoom.us/ AsyncOpenToConnectEnd is improved ~35% to 46% and AsyncOpenToFirstSent is improved ~15% to 32%.
On https://old.reddit.com AsyncOpenToConnectEnd is improved ~58% to 68% and AsyncOpenToFirstSent is improved ~27% to 45%.
[attached]

However when looking at metrics aggregated from the 20 sites in the this test, the improvements blend it with the noise. [attached]

Attached image spec_connect_zoom.png

Zoom.us results

Overall results

The lack of clear signal when viewing all sites in aggregate is a likely reason why we haven't had success in measuring an improvement to overall pageload metrics via experimentation on release and beta populations. (Since we cannot filter by site).

We know that we frequently hit this limit of 6 and so, based on the observed performance improvements in sub-resource connection times, we are increasing it to 20.
I did not see evidence of further improvements with 30 sockets, but we may increase it a later point.

Note that this is very conservative as on desktop we see no sign of overall socket pool exhaustion. See Bug 1819556

Android changes will follow as the overall socket pool size is currently very limited.

We know that we frequently hit this limit and so, based on observed performance improvements in sub-resource connection times, we are increasing it to 20.
Note that this is very conservative as on desktop we see no sign of overall socket pool exhaustion.
See https://bugzilla.mozilla.org/show_bug.cgi?id=1819556

Android changes will follow as we currently have a much smaller overall socket pool on that platform.

Assignee: nobody → acreskey
Status: NEW → ASSIGNED
Pushed by acreskey@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a260c2683052 Consider increasing the number of sockets available for speculative connect (currently 6) r=necko-reviewers,valentin
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 126 Branch
See Also: → 1889771
See Also: → 1903116
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: