Consider increasing the number of sockets available for speculative connect (currently 6)
Categories
(Core :: Networking, task, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox126 | --- | fixed |
People
(Reporter: acreskey, Assigned: acreskey)
References
(Blocks 2 open bugs)
Details
(Whiteboard: [necko-triaged])
Attachments
(4 files)
There is a global pool of sockets available to make speculative connections, which has been set with a preference to 6 here.
(this can be overridden in some cases, but is the general case).
Since every channel by default attempts a speculative connection, this pool can be quickly filled.
This bug is to investigate the possible performance impact of increasing the size of this pool.
Potentially after collecting telemetry: https://bugzilla.mozilla.org/show_bug.cgi?id=909865
Assignee | ||
Comment 1•2 years ago
|
||
This could be a good candidate for a Nimbus experiment, once Bug 1813618 is landed.
Assignee | ||
Comment 2•2 years ago
|
||
The size of our overall socket pool is 900 on desktop (providing the OS can allocate the file descriptors).
https://searchfox.org/mozilla-central/rev/3ede9deb876ad5d6389cb51b371d4a4c8d788deb/modules/libpref/init/all.js#1200
Assignee | ||
Comment 3•2 years ago
|
||
Setting performance impact to high because this affects every pageload.
Comment 4•2 years ago
|
||
The severity field for this bug is set to S3. However, the Performance Impact
field flags this bug as having a high impact on the performance.
:valentin, could you consider increasing the severity of this performance-impacting bug? Alternatively, if you think the performance impact is lower than previously assessed, could you request a re-triage from the performance team by setting the Performance Impact
flag to ?
?
For more information, please visit auto_nag documentation.
Assignee | ||
Updated•2 years ago
|
Updated•2 years ago
|
Comment 5•2 years ago
|
||
When doing the experiment we should also be looking at:
network_session_at_900fd to see if we hit the system socket limit.
Assignee | ||
Comment 6•2 years ago
|
||
This is scheduled as a Nimbus experiment for 116, release.
Assignee | ||
Comment 7•2 years ago
|
||
We ran this experiment on 12% of the release population, with three cohorts (control: 6 sockets, treatment-a: 20 sockets, treatment-b: 40 sockets)
https://experimenter.services.mozilla.com/nimbus/speculative-connect-sockets-increased-population/summary
We are seeing contradictory results from different sources:
The nimbus outcome shows no statistical improvements in pageload time:
https://protosaur.dev/partybal/speculative_connect_sockets_increased_population.html#perf_page_load_time_ms
This differs from a custom query against on the pageload event which shows an average improvement of 8ms in percentiles beyond the 50th.
Assignee | ||
Comment 8•2 years ago
|
||
Given that the user-facing performance metric Largest Contentful Paint, Bug 1722322, has just landed we're going to evaluate this change from that perspective:
The plan is to re-run the experiment in Beta Fx 121
and continue in Release Fx 121.
If we see a positive signal of improvement, we can consider rolling out the pref flip.
Assignee | ||
Comment 9•1 year ago
|
||
Using Browsertime automated tests we can measure a reproducible improvement in sub-resource connection times on some sites when using an increased speculative connection socket pool.
In addition, we see a regression in sub-resource connecting times if we reduce the size of the speculative connection socket pool to zero.
Here we look at two timings:
AsyncOpenToConnectEnd
(time from content process ASyncOpen to completion of full TLS connection)
AsyncOpenToFirstSent
(time from content process ASyncOpen to sending of first byte)
On https://www.zoom.us/ AsyncOpenToConnectEnd
is improved ~35% to 46% and AsyncOpenToFirstSent
is improved ~15% to 32%.
On https://old.reddit.com AsyncOpenToConnectEnd
is improved ~58% to 68% and AsyncOpenToFirstSent
is improved ~27% to 45%.
[attached]
However when looking at metrics aggregated from the 20 sites in the this test, the improvements blend it with the noise. [attached]
Assignee | ||
Comment 10•1 year ago
|
||
Zoom.us results
Assignee | ||
Comment 11•1 year ago
|
||
https://old.reddit.com results
Assignee | ||
Comment 12•1 year ago
|
||
Overall results
Assignee | ||
Comment 13•1 year ago
|
||
The lack of clear signal when viewing all sites in aggregate is a likely reason why we haven't had success in measuring an improvement to overall pageload metrics via experimentation on release and beta populations. (Since we cannot filter by site).
Assignee | ||
Comment 14•1 year ago
|
||
We know that we frequently hit this limit of 6 and so, based on the observed performance improvements in sub-resource connection times, we are increasing it to 20.
I did not see evidence of further improvements with 30 sockets, but we may increase it a later point.
Note that this is very conservative as on desktop we see no sign of overall socket pool exhaustion. See Bug 1819556
Android changes will follow as the overall socket pool size is currently very limited.
Assignee | ||
Comment 15•1 year ago
|
||
We know that we frequently hit this limit and so, based on observed performance improvements in sub-resource connection times, we are increasing it to 20.
Note that this is very conservative as on desktop we see no sign of overall socket pool exhaustion.
See https://bugzilla.mozilla.org/show_bug.cgi?id=1819556
Android changes will follow as we currently have a much smaller overall socket pool on that platform.
Updated•1 year ago
|
Comment 16•1 year ago
|
||
Comment 17•1 year ago
|
||
bugherder |
Description
•