Open Bug 1816678 Opened 2 years ago Updated 5 months ago

[meta] Determine what improvements we can make to our use and measurement of speculative connect

Categories

(Core :: Performance: General, enhancement, P2)

enhancement

Tracking

()

Performance Impact low

People

(Reporter: acreskey, Assigned: acreskey)

References

(Depends on 3 open bugs)

Details

(Keywords: meta)

Attachments

(1 file)

We discovered in bug 1813618 that most* speculative connections in necko are being effectively aborted (desktop only).
This includes speculative connections in the traditional sense - from the url bar, the dom (e.g. preconnect, mouse down), and the predictor.

*Note that every httpChannel attempts a speculative connection as part of nsHttpChannel::Connect() and these are not being aborted (although the pool is small and can easily fill up.)

We also discovered in Bug 1816545 that the number of sockets available for speculative connections in our performance test infrastructure is set to 0.
This impacts the realism of our tests by effectively disabling an important performance feature.

So this bug is to track improvements to both how we make speculative connections and how we measure it in our performance tests.

Performance Impact: --- → high

The severity field for this bug is set to S3. However, the Performance Impact field flags this bug as having a high impact on the performance.
:acreskey, could you consider increasing the severity of this performance-impacting bug? Alternatively, if you think the performance impact is lower than previously assessed, could you request a re-triage from the performance team by setting the Performance Impact flag to ??

For more information, please visit auto_nag documentation.

Flags: needinfo?(acreskey)
Severity: S3 → S2
Flags: needinfo?(acreskey)

Even if we re-enable the speculative connection pool in performance tests (Bug 1816545), many attempts to connect (those that go through nsIOService) will still be aborted because it stops speculative connections when connected via proxy (which is how our performance tests run).

See Also: → 1813071, 1543990
Keywords: meta
Summary: Determine what improvements we can make to our use and measurement of speculative connect → [meta] Determine what improvements we can make to our use and measurement of speculative connect
Depends on: 1818798
Depends on: 1822348
No longer depends on: 909865
Depends on: 1822352
No longer depends on: 1816539
No longer depends on: 1814389
No longer depends on: 1818798
No longer depends on: 1816545

The fix for Bug 1813618 has landed.

This query compares some of the metrics in the pageload_event before and after the fix.

While only in the nightly population, there looks to be some significant gains. Some examples:

Cross-origin links
   ~6% improvement in response time 
   38ms improvement in fcp at median, 59ms @ 95th percentile
Normal page loads
   30ms improvement in response time @ 50th percentile
   115ms improvement in response time @ 75th percentile

However we discovered yesterday that there was a secondary problem with speculative connections, see Bug 1543990.
A patch is already up for that one.

So, given all of the other performance work that is landing these days, I think that if we want to fully quantify the impact of the speculative connection fixes, we will likely need a regression experiment.

We may just be seeing the improvement from speculative DNS, but I have not fully verified that.

Some speculative connect scenarios were also broken a few years ago by site partitioning. Including rel=preconnect
Good discussion here: https://phabricator.services.mozilla.com/D177047
The fix looks to be a bit more complex.

A quick summary of where we are.

There are three code changes that are expected to improve performance.

  1. In Dana's fix to Bug 1813618 she removed the logic preventing a large class of speculative connections from running.
    This includes speculative connections made from the front end (e.g. Awesome bar, bookmark selection, tab hover), as well as some of those from the dom (clicking on links) as well as those from the Necko predictor.

This code change has been in nightly for a little over a month.
Using the pageload event we can compare various pageload metrics before and after the fix.

And we are seeing overwhelming signs of improvement in pageload metrics.

Metric Load type ~Improvement at median ~Improvement at 75th percentile
fcp (first contentful paint) Link 49ms (4.5%) 72ms (4.5%)
fcp (first contentful paint) Normal 45ms (4.0%) 111ms (6%)
response time Link 31ms (6.2%) 46ms (5.2%)
response time Normal 29ms (5%) 88ms (8%)
pageload time Link 54ms (3.4%) 102ms (3.5%)
pageload time Normal 25ms (1.3%) 71ms (3%)

We don't have telemetry for response times from Awesome bar navigations specifically, but we can see from profiling that the impact can be huge. (over 100ms in the following attachement)

Ideally we would measure this off/on in Beta and Release, but due to the nature of the fix (specifically code removal), that would involve re-introducing a dead codepath.

This landed in Fx 114.

  1. The next set of improvements will come from fixing rel="preconnect", Bug 1543990.
    This is widely used, it appears in the markup on almost half of sites according to Chrome usage stats.

This fix is expected to land in Fx 115.
Although dependent on experiment design and discussions, we plan to measure the impact of this fix via the Nimbus experiment mentioned in 3., below:

  1. Currently we have a global maximum of 6 sockets available for speculative connections. In Bug 1816539 we will experiment to see what performance gains, if any, can be obtained by increasing this number.
    This constant, 6, is widely believed to be too small (our global limit is 900). That we hit this limit can easily verified by local testing and is also seen in telemetry.
    Depending on the outcome of the experiment, we anticipate landing a patch that increases the number of sockets available.
Attached image before_after.png

Local profile showing an example of the impact of re-enablibng speculative connections on Awesome bar search results.

Depends on: 1847805

This is a project, not a bug; setting the severity to NA.

Severity: S2 → N/A
Depends on: 1894206

Reducing perf impact as we believe we've addressed the highest impact use cases.

Performance Impact: high → low
Depends on: 1965871
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: