Open Bug 1816678 Opened 2 years ago Updated 5 months ago

[meta] Determine what improvements we can make to our use and measurement of speculative connect

Categories

(Core :: Performance: General, enhancement, P2)

Product:

Component:

Type:

enhancement

Priority:

P2

Severity:

N/A

Tracking

()

Status:

NEW

Performance Impact

low

People

(Reporter: acreskey, Assigned: acreskey)

References

(Depends on 3 open bugs)

Details

(Keywords: meta)

Attachments

(1 file)

before_after.png 2 years ago Andrew Creskey [:acreskey] 168.64 KB, image/png		Details

Andrew Creskey [:acreskey]

Assignee

Description

•

2 years ago

We discovered in bug 1813618 that most* speculative connections in necko are being effectively aborted (desktop only).
This includes speculative connections in the traditional sense - from the url bar, the dom (e.g. preconnect, mouse down), and the predictor.

*Note that every httpChannel attempts a speculative connection as part of nsHttpChannel::Connect() and these are not being aborted (although the pool is small and can easily fill up.)

We also discovered in Bug 1816545 that the number of sockets available for speculative connections in our performance test infrastructure is set to 0.
This impacts the realism of our tests by effectively disabling an important performance feature.

So this bug is to track improvements to both how we make speculative connections and how we measure it in our performance tests.

Andrew Creskey [:acreskey]

Assignee

Updated

•

2 years ago

Performance Impact: --- → high

Andrew Creskey [:acreskey]

Assignee

Updated

•

2 years ago

Depends on: 1813618, 1816545, 1775358, 1816539, 909865, 1814389, 1543990

BugBot [:suhaib / :marco/ :calixte]

Comment 1

•

2 years ago

The severity field for this bug is set to S3. However, the Performance Impact field flags this bug as having a high impact on the performance.
:acreskey, could you consider increasing the severity of this performance-impacting bug? Alternatively, if you think the performance impact is lower than previously assessed, could you request a re-triage from the performance team by setting the Performance Impact flag to ??

For more information, please visit auto_nag documentation.

Flags: needinfo?(acreskey)

Andrew Creskey [:acreskey]

Assignee

Updated

•

2 years ago

Severity: S3 → S2

Flags: needinfo?(acreskey)

Andrew Creskey [:acreskey]

Assignee

Comment 2

•

2 years ago

Even if we re-enable the speculative connection pool in performance tests (Bug 1816545), many attempts to connect (those that go through nsIOService) will still be aborted because it stops speculative connections when connected via proxy (which is how our performance tests run).

Andrew Creskey [:acreskey]

Assignee

Updated

•

2 years ago

See Also: → 1813071, 1543990

Andrew Creskey [:acreskey]

Assignee

Updated

•

2 years ago

Keywords: meta

Summary: Determine what improvements we can make to our use and measurement of speculative connect → [meta] Determine what improvements we can make to our use and measurement of speculative connect

Andrew Creskey [:acreskey]

Assignee

Updated

•

2 years ago

Depends on: 1818798

Andrew Creskey [:acreskey]

Assignee

Updated

•

2 years ago

Depends on: 1822348

Andrew Creskey [:acreskey]

Assignee

Updated

•

2 years ago

No longer depends on: 909865

Andrew Creskey [:acreskey]

Assignee

Updated

•

2 years ago

Depends on: 1822352

Andrew Creskey [:acreskey]

Assignee

Updated

•

2 years ago

No longer depends on: 1816539

Andrew Creskey [:acreskey]

Assignee

Updated

•

2 years ago

No longer depends on: 1814389

Andrew Creskey [:acreskey]

Assignee

Updated

•

2 years ago

No longer depends on: 1818798

Andrew Creskey [:acreskey]

Assignee

Updated

•

2 years ago

No longer depends on: 1816545

Andrew Creskey [:acreskey]

Assignee

Comment 3

•

2 years ago

The fix for Bug 1813618 has landed.

This query compares some of the metrics in the pageload_event before and after the fix.

While only in the nightly population, there looks to be some significant gains. Some examples:

Cross-origin links
   ~6% improvement in response time 
   38ms improvement in fcp at median, 59ms @ 95th percentile
Normal page loads
   30ms improvement in response time @ 50th percentile
   115ms improvement in response time @ 75th percentile

However we discovered yesterday that there was a secondary problem with speculative connections, see Bug 1543990.
A patch is already up for that one.

So, given all of the other performance work that is landing these days, I think that if we want to fully quantify the impact of the speculative connection fixes, we will likely need a regression experiment.

Andrew Creskey [:acreskey]

Assignee

Comment 4

•

2 years ago

We may just be seeing the improvement from speculative DNS, but I have not fully verified that.

Andrew Creskey [:acreskey]

Assignee

Comment 5

•

2 years ago

Some speculative connect scenarios were also broken a few years ago by site partitioning. Including rel=preconnect
Good discussion here: https://phabricator.services.mozilla.com/D177047
The fix looks to be a bit more complex.

Andrew Creskey [:acreskey]

Assignee

Comment 6

•

2 years ago

•

A quick summary of where we are.

There are three code changes that are expected to improve performance.

In Dana's fix to Bug 1813618 she removed the logic preventing a large class of speculative connections from running.
This includes speculative connections made from the front end (e.g. Awesome bar, bookmark selection, tab hover), as well as some of those from the dom (clicking on links) as well as those from the Necko predictor.

This code change has been in nightly for a little over a month.
Using the pageload event we can compare various pageload metrics before and after the fix.

And we are seeing overwhelming signs of improvement in pageload metrics.

Metric	Load type	~Improvement at median	~Improvement at 75th percentile
fcp (first contentful paint)	Link	49ms (4.5%)	72ms (4.5%)
fcp (first contentful paint)	Normal	45ms (4.0%)	111ms (6%)
response time	Link	31ms (6.2%)	46ms (5.2%)
response time	Normal	29ms (5%)	88ms (8%)
pageload time	Link	54ms (3.4%)	102ms (3.5%)
pageload time	Normal	25ms (1.3%)	71ms (3%)

We don't have telemetry for response times from Awesome bar navigations specifically, but we can see from profiling that the impact can be huge. (over 100ms in the following attachement)

Ideally we would measure this off/on in Beta and Release, but due to the nature of the fix (specifically code removal), that would involve re-introducing a dead codepath.

This landed in Fx 114.

The next set of improvements will come from fixing rel="preconnect", Bug 1543990.
This is widely used, it appears in the markup on almost half of sites according to Chrome usage stats.

This fix is expected to land in Fx 115.
Although dependent on experiment design and discussions, we plan to measure the impact of this fix via the Nimbus experiment mentioned in 3., below:

Currently we have a global maximum of 6 sockets available for speculative connections. In Bug 1816539 we will experiment to see what performance gains, if any, can be obtained by increasing this number.
This constant, 6, is widely believed to be too small (our global limit is 900). That we hit this limit can easily verified by local testing and is also seen in telemetry.
Depending on the outcome of the experiment, we anticipate landing a patch that increases the number of sockets available.

Andrew Creskey [:acreskey]

Assignee

Comment 7

•

2 years ago

Attached image before_after.png — Details

Local profile showing an example of the impact of re-enablibng speculative connections on Awesome bar search results.

Andrew Creskey [:acreskey]

Assignee

Comment 8

•

2 years ago

While other changes are included in Beta 114, we do see a very strong signal that we've improved the network-dependent time_to_response_start from telemetry.

https://telemetry.mozilla.org/new-pipeline/evo.html#!aggregates=Median&cumulative=0&end_date=2023-05-24&include_spill=0&keys=!h2!h3!__none__&max_channel_version=beta%252F114&measure=TIME_TO_RESPONSE_START_MS&min_channel_version=beta%252F110&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2023-05-08&trim=1&use_submission_date=0

Andrew Creskey [:acreskey]

Assignee

Updated

•

2 years ago

Depends on: 1847805

Andrew Creskey [:acreskey]

Assignee

Comment 9

•

2 years ago

This is a project, not a bug; setting the severity to NA.

Severity: S2 → N/A

Andrew Creskey [:acreskey]

Assignee

Updated

•

1 year ago

Depends on: 1894206

Andrew Creskey [:acreskey]

Assignee

Comment 10

•

1 year ago

Reducing perf impact as we believe we've addressed the highest impact use cases.

Performance Impact: high → low

Andrew Creskey [:acreskey]

Assignee

Updated

•

5 months ago

Depends on: 1965871

You need to log in before you can comment on or make changes to this bug.