Fenix - create a speculative connection to the app link host
Categories
(Core :: Networking, enhancement, P2)
Tracking
()
People
(Reporter: acreskey, Assigned: acreskey)
References
(Depends on 1 open bug, Blocks 4 open bugs)
Details
(Whiteboard: [necko-triaged])
Attachments
(3 files)
In profiling the Fenix applink startup test we see that we only start loading the applink URL after ContentParent::CreateBrowser() is run and we've loaded chrome://geckoview/content/geckoview.xhtml
But this can be over a second after other network requests have started loading (location services, telemetry, etc).
https://share.firefox.dev/420hq4z
One remedy is to initiate a speculative connection to the applink host before we initialize the browser.
In the above profile (~100ms of network latency), the savings from TCP + TLS would be over 500ms.
(The code change likely won't be in Necko, so we will have to move this bug)
| Assignee | ||
Updated•5 months ago
|
Comment 1•5 months ago
|
||
This may be an S2, 500ms for applink is Huge
| Assignee | ||
Comment 2•5 months ago
|
||
Experiment: create speculative connection from Fenix layer
| Assignee | ||
Comment 3•5 months ago
|
||
Based on the idea (and code :) in Bug 1880491 (Warm up DNS on android Intent invocation) I prototyped this and will evaluate it asap.
| Assignee | ||
Comment 4•5 months ago
|
||
From profiling the WIP patch, comment #2, so far the new speculative connection also appears to be executed very late and is not yet providing benefit.
https://share.firefox.dev/4hVZOwS
Updated•5 months ago
|
| Assignee | ||
Updated•4 months ago
|
Comment 5•3 months ago
|
||
After changing the attached patch to also pass engine = engine where we create the TabIntentProcessor, the patch indeed works! Here's a profile showing the speculative connection: https://share.firefox.dev/3SurNJr
| Assignee | ||
Comment 6•3 months ago
|
||
(In reply to Markus Stange [:mstange] from comment #5)
After changing the attached patch to also pass
engine = enginewhere we create the TabIntentProcessor, the patch indeed works! Here's a profile showing the speculative connection: https://share.firefox.dev/3SurNJr
Thanks for profiling this!
I'll measure the impact on the shopify test locally.
From a perf compare I didn't see any improvements in CI, but I will look more closely now that I know it's working.
| Assignee | ||
Comment 7•3 months ago
|
||
From a local profile of the shopify startup test, I can see the speculative connection being initiated about 1 second before the actual load:
https://share.firefox.dev/4k5uME3
However in the network request for the https://theme-crave-demo.myshopify.com/ load I'm not seeing the preconnect.
So network partitioning may lead to this connection not being used in this context. Will debug.
| Assignee | ||
Comment 8•3 months ago
|
||
The symbols are missing, but this is a profile that captures the problem w/ Moz logs on the socket thread
https://share.firefox.dev/4kl1lhz
We can see that while the early speculative connection is attempted at 1.221 seconds, it is ultimately not used when we go to load the page at 2.242 seconds
Some potential issues:
2.243 LogMessages — (nsHostResolver) No usable record in cache for host [theme-crave-demo.myshopify.com] type 0.
This looked promising:
2.265 LogMessages — (nsHttp) GetH2orH3ActiveConn() request for ent 7015629790 .S........[tlsflags0x00000000]theme-crave-demo.myshopify.com:443^partitionKey=%28https%2Ctheme-crave-demo.myshopify.com%29 found an active but inexperienced connection 6f4ff09700 in native connection entry
But ultimately it looks like we want to connect over HTTP/3 and so make a new connection
2.371 LogMessages — (nsHttp) ConnectionEntry::ConnectionEntry this=6f4c545940 key=.S........[tlsflags0x00000000]theme-crave-demo.myshopify.com:443 <ROUTE-via theme-crave-demo.myshopify.com:443> {NPN-TOKEN h3}^partitionKey=%28https%2Ctheme-crave-demo.myshopify.com%29
| Assignee | ||
Comment 9•3 months ago
|
||
I took the in-progress patch from this bug as well as from bug 1965871 and bug 1903116 and built it all.
This looks like it's working as desired:
https://share.firefox.dev/45qB3Gc
See https://theme-crave-demo.myshopify.com/
1.371 seconds: initiate speculative connection to theme-crave-demo.myshopify.com
And after the lengthy content process creation when we finally go to load the URL, the connection is already active:
2.28 seconds:
LogMessages — (nsHttp) GetH2orH3ActiveConn() request for ent 7001f8e240 .S........[tlsflags0x00000000]theme-crave-demo.myshopify.com:443^partitionKey=%28https%2Ctheme-crave-demo.myshopify.com%29 found an active but inexperienced connection 7001f9bb00 in native connection entry
In this particular profile, DNS + TCP + TLS took 200ms, so that's a great savings.
Currently evaluating this locally on wifi as the shopify test is not stable in bitbar (although early results from lambdaTest look promising).
But I think would be ideal to measure the impact is the telemetry described in bug 1966167
| Assignee | ||
Comment 10•3 months ago
|
||
It looks like the speculative connection is still may not being used if we ultimately connect over HTTP/3 (comment 9 was an HTTP/2 connection).
See profile
https://share.firefox.dev/3ZzrSzy
I could be misreading this but, I see:
Creating the speculative connection 1.162S
LogMessages — (nsHttp) Creating DnsAndConnectSocket [this=70356fca80 trans=7151284300 ent=theme-crave-demo.myshopify.com key=.S........[tlsflags0x00000000]theme-crave-demo.myshopify.com:443^partitionKey=%28https%2Ctheme-crave-demo.myshopify.com%29]
2.047 - finding it at document load time
LogMessages — (nsHttp) GetH2orH3ActiveConn() request for ent 700b08bb00 .S........[tlsflags0x00000000]theme-crave-demo.myshopify.com:443^partitionKey=%28https%2Ctheme-crave-demo.myshopify.com%29 found an active but inexperienced connection 6f6aefed00 in native connection entry
But then shortly after we are actually looking for a connection with {NPN-TOKEN h3}
LogMessages — (nsHttp) GetH2orH3ActiveConn() request for ent 6f6ad6a560 .S........[tlsflags0x00000000]theme-crave-demo.myshopify.com:443 <ROUTE-via theme-crave-demo.myshopify.com:443> {NPN-TOKEN h3}^partitionKey=%28https%2Ctheme-crave-demo.myshopify.com%29 did not find an active connection
And in the profile network track we don't see the speculative connection for the document load.
I'll followup in bug 1965871, but we may not be supporting the scenario when HTTP/3 is negotiated during the TLS handshake.
Comment 11•3 months ago
|
||
From the profile, I can confirm that the speculative connection is still not used.
The reason is that we skip looking up alt-svc cache because the data storage is not ready.
2025-05-27 17:58:18.309261718 UTC - [Parent Process 22952: GeckoMain]: D/nsHttp AltSvcCache::LookupMapping 700b0d0680 https:theme-crave-demo.myshopify.com:443:.:^partitionKey=%28https%2Ctheme-crave-demo.myshopify.com%29:3
2025-05-27 17:58:18.309355224 UTC - [Parent Process 22952: GeckoMain]: D/nsHttp AltSvcCache::LookupMapping 700b0d0680 skip when storage is not ready
So, the hash key for the speculative connection is:
2025-05-27 17:58:18.309533447 UTC - [Parent Process 22952: GeckoMain]: D/nsHttp MaybeSpeculativeConnectWithHTTPSRR for ci=.S........[tlsflags0x00000000]theme-crave-demo.myshopify.com:443^partitionKey=%28https%2Ctheme-crave-demo.myshopify.com%29
Then, the request used to load the document uses HTTP/3:
2025-05-27 17:58:19.186610107 UTC - [Parent Process 22952: GeckoMain]: D/nsHttp MaybeSpeculativeConnectWithHTTPSRR for ci=.S........[tlsflags0x00000000]theme-crave-demo.myshopify.com:443 <ROUTE-via theme-crave-demo.myshopify.com:443> {NPN-TOKEN h3}^partitionKey=%28https%2Ctheme-crave-demo.myshopify.com%29
2025-05-27 17:58:19.221854980 UTC - [Parent Process 22952: GeckoMain]: D/nsHttp nsHttpChannel 6f6ae0c800 Alt Service Mapping Found https://theme-crave-demo.myshopify.com:443 [https:theme-crave-demo.myshopify.com:443:.:^partitionKey=%28https%2Ctheme-crave-demo.myshopify.com%29:3]
2025-05-27 17:58:19.221916015 UTC - [Parent Process 22952: GeckoMain]: D/nsHttp nsHttpChannel 6f6ae0c800 Using connection info from altsvc mapping
I filed bug 1966017 and hope to fix this issue completely.
| Assignee | ||
Comment 12•3 months ago
|
||
Thanks for debugging this Kershaw. Can I add bug 1966017 as a blocker to this bug, or is there another solution for HTTP/2 -> HTTP/3 speculative connections?
| Assignee | ||
Comment 13•3 months ago
•
|
||
By disabling HTTP/3 via pref and increasing the number of sockets available we can poke at the impact of the applink speculative connection.
Running the cvne-shopify.sh test on a high latency network, 200ms rtt (to highlight the improvement), I am measuring an improvement of 400ms (median startup from ~2900ms to ~2500ms).
This seems reasonable -- two round trips saved.
Notes:
• Using 100 iterations of the test instead of 5
--- Statistics for spec_connect_h2_200ms_rtt.txt ---
Count: 100
Mean: 2600.26
Median: 2521.62
Min: 1977.47
Max: 4999.00
Std Dev: 333.70
25th Percentile: 2456.52
75th Percentile: 2651.04
--- Statistics for baseline_h2_200ms_rtt.txt ---
Count: 100
Mean: 2926.86
Median: 2932.95
Min: 1426.16
Max: 4907.54
Std Dev: 370.03
25th Percentile: 2803.23
75th Percentile: 3091.71
Comment 14•3 months ago
|
||
(In reply to Andrew Creskey [:acreskey] from comment #12)
Thanks for debugging this Kershaw. Can I add bug 1966017 as a blocker to this bug, or is there another solution for HTTP/2 -> HTTP/3 speculative connections?
Sure. :)
| Assignee | ||
Comment 15•3 months ago
|
||
Thanks, and for reference here is a profile on the happy path, http/2, early speculative connection which is used by the url load that follows about one second later.
https://share.firefox.dev/4kn5TnX
Early speculative connection:
1.216s LogMessages — (nsHttp) nsHttpConnectionMgr::OnMsgSpeculativeConnect [ci=.S........[tlsflags0x00000000]theme-crave-demo.myshopify.com:443^partitionKey=%28https%2Ctheme-crave-demo.myshopify.com%29, mFetchHTTPSRR=1]
... content/geckoview.xhtml is loaded ...
And later when the applink load arrives,
nsIOService::NewChannelFromURIWithProxyFlagsInternal() -> nsHttpChannel::Init(), main thread of parent
2.209s LogMessages — (nsHttp) host=theme-crave-demo.myshopify.com port=-1
2.211 Load 8: https://theme-crave-demo.myshopify.com/
On the socket thread, we find the active connection
2.211s LogMessages — (nsHttp) GetH2orH3ActiveConn() request for ent 70239ac920 .S........[tlsflags0x00000000]theme-crave-demo.myshopify.com:443^partitionKey=%28https%2Ctheme-crave-demo.myshopify.com%29 found an active but inexperienced connection 6f58751e00 in native connection entry
And then we appear to spend 75ms with URL classifier (very slow profile because of profilermarkers:5,profilerstacks) and, as far as I can tell, waiting for the Cache check
Main thread
2.286s LogMessages — (nsHttp) nsHttpChannel::OnCacheEntryAvailable [this=70243d9800 entry=6f53b20860 new=0 status=0] for https://theme-crave-demo.myshopify.com/
And we ultimately make the request right after:
Main thread
2.287s LogMessages — (nsHttp) http request [
2.287s LogMessages — (nsHttp) GET / HTTP/1.1
2.288s LogMessages — (nsHttp) Host: theme-crave-demo.myshopify.com
| Assignee | ||
Comment 16•2 months ago
|
||
I can measure improvements resulting from a speculative connection on applink on networks down to about 50ms of additional rtt.
But at that point the change blends into the noise even though I thought I would still see the improvement in the mean and medians.
But I think I know what's happening.
See this marker only profile (to reduce overhead), A54
https://share.firefox.dev/4mUC3bU
Early speculative connection
1.595s SpeculativeConnect - https://theme-crave-demo.myshopify.com
A bit over a second later the actual load is started (kicks off a second speculative connection)
2.823s LogMessages — (nsHttp) nsHttpConnectionMgr::SpeculativeConnect [ci=.S........[tlsflags0x00000000]theme-crave-demo.myshopify.com:443^partitionKey=%28https%2Ctheme-crave-demo.myshopify.com%29]
Load 9: https://theme-crave-demo.myshopify.com/
After about 43ms the channel classifier completes:
2.866s Runnable - AsyncUrlChannelClassifier::CheckChannel - return - priority: Normal (4) task: 57d80365928b78da0
2.866s (nsHttp) nsHttpChannel::ResolveProxy [this=7028465600]
And the cache lookup is initiated:
2.868s LogMessages — (nsHttp) nsHttpChannel::OpenCacheEntry [this=7028465600]
2.868s LogMessages — (cache2) CacheStorageService::AddStorageEntry [entryKey=:https://theme-crave-demo.myshopify.com/, contextKey=O^partitionKey=%28https%2Ctheme-crave-demo.myshopify.com%29,]
2.868s LogMessages — (cache2) new storage entries table for context 'O^partitionKey=%28https%2Ctheme-crave-demo.myshopify.com%29,'
It completes about 18ms later
2.886s LogMessages — (nsHttp) nsHttpChannel::OnCacheEntryAvailable [this=7028465600 entry=7028097740 new=1 status=0] for https://theme-crave-demo.myshopify.com/
And then we make the actual request
2.887s
LogMessages — (nsHttp) http request [
LogMessages — (nsHttp) GET / HTTP/1.1
LogMessages — (nsHttp) Host: theme-crave-demo.myshopify.com
So we receive the url load message, initiate another speculative connection (was there previously) and then hit a 67ms bottleneck waiting for classification + cache check.
Without the Fenix layer speculative connection, we're already benefiting from the onLoad speculative connection during the channel classification and cache check.
(Except when the network is slower, which is why the impact is visible with induced rtt).
So this patch will be helpful in the field especially on slower networks, but the bottlenecks in URL classification and cache retrieval look to be good areas to optimize.
| Assignee | ||
Comment 17•2 months ago
|
||
I had made a mistake in my previous profile as it included stack sampling.
This one shouldn't have any stack sampling: https://share.firefox.dev/3ZtIxVk
Early Speculative connection:
0.454s LogMessages — (nsHttp) uri=https://theme-crave-demo.myshopify.com/
Actual load once gecko session is ready
1.314s Load 9: https://theme-crave-demo.myshopify.com/
Adding a cache entry with no context key, which I don't understand
1.365s LogMessages — (cache2) CacheStorageService::AddStorageEntry [entryKey=:https://theme-crave-demo.myshopify.com/, contextKey=]
Classification complete, 69ms
1.383s Runnable - AsyncUrlChannelClassifier::CheckChannel - return - priority: Normal (4) task: 858c8d80330d16db0
Adding a cache entry with context key
1.391s LogMessages — (cache2) CacheStorageService::AddStorageEntry [entryKey=:https://theme-crave-demo.myshopify.com/, contextKey=O^partitionKey=%28https%2Ctheme-crave-demo.myshopify.com%29,]
Cache entry available, make the actual request:
1.414s LogMessages — (nsHttp) nsHttpChannel::OnCacheEntryAvailable [this=6fb230fe00 entry=6f54b102e0 new=1 status=0] for https://theme-crave-demo.myshopify.com/
1.416s LogMessages — (nsHttp) GET / HTTP/1.1
In this case about 102ms to get the request out, even though the speculative connection was established.
Jeff pointed out yesterday that the Url classifier is initializing at this point, visible here on the new profile (43ms), https://share.firefox.dev/44dApeg
So I'm now evaluating this with Markus' patch to initialize earlier and Dimi's pref to defer CRC32 checks
| Assignee | ||
Comment 18•2 months ago
|
||
(In reply to Andrew Creskey [:acreskey] from comment #17)
So I'm now evaluating this with Markus' patch to initialize earlier and Dimi's pref to defer CRC32 checks
https://share.firefox.dev/4kzGmI5
Can see the URL classifier initialize before the load
Classification took 16ms in this run
And the request was out the door in 33ms
Will see how this behaves in longer running test.
| Assignee | ||
Comment 19•2 months ago
|
||
I ran this test overnight, 200 cold navigations to the shopify site, i.e. cvne-shopify.sh with device clearing before each iteration.
Fastest network available to me (home wifi), no latency added.
Combining the URL classifier patches with the Fenix speculative connection is looking very promising.
~114 ms improvement in the median and ~160ms in the mean.
And the noise looks to be reduced as well, with standard deviation cut to between 1/2 and 1/3. (Visible on the box plot).
I'll run these again to confirm reproducibility and to measure each URL Classifier patch separately.
Randell also a wip-patch to do URL Classification and CacheLookup in parallel, and I think that would shave another 16+ms off these numbers.
Variants:
baseline_h2 - latest Fenix
classifier_patches_h2 - Markus' patch to initialize classifier earlier and Dimi's pref to defer CRC32 checks, bug 1956920
spec_connect_h2 - Fenix speculative connect on receipt of applink intent
spec_connect_classifier_patches_h2 - Fenix speculative connect and both classifier changes
Results:
By median startup time, fastest to slowest
---------------------------------------------
1. spec_connect_classifier_patches_h2.txt 2416.87ms
2. spec_connect_h2.txt 2443.15ms
3. classifier_patches_h2.txt 2462.45ms
4. baseline_h2.txt 2530.77ms
Statistics:
--------------------
File: baseline_h2.txt
Count: 200
Mean: 2582.21ms
Median: 2530.77ms
Min: 2195.46ms
Max: 5220.56ms
Std Dev: 328.18ms
25th Percentile: 2419.68ms
75th Percentile: 2638.19ms
File: classifier_patches_h2.txt
Count: 200
Mean: 2459.04ms
Median: 2462.45ms
Min: 2113.61ms
Max: 2790.13ms
Std Dev: 111.63ms
25th Percentile: 2386.07ms
75th Percentile: 2529.25ms
File: spec_connect_classifier_patches_h2.txt
Count: 200
Mean: 2420.46ms
Median: 2416.87ms
Min: 1260.52ms
Max: 2745.16ms
Std Dev: 138.00ms
25th Percentile: 2358.37ms
75th Percentile: 2495.55ms
File: spec_connect_h2.txt
Count: 200
Mean: 2440.25ms
Median: 2443.15ms
Min: 2002.24ms
Max: 3295.13ms
Std Dev: 143.74ms
25th Percentile: 2362.85ms
75th Percentile: 2510.15ms
Notes: HTTP/3 disabled while we work on speculative connections for HTTP/3. (hence the _h2)
| Assignee | ||
Comment 20•2 months ago
•
|
||
Boxplot from the spec connect/url classifier test. (medians)
| Assignee | ||
Comment 21•2 months ago
|
||
Marking as blocked on landing telemetry for this scenario, bug 1966167, so we can measure the impact for Fenix users.
Updated•27 days ago
|
Description
•