Closed Bug 1900481 Opened 1 year ago Closed 1 year ago

DoH doubles failure time for unresolvable hosts, uBlock Origin can further delay fetch api

Categories

(Core :: Networking: DNS, defect, P2)

defect

Tracking

()

RESOLVED WONTFIX

People

(Reporter: u752148, Assigned: acreskey)

References

(Depends on 1 open bug, Blocks 2 open bugs)

Details

(Whiteboard: [necko-triaged][necko-priority-queue])

Steps to reproduce:

In a new Firefox profile:

  • Install uBlock Origin
  • Set DoH to Max Protection
  • Make sure that it is set to Cloudflare (https://mozilla.cloudflare-dns.com/dns-query)

Open https://www.dnscheck.tools/

On Edge:

  • Install uBlock Origin
  • Go to its settings and click on Privacy, search, and services
  • Enable Use secure DNS to specify how to lookup the network address for websites
  • Set it to https://mozilla.cloudflare-dns.com/dns-query

Open https://www.dnscheck.tools/

Actual results:

Check the status bar at the bottom of the page. Edge is always faster. Sometimes there's a difference as high as 200ms between Firefox and Edge.

Blocks: necko-perf, doh
Severity: -- → S3
Priority: -- → P2
Whiteboard: [necko-triaged][necko-priority-new]

Given the steps, I take it that uBlockOrigin is necessary for this scenario to reproduce?
If that's the case, I think this could be related to bug 1881986

Flags: needinfo?(kakekikoku)
See Also: → 1881986

Edge: 111ms
Firefox without uBO: 186ms
Firefox with uBO: >300ms

Flags: needinfo?(kakekikoku)

Thank you for checking. Seems rather clear there's a performance impact from that.

Depends on: 1881986
Summary: DoH resolution is faster on Edge → DoH resolution is 100ms slower with uBlockOrigin installed

I've done some testing and I think uBlock Origin option to disable pre-fetching (Enabled by default) is affecting the results, if I disable this option to enable pre-fetching, I get very similar results to those obtained with uBlock Origin disabled, but for me Edge is still faster than Firefox even with uBlock Origin Disable pre-fetching option enabled.

Edge: 115ms
Firefox without uBO: 179ms
Firefox with uBO (Pre-fetching Enabled): 188ms
Firefox with uBO (Pre-fetching Disabled): 274ms

Possibly related, but it only affects v127/128 as of now: https://bugzilla.mozilla.org/show_bug.cgi?id=1900730.

See Also: → 1900730

I will check this to see for any possible implications from Bug 1900730.

Whiteboard: [necko-triaged][necko-priority-new] → [necko-triaged][necko-priority-next]
Flags: needinfo?(smayya)

(In reply to Sunil Mayya from comment #6)

I will check this to see for any possible implications from Bug 1900730.

That looks like the right intuition!!!

Last week I collected and was looking a few GeckoProfile profiles but couldn't see any direct impact to DoH requests that could be directly due to webRequest listeners (besides the fact that the website seems to be deriving the result from triggering a number of http requests to generated domains names that looks like nxdomain-XXXXXXXXXXX.xyz, I'd guess to force a DoH resolution for each of them, and those requests would be technically ones that ublock would be able to intercept) but the difference in the result was definitely there and consistently reproducible. The prefs controller through the browser.privacy.network.networkPredictionEnabled privacy setting can actually affect more than what ublock can actually get access to through webRequest blocking listeners and so would be a reasonable explaination.

I was curious and so I gave it a tried again after explicitly disabling ublock setting that makes it disable prefetching through the browser.privacy.network.networkPredictionEnabled privacy setting and after that I was getting results that are much nearer to the "Firefox without uBO" one then when the privacy setting is disabling the prefetching.

I didn't dig further than that, given that you are already going to investigate further, but looking into the implications of the browser.privacy.network.networkPredictionEnabled privacy setting seems definitely the right direction from my perspective too.

Summary: DoH resolution is 100ms slower with uBlockOrigin installed → DoH resolution is 100ms slower with uBlock Origin installed

I did run experiments locally and here is my observation:

Configuration Firefox DNS Resolution Time (ms) Chrome DNS Resolution Time (ms)
Ublock Disabled 220, 204, 195, 200, 198 140, 132, 134, 156, 150
With Ublock and Prefetch Enabled 243, 264, 210, 205, 202, 208 148, 142, 136, 151
With Ublock and Prefetch Disabled 251, 317, 243, 250 141, 167, 134, 152, 167, 138, 137

Here are my key observations based on the comments and my experiments

  1. Firefox performance with Ublock Disabled is (almost) similar to Ublock with Prefetch Enabled.
  2. Even without Ublock origin we see a huge difference between Firefox and Chrome
  3. With Ublock with Prefetch Disabled, we see significant increase in resolution time compared to Prefetch Enabled.

Interestingly, Chrome has similar resolution times across different configurations.
The problem we need to focus now are:
P1. Without UBlock why does our DoH resolution about ~50-60 ms higher than chrome?
P2. Why does Ublock with Prefetch disabled only affects Firefox and not chrome.

valentin, andrew: Is 1 already a known issue?
Any leads on how we want to progress this one?

Flags: needinfo?(valentin.gosu)
Flags: needinfo?(smayya)
Flags: needinfo?(acreskey)
Whiteboard: [necko-triaged][necko-priority-next] → [necko-triaged][necko-priority-review]

(In reply to Sunil Mayya from comment #8)

I did run experiments locally and here is my observation:

Configuration Firefox DNS Resolution Time (ms) Chrome DNS Resolution Time (ms)
Ublock Disabled 220, 204, 195, 200, 198 140, 132, 134, 156, 150
With Ublock and Prefetch Enabled 243, 264, 210, 205, 202, 208 148, 142, 136, 151
With Ublock and Prefetch Disabled 251, 317, 243, 250 141, 167, 134, 152, 167, 138, 137

Here are my key observations based on the comments and my experiments

  1. Firefox performance with Ublock Disabled is (almost) similar to Ublock with Prefetch Enabled.
  2. Even without Ublock origin we see a huge difference between Firefox and Chrome
  3. With Ublock with Prefetch Disabled, we see significant increase in resolution time compared to Prefetch Enabled.

Interestingly, Chrome has similar resolution times across different configurations.
The problem we need to focus now are:
P1. Without UBlock why does our DoH resolution about ~50-60 ms higher than chrome?

Do both Crome and Firefox use the same DoH provider?
It might be interesting to also compare DoH on/off in both browsers.

P2. Why does Ublock with Prefetch disabled only affects Firefox and not chrome.

I believe Chrome might not implement the webextension API to disable prefetch, so uBlock doesn't affect DNS resolution times in Chrome.

Flags: needinfo?(valentin.gosu)

Yes, curious as to the behaviour with and without DoH.

But it's also not clear to me what exactly https://www.dnscheck.tools/ is reporting for dns times.
I see it timing a series of fetch's ?

Flags: needinfo?(acreskey)

Relevant code is here: https://github.com/brianshea2/addr.tools/blob/main/website/dnscheck.tools/main.js#L359

The dns resolution time is really the average time it takes for your browser to issue a request (`fetch``) to a non-existent domain and fail that request (rejected promise), hopefully by receiving NXDOMAIN via DNS resolution.

(In reply to azu6cjnyq from comment #11)

Relevant code is here: https://github.com/brianshea2/addr.tools/blob/main/website/dnscheck.tools/main.js#L359

The dns resolution time is really the average time it takes for your browser to issue a request (`fetch``) to a non-existent domain and fail that request (rejected promise), hopefully by receiving NXDOMAIN via DNS resolution.

Thanks.
It's an interesting approach. I would say that's measuring more than DNS resolution timing.
But nonetheless we don't want to see the time-to-fail be so much longer with uBlock.

Whiteboard: [necko-triaged][necko-priority-review] → [necko-triaged][necko-priority-queue]

This is an interesting test.

A couple of notes:
1. This test is outputting how long it takes the fetch() request to fail when given an invalid domain. As I mentioned in Comment 12, while this includes a dns lookup, it's measuring a lot more than dns timing and it looks like the delays are coming from elsewhere in the runtime.

For instance, I've instrumented our TRR code to output how long the DoH requests take.
Some timings (ms) from a test run:

 52.659417  
 83.922125  
 28.964500  
 32.108583  
 26.466917  
...

while the fetch() requests from the same run failed after the following durations:

885ms
501ms
226ms
177ms
474ms
297ms
463ms
...

But I am seeing a huge jump in the time-to-fetch-failure when enabling DoH, and also when enabling UBlock origin.

I think some increase in failure time is expected with Ublock since the fetch requests can be intercepted. But the impact shouldn't be so noticeable.

These are both potential problems so continuing to investigate.

2. I've simplified the test case and made a copy on a static site of my own:
https://acreskeymoz.github.io/dns/test_fetch.html

Assignee: nobody → acreskey

Removing UBlock from the equation for the moment (i.e. removing the extension):

With my default TRR provider, CIRA, I'm seeing fetch-failure times average about 184ms.
If I turn off DoH, fetch-failure times average about 56ms.

But note that the CIRA DoH response times are generally reasonable:
e.g.

TRR:OnStopRequest NS_ERROR_UNKNOWN_HOST: nxdomain-mi3vwv0a61n.com 117.865500  
TRR:OnStopRequest NS_ERROR_UNKNOWN_HOST: nxdomain-mi3vwv0a61n.com 117.893875  
TRR:OnStopRequest NS_ERROR_UNKNOWN_HOST: nxdomain-mi3vwv0a61n.com 32.028792  
TRR:OnStopRequest NS_ERROR_UNKNOWN_HOST: nxdomain-mi3vwv0a61n.com 43.809875  
TRR:OnStopRequest NS_ERROR_UNKNOWN_HOST: nxdomain-mi3vwv0a61n.com 30.083417  
TRR:OnStopRequest NS_ERROR_UNKNOWN_HOST: nxdomain-6bbz2errc5v.net 47.676500  
TRR:OnStopRequest NS_ERROR_UNKNOWN_HOST: nxdomain-6bbz2errc5v.net 84.297250  
TRR:OnStopRequest NS_ERROR_UNKNOWN_HOST: nxdomain-6bbz2errc5v.net 26.511958  
TRR:OnStopRequest NS_ERROR_UNKNOWN_HOST: nxdomain-6bbz2errc5v.net 29.407250  
TRR:OnStopRequest NS_ERROR_UNKNOWN_HOST: nxdomain-6bbz2errc5v.net 33.890542  
TRR:OnStopRequest NS_ERROR_UNKNOWN_HOST: nxdomain-tdyk9o9jwgc.org 29.332875  
TRR:OnStopRequest NS_ERROR_UNKNOWN_HOST: nxdomain-tdyk9o9jwgc.org 54.358583  
TRR:OnStopRequest NS_ERROR_UNKNOWN_HOST: nxdomain-tdyk9o9jwgc.org 49.865667  
TRR:OnStopRequest NS_ERROR_UNKNOWN_HOST: nxdomain-tdyk9o9jwgc.org 31.048667  
TRR:OnStopRequest NS_ERROR_UNKNOWN_HOST: nxdomain-tdyk9o9jwgc.org 27.774750  
TRR:OnStopRequest NS_ERROR_UNKNOWN_HOST: nxdomain-ew4za56v9zd.br 80.976792  
TRR:OnStopRequest NS_ERROR_UNKNOWN_HOST: nxdomain-ew4za56v9zd.br 104.643208  
TRR:OnStopRequest NS_ERROR_UNKNOWN_HOST: nxdomain-ew4za56v9zd.br 47.515583  
TRR:OnStopRequest NS_ERROR_UNKNOWN_HOST: nxdomain-ew4za56v9zd.br 82.911042  
TRR:OnStopRequest NS_ERROR_UNKNOWN_HOST: nxdomain-ew4za56v9zd.br 49.125542  
TRR:OnStopRequest NS_ERROR_UNKNOWN_HOST: nxdomain-p139eisdvxg.ca 27.946500  
TRR:OnStopRequest NS_ERROR_UNKNOWN_HOST: nxdomain-p139eisdvxg.ca 52.580542  
TRR:OnStopRequest NS_ERROR_UNKNOWN_HOST: nxdomain-p139eisdvxg.ca 30.436792  

Not clear to me yet why there are so many duplicate requests. On native DNS I'm generally seeing one duplicate (presumably one for a speculative connection, the other request for the main request).

Consider printing the mType and mOriginSuffix too.
If you're running on mac, you would also not be doing any native HTTPS lookups, so that might make the test case be a bit faster.

I've updated my test site so that it can also make a singular fetch to a non-existent domain, Singe fetch button.
https://acreskeymoz.github.io/dns/test_fetch.html

For one fetch() with an invalid domain, through DoH I'm seeing 5 TRR requests:

## TRR::SendHTTPRequest resolve nxdomain-w6bjq4tlg.ca type 1  originSuffix ^partitionKey=%28https%2Cacreskeymoz.github.io%29 
## TRR::SendHTTPRequest resolve nxdomain-w6bjq4tlg.ca type 65  originSuffix ^partitionKey=%28https%2Cacreskeymoz.github.io%29 
## TRR::SendHTTPRequest resolve nxdomain-w6bjq4tlg.ca type 28  originSuffix ^partitionKey=%28https%2Cacreskeymoz.github.io%29 
## TRR::SendHTTPRequest resolve nxdomain-w6bjq4tlg.ca type 1  originSuffix ^partitionKey=%28https%2Cacreskeymoz.github.io%29 
## TRR::SendHTTPRequest resolve nxdomain-w6bjq4tlg.ca type 28  originSuffix ^partitionKey=%28https%2Cacreskeymoz.github.io%29 

TRRTYPE_A (1), TRRTYPE_AAAA (28), TRRTYPE_HTTPSSVC (nsIDNSService::RESOLVE_TYPE_HTTPSSVC, 65)
https://searchfox.org/mozilla-central/rev/dca2603d55b5b39d3b8ab8e93c08b42563f5aad8/netwerk/dns/DNSPacket.h#30,33,36

But I can see that some of the duplicates are coming as a result of the failure (sending followup queries):

mozilla::net::TRR::TRR(AHostResolver*, nsHostRecord*, mozilla::net::TrrType)/Users/acreskey/dev/mc3/netwerk/dns/TRR.cpp
mozilla::net::TRR::TRR(AHostResolver*, nsHostRecord*, mozilla::net::TrrType)/Users/acreskey/dev/mc3/netwerk/dns/TRR.cpp
mozilla::net::TRRQuery::PrepareQuery(mozilla::net::TrrType, nsTArray<RefPtr<mozilla::net::TRR> >&)/Users/acreskey/dev/mc3/netwerk/dns/TRRQuery.cpp
mozilla::net::TRRQuery::CompleteLookup(nsHostRecord*, nsresult, mozilla::net::AddrInfo*, bool, nsTSubstring<char> const&, nsITRRSkipReason::value, mozilla::net::TRR*)/Users/acreskey/dev/mc3/netwerk/dns/TRRQuery.cpp
mozilla::net::TRR::FailData(nsresult)/Users/acreskey/dev/mc3/netwerk/dns/TRR.cpp
mozilla::net::TRR::OnStopRequest(nsIReques

Ah, and this might be why the fetch() requests to invalid domains take longer to fail with DoH -- on the TRR query failure we issue a secondary request instead of failing right away as we would with native DNS:
https://searchfox.org/mozilla-central/rev/18f09bdf36a62ea7079c018301f1d257f71f655b/netwerk/dns/TRRQuery.cpp#263-282

After looking at timestamps with MOZ_LOG=timestamp,nsHostResolver:5 and DoH enabled, I believe the longer DNS resolution times for unknown hosts when using DoH (compared to native DNS) are due to additional A/AAAA record requests made by DoH, as seen here.

This explains the performance differences compared to other browsers and native DNS during unresolvable host lookups.

In my tests, I haven't found evidence that uBlock Origin interferes with DoH query performance, regardless of prefetch settings.
However, since uBlock Origin is a web extension that intercepts each fetch(), it can slow them down, especially when many requests are made in quick succession as in these test cases.

In this profile, UBlock Origin is janked for a few intervals of up to about 150ms, totalling around 850ms.
https://share.firefox.dev/3YimJuu
This will certainly make the fetch() requests take longer to fail.

As an aside, on my test site I've also added options to run a series of tests which hit actual fetch endpoints (so the DNS lookup and the fetch both succeed).
These can be tested with and without uBlock (but note that some of the endpoints have longer, noisier server response times)
https://acreskeymoz.github.io/dns/test_fetch.html

In summary, this is an interesting issue, and I'm glad it was reported.

Based on my investigation, I suggest renaming the bug to: "DoH doubles failure time for unresolvable hosts, uBlock Origin can further delay fetch api".

I'm open to feedback and any contradictory findings.

From my investigation, the reason that unresolvable hosts take longer to fail is because our DoH implementation attempts an secondary A or AAAA request on failed requests.
This is per design, see https://phabricator.services.mozilla.com/D130047

For the second issue, it's known that UBlock, as a web extension, can delay fetch requests, particularly when making dozens of requests at a time.

I'm going to close as WONTFIX, but please feel free to add additional findings/observations.

Status: UNCONFIRMED → RESOLVED
Closed: 1 year ago
Resolution: --- → WONTFIX
Summary: DoH resolution is 100ms slower with uBlock Origin installed → DoH doubles failure time for unresolvable hosts, uBlock Origin can further delay fetch api
You need to log in before you can comment on or make changes to this bug.