Open Bug 1880491 Opened 9 months ago Updated 10 days ago

Warm up DNS on android Intent invocation

Categories

(Core :: Networking, enhancement, P1)

Unspecified
Android
enhancement

Tracking

()

People

(Reporter: jesup, Assigned: kaya, NeedInfo)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-triaged][necko-priority-queue][fxdroid][group1])

Attachments

(1 file)

Really mostly a Fenix thing - when we're started by intent, and gecko isn't running, in parallel with starting gecko and sending it the URL, start a DNS lookup of the target domain. This will get the OS to send out the DNS request earlier instead of waiting for Gecko to be up and processing the initial URL.

For DoH, we could move DoH to a separate process which may either survive or could be started far faster than all of gecko (and be started in parallel to gecko). This will be considerably more work, since DoH would need to become standalone, and we'd need to modify Gecko to use an external DoH service.

I was thinking maybe we could also postpone other startup activities that may delay the initial page load - such as update checking and a bunch of things that usually happen after browser-delayed-startup-finished. I see there are some gecko-view specific ones. that we do there

I think the DNS cache preload might be an easy and impactful first step though.

Blocks: necko-perf
Severity: -- → S3
OS: Unspecified → Android
Priority: -- → P2
Whiteboard: [necko-triaged]
Assignee: nobody → rjesup
Whiteboard: [necko-triaged] → [necko-triaged][necko-priority-queue]

Perf folks are about to land our first test which should cover this scenario
https://bugzilla.mozilla.org/show_bug.cgi?id=1898221

See Also: → 1898221
Assignee: rjesup → kkaya
Priority: P2 → P1
Whiteboard: [necko-triaged][necko-priority-queue] → [necko-triaged][necko-priority-queue][fxdroid][group1]

Was discussing this with Kaya and Randell and perf folks.

In terms of verifying the impact of the early dns request, it would be easier if:
• we flush the OS dns cache (ideally via ADB since that's how the test is run. But perhaps there is a better way)
• we introduce artificial delays into the network's DNS resolution, which would be via UDP in this case (not sure if RogersInABox also delays UDP traffic)

One way to do this:
• Create a Wifi Hot Spot on your MacBook, and use Network Link Conditioner to delay all traffic
• Connect the android device to the MacBook
• Use WireShark on the wifi interface on MacBook to verify that the "warm up" DNS request is made (make it to a dummy host, just for testing)

We likely won't be able to get all of these aspects into CI, but if we can verify them locally, that should be sufficient.

(In reply to Andrew Creskey [:acreskey] from comment #4)

• we introduce artificial delays into the network's DNS resolution, which would be via UDP in this case (not sure if RogersInABox also delays UDP traffic)

For manual testing I have a DNS server implementation that we can use to delay the DNS response:
https://github.com/valenting/dev-dns-server
We can add a setTimeout to this line.
We can also use the console.log messages to see when requests are received and responses are sent.

In automation I'm not sure we can change the DNS servers, or if that would actually break the test harness.

Let me see if I can evaluate the impact of the WIP patch.

This is looking promising in that the warmupDNS coroutine from the WIP patch ends up making the DNS request about 200 milliseconds before the app link pageload request.

I'm hotspotting from my macbook and observing the traffic via Wireshark.

I've modified your patch, Kaya, so that the warmUpDns call is always to a random host name (so it's not cached), e.g. invalid_host_name700tb
I'm also changing the applink test URL on every run so that it's also not cached. Using www.chase.com and www.etsy.com here.

But you can see the warmup request starting about 200ms before the actual. (time in seconds, is the second column)

1012	31.365202	192.168.2.5	192.168.2.1	DNS	82	Standard query 0x5762 A invalid_host_name700tb
1013	31.371331	192.168.2.1	192.168.2.5	DNS	82	Standard query response 0x5762 A invalid_host_name700tb
1014	31.555525	192.168.2.5	192.168.2.1	DNS	73	Standard query 0x76f5 HTTPS www.chase.com
1015	31.558283	192.168.2.5	192.168.2.1	DNS	73	Standard query 0x740e A www.chase.com
799	29.511546	192.168.2.5	192.168.2.1	DNS	82	Standard query 0xa475 A invalid_host_nameagjya
800	29.516334	192.168.2.1	192.168.2.5	DNS	82	Standard query response 0xa475 A invalid_host_nameagjya
801	29.726112	192.168.2.5	192.168.2.1	DNS	72	Standard query 0x35e0 HTTPS www.etsy.com
802	29.734264	192.168.2.5	192.168.2.1	DNS	72	Standard query 0x869b A www.etsy.com
803	29.770818	192.168.2.1	192.168.2.5	DNS	189	Standard query response 0x869b A www.etsy.com CNAME zone1.www.etsy.com CNAME etsy.map.fastly.net A 151.101.1.224 A 151.101.65.224 A 151.101.129.224 A 151.101.193.224
805	29.788772	192.168.2.1	192.168.2.5	DNS	183	Standard query response 0x35e0 HTTPS www.etsy.com CNAME zone1.www.etsy.com CNAME etsy.map.fastly.net SOA ns1.fastly.net

Next I'll see how it impacts resolution time when warming up DNS for the actual host.

This patch looks to be working as intended -- we make the early DNS A record lookup via the warmup coroutine and it gets used.
Note that we still make the HTTS RR record later on (more on this in a bit).

With the warmup, applink to www.nfl.com:
Note the early A record lookup and the HTTPS lookup that follows about 270ms later at 52.189114:

9391	51.919297	192.168.2.5	192.168.2.1	DNS	71	Standard query 0x01ca A www.nfl.com
9501	51.949922	192.168.2.1	192.168.2.5	DNS	174	Standard query response 0x01ca A www.nfl.com CNAME global.nfl.map.fastly.net A 151.101.129.153 A 151.101.193.153 A 151.101.1.153 A 151.101.65.153
9854	52.189114	192.168.2.5	192.168.2.1	DNS	71	Standard query 0xefd3 HTTPS www.nfl.com
9861	52.219438	192.168.2.1	192.168.2.5	DNS	168	Standard query response 0xefd3 HTTPS www.nfl.com CNAME global.nfl.map.fastly.net SOA ns1.fastly.net

Without the warmup, applink to ww.canada.ca
Note how necko makes the two lookups, A and HTTPS at 26.588626 and 26.605788

8173	26.588626	192.168.2.5	192.168.2.1	DNS	73	Standard query 0xd783 HTTPS www.canada.ca
8193	26.605788	192.168.2.5	192.168.2.1	DNS	73	Standard query 0xb3dc A www.canada.ca
8297	26.651477	192.168.2.1	192.168.2.5	DNS	164	Standard query response 0xb3dc A www.canada.ca CNAME www.canada.ca.edgekey.net CNAME e4073.dscb.akamaiedge.net A 184.26.192.192
8298	26.652391	192.168.2.1	192.168.2.5	DNS	212	Standard query response 0xd783 HTTPS www.canada.ca CNAME www.canada.ca.edgekey.net CNAME e4073.dscb.akamaiedge.net SOA n0dscb.akamaiedge.net

Sometimes the improvements can be seen via the performance timing api in the dom, i.e. performance.timing, but it's not consistent.

I don't think we'll be able to measure anything in the applink startup test because ~45ms is likely within the noise. (And the test needs to run with a single iteration of a unique host every time).

But from the wireshark logs this looks to be a good improvement.
However, before we land this, there are a couple of items to resolve:

1 - We are planning on rolling out DoH on Android, bug 1801530, sooner rather than later. We don't want to leak the applink host via cleartext dns when DoH is enabled, so this warmupDNS code shouldn't run in that case
2 - In bug 1852752 we enabled HTTPS resource records (we race them against A records in native DNS). This patch will mean that we end up using the HTTPS RR less frequently for applink scenarios (probably not at all). Not sure if that's critical.

See Also: → 1852752, 1801530

(In reply to Andrew Creskey [:acreskey] from comment #8)

However, before we land this, there are a couple of items to resolve:

1 - We are planning on rolling out DoH on Android, bug 1801530, sooner rather than later. We don't want to leak the applink host via cleartext dns when DoH is enabled, so this warmupDNS code shouldn't run in that case

I wouldn't block on that. Let's file a bug blocking bug 1801530 to make sure enabling DoH disables the warmupDNS code.

2 - In bug 1852752 we enabled HTTPS resource records (we race them against A records in native DNS). This patch will mean that we end up using the HTTPS RR less frequently for applink scenarios (probably not at all). Not sure if that's critical.

I think that's probably OK, especially considering we don't do DoH yet. We could also try to warm up the HTTPS record, but I'm not sure if HTTPS records get cached in the OS resolver on Android.

Thanks for looking at this, Valentin.

Kaya, I created bug 1929005 to block bug 1801530, ensuring that we don't run this code when DoH is available on Fenix.

I believe we can proceed with this patch.

Flags: needinfo?(kkaya)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: