Slow DNS times (500ms+) seen on Fenix applink cold startup
Categories
(Core :: Networking: DNS, defect, P3)
Tracking
()
People
(Reporter: acreskey, Unassigned)
References
(Depends on 1 open bug, Blocks 1 open bug)
Details
(Whiteboard: [necko-triaged])
We are sometimes seeing exceptionally slow DNS resolution times in Fenix applink cold startup scenarios.
In this case the app is launched directly by the android OS with a target URL.
DNS resolution of ~1900ms
https://share.firefox.dev/3cAhy2g
DNS resolution of 527ms
https://share.firefox.dev/2PKO5cN
These profiles were captured by different developers, both on Moto G5.
I believe this is a different root cause from Bug 1664492 where the user's ISP was at fault.
I managed to get similar poor times for WARM/HOT page load while navigating to these pages from in app UI (i.e. not via Intents):
- 378ms https://share.firefox.dev/3wbRYs1 (on w3c.org)
- 250ms https://share.firefox.dev/3cBss84 (on mcomella.xyz)
I also tried on more heavily trafficked sites (roku.com, amazon.com, facebook.com) but got more reasonable results (< 65ms, usually much less).
| Reporter | ||
Comment 2•4 years ago
|
||
:mcomella provided a profile with dns threads captured.
This one is only 150ms but unfortunately the time is spent in android_getaddrinfofornetcontext
https://share.firefox.dev/3czxUYU
Comment 3•4 years ago
|
||
So yeah, we rely on the OS for the actual DNS resolution, and if we are spending it in getaddrinfo it looks like there's either something beyond our control or we're going to have to be clever. E.g. is there something that Android is doing that makes the first lookups take longer? Are there just too many lookups going on? I need to look at the profile later.
Meanwhile, curious what Valentin thinks, though he's away until next week.
| Reporter | ||
Comment 4•4 years ago
|
||
I'm curious as well if the cases where we see really long delays are also spent in android's getaddrinfo.
:mcomella, as we were discussing, ni'ed you to try with DoH enabled
Is there a way to confirm DoH is enable from the profile (or even the device)? I set the network.trr.mode=3 (I didn't see a doh-rollout.enabled config option on Android) and I got a 500ms resolution again. However, there are 240ms for two calls each is spent in android's getaddrinfo so I wonder if DoH was working. Here's the profile: https://share.firefox.dev/3ucNVtZ
Comment 6•4 years ago
|
||
On Android we don't turn on DoH by default. It can only be enabled via about:config since we don't have any UI for Fenix
This specific delay is caused exclusively by the call to getaddrinfo in the libc implementation.
Using DoH might improve it in the edge cases where this is taking too long - the profiles show long waits in fread - so it's waiting for the DNS daemon to return something.
I don't know if we can do much about this issue. We can probably improve some of the corner cases with DoH, but our DoH implementation also uses regular DNS to bootstrap & all, so it might not be an all around fix. That said, I don't think we've ever looked at DoH performance on Fenix so I'd be interested to see if that fixes anything.
Comment 7•4 years ago
|
||
(In reply to Michael Comella (:mcomella) [needinfo or I won't see it] from comment #5)
Is there a way to confirm DoH is enable from the profile (or even the device)? I set the
network.trr.mode=3(I didn't see adoh-rollout.enabledconfig option on Android) and I got a 500ms resolution again. However, there are 240ms for two calls each is spent in android'sgetaddrinfoso I wonder if DoH was working. Here's the profile: https://share.firefox.dev/3ucNVtZ
You can check in about:networking if we used DoH for that resolution.
Also, the pref value should be reflected in about:support.
Comment 8•4 years ago
|
||
Put this in P3, since the delay is not caused by our code.
Updated•1 year ago
|
| Reporter | ||
Comment 9•1 year ago
|
||
ni' myself to see if this still happens and also if it's fixed by pref added in bug 1122907
| Reporter | ||
Comment 10•7 months ago
|
||
I haven't reproduced this myself, but if anyone is seeing this behaviour I would like hear about it.
| Reporter | ||
Updated•6 months ago
|
| Reporter | ||
Comment 11•15 days ago
|
||
We now have accurately telemetry on applink initial dns timings.
From this set, the timings don't look problematic:
p25: 0ms
p50: 0ms
p75: 3ms
p95: 45ms
p99: 150ms
If anyone can reproduce a problem and capture a profile, please feel free to re-open the bug.
Description
•