Closed Bug 691710 Opened 13 years ago Closed 13 years ago

Improve DNS Caching on Android to make up for lack of caching in Android's libc

Categories

(Core :: Networking, enhancement)

All
Android
enhancement
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: gal, Assigned: sworkman)

Details

(Keywords: mobile)

Android's bionic libc doesn't cache DNS. Caching is done in the dalvik VM (poorly, and ignoring TTL). We should figure out what to do here. Probably adding a simple cache to our network stack seems to be a good first step.
OS: Mac OS X → Android
Hardware: x86 → All
Summary: Android libc doesn't cache DNS → Android libc doesn't cache DNS lookups
This was discovered by medwards. Assigning to bsmith, he will find an owner for it. This is probably a very significant performance problem for page loads on Android right now. High priority to fix.
Assignee: nobody → bsmith
gecko does have its own cache (again poor with no sense of real ttl, due to DNS API issues - but its there).
Sounds like we are good here then. Thanks Patrick.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → INVALID
Do we have statistics on DNS lookup patterns on Android?  Tracing lookups (in the course of verifying candidate patches for https://bugzilla.mozilla.org/show_bug.cgi?id=687367) didn't look to me like there was any (effective) caching happening.

The getaddrinfo() API is generally not well suited to having a cache implemented on top of it; that's usually beneath the level of the resolver library, either in a separate caching resolver process or off-host (in the "stub-resolver" scenario, which appears to be all that Gingerbread and earlier Android revisions support).

If we are replacing Bionic's getaddrinfo() anyway due to thread safety problems, we should perhaps at least consider a stub resolver implementation that routes concurrent async hits to the same name through a shared context cache; see http://www.unbound.net/pipermail/unbound-users/2010-November/001523.html .  The dnssec-tools libval also appears to be a reasonable option in this space, whether or not we care about DNSSEC.
(In reply to Michael K. Edwards from comment #4)

> 
> The getaddrinfo() API is generally not well suited to having a cache
> implemented on top of it;

we do it because host cache sizes vary, there is no way to query them for their size, and the dns prefetching can tend to overwhelm small OS caches. However we do store for fairly short ttls because of the getaddrinfo() api.

telemetry about dns hit rates is an open bug. Its certainly possible we could do much better there.

> that routes concurrent async hits to the same name through a shared context
> cache; see

the gecko cache already coalesces multiple async requests for the same hostname into the same getaddrinfo request (i.e there is only ever 1 request for the same name outstanding)

I believe there are bugs open on DNSSec and getting access to real TTLs. Doing that portably for all the impt platforms has been an issue in the past but its desirable for all of them but is no doubt solvable. I would love to see patches there as well as for a portable asynchronous API to get rid of the thread pool.
Steve and Josh, this is the bug I mentioned today. Determining the impact of this issue should be a high priority.
Assignee: bsmith → nobody
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
Summary: Android libc doesn't cache DNS lookups → Improve DNS Caching on Android to make up for lack of caching in Android's libc
Summer intern David Keeler already ported libunbound from OpenSSL to NSS and got it to build as part of mozilla-central as part of our DNSSEC experiment. If we were to integrate some DNS resolver into Fennec, David would be a good person to talk to to see if libunbound would be a good fit.
Severity: normal → major
Keywords: mobile
Priority: -- → P1
(In reply to Brian Smith (:bsmith) from comment #6)
> Steve and Josh, this is the bug I mentioned today. Determining the impact of
> this issue should be a high priority.

can you please document in the bug the issue. Is it simply a high miss rate, or something else? Thanks.
(In reply to Patrick McManus from comment #5)
> I believe there are bugs open on DNSSec and getting access to real TTLs.
> Doing that portably for all the impt platforms has been an issue in the past
> but its desirable for all of them but is no doubt solvable. I would love to
> see patches there as well as for a portable asynchronous API to get rid of
> the thread pool.

bug 545866 forms the base for this
My understanding of what to m.k.edwards just told me is that every request seemed to be causing a DNS query, which obviously is very surprising. I haven't verified it. m.k.edwards, please correct me if I am wrong.
That was my impression, based on the volume of traces I got out of the __android_log() call introduced during debugging of Bug 687367.  I did not verify in detail, but I certainly saw far more DNS lookups than I expected to when, say, navigating from page to page on kickstarter.com.
(In reply to Michael K. Edwards from comment #11)
> That was my impression, based on the volume of traces I got out of the
> __android_log() call introduced during debugging of Bug 687367.  I did not
> verify in detail, but I certainly saw far more DNS lookups than I expected
> to when, say, navigating from page to page on kickstarter.com.

is there any chance they were prefetch lookups being done ahead of time based on parsing the html?

(maybe this shouldn't be p1 major until it is more concretely defined.)
(In reply to Brian Smith (:bsmith) from comment #7)
> if libunbound would be a good fit.

The part of libunbound I added (just ldns) doesn't do caching, but I believe libunbound as a whole does. I don't think it would be too difficult to add libunbound and use it as a caching resolver, but it would significantly increase the final binary size (by ~1 MB, I think?)
Assignee: nobody → sjhworkman
Took a quick look at this one.  Caching on Android seems to be working fine; nonetheless, I did notice that on kickstarter.com (mentioned in comment #11), that the cache is bypassed (via flags) for a lot of the DNS requests.  The cache is bypassed in the code using nsIRequest::LOAD_BYPASS_CACHE, which is eventually mapped to nsHostResolver::RES_BYPASS_CACHE.  So, I think this is probably what you were observing, Michael.

Not to suggest that libunbound mightn't be a great way to go forward, but I think the priority of this one can be dropped to enhancement for now.
Severity: major → enhancement
Priority: P1 → --
it sounds like this is invalid as reported. A separate bug for asynchronous dns enhancement or whatever would be welcome if one doesn't exist.
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.