Closed Bug 820391 Opened 7 years ago Closed 4 years ago

use DnsQuery (with existing DNS thread pool) for Windows users

Categories

(Core :: Networking: DNS, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla34

People

(Reporter: jaas, Assigned: johnsullivan.pem)

References

(Depends on 1 open bug)

Details

(Whiteboard: [dns])

Attachments

(1 file, 16 obsolete files)

19.55 KB, patch
sworkman
: review+
Details | Diff | Splinter Review
Right now we don't get TTL information for DNS queries on any platform. IIRC, we assign a default TTL of three minutes for all queries on all platforms.

Once the patch on 807678 is in place, which removes our systemic dependency on NSPR for working with DNS information, we can probably easily swap host lookup APIs.

We should try to use the Win32 DnsQuery function first. It'll work on Win2K+ and it gives TTL. This would give real TTL data for 95% of our users. Eventually we're planning to move to our own DNS query library, but in the mean time this might be an easy win.

DnsQuery isn't async, which isn't ideal, but what we use now isn't async either -- we use a thread pool. We'll just use DnsQuery from the thread pool.
Are we going to always respect TTL? How can we resist DNS rebinding attack?
(In reply to Masatoshi Kimura [:emk] from comment #1)
> Are we going to always respect TTL? How can we resist DNS rebinding attack?

There are various paths that force DNS cache updates (ctrl f5) - this (theoretical) patch doesn't change those. It just replaces an arbitrary constant ttl that we use now with the ones configured in the domain name system.
Assignee: joshmoz → nobody
Blocks: 151929
Whiteboard: [dns]
Assignee: nobody → josullivan
Attached patch Creates seperate GetAddrInfo. (obsolete) — Splinter Review
Comment on attachment 8449105 [details] [diff] [review]
Creates seperate GetAddrInfo.

Review of attachment 8449105 [details] [diff] [review]:
-----------------------------------------------------------------

This patch is merely a refactor of our existing code. It tries to split some of our logic off such that we can insert platform dependent code in a clean way. Doing this before getting into the platform dependent stuff came about after speaking with :sworkman.
Attachment #8449105 - Flags: feedback?(sworkman)
Comment on attachment 8449105 [details] [diff] [review]
Creates seperate GetAddrInfo.

Review of attachment 8449105 [details] [diff] [review]:
-----------------------------------------------------------------

Looks like a good base patch! Thanks for churning this out so quickly. Once you fix up the issues, upload it for a full review.

::: netwerk/dns/GetAddrInfo.h
@@ +1,1 @@
> +/* vim:set ts=4 sw=4 sts=4 et cin: */

Copy the modeline from the style guide, just like in GetAddrInfo.cpp.

@@ +5,5 @@
> +
> +#ifndef netwerk_dns_GetAddrInfo_h
> +#define netwerk_dns_GetAddrInfo_h
> +
> +#include "mozilla/net/DNS.h"

You might be able to forward declare 'class AddrInfo;' (inside the mozilla::net namespace block) rather than including DNS.h.

::: netwerk/dns/moz.build
@@ +20,5 @@
>      'DNS.h',
>      'DNSListenerProxy.h',
>      'DNSRequestChild.h',
>      'DNSRequestParent.h',
> +    'GetAddrInfo.h',

This would export GetAddrInfo to include/mozilla/net for inclusion in other modules. Since nsHostResolver.cpp is in the same dir, it should be able to see GetAddrInfo.h already.

::: netwerk/dns/nsHostResolver.cpp
@@ +21,1 @@
>  #include "nsHostResolver.h"

nit: include after nsHostResolver.h.

@@ +1155,5 @@
>          TimeDuration elapsed = TimeStamp::Now() - startTime;
>          uint32_t millis = static_cast<uint32_t>(elapsed.ToMilliseconds());
>  
>          // convert error code to nsresult
>          nsresult status;

Let's have GetAddrInfo return status in an nsresult; pass the AddrInfo as a **.

Then you can check status like this:

nsresult rv = GetAddrInfo(rec->host, rec->af, rec->flags, &ai);
if (NS_SUCCEEDED(rv)) {
...

'rv' is commonly used for nsresult return values in our codebase.
Attachment #8449105 - Flags: feedback?(sworkman) → feedback+
Attached patch Creates seperate GetAddrInfo. (obsolete) — Splinter Review
Comment on attachment 8449637 [details] [diff] [review]
Creates seperate GetAddrInfo (patch 2)

Review of attachment 8449637 [details] [diff] [review]:
-----------------------------------------------------------------

I made the changes per :sworkman's feedback.
Attachment #8449637 - Flags: review?(sworkman)
Comment on attachment 8449637 [details] [diff] [review]
Creates seperate GetAddrInfo (patch 2)

Review of attachment 8449637 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good; some minor changes, so let's have one more go to make sure.

::: netwerk/dns/GetAddrInfo.cpp
@@ +10,5 @@
>  
>  #include "mozilla/net/DNS.h"
>  #include "prnetdb.h"
>  #include "nsHostResolver.h"
> +#include "nsError.h"

nsHostResolver first in the list of inclusions.

@@ +16,5 @@
>  namespace mozilla {
>  namespace net {
>  
> +nsresult GetAddrInfo(const char* aHost, uint16_t aAf, uint16_t aFlags,
> +                     AddrInfo** aAi)

Same as in GetAddrInfo.h: return value on one line, function name and args on the next line.

@@ +21,2 @@
>  {
>    // We accept the same aFlags that nsHostResolver::ResolveHost accepts, but we

Let's add some args checks in here.
if (NS_WARN_IF(!aHost)) {
  return NS_ERROR_NULL_POINTER;
}

s/aAi/aAddrInfo/   longer, but clearer.

Check aAddrInfo for nullness, and null out *aAddrInfo to initialize it.

::: netwerk/dns/GetAddrInfo.h
@@ +12,5 @@
>  
> +class AddrInfo;
> +
> +nsresult GetAddrInfo(const char* aHost, uint16_t aAf, uint16_t aFlags,
> +					 AddrInfo** aAi);

nsresult
GetAddrInfo(const char* ...);

You can just put all the args on one line for this one. I know it might break the 80 chars max rule, but it's probably cleaner looking in this case.

::: netwerk/dns/nsHostResolver.cpp
@@ +1155,5 @@
>  
>          TimeDuration elapsed = TimeStamp::Now() - startTime;
>          uint32_t millis = static_cast<uint32_t>(elapsed.ToMilliseconds());
>  
> +        if (NS_SUCCEEDED(rv)) {

One more thing here: let's add a MOZ_ASSERT(ai); in the first line of the if branch.

MOZ_ASSERT only runs on debug builds, so we can use it here to ensure that ai is not null if GetAddrInfo succeeds.
Attachment #8449637 - Flags: review?(sworkman) → review-
Attached patch Creates seperate GetAddrInfo. (obsolete) — Splinter Review
Comment on attachment 8449930 [details] [diff] [review]
Creates seperate GetAddrInfo.

Review of attachment 8449930 [details] [diff] [review]:
-----------------------------------------------------------------

These are only changes per your reviews. Still working on windows stuff (more waiting for the build to finish). I crushed all the changes into one diff. There were tabs in the other diff that made the function prototype look ugly (the one you pointed out and said to just make over 80 characters), so I tried formatting them more nicely again rather than going over the limit. Let me know if it still looks super unpleasant though.
Attachment #8449930 - Flags: review?(sworkman)
Comment on attachment 8450699 [details] [diff] [review]
Added windows-only DnsQuery code.

Review of attachment 8450699 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/dns/GetAddrInfo.cpp
@@ +36,5 @@
> +DNS_STATUS DNSQUERY(const char * lpstrName, WORD wType, DWORD Options,
> +                    PVOID pExtra, PDNS_RECORD *ppQueryResultsSet,
> +                    PVOID *pReserved)
> +{
> +  HINSTANCE library = LoadLibraryA("Dnsapi.dll");

I think you can make this a static var; something like this: http://dxr.mozilla.org/mozilla-central/source/netwerk/system/win32/nsNotifyAddrListener.cpp#35

static HMODULE sDnsLibrary;
...
if (!sDnsLibrary) etc.

You can put this in an Init function, GetAddrInfoInit_Win, called from GetAddrInfoInit.

I think you should release the library in nsHostResolver's destructor, so you should add a GetAddrInfoShutdown function too.

@@ +49,5 @@
>  {
> +  bool disableIPv4 = aAf == PR_AF_INET6;
> +
> +  DNS_RECORD* dns_data;
> +  DNSQUERY(aHost, DNS_TYPE_A, DNS_QUERY_STANDARD, nullptr, &dns_data, nullptr);

DNS_TYPE_A should match aAf. So, INET -> A, INET6 -> AAAA and UNSPEC -> A | AAAA.

@@ +57,4 @@
>    }
>  
> +  // Everything we need is now in dns_data. Now we need to pull it out and put
> +  // it into an AddrInfo object.

Shorter comment: "Convert dns_data to an AddrInfo object.

@@ +65,5 @@
> +      // I'm ignoring them for now because I'm not sure what to do in these
> +      // cases yet (if I get a CNAME should I automatically re-query?). Will
> +      // figure it out by looking deeper into getaddrinfo's internals later.
> +      LOG(("Ignoring non A record (type %X) for %s.\n",
> +           curRecord->wType, aHost));

Ignoring for now is fine.

@@ +71,5 @@
> +      LOG(("Got A record for %s.\n", aHost));
> +
> +      NetAddr addr;
> +      addr.inet.family = AF_INET;
> +      addr.inet.port = 80; // TODO(josullivan): Is this ok?

Leave the port as 0. It is filled in later: when nsSocketTransport gets the callback (OnLookupComplete) to say that DNS resolution is done, it specifies the port it wants to use for the connection. So, we don't need it yet.

@@ +74,5 @@
> +      addr.inet.family = AF_INET;
> +      addr.inet.port = 80; // TODO(josullivan): Is this ok?
> +      addr.inet.ip = curRecord->Data.A.IpAddress;
> +
> +      LOG(("Recording IP address %X for %s.\n", addr.inet.ip, aHost));

You can use NetAddrToString here. Use #ifdef PR_LOGGING to keep the required char array in debug builds only.
Attachment #8450699 - Flags: feedback+
Attached patch Uses DnsQuery on windows. (obsolete) — Splinter Review
Comment on attachment 8455562 [details] [diff] [review]
Uses DnsQuery on windows.

Review of attachment 8455562 [details] [diff] [review]:
-----------------------------------------------------------------

This is a large diff. Sorry :(! I crushed all the changes into this because there were too many since the last revision. Feel free to split up the review however you'd like (maybe first attack GetAddrInfo.cpp and GetAddrInfo.h, before moving onto the host resolver stuff?).
Attachment #8455562 - Flags: review?(sworkman)
Attachment #8457605 - Flags: review?(sworkman)
Attachment #8457606 - Flags: review?(sworkman)
The patch to gecko above is based on the large diff in attachment 8455562 [details] [diff] [review]. Steve and I haven't figured out the best way to apply the patch to libc, but it can be applied manually at the moment for testing.
Comment on attachment 8455562 [details] [diff] [review]
Uses DnsQuery on windows.

Review of attachment 8455562 [details] [diff] [review]:
-----------------------------------------------------------------

Almost finished this, but I think you have enough comments to work on for now. The concept is good, mostly cleanups.

Also, can you set previous patches to obsolete? - easier to work through the list of attachments that way :)

::: netwerk/dns/DNS.cpp
@@ +304,5 @@
> +  size_t hostlen = strlen(host);
> +  mHostName = static_cast<char*>(moz_xmalloc(hostlen + 1));
> +  memcpy(mHostName, host, hostlen + 1);
> +
> +  mCanonicalName = NULL;

Let's add initializers for this, mHost etc. to the constructor (initialize to nullptr, not NULL).
Then, remove this "= NULL" line and just call SetCanonicalName.

::: netwerk/dns/GetAddrInfo.cpp
@@ +16,5 @@
> +#include "prlog.h"
> +#if defined(PR_LOGGING)
> +static PRLogModuleInfo *gGetAddrInfoLog = PR_NewLogModule("GetAddrInfo");
> +#define LOG(args) PR_LOG(gGetAddrInfoLog, PR_LOG_DEBUG, args)
> +#define LOG_WARNING(args) PR_LOG(gGetAddrInfoLog, PR_LOG_WARNING, args)

Stylistic preference :)

#define LOG(msg, ...) \
  PR_LOG(gGetAddrInfoLog, PR_LOG_DEBUG, ("[DNS]: " msg, ##__VA_ARGS__))

Adds a nice little prefix in front of every log statement.

@@ +36,5 @@
> +
> +// The number of times this module (GetAddrInfo) has been initialized
> +static int gInitCount = 0;
> +
> +#ifdef XP_WIN

Add a whole line comment...

//////////////////....
// WINDOWS IMPLEMENTATION
//////...

@@ +39,5 @@
> +
> +#ifdef XP_WIN
> +static HMODULE gDnsapiLibrary = 0;
> +static decltype(&DnsQuery_A) gDnsQueryFunc = nullptr;
> +static decltype(&DnsFree) gDnsFree = nullptr;

I'm wondering about thread safety for these vars. Init-time is fine, because there should be one thread only, but during shutdown, it's possible that one or more of the DNS threads could be running. We've noticed a case before where PR_GetAddrInfoByName is still executing, waiting on a response, and it blocks the thread being deleted, causing a mem leak. It's fine, we ignore it, because it gets killed anyway. But if we're going to start init'ing and shutting down during runtime (which we are with network change notifications) it may be useful to lock these. What do you think?

@@ +42,5 @@
> +static decltype(&DnsQuery_A) gDnsQueryFunc = nullptr;
> +static decltype(&DnsFree) gDnsFree = nullptr;
> +
> +nsresult
> +_GetAddrInfoInit_Windows() {

'inline' function

'{' on new line for functions.

@@ +43,5 @@
> +static decltype(&DnsFree) gDnsFree = nullptr;
> +
> +nsresult
> +_GetAddrInfoInit_Windows() {
> +  gDnsapiLibrary = LoadLibraryA("Dnsapi.dll");

if (!gDnsapiLibrary) { ... }, right?

@@ +48,5 @@
> +  if (NS_WARN_IF(!gDnsapiLibrary)) {
> +    return NS_ERROR_FAILURE;
> +  }
> +
> +  gDnsQueryFunc = (decltype(gDnsQueryFunc)) GetProcAddress(gDnsapiLibrary,

if (!gDnsQueryFunc) { ... }, right?

@@ +74,5 @@
> +  gDnsapiLibrary = 0;
> +  gDnsQueryFunc = nullptr;
> +  gDnsFree = nullptr;
> +
> +  return NS_WARN_IF(!rv) ? NS_ERROR_FAILURE : NS_OK;

return NS_WARN_IF(NS_FAILED(rv)) ...

@@ +78,5 @@
> +  return NS_WARN_IF(!rv) ? NS_ERROR_FAILURE : NS_OK;
> +}
> +
> +nsresult
> +_GetAddrInfo_Windows(const char* aHost, uint16_t aAddressFamily,

I think you could 'inline' this function since it's only going to be called by GetAddrInfo, right?

@@ +84,5 @@
> +{
> +  if (NS_WARN_IF(!gDnsQueryFunc) || NS_WARN_IF(!gDnsFree)) {
> +    LOG_WARNING(("GetAddrInfoInit must be called before this function.\n"));
> +    return NS_ERROR_NULL_POINTER;
> +  }

Let's make these assertions here and keep the warnings to GetAddrInfo.

MOZ_ASSERT(gDnsQueryFunc); etc.

Also assert that the vars are non-null.

MOZ_ASSERT is for debug builds only, so it'll do nothing in release builds.

@@ +96,5 @@
> +    // Note that the PR_AF_UNSPEC family is handled at a higher level. The
> +    // resolver will split off the unspec request onto two seperate threads to
> +    // get the A and AAAA records seperately, then join them together.
> +    LOG_WARNING(("Unsupported address family %X.", aAddressFamily));
> +    return NS_ERROR_INVALID_ARG;

This comment is good, but a little too specific for this file. We may change how UNSPEC is dealt with in nsHostResolver and the comment would be out of date. Just ending with the first sentence in this comment block is enough.

Also, I think you can MOZ_ASSERT here as well as warn. This seems like a error we should catch in debug builds.

@@ +100,5 @@
> +    return NS_ERROR_INVALID_ARG;
> +  }
> +
> +  PDNS_RECORDA dns_data = nullptr;
> +  DNS_STATUS status = gDnsQueryFunc(aHost, dns_type, DNS_QUERY_STANDARD,

Style nit: we're mixing up naming techniques here; I know it's not going to be perfect because Windows types use all caps and '_', but let's keep our var names to 'camelCase'.

@@ +115,5 @@
> +  }
> +
> +  // Everything we need is now in dns_data. We just need to pull it out and put
> +  // it into an AddrInfo object.
> +  AddrInfo* ai = new AddrInfo(aHost, nullptr);

Minor: Let's use nsAutoPtr<AddrInfo> here to make sure it's deleted or passed on when the function is done. See end of function...

@@ +118,5 @@
> +  // it into an AddrInfo object.
> +  AddrInfo* ai = new AddrInfo(aHost, nullptr);
> +  PDNS_RECORDA curRecord = dns_data;
> +  while (curRecord) {
> +    if (curRecord->wType == DNS_TYPE_A) {

Looks like you could switch on wType.

@@ +119,5 @@
> +  AddrInfo* ai = new AddrInfo(aHost, nullptr);
> +  PDNS_RECORDA curRecord = dns_data;
> +  while (curRecord) {
> +    if (curRecord->wType == DNS_TYPE_A) {
> +      LOG(("Got A record for %s.\n", aHost));

Really minor: let's put that A in single quotes, just in case someone reads the logs and thinks that they got 'a record' ;)

@@ +127,5 @@
> +      addr.inet.ip = curRecord->Data.A.IpAddress;
> +
> +      // We don't have anything meaningful to use as the port so we just use 0.
> +      // This is consistent with the PR platform.
> +      addr.inet.port = 0;

// Initialize port to 0 to be filled in later at socket connection time.

@@ +137,5 @@
> +      NetAddr addr;
> +      addr.inet6.family = AF_INET6;
> +      memcpy(&addr.inet6.ip, &curRecord->Data.AAAA.Ip6Address,
> +             sizeof(addr.inet6.ip.u8));
> +      addr.inet.port = 0;

inet6.port

@@ +154,5 @@
> +  }
> +
> +  gDnsFree(dns_data, DNS_FREE_TYPE::DnsFreeRecordList);
> +
> +  LOG(("Got address information for %s.\n", aHost));

Can you add a count for the different types of records received here, please? You can wrap the vars in #ifdef PR_LOGGING to keep them out of non-logging builds.

@@ +162,5 @@
> +    return NS_ERROR_UNKNOWN_HOST;
> +  } else {
> +    *aAddrInfo = ai;
> +    return NS_OK;
> +  }

if (!ai->mAddresses.isEmpty()) {
  *aAddrInfo = ai.forget();
  return NS_OK;
}
return NS_ERROR_UNKNOWN_HOST;

'ai' as an nsAutoPtr will automatically be deleted.

@@ +166,5 @@
> +  }
> +}
> +#else
> +nsresult
> +_GetAddrInfo_Portable(const char* aHost, uint16_t aAddressFamily,

Add a whole line comment...

//////////////////....
// PORTABLE/NSPR IMPLEMENTATION
//////...

@@ +168,5 @@
> +#else
> +nsresult
> +_GetAddrInfo_Portable(const char* aHost, uint16_t aAddressFamily,
> +                      uint16_t aFlags, AddrInfo** aAddrInfo)
> +{

MOZ_ASSERT params please.

@@ +185,5 @@
> +    aAddressFamily = PR_AF_UNSPEC;
> +  }
> +
> +  PRAddrInfo* prai = PR_GetAddrInfoByName(aHost, aAddressFamily, prFlags);
> +#if defined(RES_RETRY_ON_FAILURE)

RES_RETRY_ON_FAILURE is defined in nsHostResolver.cpp if HAVE_RES_NINIT is defined. Looks like you can move class nsResState into this file in order to call Reset and do the retry.

@@ +200,5 @@
> +  if (aFlags & nsHostResolver::RES_CANON_NAME) {
> +    canonName = PR_GetCanonNameFromAddrInfo(prai);
> +  }
> +
> +  AddrInfo* ai = new AddrInfo(aHost, prai, disableIPv4, canonName);

Change this to an nsAutoPtr<AddrInfo> too.

@@ +216,5 @@
> +}
> +#endif
> +
> +nsresult
> +GetAddrInfoInit() {

Add a whole line comment...

//////////////////....
// COMMON/PLATFORM INDEPENDENT CODE
//////...

@@ +220,5 @@
> +GetAddrInfoInit() {
> +  ++gInitCount;
> +  if (gInitCount > 1) {
> +    LOG(("GetAddrInfo already initialized.\n"));
> +    return NS_OK;

So, I think we only want one init and one shutdown. Maybe s/int gInitCount/bool gInitialized/?

if (NS_WARN_IF(gInitialized)) {
  MOZ_ASSERT(!gInitialized);
  return NS_ERROR_ALREADY_INITIALIZED;
}

Reverse for Shutdown.

@@ +257,5 @@
> +
> +nsresult
> +GetAddrInfo(const char* aHost, uint16_t aAddressFamily, uint16_t aFlags,
> +            AddrInfo** aAddrInfo)
> +{

Add a check (and debug assertion) to make sure init was called.
if (NS_WARN_IF((!gInitCount)) {
  MOZ_ASSERT(gInitCount);
  return NS_ERROR_NOT_INITIALIZED;
}

@@ +268,5 @@
> +#ifdef XP_WIN
> +  const auto kGetAddrInfoPlatformSpecific = _GetAddrInfo_Windows;
> +#else
> +  const auto kGetAddrInfoPlatformSpecific = _GetAddrInfo_Portable;
> +#endif

If we inline these functions, just call them directly. You can return with the rv outside the #ifdef/#endif to ensure we always return.

::: netwerk/dns/GetAddrInfo.h
@@ +12,5 @@
> +
> +class AddrInfo;
> +
> +nsresult
> +GetAddrInfo(const char* aHost, uint16_t aAf, uint16_t aFlags,

Comments for these functions are needed.

::: netwerk/dns/nsHostResolver.cpp
@@ +199,5 @@
>      return NS_OK;
>  }
>  
> +nsresult
> +nsHostRecord::UnspecClone(nsHostRecord** aNewRecord, uint16_t aUnspecAF)

I think you can just rename this to Clone, no?
Or define it as a copy constructor instead?

Also (see later) if we hard code the first request to be 'A' and the second to be 'AAAA', then we could still set that here in mUnspecAF for both objects.

@@ +455,5 @@
>  
>      mLongIdleTimeout  = PR_SecondsToInterval(LongIdleTimeoutSeconds);
>      mShortIdleTimeout = PR_SecondsToInterval(ShortIdleTimeoutSeconds);
> +
> +    GetAddrInfoInit();

Can this go in ::Init?

@@ +869,5 @@
>  
>  nsresult
> +nsHostResolver::IssueLookup(nsHostRecord* rec)
> +{
> +    if (SPLIT_UNSPEC && rec->af == PR_AF_UNSPEC) {

Can SPLIT_UNSPEC be checked at compile time instead?

@@ +895,5 @@
> +    }
> +}
> +
> +nsresult
> +nsHostResolver::_IssueLookup(nsHostRecord* rec)

IssueLookupInternal?

@@ +1030,5 @@
>      return false;
>  }
>  
>  void
> +nsHostResolver::OnLookupComplete(nsHostRecord* rec, nsresult status, AddrInfo* result)

Doesn't look like anything changed here.

@@ +1039,5 @@
>      PR_INIT_CLIST(&cbs);
>      {
>          MutexAutoLock lock(mLock);
>  
> +        // If this was an unspec query, then there we'll be called twice and we

delete 'there'.

@@ +1041,5 @@
>          MutexAutoLock lock(mLock);
>  
> +        // If this was an unspec query, then there we'll be called twice and we
> +        // need to make sure to merge the second result with the first.
> +        if (SPLIT_UNSPEC && rec->af == PR_AF_UNSPEC) {

Can this block be its own function - MergeIntoUnspecRecord()?

@@ +1044,5 @@
> +        // need to make sure to merge the second result with the first.
> +        if (SPLIT_UNSPEC && rec->af == PR_AF_UNSPEC) {
> +            MOZ_ASSERT(rec->mUnspecAF != nsHostRecord::UNSPECAF_NULL);
> +
> +            MOZ_ASSERT(PR_AF_INET == AF_INET && PR_AF_INET6 == AF_INET6);

Don't think you need this assert.

@@ +1050,5 @@
> +            // Find the original record by looking it up in the hash table
> +            nsHostRecord* originalRecord;
> +            if (rec->mIsClone) {
> +                LOG(("Handling cloned AF_UNSPEC record for host %s.\n",
> +                     rec->host));

("OnLookupComplete: %s for UNSPEC request %s host %s.",
 AF_INET == rec->unspecAF ? "INET" : "INET6",
 rec->mIsClone ? "(clone)" : "",
 rec->host)

@@ +1054,5 @@
> +                     rec->host));
> +                nsHostDBEnt* originalHashEntry = static_cast<nsHostDBEnt*>(
> +                    PL_DHashTableOperate(&mDB, static_cast<nsHostKey*>(rec),
> +                                         PL_DHASH_LOOKUP));
> +                if (originalHashEntry) {

So, there's something about LOOKUP that might return an entry, but it might have nothing in it, so we need to check more than non-null here. Check out PL_DHASH_ENTRY_IS_BUSY(unspecHe) elsewhere in the file.

@@ +1060,5 @@
> +                    MOZ_ASSERT(originalRecord);
> +                } else {
> +                    // This could happen if the entry has been evicted already.
> +                    // In such a case, we'll just continue on processing the
> +                    // clone.

You could add it into the cache, no?

@@ +1081,5 @@
> +                         "for %s.\n", rec->host));
> +
> +                    // Remove all of the addresses of the resolved type
> +                    NetAddrElement* iter =
> +                        originalRecord->addr_info->mAddresses.getFirst();

This part could be added to NetAddrElement as a function RemoveEntriesForAF(), even if just to shrink this function a bit :)

@@ +1095,5 @@
> +                        iter = next;
> +                    }
> +
> +                    // Add in the new results
> +                    if (result) {

And maybe this could be NetAddrElement::TakeAddresses()

@@ +1121,5 @@
> +                         "result for %s.\n", rec->host));
> +                    AddrInfo* old_addr_info = originalRecord->addr_info;
> +                    originalRecord->addr_info = result;
> +                    originalRecord->addr_info_gencnt++;
> +                    delete old_addr_info;

Don't think you need this delete old_addr_info part if it's null :)

::: netwerk/dns/nsHostResolver.h
@@ +34,5 @@
> +#ifdef XP_WIN
> +// If this is enabled, we will make two queries to our resolver for unspec
> +// queries. One for any AAAA records, and the other for A records. We will then
> +// merge the results ourselves.
> +#define SPLIT_UNSPEC 1

Explain why we need to do this in the comment.

s/SPLIT_UNSPEC/AF_UNSPEC_UNSUPPORTED/

@@ +114,5 @@
>  
> +    // If SPLIT_UNSPEC is enabled and this->af is PR_AF_UNSPEC, this will
> +    // contain the address family that should actually be passed to
> +    // GetAddrInfo.
> +    uint16_t mUnspecAF;

mInnerAF?

@@ +119,5 @@
> +    static const uint16_t UNSPECAF_NULL = -1;
> +
> +    // If true, this record is a clone of another (used when SPLIT_UNSPEC is
> +    // enabled, always false if it is not enabled).
> +    bool mIsClone;

I wonder if we can just deduce this from mUnspecAF? So, let's say that the 'A' record is always the original, and 'AAAA' is always the clone. Possible?

@@ +125,5 @@
> +    // If SPLIT_UNSPEC is enabled, this will be set to the number of unresolved
> +    // host records out for the given host record key. It is used when
> +    // determining if both the cloned and original host records have been
> +    // resolved.
> +    int mNumPending;

Can these all be wrapped in #ifdef SPLIT_UNSPEC?
Attachment #8455562 - Flags: review?(sworkman) → review-
(In reply to josullivan from comment #18)
> The patch to gecko above is based on the large diff in attachment 8455562 [details] [diff] [review]
> [details] [diff] [review]. Steve and I haven't figured out the best way to
> apply the patch to libc, but it can be applied manually at the moment for
> testing.

Let's open a new bug for these. Windows patches in this bug and FxOS in the new one :)
Attachment #8449105 - Attachment is obsolete: true
Attachment #8449636 - Attachment is obsolete: true
Attachment #8449637 - Attachment is obsolete: true
Attachment #8449930 - Attachment is obsolete: true
Attachment #8449930 - Flags: review?(sworkman)
Attachment #8450699 - Attachment is obsolete: true
Attachment #8457605 - Attachment is obsolete: true
Attachment #8457605 - Flags: review?(sworkman)
Attachment #8457606 - Attachment is obsolete: true
Attachment #8457606 - Flags: review?(sworkman)
Woot woot! Thank you for the thorough review! I did some more work on it. Some responses and questions below.

Also, this patch is a diff from the previous patch. Despite this, it's still quite large. Sorry :(.

	Let's add initializers for this, mHost etc. to the constructor

It looks like the goal of the Init function is to factor out the duplicate code in the overloaded constructors. Why do you want to move the initializations into the constructors?

	I'm wondering about thread safety for these vars. [...] What do you think?

I think I came up with a good solution. I'm now using a shared_ptr object to keep track of what's using the library in a thread-safe way. We only need the Init and Shutdown calls to lock so they don't stomp on eachother (and strictly speaking this probably isn't necessary since nsHostResolver is a singleton but it's pretty uninvasive and could prevent issues in the future).

Regardless of how badly the module is misused now it should always do the right thing (in theory). I wasn't aiming for that level of safety but it is nice :D.

	Looks like you could switch on wType.

I personally prefer if statements when the logic in each branch is complex. Especially with variables being declared. But are switches much preferred?

	RES_RETRY_ON_FAILURE is defined in nsHostResolver.cpp if HAVE_RES_NINIT is defined. Looks like you can move class nsResState into this file in order to call Reset and do the retry.

Doh, I guess my eyes kept glancing over this without thinking about what it was after I brought it in initially. Anywho, the nsResState instance needs to be thread-local which is bothersome to pull off within this function so I pulled the retry logic out into the thread function.

	So, I think we only want one init and one shutdown. Maybe s/int gInitCount/bool gInitialized/?

	if (NS_WARN_IF(gInitialized)) {
	  MOZ_ASSERT(!gInitialized);
	  return NS_ERROR_ALREADY_INITIALIZED;
	}

	Reverse for Shutdown.

Since nsHostResolver is a singleton (which I didn't realize) we don't need any kind of tracking at this level. I removed the variable altogether.

	Comments for these functions are needed.

I'm going to wait until we've decided on the thread safety stuff before writing docs since I have a feeling we'll need at least one more round of changes.

	I think you can just rename this to Clone, no?
	Or define it as a copy constructor instead?

It has some logic very specific to our handling of PR_AF_UNSPEC, so I think it might be misleading if we just call it Clone or hide it in the copy constructor. What do you think?

	Can this go in ::Init?

Yes, and it's a much better place for it because we can do the correct thing if the initization fails :D. I also stuck shutdown in shutdown since that seemed sane.

	Don't think you need this assert.

The code does rely on the fact that PR_AF_INET(6) and AF_INET(6) have the same values because we never actually convert between the two. But I added a static_assert lower down in the Window's code instead. We could instead standardize to one or the other throughout, but I'm not sure which one would be better, nor how much code we'd need to touch.

	You could add it into the cache, no?

I reworked things so that the clone stores a pointer to the original which makes everything more straightforward. Now we never touch the cache.

	I wonder if we can just deduce this from mUnspecAF? So, let's say that the 'A' record is always the original, and 'AAAA' is always the clone. Possible?

With the pointer change above this boolean went away.
Attachment #8455562 - Attachment is obsolete: true
Attachment #8458994 - Flags: review?(sworkman)
Comment on attachment 8458994 [details] [diff] [review]
Uses DnsQuery on windows - Patch 2

Review of attachment 8458994 [details] [diff] [review]:
-----------------------------------------------------------------

Getting there. Check over the comments and I'll review this again tomorrow.

::: netwerk/dns/DNS.cpp
@@ +328,5 @@
>  }
>  
> +void
> +AddrInfo::MergeAndConsume(AddrInfo* other, uint16_t addressFamily) {
> +  // Remove all of the addresses of the resolved type

MOZ_ASSERT(aOther);

::: netwerk/dns/DNS.h
@@ +147,5 @@
>    LinkedList<NetAddrElement> mAddresses;
>  
> +  // Removes every address of addressFamily, then takes every address in other
> +  // as well as copying its canonicalName if it has one. other may be null.
> +  void MergeAndConsume(AddrInfo* other, uint16_t addressFamily);

aOther, aAddressFamily - 'a' for arg.

::: netwerk/dns/GetAddrInfo.cpp
@@ +63,5 @@
> +static Mutex gInitLock("GetAddrInfo.cpp::gInitLock");
> +
> +// Don't dereference this shared pointer. Make a copy instead and use that.
> +// This ensures it isn't freed while you're using it.
> +static std::shared_ptr<DnsapiInfo> gDnsapiInfo = nullptr;

Ah, no stdlib use if possible :-/ I'll talk to you about this in person.

But it seems like you could use an StaticRefPtr/nsRefPtr here instead. Just declare NS_INLINE_DECL_THREADSAFE_REFCOUNTING, use a StaticRefPtr for the global here, and then always use nsRefPtrs in functions. This should ensure it's not freed while you're using it.

Note: null out the global in the shutdown function so that there isn't a dangling pointer after the functions are done.

@@ +72,5 @@
> +  MutexAutoLock lock(gInitLock);
> +
> +  if (gDnsapiInfo) {
> +    MOZ_ASSERT(false, "Initializing GetAddrInfo more than once.");
> +    return NS_ERROR_NULL_POINTER;

NS_ERROR_ALREADY_INITIALIZED no?

@@ +114,5 @@
>  {
> +  MOZ_ASSERT(aHost);
> +  MOZ_ASSERT(aAddrInfo);
> +
> +  std::shared_ptr<DnsapiInfo> dnsapi = gDnsapiInfo;

nsRefPtr<DnsapiInfo> dnsapi = gDnsapiInfo;

@@ +127,4 @@
>    } else if (aAddressFamily == PR_AF_INET6) {
>      dns_type = DNS_TYPE_AAAA;
>    } else {
> +    // Note that the PR_AF_UNSPEC family is handled at a higher level

I don't think this comment is needed.

@@ +154,5 @@
> +#endif
> +
> +  // Different parts of the code above us uses PR_* sometimes, whereas we're
> +  // using AF_* because NetAddr's documentation/comments say to.
> +  static_assert(PR_AF_INET == AF_INET && PR_AF_INET6 == AF_INET6

I know we talked and this is useful, but it seems like it could be at the top of this file, the first statement in the #ifdef branch for windows - no?

@@ +165,5 @@
>    while (curRecord) {
> +#ifdef PR_LOGGING
> +    switch(curRecord->wType) {
> +      case DNS_TYPE_A:
> +        ++numARecords;

Hmm, I think I prefer

#ifdef PR_LOGGING
      ++num...
#endif

inside the if (curRecord->wType == DNS_TYPE_A) branch.

@@ +193,5 @@
>        NetAddr addr;
>        addr.inet6.family = AF_INET6;
>        memcpy(&addr.inet6.ip, &curRecord->Data.AAAA.Ip6Address,
>               sizeof(addr.inet6.ip.u8));
> +      addr.inet6.port = 0;

Add the port = 0 comment.

@@ +274,5 @@
>  }
> +#else
> +// Slight future proofing in case anyone forgets to conditionally include their
> +// code.
> +#error Unknown or unspecified GETADDRINFO_PLATFORM_ID.

We discussed this offline; my concern is that it should be very clear that PR_GetAddr... is the fallback function. Maybe all it needs is a redo of the comment.

::: netwerk/dns/GetAddrInfo.h
@@ +9,5 @@
>  
> +#include "nsError.h"
> +
> +#ifdef XP_WIN
> +#define GETADDRINFO_PLATFORM_ID 1 // Windows

DNS_API == WINDOWS_DNS_QUERY

We discussed this offline.

::: netwerk/dns/nsHostResolver.cpp
@@ +212,5 @@
>      }
>  
> +    cloned->mInnerAF = aUnspecAF;
> +    cloned->mCloneOf = this;
> +    NS_ADDREF(this);

NS_ADDREF_THIS();

@@ +472,4 @@
>  nsresult
>  nsHostResolver::Init()
>  {
> +    if (NS_FAILED(GetAddrInfoInit())) {

if (NS_WARN_IF(NS_FAILED(...

NS_WARN_IF is used in if conditions.

@@ +566,5 @@
>      while (mThreadCount && PR_IntervalNow() < stopTime)
>          PR_Sleep(delay);
>  #endif
> +
> +    NS_WARN_IF(NS_FAILED(GetAddrInfoShutdown()));

NS_WARN_IF_FALSE(NS_SUCCEEDED(...

NS_WARN_IF_FALSE is used outside of ifs.

@@ +904,4 @@
>      }
> +#endif
> +
> +    return IssueLookupInternal(rec);

Nicely done.

@@ +1056,1 @@
>          // If this was an unspec query, then there we'll be called twice and we

s/"then there we'll be called"/"then this function will be called"/

@@ +1144,1 @@
>              LOG(("Full resolution complete for %s.\n", rec->host));

This log might be better in the UNSUPPORTED section, no?

@@ +1215,3 @@
>      }
> +
> +    NS_RELEASE(rec);

This refcounting looks complicated. Just run your manual tests with XPCOM_MEM_LEAK_LOG or _MEM_BLOAT_LOG enabled: https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/Debugging_memory_leaks#BloatView

@@ +1310,5 @@
>          // "real" address family when we call GetAddrInfo.
>          uint16_t passthroughAF;
> +#if AF_UNSPEC_UNSUPPORTED
> +        if (rec->af == PR_AF_UNSPEC) {
> +            passthroughAF = rec->mInnerAF;

Can we just call it 'af'?

::: netwerk/dns/nsHostResolver.h
@@ +36,5 @@
>  // queries. One for any AAAA records, and the other for A records. We will then
> +// merge the results ourselves. This is done for us by getaddrinfo, but because
> +// getaddrinfo doesn't give us the TTL we use other APIs when we can that may
> +// not do it for us (Window's DnsQuery function for example).
> +#define AF_UNSPEC_UNSUPPORTED 1

DO_MERGE_FOR_AF_UNSPEC?

@@ +112,5 @@
>      bool    usingAnyThread; /* true if off queue and contributing to mActiveAnyThreadCount */
>      bool    mDoomed; /* explicitly expired */
>  
> +#if AF_UNSPEC_UNSUPPORTED
> +    // If  this->af is PR_AF_UNSPEC, this'll contain the address family that

Remove space: "If this"
s/this'll/this will/

@@ +119,5 @@
>      static const uint16_t UNSPECAF_NULL = -1;
>  
> +    // mCloneOf will point at the original host record if this record is a
> +    // clone. Null otherwise.
> +    nsHostRecord* mCloneOf;

nsRefPtr<nsHostRecord> mCloneOf;

Let's just keep a ref to it.

@@ +127,3 @@
>      int mNumPending;
>  
> +    nsresult UnspecClone(nsHostRecord** aNewRecord, uint16_t aUnspecAF);

CloneForAFUnspec
Attachment #8458994 - Attachment is obsolete: true
Attachment #8458994 - Flags: review?(sworkman)
Attachment #8460634 - Flags: review?(sworkman)
Doh, sorry for the email spam. Last diff contained a typo.
Attachment #8460634 - Attachment is obsolete: true
Attachment #8460634 - Flags: review?(sworkman)
Attachment #8460639 - Flags: review?(sworkman)
Comment on attachment 8460639 [details] [diff] [review]
Uses DnsQuery on windows - Patch 3.1.patch

Review of attachment 8460639 [details] [diff] [review]:
-----------------------------------------------------------------

You didn't ask for my review, but I was reading your code and thought I'd leave notes.

Thanks for doing this, nice work!

::: netwerk/dns/DNS.cpp
@@ +285,4 @@
>  {
> +  if (mCanonicalName) {
> +    moz_free(mCanonicalName);
> +  }

I know your code isn't technically wrong right now because you are sure to write a valid value to mCanonicalName eventually, after this free, but this is begging for a mistake to be made in the future. Best practice is to assign nullptr right after the free, and skip the nullptr assignment in the else block.

::: netwerk/dns/GetAddrInfo.cpp
@@ +88,5 @@
> +      NS_WARN_IF(!FreeLibrary(library));
> +      return NS_ERROR_FAILURE;
> +    }
> +
> +    Info* info = new Info;

There is no failure path or locking here, why not just assign to sInfo to start with? Why create the "info" variable?

@@ +150,5 @@
> +OffTheBooksMutex DnsapiInfo::sLock("GetAddrInfo.cpp::DnsapiInfo::sLock");
> +int DnsapiInfo::sRefCount = 0;
> +
> +// Only to used by _GetAddrInfoInit_Windows and _GetAddrInfoShutdown_Windows.
> +static DnsapiInfo* _gDnsapiInfo = nullptr;

You didn't explicitly assign nullptr to previous static pointers in this file, why start now? I don't care which way you do it, but consistency is nice.

@@ +162,5 @@
> +  } else {
> +    _gDnsapiInfo = new DnsapiInfo(true);
> +    return NS_OK;
> +  }
> +}

The compiler is smart enough to deal with this, but it raises a mental flag when I look at it. Suggest getting rid of the unnecessary else block so you always return something.

@@ +181,5 @@
> +  MOZ_ASSERT(aAddrInfo);
> +
> +  DnsapiInfo dnsapi;
> +  if (!dnsapi.mInfo) {
> +    MOZ_ASSERT(false, "GetAddrInfo not yet initialized.");

Use MOZ_ASSERT_UNREACHABLE instead? "MOZ_ASSERT(false..." is ugly.

@@ +277,5 @@
> +  } else {
> +    *aAddrInfo = ai.forget();
> +    return NS_OK;
> +  }
> +}

Maybe it's just me, but again - I find this pattern a bit alarming, even if it is technically OK. I don't see it elsewhere in the Mozilla code either, at least in the code I've read. Just get rid of the else block and finish the method with the assignment and return NS_OK.
Comment on attachment 8460639 [details] [diff] [review]
Uses DnsQuery on windows - Patch 3.1.patch

Review of attachment 8460639 [details] [diff] [review]:
-----------------------------------------------------------------

So, we're getting there, but DnsapiInfo can be refined further, I think. I'll need to discuss this with you more next week when I'm back on online. I'll ping you. In the meantime, attend to these comments and try to refine and simplify the init process. We're on the right tracks :)

::: netwerk/dns/GetAddrInfo.cpp
@@ +46,5 @@
> +// WINDOWS IMPLEMENTATION //
> +////////////////////////////
> +
> +// Different parts of the code above us use PR_* sometimes, whereas we're using
> +// AF_* because NetAddr's documentation/comments say to.

// Ensure consistency of PR_* and AF_* constants to allow for legacy usage of PR_* constants with this API.

@@ +52,5 @@
> +              && PR_AF_UNSPEC == AF_UNSPEC, "PR_AF_* must match AF_*");
> +
> +// Class to handle safely loading and freeing the Dnsapi.dll library in
> +// Windows. The static function InitLibrary() should be called once before
> +// any instances are created (this is achieved by doing it in the main thread

Convert this to a /* */ style comment:
/**
 * DnsapiInfo
 * Class to ..

@@ +60,5 @@
> +// When used properly, this class is (in theory) totally thread-safe. We may
> +// leak if a dns thread is stuck in getaddrinfo when it's time to shutdown, but
> +// we'll never segfault.
> +class DnsapiInfo {
> +  struct Info {

Assuming these are meant to be private, add "private:".

@@ +67,5 @@
> +    decltype(&DnsFree) mDnsFreeFunc;
> +  };
> +
> +  static Info* sInfo;
> +  static OffTheBooksMutex sLock;

// OffTheBooksMutex because we may leak at shutdown.

@@ +68,5 @@
> +  };
> +
> +  static Info* sInfo;
> +  static OffTheBooksMutex sLock;
> +  static int sRefCount;

Functions first, vars last.
Put all vars together at the end of the class definition.

@@ +70,5 @@
> +  static Info* sInfo;
> +  static OffTheBooksMutex sLock;
> +  static int sRefCount;
> +
> +  static nsresult _InitLibrary() {

MOZ_ALWAYS_INLINE for these functions if possible.

Let's MOZ_ASSERT(NS_IsMainThread(), "Do not initialize DnsQuery off main thread!");

@@ +72,5 @@
> +  static int sRefCount;
> +
> +  static nsresult _InitLibrary() {
> +    if (sInfo) {
> +      MOZ_ASSERT(false, "Initializing GetAddrInfo more than once.");

Josh mentioned that MOZ_ASSERT(false,...) is ugly. I agree. I like MOZ_ASSERT(!sInfo, ...) because then it's clear on the console what condition is resolving to false.

@@ +107,5 @@
> +    if (info->mLibrary) {
> +      bool failed = NS_WARN_IF(!FreeLibrary(info->mLibrary));
> +      return failed ? NS_ERROR_FAILURE : NS_OK;
> +    } else {
> +      MOZ_ASSERT(false, "Shutting down GetAddrInfo more than once.");

Remove the else. (Josh mentioned this too).

Adjust the MOZ_ASSERT and if here:

MOZ_ASSERT(info);
MOZ_ASSERT(info->mLibrary);
if (info && info->mLibrary) {
  return NS_WARN_IF(!FreeLibrary(info->mLibrary)) ? NS_ERROR_FAILURE : NS_OK;
}
return NS_ERROR_NOT_INITIALIZED;

@@ +131,5 @@
> +      mInfo = nullptr;
> +    } else {
> +      MOZ_ASSERT(sInfo);
> +      ++sRefCount;
> +      mInfo = sInfo;

I think I need to discuss this with you in the office again (or next week on vidyo maybe).
It looks like you could avoid sRefCount and use StaticRefPtr and RefPtr; maybe not on DnsapiInfo, but on Info? I know we talked about this already, but I

@@ +162,5 @@
> +  } else {
> +    _gDnsapiInfo = new DnsapiInfo(true);
> +    return NS_OK;
> +  }
> +}

I agree with Josh. Remove the else.

::: netwerk/dns/GetAddrInfo.h
@@ +9,5 @@
> +
> +#include "nsError.h"
> +
> +#define DNS_API_WINDOWS_DNS_QUERY 1
> +#define DNS_API_PORTABLE 0

0 first please.
Parens, no: (0) and (1)?

@@ +14,5 @@
> +
> +#ifdef XP_WIN
> +#define DNS_API 1 // Windows
> +#else
> +#define DNS_API 0 // Portable (fallback)

Use the nice #defines you added above :)

// Windows: DnsQuery()
// Portable: PR_GetAddrInfoByName()

@@ +23,5 @@
> +
> +class AddrInfo;
> +
> +// Cross platform interface to resolve domain names. See PR_GetAddrInfo for
> +// usage.

Nope: Copy the usage comment here and amend as appropriate. Don't reference PR_GetAddrInfo in the comment.

@@ +28,5 @@
> +nsresult
> +GetAddrInfo(const char* aHost, uint16_t aAf, uint16_t aFlags,
> +            AddrInfo** aAddrInfo);
> +
> +// Call once before GetAddrInfo is used.

... to perform any necessary initialization.

::: netwerk/dns/nsHostResolver.cpp
@@ +29,5 @@
> +#include <arpa/inet.h>
> +#include <arpa/nameser.h>
> +#include <resolv.h>
> +#define RES_RETRY_ON_FAILURE
> +#endif

Any particular reason that these includes have been moved further down the file?

@@ +202,5 @@
>  }
>  
> +#if DO_MERGE_FOR_AF_UNSPEC
> +nsresult
> +nsHostRecord::CloneForAFUnspec(nsHostRecord** aNewRecord, uint16_t aUnspecAF)

s/aNewRecord/aClonedRecord/
s/aUnspecAF/aInnerAF/

@@ +214,5 @@
> +    cloned->mInnerAF = aUnspecAF;
> +    cloned->mCloneOf = this;
> +    NS_ADDREF_THIS();
> +
> +    *aNewRecord = cloned;

I know I said NS_ADDREF_THIS earlier, but it's fewer lines to have NS_ADDREF(*aNewRecord = cloned); also clearer that we're addrefing the return value.

@@ +880,5 @@
>  nsresult
> +nsHostResolver::IssueLookup(nsHostRecord* rec)
> +{
> +#if DO_MERGE_FOR_AF_UNSPEC
> +    if (rec->af == PR_AF_UNSPEC) {

Add a short comment here:
// Issue two lookups to fulfill AF_UNSPEC requests: one each for AF_INET and AF_INET6.

@@ +1060,5 @@
> +        // and we need to make sure to merge the second result with the first.
> +        if (rec->af == PR_AF_UNSPEC) {
> +            MOZ_ASSERT(rec->mInnerAF != nsHostRecord::UNSPECAF_NULL);
> +
> +            // Find the original record by looking it up in the hash table

This comment looks wrong.

@@ +1069,5 @@
> +                     rec->mCloneOf ? " (clone)" : "",
> +                     rec->host));
> +
> +                originalRecord = rec->mCloneOf;
> +            } else {

I'd like to try to shrink the lines of this if/else down:

nsHostRecord *originalRecord = rec->mCloneOf ? rec->mCloneOf : rec:
LOG(("OnLookupComplete: %s for UNSPEC request (%s) ...",
     ...
     rec->mCloneOf ? "clone" : "original",
     ...));

I think you can reduce both if and else branches to these two statements.

@@ +1095,5 @@
> +                    originalRecord->addr_info_gencnt++;
> +                }
> +            }
> +
> +            if (rec != originalRecord) {

// Release the cloned record if its lookup is complete.

@@ +1115,5 @@
> +                NS_RELEASE(rec);
> +
> +                // We're totally done with the clone now and can concern
> +                // ourselves solely with the original record.
> +                rec = originalRecord;

I know originalRecord only exists in this scope, but let's null it out.

Also, I think that the code in the if branch could be in its own function. OnInnerLookupCompleteForAFUnspec? Currently, this is a very long function, and it looks like you'd just need to pass in the nsHostRecord pointer. Make it inline too.

@@ +1128,5 @@
> +            LOG(("Got result for %s.\n", rec->host));
> +
> +            // update record fields.  We might have a rec->addr_info already if
> +            // a previous lookup result expired and we're reresolving it..
> +            AddrInfo  *old_addr_info;

Two spaces!! Remove one please :)

@@ +1141,2 @@
>  
> +        // grab list of callbacks to notify

Remove this commet, and use a couple of comments in the #ifdef and #else.

@@ +1141,4 @@
>  
> +        // grab list of callbacks to notify
> +#if DO_MERGE_FOR_AF_UNSPEC
> +        if (rec->mNumPending <= 0) {

// If we're merging for AF_UNSPEC, grab the callback list only if all inner lookups are complete.

@@ +1143,5 @@
> +#if DO_MERGE_FOR_AF_UNSPEC
> +        if (rec->mNumPending <= 0) {
> +            MOZ_ASSERT(rec->mNumPending == 0);
> +#else
> +        {

// Otherwise, go ahead the grab the list of callbacks to notify.

@@ +1156,2 @@
>              else {
> +                rec->expiration += TimeDuration::FromSeconds(60); /* one minute for negative cache */

Let's move the "one minute" comment to above this line:
// Set expiration to 1 min for negative cache entries.

@@ +1192,5 @@
>          }
>      }
>  
> +#if DO_MERGE_FOR_AF_UNSPEC
> +    if (rec->mNumPending <= 0) {

Switch this around so that the callbacks section doesn't need to be indented. It will help a little with revision history.

Remember to release rec!

Add a comment:
// Inner lookups for AF_UNSPEC still pending; release current record and return.

@@ +1202,5 @@
>  
> +        if (rec->af == PR_AF_UNSPEC && rec->addr_info) {
> +            MutexAutoLock lock(rec->addr_info_lock);
> +            status = rec->addr_info->mAddresses.isEmpty() ? status : NS_OK;
> +        }

Should this not be in a DO_MERGE section?

@@ +1217,2 @@
>          }
> +

Empty line! Remove!

@@ +1310,5 @@
>          TimeStamp startTime = TimeStamp::Now();
>          MOZ_EVENT_TRACER_EXEC(rec, "net::dns::resolve");
>  
> +        // In the case of an unspec request, we need to make sure to use the
> +        // "real" address family when we call GetAddrInfo.

Move this comment inside the MERGE block.

@@ +1313,5 @@
> +        // In the case of an unspec request, we need to make sure to use the
> +        // "real" address family when we call GetAddrInfo.
> +        uint16_t af;
> +#if DO_MERGE_FOR_AF_UNSPEC
> +        if (rec->af == PR_AF_UNSPEC) {

af = (rec->af == PR_AF_UNSPEC) ? rec->mInnerAF : rec->af;

::: netwerk/dns/nsHostResolver.h
@@ +134,5 @@
>      // as unusable. the list is kept as a set of strings to make it independent
>      // of gencnt.
>      nsTArray<nsCString> mBlacklistedItems;
>  
> +

Extra line - please remove.

@@ +275,4 @@
>  
>      nsresult Init();
>      nsresult IssueLookup(nsHostRecord *);
> +    nsresult IssueLookupInternal(nsHostRecord *);

Please add a couple of short comments for these two functions to clarify why there is an internal lookup function.
Comment on attachment 8460639 [details] [diff] [review]
Uses DnsQuery on windows - Patch 3.1.patch

Review of attachment 8460639 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the review Josh, I certainly appreciate more eyes on my code. I left a few answers/questions inline for both of you. I've made the changes I can but it looks I'm blocked on Steve returning so I'm going to let the changes sit locally for the time being so I don't have to do two rounds of testing.

::: netwerk/dns/GetAddrInfo.cpp
@@ +72,5 @@
> +  static int sRefCount;
> +
> +  static nsresult _InitLibrary() {
> +    if (sInfo) {
> +      MOZ_ASSERT(false, "Initializing GetAddrInfo more than once.");

On the console, `!sInfo` is unlikely to mean anything to anyone who hasn't just been in the code, and if they have just seen the code they're going to go to the line the assertion mentions anyways and see the condition in the if. So I don't see a great benefit, and I do see a downside in that the condition is duplicated in the code instead of there being one source of truth. I think `assert(false...` better conveys that this code is supposed to be unreachable.

@@ +88,5 @@
> +      NS_WARN_IF(!FreeLibrary(library));
> +      return NS_ERROR_FAILURE;
> +    }
> +
> +    Info* info = new Info;

The previous (and imperfect) way I was doing thread safety mandated that some pointer pointed at the fully prepared structure or null, now it should be unnecessary as long as the init completely finishes execution prior to any DNS/worker threads starting up (which is currently guaranteed).

@@ +131,5 @@
> +      mInfo = nullptr;
> +    } else {
> +      MOZ_ASSERT(sInfo);
> +      ++sRefCount;
> +      mInfo = sInfo;

I have coded up the three alternatives for your viewing pleasure. I prefer what is currently here, but I don't have a strong opinion since it's a stylistic thing, so I'll change it to whichever you prefer (or maybe you know of an even better way?).

 (1) Current: https://gist.github.com/brownhead/9cdef5f9bf8dd7c53cf2
 (2) StaticRefPtr: https://gist.github.com/brownhead/147556002889935150f0
 (3) static nsRefPtr: https://gist.github.com/brownhead/8a832d095382c425b2aa

I'm very slightly worried about the thread safety of things using 2 or 3 if we manage to come out of the main() function without having called shutdown yet (which is pretty much impossible afaik, thus the very slightly). Reasoning about the static destructors is time consuming.

I think this is too detail-oriented to talk about verbally, so I'd prefer to resolve this with email/bugzilla if that's alright.

@@ +150,5 @@
> +OffTheBooksMutex DnsapiInfo::sLock("GetAddrInfo.cpp::DnsapiInfo::sLock");
> +int DnsapiInfo::sRefCount = 0;
> +
> +// Only to used by _GetAddrInfoInit_Windows and _GetAddrInfoShutdown_Windows.
> +static DnsapiInfo* _gDnsapiInfo = nullptr;

I can't find the pointers you mentioned. I prefer to null them always (despite it being unnecessary).

@@ +162,5 @@
> +  } else {
> +    _gDnsapiInfo = new DnsapiInfo(true);
> +    return NS_OK;
> +  }
> +}

Sounds good. This is holy war territory if you want to read some amusing arguing (google: `if return else return`).

@@ +181,5 @@
> +  MOZ_ASSERT(aAddrInfo);
> +
> +  DnsapiInfo dnsapi;
> +  if (!dnsapi.mInfo) {
> +    MOZ_ASSERT(false, "GetAddrInfo not yet initialized.");

MOZ_ASSERT_UNREACHABLE seemed to be primarily for optimization and did not seem safe. If execution did manage to get into this if statement in production, and we had a MOZ_ASSERT_UNREACHABLE here, we would enter undefined-behavior land.

::: netwerk/dns/GetAddrInfo.h
@@ +9,5 @@
> +
> +#include "nsError.h"
> +
> +#define DNS_API_WINDOWS_DNS_QUERY 1
> +#define DNS_API_PORTABLE 0

Why do you want parens here?

::: netwerk/dns/nsHostResolver.cpp
@@ +29,5 @@
> +#include <arpa/inet.h>
> +#include <arpa/nameser.h>
> +#include <resolv.h>
> +#define RES_RETRY_ON_FAILURE
> +#endif

Hmmm, I don't think so. I'll move them back.

@@ +214,5 @@
> +    cloned->mInnerAF = aUnspecAF;
> +    cloned->mCloneOf = this;
> +    NS_ADDREF_THIS();
> +
> +    *aNewRecord = cloned;

`cloned` is not the this pointer in this case.
(In reply to josullivan from comment #27)
> @@ +72,5 @@
> > +  static int sRefCount;
> > +
> > +  static nsresult _InitLibrary() {
> > +    if (sInfo) {
> > +      MOZ_ASSERT(false, "Initializing GetAddrInfo more than once.");
> 
> On the console, `!sInfo` is unlikely to mean anything to anyone who hasn't
> just been in the code, and if they have just seen the code they're going to
> go to the line the assertion mentions anyways and see the condition in the
> if. So I don't see a great benefit, and I do see a downside in that the
> condition is duplicated in the code instead of there being one source of
> truth. I think `assert(false...` better conveys that this code is supposed
> to be unreachable.

MOZ_ASSERT_UNREACHABLE - equates to NOZ_NIGHTLY_ASSERT(false,...), which asserts on debug builds and nightlies. This is different from MOZ_ASSUME_UNREACHABLE which looks like it's in the process of being deprecated due to the possibility of undefined behavior as you mentioned elsewhere.

> @@ +131,5 @@
> > +      mInfo = nullptr;
> > +    } else {
> > +      MOZ_ASSERT(sInfo);
> > +      ++sRefCount;
> > +      mInfo = sInfo;
> 
> I have coded up the three alternatives for your viewing pleasure. I prefer
> what is currently here, but I don't have a strong opinion since it's a
> stylistic thing, so I'll change it to whichever you prefer (or maybe you
> know of an even better way?).
> 
>  (1) Current: https://gist.github.com/brownhead/9cdef5f9bf8dd7c53cf2
>  (2) StaticRefPtr: https://gist.github.com/brownhead/147556002889935150f0
>  (3) static nsRefPtr: https://gist.github.com/brownhead/8a832d095382c425b2aa
> 
> I'm very slightly worried about the thread safety of things using 2 or 3 if
> we manage to come out of the main() function without having called shutdown
> yet (which is pretty much impossible afaik, thus the very slightly).
> Reasoning about the static destructors is time consuming.
> 
> I think this is too detail-oriented to talk about verbally, so I'd prefer to
> resolve this with email/bugzilla if that's alright.

Ah, I understand better again ;) There's a comment you added in the second example that helped:

// We can't make this an nsRefPtr because it wouldn't expose the refcount to
// us (therefore trigerring the library's shutdown), and objects pointed to
// by StaticRefPtr objects must have trivial destructors.

So, I would like to see multiple comments like this that explain why there is another refcounting mechanism that's different to the more commonly used one. I guarantee I am going to forget in the future, and someone else who is new to the code will also benefit from such explicative comments.

> @@ +162,5 @@
> > +  } else {
> > +    _gDnsapiInfo = new DnsapiInfo(true);
> > +    return NS_OK;
> > +  }
> > +}
> 
> Sounds good. This is holy war territory if you want to read some amusing
> arguing (google: `if return else return`).
> 
> @@ +181,5 @@
> > +  MOZ_ASSERT(aAddrInfo);
> > +
> > +  DnsapiInfo dnsapi;
> > +  if (!dnsapi.mInfo) {
> > +    MOZ_ASSERT(false, "GetAddrInfo not yet initialized.");
> 
> MOZ_ASSERT_UNREACHABLE seemed to be primarily for optimization and did not
> seem safe. If execution did manage to get into this if statement in
> production, and we had a MOZ_ASSERT_UNREACHABLE here, we would enter
> undefined-behavior land.

From the code (http://dxr.mozilla.org/mozilla-central/source/mfbt/Assertions.h#501) it looks like you were reading about MOZ_ASSUME_UNREACHABLE. The ASSERT version looks like what we want here.

> ::: netwerk/dns/GetAddrInfo.h
> @@ +9,5 @@
> > +
> > +#include "nsError.h"
> > +
> > +#define DNS_API_WINDOWS_DNS_QUERY 1
> > +#define DNS_API_PORTABLE 0
> 
> Why do you want parens here?

I've been taught that it was a good practice - http://stackoverflow.com/questions/9081479/is-there-a-good-reason-for-always-enclosing-a-define-in-parentheses-in-c
Not super relevant to here, but no harm in programming defensively. Not super important.

> @@ +214,5 @@
> > +    cloned->mInnerAF = aUnspecAF;
> > +    cloned->mCloneOf = this;
> > +    NS_ADDREF_THIS();
> > +
> > +    *aNewRecord = cloned;
> 
> `cloned` is not the this pointer in this case.

It is not. You are right! ;)

So, I'm going to r+ this, but no landing until testing is done :)
Attachment #8460639 - Flags: review?(sworkman) → review+
Sorry for the delay here. Quick update: Squashed a couple bugs and have been slowed down by the try server problems this week. I just got a clean run through a couple of try platforms finally though, the other ones failed because of try server problems. I'm going to put it up through windows and linux again and hopefully I'll get one solid result that I can point at for checkin.
I made changes per the reviews. I also made changes to fix bugs that cropped up after testing on try. This is a diff relative to the last patch.

The bugs that cropped up (and therefore the changes in this diff unrelated to the reviews) were:

* `nsHostRecord.nsHostaddr_info_gencnt` must be incremented whenever `nsHostRecord.addr_info` changes. Otherwise, an iterator going through it (it's a linked list) can segfault. I increment it properly now when its changed, and also improved the comment for the data member.
* The static mutex that made the `DnsapiInfo` class threadsafe caused a segfault in certain cases when it was destructed because it tries to access another global that may have already been freed. The fix for this was to dynamically allocate it and let it leak all the way to the end of the process's life (see comments in `DnsapiInfo::DnsapiInfo()`).
* Due to a bad automerge, `GetAddrInfoInit` was being called twice. I thought it was a mochitest specific thing so I made doing so safe. After discovering the source of the problem I decided to keep the changes because it's a bit cleaner and safer. So now `GetAddrInfoInit` can be called twice and in any thread (though it logs a warning), rather than being racey if not used properly.
Attachment #8472017 - Flags: review?(sworkman)
The patch got through the try server just fine: https://tbpl.mozilla.org/?tree=Try&rev=02ceb01b03da (the red came from unrelated, known, intermittent failures and succeeded after retry).
Comment on attachment 8472017 [details] [diff] [review]
Uses DnsQuery on windows - Changes on top of Patch 3.1.patch

Review of attachment 8472017 [details] [diff] [review]:
-----------------------------------------------------------------

Mostly comments on comments. And we talked offline about trying to use StaticRefPtr and nsRefPtr with specialized versions of AddRef and Release.

I think we're on the final stretch to get DnsQuery in mozilla-central :)

::: netwerk/dns/GetAddrInfo.cpp
@@ +59,5 @@
> + * use code in the Dnsapi.dll library, create an instance of this class and
> + * use mInfo. When shutting down, call ShutdownLibrary() once as well.
> + *
> + * This class is (in theory) totally thread safe, and we should never segfault.
> + * We do, however, always leak a mutex

... always leak an OffTheBooksMutex.

@@ +67,5 @@
> + * pointer that we can use. Our NS_INLINE_DECL_THREADSAFE_REFCOUNTING macro
> + * can't be used either because it does not give us a way to detect if an
> + * object has been freed (this is actually the reason nsRefPtr isn't
> + * thread-safe). If we can ever use std::shared_ptr instead we should jump on
> + * it.

Note that we're doing refcounting manually here rather than using nsRefPtr and refcounting macros. For DnsapiInfo we require a particular kind of thread-safe refcounting that is reminiscent of std::shared_ptr: <please describe>.

@@ +73,1 @@
>  class DnsapiInfo {

Open brace on new line. Missed this earlier sorry.

@@ +76,5 @@
> +    if (!sLock) {
> +      // We intentionally leak this mutex. This is because we can run into a
> +      // situation where the worker threads are still running until the process
> +      // is actually fully shut down, and at any time one of those worker
> +      // threads can access sLock.

We intentionally leak this mutex because of a possible race condition with the worker threads during process shutdown. Worker threads may be blocked at process shutdown while waiting on the OS host resolver API to return. Since they may become unblocked at any time before the process is fully shut down, we ensure that sLock is available by allowing it to leak.

@@ +88,4 @@
>      if (sInfo) {
> +      // It's OK if this happens (no explosions will occur), but it indicates
> +      // a programmer error.
> +      LOG_WARNING("GetAddrInfo is being initialized twice!");

Let's MOZ_ASSERT_UNREACHABLE here and catch the places where we're initializing twice. It may help avoid some bugs.

@@ +162,4 @@
>  
> +  // Acquire sLock before calling this function
> +  static nsresult _Release() {
> +    LOG("Releasing GetAddrInfo library.");

sLock->AssertCurrentThreadOwns();

@@ +216,5 @@
>  }
>  
>  static MOZ_ALWAYS_INLINE nsresult
>  _GetAddrInfo_Windows(const char* aHost, uint16_t aAddressFamily,
> +    uint16_t aFlags, AddrInfo** aAddrInfo)

Indentation.

::: netwerk/dns/GetAddrInfo.h
@@ +17,2 @@
>  #else
> +#define DNS_API DNS_API_PORTABLE // Portable: PR_GetAddrInfoByName()

Move these end of line comments up to the #defines for DNS_API_PORTABLE and DNS_API_WINDOWS_DNS_QUERY.
Also, try to align the start of the comment with spaces.

@@ +27,5 @@
> + * Look up a host by name. Mostly equivalent to getaddrinfo(host, NULL, ...) of
> + * RFC 3493.
> + *
> + * @param hostname[in] Character string defining the host name of interest
> + * @param aAf[in] May be PR_AF_UNSPEC or PR_AF_INET.

I don't think so. GetAddrInfo can take UNSPEC, INET and INET6, no? This looks like a copy error from PR_GetAddrInfoByName.

::: netwerk/dns/nsHostResolver.cpp
@@ +1114,5 @@
>              }
>  
>              MOZ_ASSERT(rec->mNumPending >= 0);
>              rec->mNumPending--;
> +

Extra line? Prob remove it.

@@ +1140,4 @@
>          if (rec->mNumPending <= 0) {
>              MOZ_ASSERT(rec->mNumPending == 0);
>  #else
> +        // Otherwise, go ahead an grab the list of callbacks to notify.

go ahead and...

@@ +1314,4 @@
>          uint16_t af;
>  #if DO_MERGE_FOR_AF_UNSPEC
> +        // In the case of an unspec request, we need to make sure to use the
> +        // "real" address family when we call GetAddrInfo.

In the case of an AF_UNSPEC ...
use the inner address family that can be used by the OS resolver when we....

::: netwerk/dns/nsHostResolver.h
@@ +82,5 @@
>       */
>      Mutex        addr_info_lock;
> +    /* generation count of |addr_info|. Make sure to change whenever addr_info
> +     * changes so any iterators going through the old linked list can be
> +     * invalidated. */

Generation ... Must be incremented when addr_info is changed so ...

@@ +284,5 @@
>  
> +    // This will issue two lookups (using the internal version) if
> +    // DO_MERGE_FOR_AF_UNSPEC is enabled and the passed in record is an
> +    // AF_UNSPEC record.
> +    nsresult IssueLookup(nsHostRecord *);

Issues lookup(s) for the given nsHostRecord in the hash table.
This will ....

@@ +287,5 @@
> +    // AF_UNSPEC record.
> +    nsresult IssueLookup(nsHostRecord *);
> +
> +    // This actually issues a single lookup
> +    nsresult IssueLookupInternal(nsHostRecord *);

Issues a single lookup for the given nsHostRecord based on address family.
This is a patch relative to "Uses DnsQuery on windows - Changes on top of Patch 3.1.patch", and is a response to the latest review by sworkman.
Attachment #8473241 - Flags: review?(sworkman)
Comment on attachment 8473241 [details] [diff] [review]
Uses DnsQuery on windows - Latest changes 8-14-2014.patch

Review of attachment 8473241 [details] [diff] [review]:
-----------------------------------------------------------------

Good stuff. Just 1 nit. r=me.

Please upload a full patch marked r+ when you have a green try run. Then add the checkin-needed keyword to the bug to try it out on m-c.

::: netwerk/dns/GetAddrInfo.cpp
@@ +60,4 @@
>  
> +struct DnsapiInfo
> +{
> +  NS_INLINE_DECL_THREADSAFE_REFCOUNTING(DnsapiInfo);

nit: Add an explicit public specifier here.
Attachment #8473241 - Flags: review?(sworkman) → review+
Summary: use DnsQuery (with existing DNS thread pool) to get real TTL for Windows users → use DnsQuery (with existing DNS thread pool) for Windows users
A separate bug will be opened for the TTL portion of this.
Try server run: https://tbpl.mozilla.org/?tree=Try&rev=ebf7da2a35bd
There was a failure from the open, intermittent bug 995108, but after a couple retriggers it went through fine.
Attachment #8460639 - Attachment is obsolete: true
Attachment #8472017 - Attachment is obsolete: true
Attachment #8473241 - Attachment is obsolete: true
Attachment #8472017 - Flags: review?(sworkman)
Attachment #8474755 - Flags: review+
Attachment #8474755 - Flags: checkin?
Comment on attachment 8474755 [details] [diff] [review]
Uses DnsQuery on windows - Patch 4 checkin.patch

Please use the checkin-needed bug keyword. Plays more nicely with our bug marking tools :)
Attachment #8474755 - Flags: checkin?
Keywords: checkin-needed
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #37)
> Comment on attachment 8474755 [details] [diff] [review]
> Uses DnsQuery on windows - Patch 4 checkin.patch
> 
> Please use the checkin-needed bug keyword. Plays more nicely with our bug
> marking tools :)

Sorry about that! Thank you for letting me know, I misunderstood the directions on the wiki. Hopefully this is right now.
(In reply to josullivan from comment #38)
> Sorry about that! Thank you for letting me know, I misunderstood the
> directions on the wiki. Hopefully this is right now.

In the future, please use your full name in the hg commit information as well :)
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #40)
> (In reply to josullivan from comment #38)
> > Sorry about that! Thank you for letting me know, I misunderstood the
> > directions on the wiki. Hopefully this is right now.
> 
> In the future, please use your full name in the hg commit information as
> well :)

Whoops! Fixed my configuration. Thank you again!
https://hg.mozilla.org/mozilla-central/rev/4741ef815af9
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
(In reply to josullivan from comment #35)
> A separate bug will be opened for the TTL portion of this.

which bug was this done in?
I'm backing this out:  https://hg.mozilla.org/integration/mozilla-inbound/rev/49502f43bd03

because of 1] https://bugzilla.mozilla.org/show_bug.cgi?id=1056528

and 2] https://bugzilla.mozilla.org/show_bug.cgi?id=1056599

#1 sounds like a bug.. #2 sounds perhaps like it is making an existing interaction a bit worse as a side effect of the new resoltion.. we should investigate and decide more deliberately what to do - perhaps the beginning of the new nightly in a week and half would give more room to firefight.

in any event, without the ttl changes this is just risky churn without upside. Let's land them together next time.
Depends on: 1056528
(In reply to AFK Aug 22- Sep 01 Patrick McManus [:mcmanus] from comment #44)
> I'm backing this out: 
> https://hg.mozilla.org/integration/mozilla-inbound/rev/49502f43bd03


FYI, this landed with the wrong bug number. I've backed out and re-pushed with it corrected. Please double-check that in the future before pushing.

https://hg.mozilla.org/integration/mozilla-inbound/rev/cb4ea6b2f33c
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 1056599
https://hg.mozilla.org/mozilla-central/rev/cb4ea6b2f33c
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch Changes-Since-Rollback.patch (obsolete) — Splinter Review
This is a preliminary patch of my changes since the rollback. It's not in a super clean state but you can see roughly where things stand. A few things worth noting:

* I had to export several functions to allow the test program to work. Not sure if that's OK or if there's another way. The test program isn't critical to check into our codebase so we can leave it out of the patch if necessary.
* I have not yet determined what to do about the Bionic derived functions. They're annotated as such for the moment.
* The netbios stuff is very new and not working yet, so you can mostly ignore it.
* Sending this now so I don't just hand you a huge chunk of code with the full patch. Keep in mind that this is preliminary though, so don't worry about style/nits or polishing type things, I'll polish it properly before doing a full review.
Attachment #8486223 - Flags: feedback?(sworkman)
Attached patch Additional DnsQuery.patch (obsolete) — Splinter Review
This is a fresh set of changes that use the new strategy (making a second request). Just like before, the TTL is not yet used. I ran it through try and there's failures because lookups are being issued during shutdown (at least I think so): https://tbpl.mozilla.org/?tree=Try&rev=898e87925388. Trying to fix that now but a second set of eyes should still be useful now.
Attachment #8474755 - Attachment is obsolete: true
Attachment #8486223 - Attachment is obsolete: true
Attachment #8486223 - Flags: feedback?(sworkman)
Attachment #8488794 - Flags: feedback?(sworkman)
Looked at the mochitest-1 failure: how about dispatching the 2nd IssueLookup to the main thread? That takes all the hash table lookup out of the context of OnLookupComplete and should allow it to move on a little bit quicker to the next lookup. A bit more async.
Oh, ignore the hash table part of that comment. I was confusing it with ResolveHost.

I think the root cause of the crash was caused by mLock not being locked when you call ConditionallyCreateThread; it's called just after that block in your patch.
I came to the same conclusion. Already doing some testing with the mutex locked and to see how it goes.
Comment on attachment 8488794 [details] [diff] [review]
Additional DnsQuery.patch

Review of attachment 8488794 [details] [diff] [review]:
-----------------------------------------------------------------

Cool. f+.

::: netwerk/dns/GetAddrInfo.cpp
@@ +171,5 @@
> +      aHost,
> +      DNS_TYPE_ANY,
> +      (DNS_QUERY_STANDARD | DNS_QUERY_NO_NETBT | DNS_QUERY_NO_HOSTS_FILE
> +        | DNS_QUERY_NO_MULTICAST | DNS_QUERY_ACCEPT_TRUNCATED_RESPONSE
> +        | DNS_QUERY_DONT_RESET_TTL_VALUES),

Cool - I think this should work well. If we do have a NetBT record in there, then the TTL could be wrong, but I think speeding up the request by bypassing all the multicast/broadcast requests and potential timeouts is worth it. I hope we'll get a usable value for the TTL for the vast majority of cases. We can revisit it if need be.

::: netwerk/dns/nsHostResolver.cpp
@@ +878,3 @@
>      
>      if (IsHighPriority(rec->flags))
>          PR_APPEND_LINK(rec, &mHighQ);

Put the #if TTL_AVAILABLE block in here?

More importantly ... So, this is the right idea: high prio requests should not get the ttl, or rather they should get it async'ly. But, we should also set mGetTtl = false if a callback is added to the host record, and that callback wants a high priority request.

Consider: IssueLookup could have been called earlier with a MED/LOW request, so no change to mGetTtl. Then ResolveHost comes along with a high request; it doesn't call IssueLookup again, it just ups the priority and adds itself to the callbacks. Then, the request is dequeued in ThreadFunc and tries to get a TTL, but it shouldn't because the request is now high priority.

I wonder if the simplest way to catch all of these is to unset mGetTtl in ThreadFunc if it's high priority?
Attachment #8488794 - Flags: feedback?(sworkman) → feedback+
Great catch, I didn't realize the priorities can change. I agree that it would be best to look at the priority in the ThreadFunc because we'd bypass all the raceyness. Trying that right now.
Attached patch Bug 820391.patchSplinter Review
Attachment #8488794 - Attachment is obsolete: true
Attachment #8492304 - Flags: review?(sworkman)
Comment on attachment 8492304 [details] [diff] [review]
Bug 820391.patch

Review of attachment 8492304 [details] [diff] [review]:
-----------------------------------------------------------------

Cool r=me. Couple of comment nits so it shouldn't change the code at all.
Let's land it with the TTL experiment patch from bug 1067679.

::: netwerk/dns/GetAddrInfo.h
@@ +28,5 @@
> + * RFC 3493.
> + *
> + * @param aHost[in] Character string defining the host name of interest
> + * @param aAddressFamily[in] May be AF_INET, AF_INET6, or AF_UNSPEC. Note that
> + *     AF_UNSPEC is not supported if DNS_API is DNS_API_WINDOWS_DNS_QUERY.

nit: comment out of date: 'DNS_API' no longer used.

@@ +40,5 @@
> + *     retrieved if DNS provides the answers..
> + */
> +nsresult
> +GetAddrInfo(const char* aHost, uint16_t aAddressFamily, uint16_t aFlags,
> +	        AddrInfo** aAddrInfo, bool aGetTtl);

nit: indentation (I had to give you one of these on your last day, right?!? ;p )
Attachment #8492304 - Flags: review?(sworkman) → review+
(In reply to Steve Workman [:sworkman] from comment #55)

> Let's land it with the TTL experiment patch from bug 1067679.
> 
Just to clarify, we'll try landing this patch with the TTL experiment in bug 1067679 as one patch associated with that bug only for now. Assuming the patch sticks and we get results on having TTL, we can update this bug as necessary.
Blocks: 1093983
we get ttls on windows now
Status: REOPENED → RESOLVED
Closed: 5 years ago4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.