Closed Bug 390304 Opened 17 years ago Closed 12 years ago

DNS lookups not bypassed when using auto proxy config URL

Categories

(Core :: Networking, defect)

x86
Windows XP
defect
Not set
major

Tracking

()

RESOLVED DUPLICATE of bug 507578

People

(Reporter: mozilla, Assigned: sworkman)

Details

Attachments

(3 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6

Beginning with 2.0.0.5 and continuing with 2.0.0.6, Firefox always attempts DNS lookups on hosts which the proxy config script has returned "PROXY ip:port;". Obviously this should not occur because often systems that need a proxy server have no access to external (Internet) DNS.

The result is that browsing to intranet sites (where the proxy config script returns "DIRECT" and DNS lookup succeeds, if applicable) works but any sites directed to a proxy server when Firefox cannot resolve the requested URL's hostname (NOT the proxy URL's hostname) fail with "Server Not Found".

Reproducible: Always

Steps to Reproduce:
1. Create a proxy config script (i.e. "proxy.pac") with function FindProxyForURL that returns "PROXY ip:port;" (ex. "PROXY 192.168.1.1:3128;") for a given URL whose hostname can be resolved by the proxy server itself, but NOT on the Firefox client.
2. Direct Firefox to this URL under Tools -> Options -> Advanced -> Network -> Settings -> Automatic proxy configuration URL (using hostname in URL, ex. "http://proxy.mysite.com/proxy/") -- Firefox client PC CAN resolve this hostname, of course
3. Browse to a URL whose hostname cannot be resolved on the local Firefox client, but can be resolved by the proxy server itself
Actual Results:  
Page load fails with "Server Not Found".

Expected Results:  
Request is sent to proxy server without attempting DNS lookup on target URL's hostname.

Firefox loads the auto proxy config script fine, as evidenced by web server logs and the fact that it can load sites returned as "DIRECT" by the script, given they are resolvable and reachable from the Firefox client PC.
same issue as mentioned in bug 213290 and bug 203793 ?

Can you please provide a log-file as mentioned in http://www.mozilla.org/projects/netlib/http/http-debugging.html ?

And can you also post your proxy.pac file, it might contain a DNS-lookup on its own (using isResolvable, isInNet, dnsResolve). Hide your private information if necessary.
Attached file HTTP debug log
Attached file Auto proxy config file
Sanitized hosts/domains and removed repetitive lines (&&, || for additional hosts), but did not remove any unique function calls.
Attachment #274626 - Attachment mime type: application/octet-stream → text/plain
Sorry, I forgot to mention that I disabled IPv6 (network.dns.disableIPv6) and the problem still occurs. (FWIW, the same proxy.pac file worked fine in Firefox before 2.0.0.5 and I didn't have external DNS available then either.)
I could be wrong, but I think there's an error on the following line of your PAC file :

  return "PROXY 192.168.3.1:3128;";

The extra semicolon inside the string shouldn't be there. But it's a bit strange that this causes such a result. But I have seen other weird results with PAC files before.

But I don't see anything wrong in ProcessPACString / ExtractProxyInfo (in nsProtocolProxyService.cpp), I don't think it's a problem.
I removed the semicolon, reloaded the PAC file via Preferences, and the problem still occurs. Also completely restarted Firefox and still have the issue.

If there is anything else you need, let me know. Thanks for looking into this.
I've noticed this also.  This is happening even if the proxy.pac consists of the following 3 lines :

 function FindProxyForURL(url, host)
 {
     MYPROXY  = "PROXY 10.10.10.102:9191";
     return MYPROXY;
 }

However, I've noticed it only happens on the first or 'start/home' page.  Subsequent pages do not perform DNS Lookups.   You can confirm this by doing ipconfig /flushdns and then launching mozilla (set to point to the above proxy and then ipconfig /displaydns will show the home pages dns entry).. subsequent pages have the DNS occuring at the proxy where it is supposed to be.
I have "auto-detect proxy settings" enabled which is essentially the same thing and can confirm that Firefox is making DNS requests even though HTTP is directed through the proxy (configured from wpad.dat, proxy.pad).

If I use fixed proxy settings (ip:port) in firefox and disable wpad.dat/proxy.pac completely firefox stops makeing DNS requests.

Tested on V 3.0.10 Windohz Vista.
FF 3.5 on Linux also has this problem. When using the option "Use system proxy settings" FF spews out many DNS queries; when setting a manual proxy the DNS is not queried any more...

Platform should be changed to "All".
Timo: if you're talking about the DNS prefetch request (not the regular ones) in an environment where a PAC file or WPAD is used, then it's bug 507578. Prefetch request are made independently from the PAC file (they're not HTTP request at all), but are indeed not sdend when using a manual proxy server. The same thing should be done for a PAC or WPAD too. Or maybe it should be an option, since it's still useful in some environments.
Is there a simple way to tell if the DNS queries I see are prefetch queries or otherwise? The DNS traffic occurred very frequently (every 30 seconds or so) with a stationary FF (did not click on anyting). From your description I guess they must be prefetch...
But to confirm this bug for Linux, what should I look for?
You can disable DNS prefetching by setting network.dns.disablePrefetch to false in about:config (you'll probabaly have to create the preference first).

You can see them in a Wireshark network trace : they're the DNS requests that are done after the page is loaded. But they're no occurring continuously as you describe, it's done only once. If you see continuously DNS requests, then it's might be coming from something else.
The page that generates the traffic is a page of a commercial product that has links to the website of the manufacturer, but my PC has no direct Internet connection. This page refreshes regularly (every 60 seconds)so that is the reason I see it so often.
However, I think that prefetching is still a bit shaky since it appears that for each element that is referenced outside the page a DNS query is done, even if it uses the same host. If the page reloads, all elements are queried again. Looks to me that prefetched URLs are not cached...
There is a DNS cache in Firefox (aside from what is potentially installed in your operating system, like nscd), but it's by default 3 minutes in Firefox 3.5 (1 minute before that).

To change it, go to about:config, create the preference network.dnsCacheExpiration, default is 180 seconds. There's also a total number of entries, network.dnsCacheEntries, default 400.
I guess I have to look into that further; it might be a different bug altogether (but linked to this one, since it disappears when I manually configure a proxy). My (standard) Firefox queries the DNS _each_ minute and tries to resolve the same hostname several times in one birst...
The query frequency corresponds to the refresh interval of the page and the number of times a single host is queried looks to match the number of links to it on the page (not sure though).
Would this happen if the cache should somehow be full? And if so, how do I see how many entries there currently are in the cache?
Component: General → Networking
Product: Firefox → Core
QA Contact: general → networking
I've confirmed that Jo Hermans is dead-on.  The issue is DNS Prefetching and it still exists in 6.0.

This can have a significant impact in large corporate environments as prefetching DNS on internet URLs will raise utilization and errors on Internal-Only DNS servers.

I would recommend that at a minimum, a simple check be added:

if ((proxy is configured) && (dns name for resolve needs proxy))
{ 
  disable dnsPrefetch for that dns name
} else {
  respect user settings for dnsPrefetch
}
That depends on the network. In many cases (especially if you have a PAC file that uses DNS to decide which path your http-requests will take), it would be an advantage to use prefetching. If it's a completely closed environment, or if the PAC file does not use DNS, then you don't need it.
Jo,

So do you have a (even hand-wave-y) algorithm to suggest, along the lines of comment 16?
Maybe a general toggle button for DNS prefixes in general. Unfortunately, most people wouldn't know what would be the best solution for their network.

Maybe we can allow DNS prefetches if the PAC file use a DNS-resolving function ?
In my experience of managing many complex PAC files, it's in most peoples best interest to not use any functions in the PAC that perform DNS resolutions unless absolutely necessary.  Those truly concerned about performance will probably find more benefit in not using those functions than they would gain with DNS prefetch with a proxy in play.

One resolving function call could trigger hundreds of DNS lookups for each page load depending on the size of your suffix list and it only multiples if you have more than one call.

Interestingly, the argument against it is probably the same argument that could be used for DNS pre-fetching with PAC files as any delays in resolution can prevent the browser from displaying a page and pre-fetch might at least negate the use of the function for those that want to use it.

I would adjust my proposal simply to make it an option, with the default erring on the side of not spamming DNS servers with unnecessary requests and let those that want to see if it will improve their performance for resolutions within a PAC enable it.

Something like:

network.dns.prefetchWithProxy bool false (default)

I'm focusing on PAC files, but the same should apply for a direct connect to a proxy.
... in reading Timo Ruiter's comments from 2009, I suspect he may be using a resolve function in his PAC, which is why it goes away if he goes direct the proxy and why it matches the page reload interval.  

That I don't see as a bug - that's self inflicted.  He'll see one for every link on the page, possibly more depending on his search suffix.
Assignee: nobody → sjhworkman
You can dismiss my comments regarding DNS behaviour for this bug. The errant behaviour I reported is filed under bug 511839.
Attached patch Proposed Diff (obsolete) — Splinter Review
Diff implements config variable as suggested.  Default state is to bypass DNS lookup for proxied hosts.
sr? requested because IDL for msIProtocolProxyService has a new API added: IsURIProxied().

Please take a look and send me your comments.
Attachment #566098 - Flags: superreview?(bzbarsky)
Attachment #566098 - Flags: review?(mcmanus)
Isn't the fix in bug 507578 comment 3 simpler (except that it's not configurable) ?
Steve, can you state the problem that is solved with your patch?

Comment 22 says that the original report, one of interop, is not valid.

It seems you are trying to solve the problem of generating some prefetches that are not used because a proxy is in play. I haven't seen anything to suggest those are actually a problem (I have seen a number of people notice and report them - but that's not the same thing), and the patch takes a pretty heavy weight approach to figuring that out on each prefetch attempt with the isallowed() logic.

do you think this is important to do?

minimally, the isallowed() check is a property of the document and can be cached in there somewhow which would save a lot of the computation.
Status: UNCONFIRMED → NEW
Ever confirmed: true
"I haven't seen anything to suggest those are actually a problem..."

What specifically do you want to see to show it's a problem?  It is definitely a problem in large corporate networks (if they know it yet or not).  We are seeing a very significant number of failed look-ups across the enterprise mostly pointing back to Firefox users.  If there is a high probability that a resolution will fail, in my opinion, Firefox shouldn't do the prefetch.  However, I can understand an option to re-enable it for folks that may benefit from that configuration (or where firefox may make the wrong assumption).

The main difference between the two solutions I see is that one is on a URL by URL basis, allowing for the prefetch benefit for URLs that can use it and not for others; the other is all or nothing if a proxy is configured.  Either could use a parameter to re-enable the current behavior for folks that may want it.

My only question due to my lack of understanding of the full code base is if Steve's solution could ever cause multiple iterations of processing the PAC file.  For instance, if it has to process it once to determine if it should prefetch, and again to determine how to retrieve it, that would be bad since some larger PAC files can take a few ms to process for each URL.
(In reply to Michael Eckhoff from comment #26)
> "I haven't seen anything to suggest those are actually a problem..."
> 
> What specifically do you want to see to show it's a problem? 

a report of an overloaded dns server maybe. Or a bandwidth measurement against a network choke point showing a significant share. Failed lookups themselves aren't necessarily interesting unless they are causing a problem - that's the nature of any speculative optimization. 

I had a meeting with the BIND folks 1.5 years after prefetching went in (and chrome started doing the same thing) and they had received 0 reports of any kind of load problems or upgrades tied to this. DNS is extremely cheap in terms of both bandwidth and processing - that's why its a good candidate for speculative prefetch. I am not saying this is *not* a problem, I'm asking what evidence there is that it *is* a problem. Existence isn't a problem as far as I am concerned.

> My only question due to my lack of understanding of the full code base is if
> Steve's solution could ever cause multiple iterations of processing the PAC
> file.  For instance, if it has to process it once to determine if it should
> prefetch, and again to determine how to retrieve it, that would be bad since
> some larger PAC files can take a few ms to process for each URL.

It could cause hundreds of evals of the PAC file for each page - one for each hostname referenced in the page (not just loaded or clicked on) being prefetched. I think the filter is fine, maybe not terribly important, but fine as long as it doesn't create a big processing burden when using pac files and has 0 impact when not using them.
(I was on PTO yesterday, Oct 11th, so just getting to answer this now).

For my patch, I was focussing on solving the problem of DNS prefetching and PAC files, which seems to form the majority of the discussion on this bug.  Comment #22 says it's not an issue, but 507578 seems to describe the same thing … my approach was to provide a single fix that could solve both issues, posting the fix to the older bug.  

After producing the fix (and learning the DNS code as I go! :) ), I agree that it's over-complicated and expensive for the issue at hand.

So, I want to propose a configurable version of Jo's suggested fix in 507578.  Using the same pref variable Michael suggested network.dns.prefetchWithProxy, DNS prefetching can be disabled for all hosts if a PAC/WPAD file is being used.  I believe that this should help satisfy the problem as seen by Michael.

According to Michael, (comment #26: "We are seeing a very significant number of failed look-ups across the enterprise mostly pointing back to Firefox users.") and the description of bug 507578, DNS prefetching with PAC/WPAD should be disabled by default.  How does this sound to folks?

With respect to 508578, that bug is newer, so that should be marked as a duplicate of this one.  Unless someone has a reason for it not to be...
Comment on attachment 566098 [details] [diff] [review]
Proposed Diff

I don't think we want to call into PAC for every single link on the page, for sure.
Attachment #566098 - Flags: superreview?(bzbarsky) → superreview-
Something like this should be suitable.  Note, I've currently restricted it to only skip if WPAD or PAC are set.  However, I'm thinking that skipping for all proxy cases may be an option ... thoughts?
Attachment #566098 - Attachment is obsolete: true
Attachment #566717 - Flags: review?(mcmanus)
Attachment #566098 - Flags: review?(mcmanus)
I would agree with skipping in any proxy case (Direct to a proxy, SOCKS, PAC, whatever the various options are).  In addition to just being consistent, to not do that would cause confusion with 'network.dns.disablePrefetchWithProxy'.

I appreciate your work on this!
(In reply to Steve Workman [:sworkman] from comment #30)
> ... thoughts?

comment 25 is unaddressed. 

I'd argue wontfix, or minimally the disableprefetchwithpoxy approach defaulting to false. Though I don't really see the difference between that and the existing disableprefetch pref if it defaults to false.
I've heard my own admins complaining about large numbers of DNS-requests in general (even though they didn't specifically referred to DNS-prefetches, but they proper don't know about it), and I've also heard it from customers. I'm in the router business, and I can assure you that most ISPs are trying to push DNS servers close to the users (in my case: in the edge routers), since DNS-traffic is increasing every year. And since they also want to lower the latency of course.

The actual impact on DNS-servers might be low, but I think that the point is that these DNS prefetches are not necessary when you use a proxyserver (in most cases, some PAC files might use DNS). Any reduction of unnecessary DNS-prefetches is worthwhile.
Based on some offline discussions and looking at Bug #507578, it looks like there is not a perfect solution here.

-- A complex solution is very expensive (or at least requires a lot more work to make it less expensive).  There are currently higher priority DNS issues that must be taken care of, so a complex solution will take time.

-- The simple solution to disable prefetching with PAC/WPAD by default leaves out the cases where prefetching would make a difference to the end users.  As I understand it, PAC/WPAD files are there to control what you want to be proxied.  Jo mentions in 507578 that there are different types of PAC/WPAD, with varying amounts of hosts being proxied/not proxied.

If we were to enable prefetching by default for PAC/WPAD, but have the option there for it to be disabled, would that work as a temporary solution for you?  I'm not sure how much control you have over your users' desktops, but it's worth asking if this is enough for now.  If time and resources allow, we can revisit this with a more granular solution, deciding whether to prefetch based on a single host or a group of them.
I'm fine with the solution detailed in #507578, however, it really needs to be the default behavior with the option for the handful of proxy users that may benefit from having it enabled being able to turn it on.

I support hundreds of thousands of users behind proxies and am very familiar with the setups of other large enterprises that use proxies and it is quite rare to rely on internal DNS servers to do resolution for web services BEFORE sending it to the proxy - that's why so many folks are reporting this as an issue.  No other browser I've analyzed does this.

When you really get down to it, the prefetch value retrieved, even if it is successful, is never used anyway since the target of the TCP socket is going to be the proxy IP and not the resolved internet IP that it prefetched.  The request to the proxy itself is a FQDN or you'd break most virtual hosting.  

Therefore, it's a complete waste of resources to even be asking for it.  The few proxy users that could benefit from this would be people that use transparent proxies, and they're not going to have a proxy configured anyway (WPOD, PAC, Manual, or otherwise) since it's transparent.

If you'd like to get deeper into this at a packet level with traces and log data, let me know out of band and we can work something out.
Comment on attachment 566717 [details] [diff] [review]
Proposed Diff v2.0

before reviewing the code in detail I would want to see a more rigorous study of the impact of the prefetching in this scenario both on nets and nameservers as well as on the use cases in the browser that would lose the feature.
Attachment #566717 - Flags: review?(mcmanus)
bug 507578 moves this forward.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: