Closed Bug 235853 Opened 20 years ago Closed 12 years ago

[PAC] Defer proxy resolution for HTTP and HTTPS PAC to avoid blocking main thread during DNS resolution

Categories

(Core :: Networking, defect, P1)

defect

Tracking

()

RESOLVED DUPLICATE of bug 769764
mozilla13

People

(Reporter: stewart, Assigned: sworkman)

References

(Blocks 1 open bug, )

Details

(Keywords: hang)

Attachments

(3 files, 20 obsolete files)

17.57 KB, application/zip
Details
6.72 KB, text/plain
Details
22.34 KB, patch
Details | Diff | Splinter Review
User-Agent:       
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7b) Gecko/20040225

This bug has been split from #224447 as the symptoms are different. I will copy
relevant comments from #224447 into this bug report.

Reproducible: Always
Steps to Reproduce:
1. Go to Preferences, Advanced, Proxies
2. Select "Automatic Proxy Configuration URL"
3. Enter a URL to a valid PAC file
4. Click OK
5. Browse a website (particularly one with many offsite adds/images)


Actual Results:  
DNS resolution causes the UI to hang

Expected Results:  
DNS resolution should not cause the UI to hang (it doesn't when using either
"Direct connection to the Internet" or "Manual Proxy Configuration")
Copied from bug #224447

 ------- Additional Comment #6 From mike  2004-01-20 17:17 PST  [reply] -------

Also experiencing this.  Firebird 0.8+ MacOS X
Resolving Host takes about 5 seconds for any page with a new DNS.  For example,
if i go to www.google.com, it will hang for a few secs at resolving host before
loading, but then anything i do within google will be snappy.  But, say i click
a link from google to another site, it will hang for another few seconds until i
can get into that new site's DNS.


------- Additional Comment #7 From emmet@cogs.sussex.ac.uk 2004-02-17 05:38 PST
[reply] -------

This ought to be renamed to "Complete hang when DNS resolving/lookup"

I see this too, but not always (local site dns cacheing?) -- and experienced
this also when using Mozilla. 

Note: IE does not hang -- that is, its menus etc still function, it can redraw
etc while the lookup occurs.


(Firefox 0.8, Win2K, via a proxy squid/2.5.STABLE3)

------- Additional Comment #9 From Stewart Jeacocke 2004-02-25 06:30 PST [reply]
-------

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7b) Gecko/20040224

As of 23/02/04 I'm still seeing this bug on the SeaMonkey CVS Tree. The UI 
(every windows and tab) is blocked while name resolution is performed. This is a 
real PITA particulary for sites which have many adds etc. For example it can 
take over 30s for http://www.ebay.co.uk to load during which time the UI is 
completely hung.

This bug seems to be due to two things
i) The UI is blocked during host resolution
AND
ii) Host resolution in mozilla is sometimes very slow, even when the dns names 
have already been cached at a local DNS server. For example I used the "host" 
command to resolve all the names required for www.ebay.co.uk and none of them 
took longer than 0.05s to resolve


------- Additional Comment #10 From Stewart Jeacocke 2004-02-25 06:42 PST
[reply] -------

This bug is not specific to Firefox. Would it be a good idea to change "Product" 
to something other than FireFox?

I've stuck some cout statements at the beginning and end of nsDNSService::
Resolve() [nsDNSService2.cpp] and can confirm that we are inside this method 
when the UI hangs.

I'm willing to help track down the cause of this bug (as its making me have to 
use Opera ;-) ) but I have no experience with the internals of mozilla so would 
need guidance as to where to start looking.


------- Additional Comment #11 From Stewart Jeacocke 2004-02-25 12:56 PST
[reply] -------

It seems that this bug is triggered by something in my preferences. I moved my
.mozilla directory out of the way and let mozilla create a new one. This caused
the bug to disappear (at least on current CVS and mozilla 1.6).

When using my old .mozilla dir (the config the elicits this bug) the method
nsDNSService::Resolve() is being called lots, however when I let mozilla create
.mozilla from scratch nsDNSService::Resolve() is never called and is replaced by
calls to nsDNSService::AsyncResolve()


------- Additional Comment #12 From Stewart Jeacocke 2004-02-25 14:30 PST
[reply] -------

The offending preference is enabling Proxy Auto Config (PAC).

When I enable PAC (set to use a valid remote PAC file) the
nsDNSService::Resolve() method is used. When I either manually set the proxy
config or select "direct connection" the nsDNSService::AsyncResolve() method is
called.

SUMMARY
Using Proxy Auto Config stops DNS resolution from happening asynchronously. This
causes the UI to hang while name resolution occurs. 


------- Additional Comment #13 From Jason Barnabe 2004-02-25 14:47 PST [reply]
-------

Moving to Browser, good work Stewart.


------- Additional Comment #14 From Jason Barnabe 2004-02-25 14:49 PST [reply]
-------

*** Bug 213751 has been marked as a duplicate of this bug. ***


------- Additional Comment #15 From Stewart Jeacocke 2004-02-27 03:08 PST
[reply] -------

Not sure if this helps but this is a example bracktrace that I get if I set a 
breakpoint on the nsDNSService::Resolve() method (with proxy auto config 
enabled).

#0  nsDNSService::Resolve(nsACString const&, int, nsIDNSRecord**) 
(this=0x8110a50, hostname=@0x889d658, bypassCache=0, resul$
#1  0x4012a025 in XPTC_InvokeByIndex () from ./libxpcom.so
#2  0x4089e3b7 in XPCWrappedNative::CallMethod(XPCCallContext&, 
XPCWrappedNative::CallMode) (ccx=@0xbfffe650, mode=CALL_METH$
#3  0x408a8913 in XPC_WN_CallMethod(JSContext*, JSObject*, unsigned, long*, 
long*) (cx=0x818cce0, obj=0x87fd680, argc=2, arg$
#4  0x401b2770 in js_Invoke (cx=0x818cce0, argc=2, flags=0) at jsinterp.c:941
#5  0x401c0660 in js_Interpret (cx=0x818cce0, result=0xbfffedcc) at 
jsinterp.c:2962
#6  0x401b27ea in js_Invoke (cx=0x818cce0, argc=3, flags=2) at jsinterp.c:958
#7  0x40896e3b in nsXPCWrappedJSClass::CallMethod(nsXPCWrappedJS*, unsigned 
short, nsXPTMethodInfo const*, nsXPTCMiniVariant$
#8  0x4088fd2f in nsXPCWrappedJS::CallMethod(unsigned short, nsXPTMethodInfo 
const*, nsXPTCMiniVariant*) (this=0x88ef1a0, me$
#9  0x4012a396 in PrepareAndDispatch (methodIndex=4, self=0x88ef1a0, 
args=0xbffff214) at xptcstubs_gcc_x86_unix.cpp:100
#10 0x40ad6c23 in nsStreamListenerTee::OnStopRequest(nsIRequest*, 
nsISupports*, unsigned) (this=0x81e8ba0, request=0x8854778$
#11 0x40b780a6 in nsHttpChannel::OnStopRequest(nsIRequest*, nsISupports*, 
unsigned) (this=0x8854778, request=0x88c1898, ctxt$
#12 0x40aae8ca in nsInputStreamPump::OnStateStop() (this=0x88c1898) at 
nsInputStreamPump.cpp:498
#13 0x40aae2e8 in nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream*) 
(this=0x88c1898, stream=0x889d4cc) at nsInputS$
#14 0x400d87c7 in nsInputStreamReadyEvent::EventHandler(PLEvent*) 
(plevent=0x8891454) at nsStreamUtils.cpp:118
#15 0x400fc41b in PL_HandleEvent (self=0x8891454) at plevent.c:671
#16 0x400fc2d0 in PL_ProcessPendingEvents (self=0x80bd5a8) at plevent.c:606
#17 0x400ff4ec in nsEventQueueImpl::ProcessPendingEvents() (this=0x80bd570) at 
nsEventQueue.cpp:391
#18 0x416ec720 in event_processor_callback (data=0x80bd570, source=9, 
condition=GDK_INPUT_READ) at nsAppShell.cpp:186
#19 0x416ec0cd in our_gdk_io_invoke (source=0x81ec768, condition=G_IO_IN, 
data=0x81ee1f0) at nsAppShell.cpp:71
#20 0x4042fa56 in g_io_add_watch () from /usr/lib/libglib-1.2.so.0
#21 0x4043103d in g_get_current_time () from /usr/lib/libglib-1.2.so.0
#22 0x404314f4 in g_get_current_time () from /usr/lib/libglib-1.2.so.0
#23 0x40431724 in g_main_run () from /usr/lib/libglib-1.2.so.0
#24 0x40358c3f in gtk_main () from /usr/lib/libgtk-1.2.so.0
#25 0x416ecb4e in nsAppShell::Run() (this=0x811dca0) at nsAppShell.cpp:317
#26 0x4169c4cc in nsAppShellService::Run() (this=0x811d408) at 
nsAppShellService.cpp:483
#27 0x080570e9 in main1 (argc=1, argv=0xbffff744, nativeApp=0x809e540) at 
nsAppRunner.cpp:1291
#28 0x08057d16 in main (argc=1, argv=0xbffff744) at nsAppRunner.cpp:1678

Depends on: 79893
Keywords: hang
Confirming on Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6)
Gecko/20040206 Firefox/0.8
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → All
Hardware: PC → All
This is a dup of a bug that was WONTFIX'd about a year and a half ago.  The
problem is that DNS resolution run in a PAC file is done in the UI thread,
whereas all normal DNS resolution is done in the network thread itself.  I'll
look up the original bug; I seem to remember it being an issue of ugly
architectural issues.
It was bug 97605 (the actual resolution was invalid).  The "workaround" is not
to use PAC scripts that use dnsResolve().
The PAC script in question doesn't use dnsResolve(). But it does use
isResolvable() which I guess may be the culprit. Here is the PAC function

function FindProxyForURL(url, host)
{
  //So the error message "no such host" will appear through the
  //normal Netscape box - less support queries :)
  if (!isResolvable(host))
    return "DIRECT";
 
  if (isPlainHostName(host) ||
      dnsDomainIs(host, ".lan") ||
      dnsDomainIs(host, ".jeacocke.org.uk"))
    return "DIRECT";
  else
    return "PROXY hadron:3128; DIRECT";
}
fixed blocks, added depends.

isResolvable() calls dnsResolve()

http://lxr.mozilla.org/mozilla/source/netwerk/base/src/nsProxyAutoConfig.js#267

267 "function isResolvable(host) {\n" +
268 "    var ip = dnsResolve(host);\n" +

Chase, is this a won't fix?
Blocks: 79893
Depends on: 97605
No longer depends on: 79893
Summary: Using Proxy Auto Config stops DNS resolution from happening asynchronously - This causes the UI to hang while name resolution occurs → PAC: isResolvable() causes the UI to hang during resolution
I'll leave that for someone from Networking to decide, but I think it's a likely
outcome.  I'd really like to see it fixed, but last time it was discussed, it
seemed like an arch problem in the worst way.
The reason I'm asking is that I'm about to roll up some network admin docs for
PAC, so if you mark this wontfix, I'll use that as the final state of the bug.
Are there any workarounds for this? (For example, changing a setting so that the
resolution will give up sooner)
This bug is killing me!!! Simply type an URL that won't resolve and all tabs are
locked permentatly. I need to kill process and start all over again. Easily
repeatable on XP running Firefox 1.0

There are some screwing things going on with DNS here, can we please get this
fixed before I'm forced to abandon this browser
Target Milestone: --- → mozilla1.8beta
Jason: I was reading DNS&BIND the other day for work, and I think some very
modern BIND resolvers allow system-level timeout settings.

The other question is: why are you using this in the PAC file. When I supported
PAC for Netscape, I found only one customer that did this for a good reason, and
that was predicated on a completely bad DNS subdomain strategy that they decided
to re-do, based on this bug.
Confirm this problem occurs in FF 1.0 release under both XP and Gentoo ith FF 1.0.

The same effect is not noticed in IE, although obviously with it not using tabs
that could just be me not noticing it due to the expectation of a **** UI.

In my case I am using the automagically detect my proxy setting, rather than the
PAC file. The result is the same in that the UI appears to die toally :(


Status: NEW → ASSIGNED
Depends on: 282442
Priority: -- → P1
benc in comment 11:

> The other question is: why are you using this in the PAC file.

VPN.  Specifically, using isInNet(host,subnet,mask) to proxy only certain
address blocks, the inverse of "No Proxy for" that can't easily be accomplished
any other way.

To see the full effect of this bug, by the way, try browsing around
http://www.myspace.com/ for a while with a few tabs open.  Ouch.

Firefox 1.0.4 on Linux, ditto for recent trunk builds.
Target Milestone: mozilla1.8beta1 → mozilla1.9alpha
Can those of you who experienced this bug or are still experiencing this bug, please comment about the behavior with FF 1.5?  I did some work on FF 1.5 that should have improved things for the case where the DNS query is made for the hostname passed to FindProxyForURL.  Thanks!
Haven't noticed any change. Still hangs for at least a minute.
Same here as comment 15.
Jason and Peter, can either of you share your PAC file with me?  If you could post it to this bug report that would be ideal.  If there is anything sensitive in there, feel free to replace it with random characters or whatever... I just want to get a sense for the kind of function calls made by your PAC scripts.  Thanks!
Pretty straightforward as far as I can tell:

function FindProxyForURL(url, host)
{
        if(
                isInNet(host,"192.168.15.0","255.255.255.0") ||
                isInNet(host,"10.120.0.0","255.255.0.0") ||
                isInNet(host,"10.180.0.0","255.255.0.0")
        )
        {
                return "PROXY proxy:8080";
        }
        else
        {
                return "DIRECT";
        }
}

I have another one that basically reverses the DIRECT and PROXY return values.
I no longer have the PAC file I was having this trouble with; I worked around it by just proxying based on the domain name rather than trying to be clever about only proxying unresolvable hosts.
Here is my pac file.  Right now I have some debug code in there to show that there is a several second delay when resolving a new DNS name but the delay is there regardless if I have dnsResolve there or commented out.  So it almost seams like it is doing a DNS lookup in the UI code.  Also adding my debug code does not slow it down any.  So if I have to guess I would say that there is a DNS lookup sometime after the function FindProxyForURL is called.

PS I am using FF 1.5.0.1 on Win XP.

function FindProxyForURL(url,host)
{


//#######
//#Debug#
//#######

var time = new Date();
sec1=time.getSeconds();
milli1=time.getMilliseconds();

remoteIP=dnsResolve(host);

var time2 = new Date();
sec2=time2.getSeconds();
milli2=time2.getMilliseconds();

alert(host+" Resolved to "+remoteIP+" in "+(sec2-sec1)+":"+(milli2-milli1));


//###############
//#Local Network#
//###############

if (isInNet(host, "10.0.0.0", "255.0.0.0")){
            return "DIRECT";
}

//#########
//#Library#
//#########

if (	host == "auth.somesite.com" ||
	host == "search.someothersite.com" ||
	host=="yetanothersite.com" ){
	return "DIRECT";
}


//#########
//#Default#
//#########

return "PROXY proxy.ourdomain.com:8000";

}

Have you changed the "network.dnsCacheExpiration" preference?
Ah, nevermind.  I know what the problem here is.  We need to teach the IOService + HTTP protocol handler how to use nsIProtocolProxyService::AsyncResolve.  If we do that, then the code in nsPACMan.cpp that pre-resolves the URL's hostname will help eliminate this bug for cases where the given hostname is resolved by the PAC script.
We already have code in the HTTP channel that uses AsyncResolve during startup before the PAC script is loaded, and we might be able to extend that to work here.  Not sure yet.
Summary: PAC: isResolvable() causes the UI to hang during resolution → [PAC] Teach IO service and HTTP to use PPS::AsyncResolve to avoid UI hangs when PAC script resolves the URL's hostname
Priority: P1 → P2
any change of being fixed still in versions 1.x?

we want to deploy firefox for all company, but we use a .pac to bypass the proxy when it resolves to a local IP, that is hard to workaround by domain names because a we have a division that develop sites, so they always have new domains... keeping add this new domains for firefox to work well will be a hell...

So this bug seens like a blocking bug for us...

my .pac is basically close to comment #18
Assignee: darin → nobody
Status: ASSIGNED → NEW
QA Contact: benc → networking
Target Milestone: mozilla1.9alpha → ---
Is not fixed even in 2.0... :(
It's clearly not fixed -- the bug is not "RESOLVED FIXED".

Someone might need to step up and do what Darin suggests...
Flags: blocking1.9?
Keywords: helpwanted
The UI is also freezing when FF is first connected to network that uses PAC proxy, but then you change the network to one without PAC proxy. Then you have to wait very long time before FF finds there is no proxy.
The cause of this is probably same.
(In reply to comment #26)
> It's clearly not fixed -- the bug is not "RESOLVED FIXED".
> 
> Someone might need to step up and do what Darin suggests...
> 

I can confirm that this hasn't been fixed in 2.0.0.1 and has been bugging me for the last 2 years...

I'd love to work on this bug but I've never done any development for mozilla or the open source, but yes I know how to program. So I'm willing to make a stab but can't make any promises on success or follow through.
I think I have been a victim of this bug for several months, using Firefox
on linux (currently Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.8.1) Gecko/20061010 Firefox/2.0 running on Fedora Core 6).

When I work on this machine at home, using cable service provider, everything works fine most of the time.
But occasionally I use the the 'automatic proxy configuration option' to go via the proxy server at my university in order to gain access to research stuff on the web. For some web sites everything works, but a little more slowly.

Sometimes however, the delay is excessive, and then everything concerned with firefox freezes. I cannot change tabs, abort the lookup, or open a new window. Then either I have to wait until the web page is retrieved, or the lookup fails. That can take so long that I sometimes end up having to kill firefox, which can be a real pain, as it sometimes has has several windows and tabs open recording things I was doing.

If, as someone else said, Opera does not have this problem, then presumably it should be fixable in Firefox. I presume it means that all lookups have to be
done in such a way that other interactions are never blocked.

Sorry I can't help with debugging etc. I try to help with promoting FF,
but will have to be cautious about doing that while this bug persists
as it is disastrous.

Aaron
http://www.cs.bham.ac.uk/~axs
<p>It sound like Darin <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=235853#c22">comment #22</a> has a through understanding of the problem, the solution and the code. While some of us might take a try at fixing it, I have the impression that Darin (and probably others) know how to implement the solution quickly and properly. </p>
<p>I voted for it as a bug, you should <a href="https://bugzilla.mozilla.org/votes.cgi?action=show_user&bug_id=235853#vote_235853">vote</a>  if you want it fixed.</p>
<p>Bugs like this definitely keep Firefox from being accepted much less being adopted in the corporate world...</p>
Steve, Darin is not really working on this stuff now.  And please, no HTML in comments.  It makes them _very_ hard to read with all those '<' and '>' all over. ;)
My apologies to everyone about the HTML in the posting.  I saw href links in previous postings and attempted to provide the same for convenience.
I couldn't find the link for the "preview" page :-)
Not a blocker, but we'll take a patch for 1.9 if it appears.
Flags: blocking1.9? → blocking1.9-
I hate this bug with all the strength of my soul. Heck, I'm even considering quitting my job, buying a C++ book, learning it, finding where the bug is, and patching it. That, or switching to opera.

Please, VOTE for the bug to be fixed.
As a workaround, what I did was download the PAC script I use, commented out lines containing isInNet and sites I never visit, saved it locally, and configured Firefox to read from my local file.
I use FF 2.0.0.1, and my UI is hanging a lot, with a similar (but not the same) cause and symptoms.  Please let me know whether you think I should open a new bug.

The cause is that the company I work for now requires all Internet access to be through a proxy, and so I have entered the proxy's details in the "Manual proxy configuration" section of the Connection Settings (as opposed to "Automatic Proxy Configuration URL" which this bug has been about so far).

The symptoms are that most Web page loads cause the UI to hang, even if I have just loaded a page from the same website (as opposed to only pages "with a new DNS", which is what Comment #1 says).

It may be that this is exactly the same underlying issue as reported above but that the proxy I have to use is poorer quality than most.  But I don't think the UI should hang even if the proxy is causing trouble.

Typically I will click on a link or type in an address, and then have to wait from 5 seconds to 5 minutes before I get my UI back.  This hang time may include very brief lapses - just enough for most of the page to load, for example.

Sometimes a single website will start to work properly for a while, with pages loading quickly and no hanging at all, but it could at any moment regress and start hanging again.


The only entries in my "No Proxy for" section are 'localhost', '127.0.0.1', and one other specific local address, and yet most of my company's websites (both local and public), such as www.atosorigin.com do not have this hanging problem.  So presumably the proxy is configured to treat these sites differently (or maybe they're just faster?), and this is being reflected in FF's behaviour.


For the record, I found bug #306922, which seems similar, and has a comment that it may be a duplicate of this one.

I also looked at bug #224447 (from which this one split) and its chain of duplicates bug #188332 and bug #240759, but I don't know how much use those are.
Peter: i think your problems maybe related to some extension, as it seens a DNS problem...

you probably dont have access to a internet capable DNS server, only your internal DNS and maybe some extensions is trying to resolve the hostnames all the time...
try to use firefox in safe mode
also, try to ping a server and see how much time it takes to resolve (or if resolves)

other ideia might be the 127.0.0.1... it might force firefox to resolve the hostname to check if it is 127.0.0.1... try to put only hostnames, no IPs
It is a standard practice in our company to use Automatic Proxy Configuration especially because of the VPN and many company locations.
And it's a pity for such a good browser to hang all browser instances and tabs if by mistake a wrong URL was typed in the address bar.
I have no such problem with Opera for example.
Thanks Daniel for your suggestions.  The behaviour has now changed; hopefully this will let us work out exactly what is causing the problem.  I will investigate futher and report back here.
Actually it seems the change of behaviour I reported in Comment #39 was a co-incidence.  I have removed all entries from the "No Proxy for" section of my connection settings, and I now normally use Safe Mode (even though the only add-ons I have are the DOM Inspector, Talkback, the British English Dictionary and the default Firefox Theme).  Also I am now at 2.0.0.2.

None of these factors seems to make any difference to the problem, which is actually intermittent.  Occasionally there are periods where everything works fine (and hence my Comment #39), but 95% of the time I get the hanging.

So I presume the problem is related to the proxy.  But I don't think the UI should hang - even if the proxy is causing trouble.


If anyone isn't clear, I have screenshots to show what actually happens.  I can also supply some full ping results (as suggested in Comment #38), but in summary I think they show that DNS is working ("Pinging recluse.mozilla.org [63.245.208.164]") and that my company is blocking ping ("Destination net unreachable." or "Request timed out.")


Do you think this is the same underlying issue as reported above?  (and described in Comment #3 as "ugly architectural issues")  Or shall I open a new bug?  I would be glad to investigate this further in conjunction with someone who knows what they're doing; I could run debuggers, system performance monitors, or anything that would help.  
Attached image Improper connection settings UI (obsolete) —
There is one other thing related to this bug. Sometimes when this proxy-hang happens firefox crashes, and the Network Connection Settings UI becomes unusable - see attached screenshot. Each time this happens I simply re-install firefox and it comes back to normal. It'd be nice if when network connection info is compromised UI didn't hose itself.
I come from bug: https://bugzilla.mozilla.org/show_bug.cgi?id=309582 where the same problem is discussed. We adjure that this bug will be solved in the NEAR future. This problem is annoying me for a VERY long time now!

Bug is still active on:
Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4
I really found a workaround for this problem:

Use your wpad.pac file like this:

function FindProxyForURL(url, host)
{
   if (isResolvable(host)
   {
       <... your stuff ...>
   } else
      return "DIRECT";
}

This works for me... 

Greets
Sascha
Thank you, Sascha! Your workaround works for me. 

function FindProxyForURL(url, host)
{
        if(!isResolvable(host))
                return "DIRECT";

        <... your stuff ...>
}

The only remaining problem is the fact that "<... your stuff ...>" is normally pulled from a web server, and it might change. Apparently there is no include statement for these proxy.pac files, so I would need to find someone who could change the official company proxy.pac. And then tell them to look at this bug and please change that pac file accordingly.

So, could something like this be put into the Firefox source? Try to resolve the hostname just before calling that FindProxyForURL() function?

Or wouldn't that help?

Regards...
  Michael
(In reply to comment #44)

That default wouldn't work because, for instance, I would like to use a proxy only for unresolvable hosts (because they are internal hosts that I need to access through an ssh-port-forwarded proxy).
Particularly difficult case for this bug - http://symbiansmartphone.blogspot.com/ 
It usually takes some 30 secs to load on my mega-powerful double Xeon 2GHz with 2 gigs of ram. And most of these 30 secs the Firefox is not responsive.

Originally I blamed http://symbiansmartphone.blogspot.com/ but now I see that it is slow PAC file processing that is guilty.
downstream dupe with some more comments:
https://bugs.launchpad.net/ubuntu/+source/firefox/+bug/113201
I wonder if it will ever be fixed. Seems like devs completely ignore this issue.
Just for the record, this bug is affecting Firefox 3.0 Beta 1 as well. I tried it today and the UI completely blocks when it is accessing the proxy. Again, this will likely affect any enterprise users behind proxies with proxy scripts. As far as I know that would be just about any fortune 500 company.

Neither internet explorer 6 nor safari 3.0.4beta have this issue on my laptop. Nor does the problem surface when browsing outside the corporate intranet. 

Worst case for this bug is pages with content from various sites that each need to be resolved by the script. I've seen the browser block UI for 10-20 seconds on such sites which is of course totally unacceptable.  

I'm afraid this bug has joined the ranks of ignored bugs. If a moz developer is reading; please do something with this bug because this makes Firefox suck big time in Nokia.
I agree this really needs some work; the problem can be worked around by reworking your proxy.pac to do everything using regexp on the names rather than resolves; but this is a big job and can end up with a very complex pac depending on your network.

It also means that when dropping Firefox into an organisation which is using something else it makes Firefox look really bad, and since it isn't obvious it is a .pac issue it just means Firefox looks slow and clunky.

Dave
when i changed work, i arrived to a company that used a .pac that blocked firefox and because of this, almost no one used firefox...

i fixed the .pac and right now, most web developers use firefox (a great thing for compatibility of our sites) and several people in all departments started using firefox.

this is great, but only the admins could change the .pac and none of then knew about this problem, or even cared about it, so all users couldnt do anything to fix this, other than download the pac and fix it... of course, almost no one know how to do it.

this bug is a big problem because only companies and ISPs use .pac files, and user that use this pac arent tecnical enough to find and workaround the problem.
in the end, this people will not use firefox because it is slow.

waiting for ISPs and companies to fix their .pac will take ages, assuming that it will be fixed at all!!
+ wanted1.9 would be nice, but this needs to be fixed, the sooner, the better
(In reply to comment #52)

> this bug is a big problem because only companies and ISPs use .pac files

Not so: I work in a university (www.cs.bham.ac.uk) and we have the same problem.

This bug seems to me to have much higher priority than some other things that have been changing in firefox.

Unfortunately I don't have the technical knowledge to work on it.

Aaron
is bug 306922 a dupe as suggested in it's comment 11?
I can confirm this bug, using Firefox 2.0.0.11 for windows or Iceweasel for Debian in combination with a Debian Edu/Skolelinux setup (WPAD on Tjener, auto detection of proxy set on). Slightly disturbing for power users that are used to immediate reaction of firefox. I agree that this bug is of high priority.
the only fix i can see is moving pac resolving off the main thread. JavaScript is not in any particularly good position to store state so that a caller can resume execution when it has the answer it needs. OTOH, if the JavaScript runs on a different thread, this problem magically goes away.

ralfg: if it's of high priority to you, then either write a patch, or find someone who's willing to accept your payment to write a patch.

You're supposed to read "Bugzilla Etiquette" <https://bugzilla.mozilla.org/page.cgi?id=etiquette.html>, linked from <https://bugzilla.mozilla.org> before commenting in bugs.
Confirming that this bug still exists on Firefox 3 Beta 2 for Windows Vista.
it's a fairly annoying bug that prevent many corporate users from using mozilla unfortunately.
As Dr. David and Daniel mentioned, there is a workaround by fixing the pac file.  wonder if anyone is kind enough to write an example for this workaround?... before the bug is fixed hopefully...  thanks
The workaround does /not/ work. It only prevents *nonexistent* domains from slowing down Firefox. Any site which has slow DNS (or if DNS servers for your network are slow) will still lock the UI until it resolves. 
function FindProxyForURL(url, host)
{
        if (dnsDomainIs(host, ".mydomain.com"))
            return "DIRECT";
        else
        if (isPlainHostName(host))
            return "DIRECT";
        else
        if (host == "localhost")
            return "DIRECT";
        else
        if (shExpMatch(host, "127.*"))
            return "DIRECT";
        else
        if (shExpMatch(host, "10.0.0.*") )
            return "DIRECT";
        else
            return "PROXY 10.0.0.1:3128";
}

This way no DNS queries are done during proxy resolution.
correct, the posted workaround works fine, Jason may have tried a bad workaround, using this was the dns code that locks the firefox doesnt run. any site with slow or no dns will be slow, but JUST for that tab, not all firefox like this bug.

i use a little more elaborate code, but might be useful for those with less proxy.pac knowledge:

if ( isPlainHostName(host)                  ||
      shExpMatch(host, "*localhost*")        ||
      shExpMatch(host, "*producao*")         ||
      shExpMatch(host, "*internal-tests*")   ||
      shExpMatch(host, "*intranet-pe*")      ||
      dnsDomainIs(host, "domain1.pt")        ||
      dnsDomainIs(host, "domain2.pt")
   ) { return "DIRECT"; }
else if ( shExpMatch(url, "*10.10.*")       ||
          shExpMatch(url , "*:7778*")       ||
          shExpMatch(url, "*127.0.0*")      ||
          shExpMatch(url, "*192.168.*")
      ){ 
         if ( isInNet(host, "10.10.0.0", "255.255.0.0")     ||
              isInNet(host, "127.0.0.0",  "255.0.0.0")      ||
              isInNet(host, "192.168.0.0", "255.255.0.0")
            ) { return "DIRECT"; }
         else { return "PROXY proxy:3128"; }
      }
else { return "PROXY 10.10.0.241:8080"; }

note that my code DOES check for the DNS, but only if the URL have some kind of IPs... i do this because i have a internal server with a IP in the URL (not in the host part), so i assume that might also happens in the net... during this years i only found one public server with a url with a internal IP.
neverless, i have a internal DNS that resolves those private IPs, so i get no slowdown... but you can remove that isinet code and it works fine also.

higuita
this bug turned in search for "wpad"

seamonkey trunk tries to resolve |wpad.localdomain| on visiting any url - checked with sniffer on localhost.

is this normal?

if it resolves because of malicious name server will it override proxy prefs - in this case bug should be filed.
(In reply to comment #62)
> this bug turned in search for "wpad"

But it's not about the wpad protocol. Open another issue if you care about it
(look up WPAD on wikipedia to see what's it about before)
sorry for the bugspam. the new stuff is Bug 421490
biesi pointed me in this direction.  I'm spinning a try server build shortly, and writing tests, just throwing it up here for now.
Assignee: nobody → shaver
Status: NEW → ASSIGNED
Comment on attachment 311399 [details] [diff] [review]
resolve non-blocking, and if we block defer until later

This passes a bunch of manual testing (including network removal, proxy being down, etc.) and the PAC parts of the existing necko tests.  I'd like to get this into b5, and I'll work on unit tests for the specific newChannel cases later this week!
Attachment #311399 - Flags: review?(cbiesinger)
Hrm, so... this patch means that for FTP (and Gopher ;) ) we wouldn't use a proxy when PAC is selected. That doesn't seem too good, should we only pass the nonblocking flag when scheme is HTTP/HTTPS?
Comment on attachment 311399 [details] [diff] [review]
resolve non-blocking, and if we block defer until later

I don't think this works as we want -- it seems to skip PAC entirely for some protocols.
Attachment #311399 - Attachment is obsolete: true
Attachment #311399 - Flags: review?(cbiesinger)
This is better, though ftp and gopher can still lock up the UI!

Discovered that I was running the wrong tree when I was working on a unit test for the deferral case -- test isn't working yet, but running the right code sure does work better!

I'll spin up a tryserver with these so that interested folks can report as to their results with slow DNS.  (I'm not quite sure how I'm going to do the slow-DNS testing, other than just going to the office tomorrow!)
Attachment #311445 - Flags: review?(cbiesinger)
Comment on attachment 311445 [details] [diff] [review]
With fix to newProxyInfo, and restriction to http/https

this does not actually fix non-HTTP
Attachment #311445 - Flags: review?(cbiesinger) → review-
Attached patch Better fix for protocol restriction (obsolete) — — Splinter Review
This still needs tests, and more test_ing_.  People who can build with this patch and test in their DNS-sensitive proxy configs should consider themselves Cordially Invited to do so and report their findings.
Attachment #267425 - Attachment is obsolete: true
Attachment #311445 - Attachment is obsolete: true
Attachment #311500 - Flags: review?(cbiesinger)
Comment on attachment 311500 [details] [diff] [review]
Better fix for protocol restriction

r=biesi with the compile error fixed ;)
Attachment #311500 - Flags: review?(cbiesinger) → review+
> People who can build with this patch and test in their DNS-sensitive proxy 
> configs should consider themselves Cordially Invited to do so and report their 
> findings.

I don't really know how to apply patches and compile Firefox and frankly don't have time for it. However, if somebody can tell me where to get a patched binary, I can test it with my office network, where the issue can be reproduced easily all the time.
Comment on attachment 311500 [details] [diff] [review]
Better fix for protocol restriction

I'll test this as best I can, and cross-my-heart-hope-to-be-flamed-by-Brad promise to write Tunit fodder once we get clear of the B5 cruch, but I think the bestest of testests is going to be PAC in the wild -- both to determine to what extent it repairs the problem, and to find cases where it might interfere (though I think those unlikely, based on my growing understanding of this code).
Attachment #311500 - Flags: approval1.9b5?
(In reply to comment #73)
> I don't really know how to apply patches and compile Firefox and frankly don't
> have time for it. However, if somebody can tell me where to get a patched
> binary, I can test it with my office network, where the issue can be reproduced
> easily all the time.

The try server has been activated, and there will be builds available in 30-180 mins (Mac first, Windows last).

https://build.mozilla.org/tryserver-builds/?C=M;O=D

You will want the builds that identify themselves as "nonblocking_pac2", from March 25th.  And I will want to hear about your results!
Addressing biesi's comments, looking for some approval love.
Attachment #311500 - Attachment is obsolete: true
Attachment #311573 - Flags: approval1.9b5?
Attachment #311500 - Flags: approval1.9b5?
Quick testing results:

1. Tried the usual difficult page http://symbiansmartphone.blogspot.com/ in my normal Firefox 2.0.0.12 - not 30 secs of frozing as it used to be several months ago, but still many times browser was frozen during the page loading totally for about 5-10 secs. So bug still could be reproduced

2. Closed normal Firefox, opened the Windows build from the zip file, went to the same page - everything was blazing fast, no freezing at all, not even for 0.5 second

3. Could not believe that it was that fast, closed your build, went back to normal Firefox, opened the same page. It was fast! There were minor freezings (like 0.3-0.5 secs) so your build was faster with absolutely no freezing, but that was way faster than before. Not sure how to explain this (maybe some caching issue?).

I am eager to do more testing, but I am traveling for a couple of days and not sure, when exactly I'll be back in the office. So, feel free to put more "testing tasks" to this thread, but don't expect quick reply (from me at least).

Good luck!
Artem.
DNS caching is definitely in play, since the blocking that happens is due to us waiting on a DNS response.  Purging your DNS cache before step 2 would make the results more useful.  Sorry  I didn't say so earlier, if you have a chance to test again with that it would be helpful.  (Or you can go to a "fresh" host first, since the loading speed of the page isn't going to be affected here, just whether we have the browser locked up while waiting for the DNS response in proxy autoconfig.  Comparing vs. Firefox 3 b4 rather than 2.0.0.12 will also probably isolate things better.)
Comment on attachment 311573 [details] [diff] [review]
posterity: from rediffing after removing the NS_ASSERTION, not just rebuilding and running :(

Would rather risk this in beta than in final, and its a pretty nasty bug for those affected...
Attachment #311573 - Flags: approval1.9b5? → approval1.9b5+
Checking in base/src/nsIOService.cpp;
/cvsroot/mozilla/netwerk/base/src/nsIOService.cpp,v  <--  nsIOService.cpp
new revision: 1.206; previous revision: 1.205
done
Checking in base/src/nsProtocolProxyService.cpp;
/cvsroot/mozilla/netwerk/base/src/nsProtocolProxyService.cpp,v  <--  nsProtocolProxyService.cpp
new revision: 1.76; previous revision: 1.75
done

May all your PACing be async. Big thanks to biesi for his guidance and review!
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Summary: [PAC] Teach IO service and HTTP to use PPS::AsyncResolve to avoid UI hangs when PAC script resolves the URL's hostname → [PAC] Defer proxy resolution for HTTP and HTTPS PAC to avoid blocking main thread during DNS resolution
I backed this out due to mochitest failures on 3 platforms. Running mochitest locally, I noticed that it got stuck running
http://lxr.mozilla.org/seamonkey/source/docshell/test/test_bug413310.html

with a "confirm resending POSTDATA" alert. This would be treated like a hang by mochitest and is consistent with the errors on tinderbox (general mochitest FAIL on Linux/Mac, red on Windows).
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
> Purging your DNS cache before step 2 would make the
> results more useful.

Could you tell me how to purge the cache?
Tested the Trunk Build (3/25/08) with a PAC. The UI froze indefinitely with an invalid URL. I used the invalid URL "http://www.del.icio.us". Valid URLs worked fine. 
shaver@mozilla.com-nonblocking_pac2-firefox-try-win32.zip
WIN XP PRO, SP2
(In reply to comment #82)
> Running mochitest locally, I noticed that it got stuck running
> http://lxr.mozilla.org/seamonkey/source/docshell/test/test_bug413310.html
> 
> with a "confirm resending POSTDATA" alert.

I've confirmed that backing out the patch fixes that problem.
The only reason I can think of for this to affect bug 413310 is if the POST response is not being cached for some reason.  Does someone want to try applying this patch and seeing whether your can submit a POST form (in a subframe, so there's no bfcache), then go back then go forward?  If you get the repost dialog,  repeat all that with HTTP logging enabled?
hm, how do http proxies deal with the copyrighted stanford pdf trick of changing dns stuff?
I don't quite follow what that has to do with this bug?  Do you think that the DNS is changing between resolution points?
(In reply to comment #86)
> The only reason I can think of for this to affect bug 413310 is if the POST
> response is not being cached for some reason.  Does someone want to try
> applying this patch and seeing whether your can submit a POST form (in a
> subframe, so there's no bfcache), then go back then go forward?

I tested with:
data:text/html,<frameset><frame src="data:text/html,<form method='post' action='http://gavinsharp.com/tmp/echo.php'><input type='hidden' value='foo' name='foo'><input type='submit'>">

With the patch applied, clicked "Submit", clicked back, clicked forward. No POST dialog.
But on the mochitest you do see the dialog?
(In reply to comment #90)
> But on the mochitest you do see the dialog?

Yes.
I minimized the mochitest to the smallest file that would reproduce the dialog when the patch is applied (to avoid extra loads of JS/CSS), and then generated HTTP logs with/without the patch. All the files are at:

http://people.mozilla.org/~gavin/bug/235853/

The two "diff" files represent the only significant difference between the two logs that I could find - in the "bug" case, two nsHTTPChannels are created, while in the "no bug" case, only one is.

I also got a stack to the code that prompts about the form resubmit:
http://people.mozilla.org/~gavin/bug/235853/repost.alert.stack
Attached file HTTP logs/testcase/stack —
This contains the current contents of http://people.mozilla.org/~gavin/bug/235853/, for posterity.
(In reply to comment #88)
> Do you think that the
> DNS is changing between resolution points?
> 

yes, malicious dns server on purpose gives different replies for the ip of
hostname with short dns TTL. so if a script is loading several times content
from mal.dns you may end up with content with hostname mal.dns coming from two
ip-s: 1.1.1.1 and 10.1.1.1 and both seem coming from the same hostname so they
can interact.
Gavin, can you load the POST result, then breakpoing in CanSavePresention and navigate away from it?  Does it get bfcached?  If not, which exact reason causes CanSavePresentation to return false?  Is doing loads directly from the onload handler relevant?
(In reply to comment #95)
> Gavin, can you load the POST result, then breakpoing in CanSavePresention and
> navigate away from it?  Does it get bfcached?  If not, which exact reason
> causes CanSavePresentation to return false?

With or without the patch applied (i.e. whether or not the dialog appears), the CanSavePresentation called with nsLocation::SetHref on the stack (i.e. the one that navigates away from the POST in the testcase) returns false due to the subframe check:

http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/docshell/base/nsDocShell.cpp&rev=1.894&mark=5292#5261

> Is doing loads directly from the onload handler relevant?

No - I can reproduce the problem by driving the test manually (e.g. submitting the iframe's form with a javascript: URI, and then just going back to the form page, and forward to the POST result manually with the back/forward buttons).
Oh, right.  The whole point was to not have bfcache happening so the loads have to come from cache...

I guess it would be worth stepping through the HTTP channel AsyncOpen/OpenCacheEntry stuff when returning to the POST page and see what's up (nothing in cache, cache entry invalid, or what).
Attached patch regression fix? (obsolete) — — Splinter Review
OK, I looked into this a bit further. When the bug occurs, nsHttpChannel::OpenCacheEntry() fails when going back to the POSTed page. If fails because it has the wrong mPostID, so it ends up generating the wrong cacheKey to pass to nsCacheSession::OpenCacheEntry.

I added some debug logging and noticed that docshell was setting the new cachekey on the channel, but that it wasn't setting it on the right channel when the bug occurred ( http://people.mozilla.org/~gavin/bug/235853/cachekey_comparison.txt is a log of the relevant OpenCacheEntry/SetCacheKey calls). That made me think of comment 92, where I noticed multiple channels being created. That led to finding SetupReplacementChannel(), where I noticed that mPostID wasn't being transferred to the new channel. I made it do that, and then I could no longer reproduce the bug.

I'm still not sure why shaver's patch causes the extra channel to be present, and I have no idea whether this change is correct in the general case, but it fixes the regression for me.
You should probably only be sending the mPostId along if the POST data is being sent along too, right?  And possibly only if the URI hasn't changed... not sure about that last.  biesi might know.

> I'm still not sure why shaver's patch causes the extra channel to be present,

Coming from DoReplaceWithProxy, right?
Ah right. I should've thought of this.

The patch looks basically correct to me, except that the cache key should only be forwarded when coming from ReplaceWithProxy, I would say. maybe add an argument forwardCacheKey to SetupReplacementChannel.
Attached patch regression fix (obsolete) — — Splinter Review
Attachment #311954 - Attachment is obsolete: true
Attachment #312162 - Flags: superreview?(cbiesinger)
Attachment #312162 - Flags: review?(cbiesinger)
Attachment #312162 - Flags: superreview?(cbiesinger)
Attachment #312162 - Flags: superreview+
Attachment #312162 - Flags: review?(cbiesinger)
Attachment #312162 - Flags: review+
Attached patch combined patch (obsolete) — — Splinter Review
Attachment #311573 - Attachment is obsolete: true
Attachment #312162 - Attachment is obsolete: true
Comment on attachment 312383 [details] [diff] [review]
combined patch

Looking for approval here.  There is risk in bizarre PAC configurations, but the upside is pretty substantial in basically all PAC configurations.
Attachment #312383 - Flags: approval1.9?
Comment on attachment 312383 [details] [diff] [review]
combined patch

a1.9=beltzner
Attachment #312383 - Flags: approval1.9? → approval1.9+
Checked this in again, and it broke some mochitests again - this time the offline cache ones. I could reproduce that bustage too, when I ran the offline tests directly. After spending some time trying to figure out what was going wrong in offline cache code, I noticed nsOfflineCacheUpdateItem::OpenChannel sets nsICachingChannel attributes on the channel - cacheClientID and cacheForOfflineUse - that weren't being forwarded to the new channel in nsHttpChannel::SetupReplacementChannel. So essentially the same problem that was exposed for cacheKey by the POST submission test.

Fixing SetupReplacementChannel to also forward those attributes to the new channel fixed that failure, but that makes me wonder whether there are other properties of the channel that need forwarding that might not have test coverage. cacheToken or offlineCacheToken, perhaps?
Oh, I guess the setters for those aren't implemented, so probably not?
Attached patch combined patch + additional fix (obsolete) — — Splinter Review
I'm running this through mochis again.
Attachment #312383 - Attachment is obsolete: true
I'm seeing another failure now:
source/extensions/cookie/test/test_loadflags.html is failing, because its "http-on-modify-request" observer (in:
http://lxr.mozilla.org/seamonkey/source/extensions/cookie/test/file_testloadflags.js )

is expecting to be called twice, but is now being called 4 times. This is presumably another consequnce of two channels being used per load rather than one. It seems to me that the test should probably be adjusted to now expect 4 calls, but I'm not sure.
yeah, i suppose adjusting it to 4 is fine - it's (i hope!) an implementation detail as to how many requests are generated for a load. though biesi should probably confirm that this behavior is okay, and not bad in other ways.

a more robust way to write the test might be to pull the request URI from the channel on each firing of "http-on-modify-request", and compare it against an expected list. but r=me for making it 4, as long as you add a comment in the .js explaining why we expect twice the calls.
yes, the number of channels used is an implementation detail. that's why nsIChannelEventSink has an _INTERNAL reason.
Comment on attachment 314416 [details] [diff] [review]
combined patch + additional fix

biesi: can you stamp the combined change?  We'll get the mochitest count before checkin as well, per dwitte.
Attachment #314416 - Flags: superreview?(cbiesinger)
Attachment #314416 - Flags: review?(cbiesinger)
Attachment #314416 - Flags: superreview?(cbiesinger)
Attachment #314416 - Flags: superreview+
Attachment #314416 - Flags: review?(cbiesinger)
Attachment #314416 - Flags: review+
Attached patch rollup patch (obsolete) — — Splinter Review
Includes the change to the cookie unit test. Passes a full run of Mochitest on a fresh build.
Attachment #314416 - Attachment is obsolete: true
Attachment #316154 - Flags: approval1.9?
Comment on attachment 316154 [details] [diff] [review]
rollup patch

a1.9=beltzner
Attachment #316154 - Flags: approval1.9? → approval1.9+
Been waiting for the tree to go green so that I can check this in.
Keywords: checkin-needed
After all the work I'd done to make absolutely sure that this passed a full run of mochitests, multiple times, I checked it in. Then I discovered that it caused one of the mochi*chrome* tests to fail:

http://lxr.mozilla.org/seamonkey/source/content/base/test/chrome/test_bug421622.xul

I don't know why, offhand - SetupReplacementChannel certainly does transfer the referer.

I'm very upset that I had to back this out *again* because of my own negligence. At least we know we have good test coverage of PAC-enabled network code paths, I guess.
I get continual hangs even if I'm not doing anything. I assumed that this was caused by Camino 1.6, but I backed out to Camino 1.5.5 and I'm still getting it... so I went in with Activity Monitor and found this was always due to a bunch of nested js_Invoke called by nsPACMan::getProxyForURI, called, at the top level, from -[Bookmark refreshIcon]

I'm not sure that this is due to the problem being discussed here.

A couple of questions are pulled irresistibly to mind here.

1. What is -[Bookmark refreshIcon] being called so frequently for?

2. Shouldn't ANY kind of name resolution related work (including proxies and PAC files) happen in its own thread (or at least be interruptible)? Or even in a whole separate proxy helper process - it's not like name lookup is fast enough that even a few *milli*seconds to pass a name to the helper and get the address back would be noticeable.

3. Speaking of helper applications, what about putting all the maintenance work like updating bookmark icons into one?
Just for information, the Activity Monitor samples from a typical "proxy hang" in Camino.
The hangs under GetProxyForURI are due to exactly this bug.  The helper app and bookmark icon bits should be in other Camino bugs, and probably not mentioned here again.

DNS resolution does happen on a background thread, but proxy selection is currently synchronous, so the script waits for the background thread to have completed, making the asynchronicity of the actual resolution somewhat pointless.

What we need to fix this is for someone to help figure out what is causing the mochichrome failure, and then we can look at how to get a patch like this into a 3.0.1.  The motivating problem is well-understood, and easy to reproduce.
Flags: wanted1.9.0.x?
(In reply to comment #119)
> The helper app and bookmark icon bits should be in other Camino bugs

Specifically, bug 351678.
Whiteboard: [missed 1.9 checkin]
Comment on attachment 316154 [details] [diff] [review]
rollup patch

Removing approval since this missed the 1.9 cutoff.
Attachment #316154 - Flags: approval1.9+
Flags: blocking1.9-
I am using Firefox in an enterprise environment. This bug has been made me switch to opera. Can anyone tell whether this bug is going to be fixed in the next update for Firefox 3.0.
(In reply to comment #122)
> I am using Firefox in an enterprise environment. This bug has been made me
> switch to opera. Can anyone tell whether this bug is going to be fixed in the
> next update for Firefox 3.0.
> 

see comment 119 : there's still a testcase that is failing, and that needs to be fixed first
(In reply to comment #116)
> After all the work I'd done to make absolutely sure that this passed a full run
> of mochitests, multiple times, I checked it in. Then I discovered that it
> caused one of the mochi*chrome* tests to fail:
> 
> http://lxr.mozilla.org/seamonkey/source/content/base/test/chrome/test_bug421622.xul
> 
> I don't know why, offhand - SetupReplacementChannel certainly does transfer the
> referer.

The problem appears to be that test_bug421622.xul uses setRequestHeader to set the referer and so nsHttpChannel::mReferrer is not set and thus not copied over by SetupReplacementChannel.

I've attached a patch that adds a hack to make the mochichrome test pass, but also adds a new mochitest that uses a custom header that is not propagated by SetupReplacementChannel.

I'd guess that both test_bug421622.xul and the new test would currently fail if a proxy is setup so that DoReplaceWithProxy...SetupReplacementChannel etc. is called like it is after this patch is applied. However, I haven't confirmed this.
Attached patch rollup patch + hack to set referer header (obsolete) — — Splinter Review
Copies the referer header over to make the mochichrome test pass. Includes a new test that still fails.
Hmm.  So this patch as it stands doesn't carry over "custom" headers set by XMLHttpRequest?  That seems like it could be a problem, no?
(In reply to comment #126)
> Hmm.  So this patch as it stands doesn't carry over "custom" headers set by
> XMLHttpRequest?  That seems like it could be a problem, no?
> 

Correct. The patch is not meant to be applied. However, SetupReplacementChannel already has the problem. The patch from shaver/gavin just makes the problem worse because we are calling SetupReplacementChannel more often.
Right.  I understand where the patches stand.

It seems that SetupReplacementChannel is used for two things: 3xx redirects and proxies.  I'm not sure whether copying headers en masse makes sense for these two use cases.  The HTTP RFC seems to be silent on the behavior that 3xx should lead to in this regard...
Attached patch rollup patch + copy all headers (obsolete) — — Splinter Review
This patch copies all of the headers in the case of a proxy connection and no headers otherwise. It renames the previous 'transferCacheInfo' flag to 'forProxy' (I don't like this name, but nothing better came to mind).

It passes the test_bug421622.xul test as well as the newly added test.
Attachment #316154 - Attachment is obsolete: true
Attachment #328677 - Attachment is obsolete: true
Comment on attachment 328746 [details] [diff] [review]
rollup patch + copy all headers

This will cause kind of interesting behaviour because now for some
OnChannelRedirect calls we won't have transferred the headers while
for others (REDIRECT_INTERNAL) we will have. I guess that's not that
much of a problem since the observer will probably just re-set the
header if it cares at all.

On the patch itself:
      for (i = 0; i < count; ++i) {

The body of this loop is incorrectly indented. It seems you're using tabs,
please don't.

	nsCAutoString header, value;
	header.Assign(oldHeader);
	value.Assign(oldValue);
	httpChannel->SetRequestHeader(header, value, PR_FALSE);

This should just be:
  httpChannel->SetRequestHeader(nsDependentCString(header),
                                nsDependentCString(value),
                                PR_FALSE);

in the header:
    nsresult SetupReplacementChannel(nsIURI *, nsIChannel *, PRBool preserveMethod, PRBool forProxy);

Limit that line to 80 characters
Attachment #328746 - Flags: review?(cbiesinger) → review+
Blocks: 356281
Breaks lines longer than 80 columns in netwerk/protocol/http/src/nsHttpChannel.h
Attached patch rollup patch + copy all headers (obsolete) — — Splinter Review
The previous patch with changes suggested by cbiesinger.
Attachment #328746 - Attachment is obsolete: true
Attachment #332390 - Flags: superreview?(cbiesinger)
Flags: wanted1.9.1?
Comment on attachment 332390 [details] [diff] [review]
rollup patch + copy all headers

Moving sr to bz, because biesi was the r, I believe.  Maybe we'll get lucky for 3.1b1!
Attachment #332390 - Flags: superreview?(cbiesinger) → superreview?(bzbarsky)
Comment on attachment 332390 [details] [diff] [review]
rollup patch + copy all headers

I'm a maroon -- r+sr was from biesi on previous patches, restoring request.
Attachment #332390 - Flags: superreview?(bzbarsky) → superreview?(cbiesinger)
Comment on attachment 332390 [details] [diff] [review]
rollup patch + copy all headers

+      PRUint32 i, count = mRequestHead.Headers().Count();
+      for (i = 0; i < count; ++i) {

why not:
  for (PRUint32 i = 0; ...)

+        httpChannel->SetRequestHeader(nsDepedentCString(header), nsDependentCString(value), PR_FALSE);

please limit your lines to 80 chars

+        httpChannel->SetRequestHeader(nsDepedentCString(header), nsDependentCString(value), PR_FALSE);

you misspelled nsDependentCString in the first argument?
Attachment #332390 - Flags: superreview?(cbiesinger) → superreview+
Also, a necko unit test instead of just the one in content/ would be really great.
Attached patch v2 (obsolete) — — Splinter Review
This addresses the review comments and merges with HEAD. 

Part of the merge involved a new call to SetupReplacementChannel() in ProcessFallback(). I pass in false as 'forProxy', but this is just a guess as to the desired behavior. Does it seem correct?
Attachment #332390 - Attachment is obsolete: true
Flags: blocking1.9.1?
(In reply to comment #138)
> Part of the merge involved a new call to SetupReplacementChannel() in
> ProcessFallback(). I pass in false as 'forProxy', but this is just a guess as
> to the desired behavior. Does it seem correct?

yeah that seems right.

are you going to add a test in netwerk/test/unit? you'd just have to set the PAC pref using the prefservice, can be a data: URI that just always returns DIRECT I think. and then check that stuff gets correctly transferred.
I'm not going to have time to write a test for this soon. I still can when I get the chance, but someone else could probably write a test much faster than I could anyways.
Blocks: 450153
Not going to block on this, but I think we want it even after beta2... what's the current status of the patches here?
Flags: wanted1.9.1?
Flags: wanted1.9.1+
Flags: blocking1.9.1?
Flags: blocking1.9.1-
Attached patch Necko unit test (obsolete) — — Splinter Review
Here's a necko unit test that tests that headers and cache keys are preserved when we use a proxy. It fails with trunk and passes with this patch.
Attachment #332389 - Attachment is obsolete: true
Defers proxy resolution for HTTP and HTTPS PAC to avoid blocking main thread during DNS resolution and adds code to SetupReplacementChannel to ensure that the proxy channel is similar to the original channel.

It also adds a necko unit test and a mochi test to that the proxy channel preserves the needed properties of the original channel.
Attachment #341161 - Attachment is obsolete: true
Attachment #345365 - Attachment is obsolete: true
This still isn't passing mochitest :( It's causing tests/content/base/test/test_CrossSiteXHR.html to fail now.
The most immediate problem is that nsCrossSiteListenerProxy::OnChannelRedirect is broken: it never allows the redirect, because it calls CheckRequestApproved which always fails on channels that are not 2xx HTTP responses (which no channel being redirected ever is).

Sicking says fixing that won't be enough because preflight requests (for POST/PUT) aren't allowed to be redirected right now; we'd need to fix that to allow the PAC redirect for those.
Preflight requests? Are those the ones where the browser preloads links? It would be worth disabling them for proxied connections, perhaps?
Preflight requests are the access-control GET requests made before a cross-site XHR POST is allowed to happen.
I was talking to Boris and he suggested that Jonas disable the failing tests so that we can land this in time to make the beta. The reason the tests are failing comes from this patch causing existing bugs in the tree to be exercised and the benefit from the additional user testing probably outweighs the disadvantage of having the disabled tests.
Flags: blocking1.9.1- → blocking1.9.1?
I'm fine with disabling the tests for now I guess. Though it will mean that XSXHR won't work for people with proxies at all for now :(
Priority: P2 → P1
What tests need to be disabled?
Per comment 146, content/base/test/test_CrossSiteXHR.html 

Just commenting it out in the Makefile should work...

Jonas, did you ever get that patch I sent you into a bug?
test_XHRDocURI.html was also hanging.
test_CrossSiteXHR_cache.html also breaks.
Attachment #346370 - Flags: approval1.9.1b2?
Comment on attachment 346370 [details] [diff] [review]
[2/2] v4: Defer proxy resolution for HTTP and HTTPS PAC to avoid blocking main thread during DNS resolution

If we're thinking about doing this in the beta....
Comment on attachment 346370 [details] [diff] [review]
[2/2] v4: Defer proxy resolution for HTTP and HTTPS PAC to avoid blocking main thread during DNS resolution

I'm fine to take this opportunistically if people tell me that it's well tested and won't cause headaches, but based on the previous comments, I won't take this without a patch that also disables the tests that need to be disabled, and without a follow-up bug filed to fix those tests.
Attachment #346370 - Flags: approval1.9.1b2? → approval1.9.1b2-
> Though it will mean that XSXHR won't work for people with proxies at all for now

It already doesn't in a number of cases (proxy failover, say).

Jeff, can you put together a patch that does the test disabling?  I'll work on getting that bug filed.
So it looks like we also fail a bunch of video tests:
 /tests/content/media/video/test/test_constants.html
 /tests/content/media/video/test/test_bug461281.html  - Bug 461281
 /tests/content/media/video/test/test_duration1.html
 /tests/content/media/video/test/test_ended1.html

basically it looks like video is also broken with proxies with this patch.
I filed bug 464954 on fixing XHR.

Is the problem that video is using the same access-control stuff as XHR (and hence is broken in the same ways)?
Actually, video doesn't use access-control yet.  Chris, any idea what's going on there?  Does video work across 3xx redirects?
I think Chris Pearce worked on some things to do with redirects on bug 451958. Hopefully he can comment on whether redirects require that patch or not.
(In reply to comment #161)
> Does video work across 3xx redirects?

It WFM in recent win32 nightlies. See: http://pearce.org.nz/movie
That's a 301 redirect.

Also the tests for bug 451958 test redirects with allow-origin and non-allow-origin videos, and that WFM. That patch uses nsCrossSiteListenerProxy... That patch hasn't landed yet, it's blocked by bug 462878, which I'm working on now. 


(In reply to comment #159)
> So it looks like we also fail a bunch of video tests:
>  /tests/content/media/video/test/test_constants.html
> basically it looks like video is also broken with proxies with this patch.

test_constants.html doesn't load any video, it only tests the existence and value of constants. I don't understand the area you're working in, but I can't imagine how anything to do with networking could affect this testcase.

Maybe try turning tracing off? We've been getting random test failures with video, maybe that's the cause?
test_constants.html only fails when being run after test_bug461281.html

test_bug461281.html fails consistently with:
not ok - Error thrown during test: [object ProgressEvent] got 0, expected 1

If you give the patch the PAC patch a try you should be able to reproduce the test failures without any difficulty and it's not unlikely that they point to bugs in the video code.
Flags: wanted1.9.0.x?
Flags: blocking1.9.1? → blocking1.9.1-
This bug turns a mistyped key to become a minute of annoyance, it will be good when it is fixed :)
I reran the media/video tests with a recent tree and they all seem to pass. So it looks like we're good there.

I also tried on the patch from bug 464954, but it seems to cause content/base/test/test_CrossSiteXHR.html to hang, event without the PAC patch.
I'll land bug 464954 today when I get back from the ski slopes, that should help with the cross-site XHR tests.
It looks like content/html/document/test/test_bug445004.html fails with this patch.
That's quite odd.  Which part of that test?
I'm not sure. I wasn't able to reproduce the failure when running the test individually. I'm rerunning the whole suite to see if I can reliably reproduce the failure that way.
So, I wasn't able to reproduce the test_bug445004.html failure but toolkit/components/places/tests/test_bug_411966.html seems to fail reliably with:

"not ok - Got wrong referrer for http://localhost:8888/tests/toolkit/components/places/tests/bug_411966/PermRedirectPage.htm"

and toolkit/components/url-classifier/tests/mochitest/test_classifier.html and toolkit/components/url-classifier/tests/mochitest/test_classifier_worker.html both fail with:

"not ok - Error thrown during test: document.getElementById("testFrame") is null got 0, expected 1"
test_classifier.html and test_classifier_worker.html
still fail without the patch so I don't know what's up there.

test_bug_411966.html passes without the patch and the redirect stuff seems like it could be related to the patch.
Could be, yes... That said, you do copy the referrer in SetupReplacementChannel, right?

Can you reproduce the failure when running just the one test?  If so, are you willing to debug?
Here's some more information about how the test fails:

not ok - Got wrong referrer for http://localhost:8888/tests/toolkit/components/places/tests/bug_411966/PermRedirectPage.htm, expected http://localhost:8888/tests/toolkit/components/places/tests/bug_411966/ClickedPage.htm 

The error happens in redirect.js:checkDBOnTimeout():181
We copy the Referrer if mReferrer is set and we copy all the headers so it should be getting set.

I can reproduce the failure when running just the one test and I'm willing to debug with guidance.
Hmm. It looks like places stores this redirect with addDocumentRedirect.  If you skip doing that one call for cases when the redirect is internal, does that help?
It does help. Not calling addDocumentRedirect() when the redirect is internal fixes the test case.

So now, to fix it properly:

1. AddDocumentRedirect() doesn't check for internal redirects. It seems reasonable that it should ignore internal redirects.

2. nsDocShell::OnRedirectStateChange() also calls AddDocumentRedirect() and gavin was wondering whether nsDocShell::OnRedirectStateChange() should be called on internal redirects.
Depends on: 475069
For #2, I would say yes.  For example, the URL classifier probably needs to see the new channel, as far as I can tell.
(In reply to comment #179)
> For #2, I would say yes.  For example, the URL classifier probably needs to see
> the new channel, as far as I can tell.

Right, but can that not be done in nsDocLoader::OnChannelRedirect? I guess I don't really understand why we need both nsDocShell::OnRedirectStateChange and nsDocLoader::OnChannelRedirect, if the difference wasn't meant to be that only the former was called for non-INTERNAL redirects. Not really relevant to this bug anymore, I guess...
After some more testing I'm getting occasional failures on test_bug445004.html, however I haven't been able to reliably reproduce the problem.
Depends on: 479328
I can reliably reproduce the following failure with tryserver unit tests:
*** 41223 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/tests/mochitest/geolocation/test_manyWindows.html | Error thrown during test: windows[i] is null - got 0, expected 1

There also seem to be some leak test issues:
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1236092965.1236101757.29727.gz&fulltext=1
Doug, can you give some guidance as to what could be going wrong here?
seems to be same as bug 208287 and should be marked a dup of this bug since there is more info on this bug than other.
Blocks: 208287
Jason can you take ownership of this bug?
> Jason can you take ownership of this bug?

Sure.   I've got a lot on my plate, but probably not as much as Shaver (I didn't even know he fixed bugs...)

I haven't had time to look over the 186 comments, but the first thing that comes to mind is that multi-process (electrolysis) may make this bug moot, assuming that DNS lookup occurs in the subprocess and not in the chrome process.  Does anyone have a sense if that is correct?   If so, unless we're very close to a architecturally clean fix here, it may make sense to wait and get this "for free" and "right way" later.
Assignee: shaver → jduell.mcbugs
Sorry, upon 2 seconds more reflection, it's unlikely that electrolysis is going to automatically fix this for free (though it may change how we architect the fix, if we can keep the main thread uninvolved or less involved).
I'm trying to do some research and dupe checking on bug 492558 because I saw three separate reports of the problem as being new to Firefox 3.0.10.  The technical aspects of it are a little over my head, so I wanted to ask if it might be related in any way to this bug since one is receiving current attention.
Blocks: 309582
With the new version of Firefox (Mozilla/5.0 (Windows;
U; Windows NT 5.1; fr; rv:1.9.1.1) Gecko/20090715 Firefox/3.5.1), it seems
there are no more freeze.

Anyone can confirm this?
No, I still experience freezes just as before.

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.1) Gecko/20090715 Firefox/3.5.1
I also continue having problems:

Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.9.1.1) Gecko/20090715 Firefox/3.5.1 (.NET CLR 3.5.30729)
I DON'T use PAC. I use manual proxy configuration.
I have the same problem of blocking UI during opening a new tab/URL.
I don't understand for what reason DNS resolving has to be done in case using PROXY.

However I understand that this DNS resolving is done through the Proxy server!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
I have ping google.com and it works fine for any server in the world.
I need just one simple property in the about:config or checkbox in the proxy settings that says:

Do the DNS resolving with direct connection.

By using this simple fix i will have Firefox that resolves names using the local caching DNS server and working as fast as possible.
Please do that before I download the source and make complete mess of the Firefox to make it works that way.

This proposal will not help for the half of the people that have only proxy and local DNS server.
However many companies including big ones like the one I work for, uses proxy for internet and direct communication for VPN connection to all the offices worldwide. And this is not going to change any time soon in this millennium.
I suffer this bug from the time when Firebird 0.4 have been released. I use Firefox 3.5.4 WinXP and it is not fixed YET WHY?
DNS resolving *has* to be done in the proxy server, otherwise it wouldn't know where to forward the request to.

A direct connection does not use a proxy server, so the browser has to do all the DNS lookups itself.

In the case of a PAC file, your browser might be doing DNS lookups too, but not always (depending on the content of the PAC file).

You're probably confusing the DNS prefetching-lookups (sent by the browser) with the ones needed to open the connection (sent by the browser and/or the proxyserver). In case of a PAC file, they're not always necessary, so we might disable prefetching in that case. See bug 507578.
Blocks: 468079
I do not know the PAC file scenario of the problem.
But seems my bug reported was redirected to this one. I suppose that the source code forced the maintainers of the bugs to do that.

I also wish to mention that the line in my previous comment was missing one word that changes the meaning. 
It was :

"This proposal will not help for the half of the people that have only proxy and
local DNS server."

It should be:

This proposal will not help for the half of the people that have only proxy and
NO local DNS server.
Going back to an older post, this does seem to fix the issue:

function FindProxyForURL(url, host)
{
  //So the error message "no such host" will appear through the
  //normal Netscape box - less support queries :)
  if (!isResolvable(host))
    return "DIRECT";
...
I have experienced the same hanging even not using any proxy (at home).
It seems to me that if the web server is on high load or the internet connection is slow, opening a new tab hangs until response. So I had no way to use other tabs until the last opened is loaded and least at 10% or so.
I will quite probably create a new bug about that if it is not already opened.
Perhaps I am in the wrong place, but I will give it a try.  For at least the next two months, I will be using this computer, which is now seven years old.  I am currently running Win 7 and not experiencing any problems that I am aware of at this time.  My ISP, Wild Blue, suggested that to improve the speed of my internet access, to use Firefox.  I have been using Internet Explorer since the beginning of my computer experience.  I was not satisfied and a little unhappy with IE 8, so I readily switched to FireFox.  The problem I have experienced, and continue to experience, occurs when I open a new tab or click on a link. It looks asa if it is going to do the requested action.  After several seconds, the progress bar has no indication that it has beun to load the page, or it may show progress then stop.  After a long period of time, a minute or more, I can "Stop" the procedure and then select the same thing again and it usually will quickly respond with the request.  In my settings, I am using an Auto-configuration proxy URL supplied by my ISP.  Would it be better to "Auto detect" the settings?  This is quite frustrating.
"Auto detect" is nothing magical, it's just a way for the browser to find out where the Automatic proxy configuration URL is located. In your case (if you're using wildblue.net as your DNS suffix), according to <http://help.wildblue.net/kb/article/2751>, I guess it will automatically look for the URL <http://wpad.wildblue.net/wpad.dat> instead of the <http://wpad.wildblue.com/wpad.dat> installed by the Wildblue Optimizer. I can't see from the outside if this is the same or not. If your DNS suffix is wildblue.com, then the 2 settings are exactly the same.

Anyway, there might be a problem with the proxy server(s) itself, or with the DNS request that your browser is using when using the wpad-file. Or with the pipelining that the Optimizer has installed (in which case it might also dependent on the exact website that you're visiting).

I'm not sure if you really need that wpad-file, it might be possible that you can surf faster without it. Or maybe you can try switching off the pipe-lining, especially the non-proxy pipelining, see <http://kb.mozillazine.org/Network.http.pipelining>.
Kosta,
I have the same problem: I use manual proxy configuration and I get the UI freezes. So I had to ditch FF and use ie SIX !!!
Is it just me, or has this bug gone from bad to horrible in version 4.0?  I used to avoid DNS related functions when writing PAC files, but firefox 4 seems to have intermittent freezing with even basic PAC files now.
It is indeed horrible!

Today I've tried accessing blender.org whose name servers do not respond.
When using PAC whole browser locks up for ~15s, unfreezes then locks up again.

Switching proxy to manual or direct resolves issue, though browser seems to freeze for almost second every 10-15 seconds while another tab tries to load page for ~1min.

This is on Win7 with FF4.0

I guess this can be reproduced by setting up a dns zone with ns records pointing to dead servers (or finding such domain on internet).
My wpad.dat is like this:

function
FindProxyForURL(url, host)
{
        if (isPlainHostName(host))
                return "DIRECT";
        if (dnsDomainIs(host, "my.local.domain"))
                return "DIRECT";
        if (isInNet(host, "127.0.0.0", "255.255.255.0"))
                return "DIRECT";
        if (isResolvable(host))
                return "PROXY 192.168.0.1:8080; DIRECT";
        else
                return "DIRECT";
}
It was version 2.0 of Firefox back in 2008, when I confirmed this bug for the first time. Three and a half years later, I have to confirm this bug for Firefox 4.0, too.

Especially for schools and bigger companies that cannot set proxy settings on each single client - but are important multipliers for the usage of free software - this is sad.

In times of muliple CPU cores and perfect threading, there should be no need for freezing GUIs, I think.

Now, I have to check if this is still a show stopper for the version 5.0 of Firefox. 

With kind regards.
RalfG (Skolelinux)
BTW, my wpad.dat looks like this (maybe it can be optimized?):
----
function FindProxyForURL(url, host)  {

//###############
//#Local Network#
//###############

  if (isInNet(host, "10.0.2.0", "255.255.254.0"))
  {
   return "DIRECT";
  }


//#########
//#Default#
//#########

return "PROXY 10.0.2.2:3128;"

}
Yes, add this in front :

if (isPlainHostName(host) || shExpMatch(host, "*.yourdomain.com"))
{
   return "DIRECT";
}

If you're using internal webserver using the hostname or full DNS name, you will contact them directly, without going to the proxyserver first. The isInNet call only helps if you contact those webservers using an ipaddress, not a DNS-name.

But your problems have nothing to do with this bug, as your PAC-file does not use DNS at all (use dnsResolve call for that).
I agree to Jo's solution for getting around the problem (I also added one set of rules with javascript regexps on the host variable to allow for direct use of IP adresses in the URL).

But I must disagree with his last comment, as it is my understanding that isInNet performs a dnsResolve call to get the IP adress in order to check it against the network mask.

Sad to hear that this bug is still live in 4.0...

Damn, now even Wikipedia says :
"The function dnsResolve (and similar other functions) performs a DNS lookup that can block your browser for a long time if the DNS server does not respond."
Indeed, isInNet couldn't possibly work except by a DNS resolution. But the advice is good; best to avoid that call whenever possible.

Other workarounds: local proxy server: I wrote http://www.slamb.org/projects/foxigniter/ for this purpose back in 2008, reasoning that it was a lot easier to write than to get a patch into Firefox. (Given that the bug is around three years later, it appears I was right.) But I never finished it, so it has several problems including a crash when the local side of the proxy cancels a request and a lack of flow control without some libevent patches that probably won't apply cleanly to a modern version.

The reason I never finished it? Switched to Google Chrome instead. Maybe it's a bit rude of me to point this out on the Firefox bug tracker, but Chrome doesn't have this problem, so I'm much happier.
Firefox 4 seems worse.  When I restore a session of only 3 windows/20 tabs, it can take several minutes before the UI starts responding under Windows XP. This morning I killed the Firefox task after about 2 minutes and re-started it.  Firefox did not hang the second time I started it.  But periodically during the day it will hang for several minutes.
Bill, your issue has absolutely nothing to do with this bug. If you can't find another one already filed please create a new one.
@Henrik, not sure why you say that.  My organization has a huge proxy.pac.  At home with no proxy, Firefox works fine with 20 windows and 50-100 tabs.
Hello & thanks for your comments.

Today I tested Firefox 5.0 (for Windows), and the bug is still there.

This is the way I test it:

(1) Put proxy settings to "auto detect"
(2) Start a web search for "test"
(3) Click on 3..4 hits of the search engine, using the mid-mouse button.

Usually, I won't get to the 3rd click, because Firefox hangs then.
Even if you open the Settings dialog before loading is complete, this
dialog will hang for several seconds.

If you use explicit proxy settings (10.0.2.2:3128), there is no delay at all.
When I get you right, auto proxy recognition can't do without a DNS call.
Reading this, I suggest:
- prevent DNS calls from blocking the GUI
- prevent DNS calls if the proxy string spread via DHCP is an IP (as in my case)
- prevent proxy detection with every URL call; rather define some update interval
  or just lookup proxy settings (wpad.dat) on failure.
We had users complain about UI hang when they typed in a non-existant domain.  We had 30 IsInNet statements in the PAC file they were using.  Firing up Wireshark showed 2 types of lookup were attempted - normal DNS lookup, and a netbios lookup.  Regular DNS lookups returned pretty instantly with NXDOMAIN.

I saw the netbios lookup packets conformed to NameSrvQueryCount and NameSrvQueryTimeout defaults as mentioned here http://support.microsoft.com/kb/314053.  1.5seconds * 3 attempts = 4.5 seconds per IsInNet clause.

Completely turning netbios off fixed the issue, but that's clearly a no-go (not a Windows engineer, I'd have happily turned it off!).

Someone in our team came up with the idea of cutting down on the amount of resolving needed.

We went from the following:

=== from ===
if (isInNet(host, "172.31.0.0", "255.255.0.0") ||
    isInNet(host, "10.0.1.0", "255.255.255.0"))
    {
    return "proxy1.domain.co.uk:1234"
    }
    if (isInNet(host, "192.168.0.0", "255.255.0.0") ||
        isInNet(host, "127.0.0.0", "255.0.0.0") ||
        isInNet(host, "172.16.0.0", "255.255.0.0") ||
        isInNet(host, "172.19.4.0", "255.255.252.0") ||
        ....
        ....
        isInNet(host, "172.20.0.0", "255.255.0.0") ||
        isInNet(host, "172.21.0.0", "255.255.0.0") ||
        isInNet(host, "10.0.0.0", "255.0.0.0"))
    {
    return "DIRECT"
 
=== to ===  

var resolved_ip = dnsResolve(host);


if (isInNet(resolved_ip, "172.31.0.0", "255.255.0.0") ||
    isInNet(resolved_ip, "10.0.1.0", "255.255.255.0"))
    {
    return "proxy1.domain.co.uk:1234"
    }
    if (isInNet(resolved_ip, "192.168.0.0", "255.255.0.0") ||
        isInNet(resolved_ip, "127.0.0.0", "255.0.0.0") ||
        isInNet(resolved_ip, "172.16.0.0", "255.255.0.0") ||
        isInNet(resolved_ip, "172.19.4.0", "255.255.252.0") ||
        ....
        ....
        isInNet(resolved_ip, "172.20.0.0", "255.255.0.0") ||
        isInNet(resolved_ip, "172.21.0.0", "255.255.0.0") ||
        isInNet(resolved_ip, "10.0.0.0", "255.0.0.0"))
    {
    return "DIRECT"

======

It's not a fix, but it's 1001 times better than what we had before, hope this helps someone with a similar situation.
This bug is not resolved since 2004 year.
People stop chatting here and make it ALL OVER AGAIN FROM SCRATCH.
Apparently the source is a complete mess.
My proposal is to have a one thread tho handle DNS responses, and if there is proxy set then a second thread will be started to resolve the DNS through the proxy.
So the thread must work by sending the DNS request on 52 UDP/TCP ports as it is done is millions of software.
Then the thread waits for response. And the corresponding TAB in the Firefox is waiting for that response, ALL OTHER TABS ARE WORKING JUST FINE - so we can use them as nothing happend.
When the thread(s) receive a response from the DNS server they send a EVENT to the corresponding TAB with the responded information - i.e. the IP of the web server.

That is all.
There is no other matter to discuss.
There is a way to be done in the current Firefox - after all 2004 year is 7 (SEVEN) years old BUG.
Came on get in hands.
Firefox 7.0 (no addons enabled) - still happening. Am actually forced to use IE because FF unusable behind this proxy. Everything in FF hangs - the About box, the proxy login, the menus etc. It's not frozen - if I wait 20 minutes, a web page will appear. Disable the proxy settings and FF is fine - except no internet of course. Very frustrating & embarrassing.
This is ridiculous. There really shouldn't be any reason why this is still happening. I'm on a clean install of Windows 7 x64 with Firefox 6.0.1 after uninstalling my previous install, deleting every last trace of it, and cleaning my registry. This STILL Happens.

It's really strange. 

I'll be browsing just fine, and then suddenly out of the blue, all of my pages that are loading will get stuck in this loading loop (as if Firefox cannot access the internet.) This is with absolutely no extensions, and just the basic plugins (Flash, Shockwave.)

The only way to get Firefox to work again is to reopen it. While Firefox is hiccuping, I can use Chrome/IE/Opera/Safari just fine.
In reply to comment 220, comment 221 and comment 222: did you try to disable IPv6 via “network.dns.disableIPv6″ user pref?
comen(In reply to RNicoletto from comment #223)
> In reply to comment 220, comment 221 and comment 222: did you try to disable
> IPv6 via “network.dns.disableIPv6″ user pref?

Yep.
Problem exists with large PAC files


Test 1 - Load a dynamic server page on localhost - A PAC file with about 350 hostname resolves. 

Firefox 7.0.1 with the PAC file enabled (Connection Settings --> Automatic proxy configuration URL): 3.042 s, 2.865 s, 2.892 s, 3.042 s, 3.003 s

Firefox 7.0.1 with no proxy enabled (Connection Settings --> No Proxy): 0.646 s, 0.686 s, 0.588 s, 0.699 s, 0.573 s

Internet Explorer 8 with the PAC file proxy configuration enabled: almost instant meaning < 1 s (not measured)


Firefox GUI with the tested PAC file enabled is completely frozen while loading the page.


Test 2 - Start Firefox with 20 tabs

Firefox 7.0.1 with the PAC file enabled (Connection Settings --> Automatic proxy configuration URL): 2 minutes 30 seconds
Firefox 7.0.1 with no proxy enabled (Connection Settings --> No Proxy): 30 seconds
Assignee: jduell.mcbugs → sjhworkman
If this is not going to be tracked against a release, then what are the chances of this ever getting fixed?
A resolution for this is crucial especially for enterprise usage where such a proxy configuration is much more common than at home. I urge you to look into this issue for the upcoming enterprise version of firefox (http://blog.mozilla.com/blog/2012/01/10/delivering-a-mozilla-firefox-extended-support-release/).

It would be a huge improvement for us in Firefox, in fact the biggest in years. Thanks!
(In reply to Benjamin Scherrer from comment #227)
> I urge you to look
> into this issue for the upcoming enterprise version of firefox
> (http://blog.mozilla.com/blog/2012/01/10/delivering-a-mozilla-firefox-
> extended-support-release/).
It's no longer upcoming, it has already been released.
Thanks, I missed the news. So any hopes for progress here? I really hoped it would make this release.
All, sorry for the lack of responses on this one.  There have been some other higher priority tasks for various reasons, but this seems to be next on my list of bugs.  I have other tasks as well, though, so I can't devote all my time to it.  I hope to have a patch up sometime in the next month.
Well "hopefully next month" is pretty awesome compared to the (almost exactly) 8 years that this bug has been around...
I rebased this patch and I'm running it through try to see what the state of the world is.
Attachment #346370 - Attachment is obsolete: true
Attachment #346138 - Attachment description: [1/2] break lines in netwerk/protocol/http/src/nsHttpChannel.h → [1/2] break lines in netwerk/protocol/http/src/nsHttpChannel.h
Attachment #346138 - Attachment is obsolete: true
Attachment #594181 - Attachment is obsolete: true
Try run for 5fa326c23268 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=5fa326c23268
Results (out of 99 total builds):
    exception: 4
    success: 39
    warnings: 16
    failure: 40
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/josh@joshmatthews.net-5fa326c23268
Try run for 55a64e81467b is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=55a64e81467b
Results (out of 129 total builds):
    success: 114
    warnings: 15
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/josh@joshmatthews.net-55a64e81467b
 Timed out after 06 hours without completing.
That try run was completely green, modulo a couple unrelated known oranges. Jason/Biesi (or Steve, for that matter), does this patch require another round of review? The rebasing I performed was mostly mechanical.
Comment on attachment 594233 [details] [diff] [review]
Defer proxy resolution for HTTP and HTTPS PAC to avoid blocking main thread during DNS resolution.

Steve is going to look the patch over.
Attachment #594233 - Flags: feedback?(sworkman)
Jdm, thanks for cleaning up the patch and trying it out.  Looks good to me - no issues.
Target Milestone: --- → mozilla13
Gentleman thank you. It looks like this bug really should be fixed now..? if it gets into next FF release, then it would deserve it's own release party. Is it not the oldest bug ever fixed?
It took Mozilla 8 (verbal: E-I-G-H-T!) years to fix it.

Fixing TNEF attachments in Thunderbird is another sad example for time-consuming bugs and hence is still not fixed.
Maybe fixing proxy resolution will get fixed soon, too?! :)
Many thanks for looking at this issue. Much appreciated
Attachment #594233 - Flags: feedback?(sworkman)
https://hg.mozilla.org/mozilla-central/rev/20240add1d35
Status: REOPENED → RESOLVED
Closed: 16 years ago12 years ago
Flags: wanted1.9.1+
Resolution: --- → FIXED
Whiteboard: [missed 1.9 checkin]
Blocks: 750445
Is this fixed in firefox 12 ? It still hangs if i try to use PAC.
Per the Target Milestone field above, it is fixed for Firefox 13. You can verify that it is fixed with a current beta release.
However, as we discovered in bug 750445, this may not actually be fixed in FF 13 either. More investigation is probably required.
Yes, this is still broken. I filed bug 750445 instead of reopening this one, because the patch for this one stuck (and shouldn't be backed out: we just need to fix it).  Let me know if that's the wrong approach bugzilla-administration-wise.
That's absolutely the best practice, don't let anyone tell you otherwise :)
Making everyone who's been following this bug for the past 8 years jump through hoops. Oh well.
Unfortunately it doesn't really make sense for us to optimize our Bugzilla practices for the casual bug watcher audience (Bugzilla really wasn't designed for that). Trying to track several different changes in a single bug just doesn't work very well.

I suspect that many people on the CC list have since lost interest in getting frequent updates, but those who want to track every step along the way can still CC themselves to the new bug.
And apologies for this bug not getting fixed--it was an accident in the rebasing of a patch.   For all we knew this was fixed until we discovered otherwise last week.  I'll be looking into it very soon.
No worries, we all moved to Google Chrome anyway. 
But I think I'll stay at the CC list for a while: every time I get an email notification about this defect not being fixed, it reassures me that I made a good decision.
(In reply to Avner from comment #253)
> No worries, we all moved to Google Chrome anyway. 

I have to confirm this - at least for my colleagues.

Mozilla is way too good (that video about "we believe in freedom" still makes me cry - just beautiful!) to act like this.
Come on guys, step up and get stuff like x84 builds in the release channel done; it's about time!

Remember: its not about the "under the hood", it's about what changes usability and recognizable changes to the user - everything else is just for geeks, not end-users.

And this is definitely true: Firefox is the better browser, because it respects user's privacy like no other. So get that "nerd touch" destroyed by moving faster.
I don't care what goes on under the hood, and I don't care how the proxy server is being resolved technically.
As an end-user, when my browser hangs I just kill it and move to a faster browser.
That's what I tried to say - thanks for summing it up! :)
Indeed. In fact several years ago it was exactly this bug that forced me to move to Chrome. I don't work in the huge company with complex proxy server configs anymore, yet.. I am on Chrome already. Keeping Firefox only for a couple of Chrome-incompatible sites and following this bug out of nostalgia.
Guys, I know this has been a source of frustration for you, but please show some courtesy to the 103 people CCed to this bug by not spamming them with your reasons for switching to Chrome.
This is my last comment, because I want to respect Ryans kind wish to show some courtesy:

I'm not on Chrome, I'm on Firefox (even Nightly x64 right now) - since Version 1.0, at work, too, even on Android and until it really brakes the basic rules, I guess.

So, finally: sorry to those 103 people for spamming. I did not intend to do so - I just wanted to kind of "resurrect" the spirit that once brought so many users from IE to the great and lovely fox.
I stick with Firefox, and I really appreciate how it has kept listening to its users and stuck with a clean, native user interface... but there's a couple of old horrible bugs like this that really need to get fixed, and I keep following them in the (apparently not quite vain) hope that it will eventually happen. The biggest one is the incomplete downloads bug, where the remote end closes the connection before the download is finished and there's no indication on this end that you only got 20MB of an expected 40MB. Is that another "forgotten" one? If the download manager would just leave the expected size visible after the download ends we'd at least have a chance of catching it when it happens.
Depends on: 767005
Blocks: 769764
backed out in bug 767005
Since the fix has been backed out, please consider reopening this BZ. Thanks!
(In reply to Laszlo Ersek from comment #262)
> Since the fix has been backed out, please consider reopening this BZ. Thanks!

it would just be a dup of the work being done in bug 769764, you can follow it over there.
Resolution: FIXED → DUPLICATE
You need to log in before you can comment on or make changes to this bug.