Closed Bug 1494364 Opened 1 year ago Closed 1 year ago

Frequent "Unable to connect" when using automatic proxy configuration URL

Categories

(Core :: Networking, defect, P2)

62 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla65
Tracking Status
firefox-esr60 65+ fixed
firefox65 --- fixed

People

(Reporter: ypodolyan, Assigned: junior)

References

(Regressed 1 open bug)

Details

(Whiteboard: [necko-triaged])

Attachments

(2 files, 1 obsolete file)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0
Build ID: 20180906134058

Steps to reproduce:

At work, we have a very large automatic proxy configuration script. It used to work fine in Firefox until several months ago. Now, I may be using Firefox for some time for accessing internal and external web pages and it would suddenly stop loading external (to our network) pages with message "Unable to connect. Firefox can’t establish a connection to the server at <some external URL>".

Note 1: All internal web sites still work fine.
Note 2: After I go to Preferences -> Network Proxy -> Settings and change from using "Automatic proxy configuration URL" to "Use system proxy settings" (which uses the exact same automatic proxy connection URL) or vice versa, it starts working again (for some time). Also, restarting Firefox resolves the issue for a short period too.
Note 3: Same issue with Firefox on Ububtu Linux and Windows.



Actual results:

Firefox was unable to connect to external (to our network) web pages.


Expected results:

External pages should be loaded.
Component: Untriaged → Networking
Product: Firefox → Core
That has become so disrupting that I have to switch browser to Chrome. Otherwise every 10 minutes I have to switch the proxy setting.
Hello Eugene,
Thanks for reporting this.
To resolve this issue, we need your help for gathering the information.

We have several things to do.

(a) Ask for confirming if I understand correctly: 
In your recent environment, proxy setting "Automatic proxy configuration URL" doesn't allow you hit the internet.
 "Use system proxy settings" allows.

Is that right?

Are All the external websites blocked? Or just some of them?

(b) Could you remember the last version of Firefox works well?
Try it again to see if the PAC file is good as you think.

Moreover, it's important to find out if this is something that has changed in your local network configuration
or if this is something that has changed in Firefox.

To find out the latter, you can use https://mozilla.github.io/mozregression/quickstart.html

(c) We also need log information to analysis what happened.
Could you attach the log and append the MOZ_LOG environment variable with proxy:5
See instructions here: https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging
Flags: needinfo?(ypodolyan)
(a) No, you misunderstood. It works with either setting for some time and then suddenly stops working showing "Unable to connect. Firefox can’t establish a connection to the server at <some external URL>". When I switch proxy setting to the other one, then it starts working again but after some time stops. So, I switch back to first setting and it starts working for some time. So, I just alternate between "Automatic proxy configuration URL" and "Use system proxy settings" because it works after switch.

(b) Unfortunately, I do not remember the last version it worked.

(c) I will work on getting the log and update shortly.
Interestingly, prior to updating Firefox last Friday, I had this issue very frequently (several times an hour). After updating to firefox-locale-en:amd64 (62.0+build2-0ubuntu0.14.04.5, 62.0.3+build1-0ubuntu0.14.04.2) FROM (I believe) firefox-locale-en:amd64 (62.0+build2-0ubuntu0.14.04.3, 62.0+build2-0ubuntu0.14.04.5), the issue is no longer reproducible. At least, I haven't encountered it today a single time. So, it may be somehow fixed in that latest build. I will monitor for a couple more days to make sure.
Actually, I believe the update was from 62.0+build2-0ubuntu0.14.04.5 -> 62.0.3+build1-0ubuntu0.14.04.2, while the previous update was from 62.0+build2-0ubuntu0.14.04.3 -> 62.0+build2-0ubuntu0.14.04.5.
Thus, 62.0.3 may have the fix.
Feel free to reopen if it happens again. Thanks!
Cancel the needinfo? also.
Status: UNCONFIRMED → RESOLVED
Closed: 1 year ago
Flags: needinfo?(ypodolyan)
Resolution: --- → WORKSFORME
The issue is back again. I was able to capture the log. When I started seeing the issue, I enabled the log. Tried the external site (kohls.com) - didn't work, tried internal site - worked fine. Then I changed Firefox proxy setting from using automated script to "Use system proxy settings" and tried www.kohls.com again and it worked. This is where log ends.
Status: RESOLVED → UNCONFIRMED
Resolution: WORKSFORME → ---
Just accidentally overwritten the log file by enabling it again. I will upload the log when the issue is reproducible again and I capture new log.
I was able to reproduce already several times.

Here's a link to a zipped log file:
https://1drv.ms/u/s!AjUMOm_pWRRB0T2D6bB6-pBsr2qU

Again, I tried a couple of external sites and they dis not work; tried internal and it worked; changed setting (now to Automatic proxy script) and tried the same external sites and they worked.
I've created a new (shorter) log, so you don't have to comb through a long one.
I've tried accessing alexa.amazon.com and support.getvera.com. Both failed. Changed Proxy to "Automatic proxy configuration URL" (from "Use system proxy settings"), reloaded the same pages and it worked. Log file is uploaded
Attached file log.txt-main.2387.zip (obsolete) —
I don't see the proxy log.
Did you append the MOZ_LOG environment variable with proxy:5 like comment 2 (c)?
Flags: needinfo?(ypodolyan)
Here are the log sections I have
timestamp,sync,nsHttp:5,cache2:5,nsSocketTransport:5,nsHostResolver:5,proxy:5
Flags: needinfo?(ypodolyan)
Attached file log.txt.zip
OK, I verified and this one does have D/proxy in it
Attachment #9016461 - Attachment is obsolete: true
Hello Eugene,
Is the behavior like time-out? Or it prompt the error immediately?

As far as I can tell, no response for those failure in local network after mSocketOut::AsyncWait.
For external network, Bad Socket 804b000d is prompt, which means the connection is refused.

pac seems right. Not sure why "reset pac thread" works for you.
Flags: needinfo?(ypodolyan)
Yes, it is a time-out. It tries connecting for several seconds and then times out showing the boiler plate.
After I switch proxy setting to another one (e.g., if it was "Use system proxy setting", I switch to "Automatic proxy configuration URL" or wise versa - btw, system proxy is set to use the same automatic proxy config URL), it starts working again immediately!
Flags: needinfo?(ypodolyan)
Btw, all local traffic is unaffected, I can still access all local sites. The issue only happens with external. Restarting Firefox also helps every single time.
The proxyInfo is pruned, thus causing host unreachable.

PruneProxyInfo prunes two types of proxies: 
(a) unallowed (like FTP connection with HTTP proxy)
(b) disable (which failed before)

The proxy string is all the same and allowed.
Hence, the only reason is the proxy failed before, and we're forced to try direct connection.

It makes sense that we enable all the proxies again after we re-read the pac file.

Two ways to follow-up:
(a) Can you provide a log from good condition to bad?
Therefore we can see what we disable the previous proxy.

(b) Remove the DIRECT for external link in the pac file.
If we see all the proxies are disable, we'll enable all again.

2018-10-15 19:50:15.396370 UTC - [2387:Main Thread]: D/nsHttp nsHttpChannel::ResolveProxy [this=0x7f08bc770000]
2018-10-15 19:50:15.396455 UTC - [2387:Main Thread]: D/proxy AsyncApplyFilters 0x7f089abc23d0
2018-10-15 19:50:15.396463 UTC - [2387:Main Thread]: D/proxy AsyncApplyFilters::AsyncProcess 0x7f089abc23d0 for req 0x7f08a7961d00
2018-10-15 19:50:15.396469 UTC - [2387:Main Thread]: D/proxy AsyncApplyFilters::ProcessNextFilter 0x7f089abc23d0 ENTER pi=0x7f08a04e2580
2018-10-15 19:50:15.396474 UTC - [2387:Main Thread]: D/proxy AsyncApplyFilters::Finish 0x7f089abc23d0 pi=0x7f08a04e2580
2018-10-15 19:50:15.396479 UTC - [2387:Main Thread]: D/proxy nsProtocolProxyService::PruneProxyInfo ENTER list=0x7f08a04e2580
2018-10-15 19:50:15.396479 UTC - [2387:ProxyResolution]: D/proxy Use proxy from PAC: PROXY www-proxy-adcq7-new.us.oracle.com:80; PROXY www-proxy-brmdc.us.oracle.com:80; DIRECT;
2018-10-15 19:50:15.396493 UTC - [2387:Main Thread]: D/proxy nsProtocolProxyService::PruneProxyInfo LEAVE list=(nil)
2018-10-15 19:50:15.396508 UTC - [2387:Main Thread]: D/proxy DoCallback::consumeFiltersResult this=0x7f08a7961d00, pi=(nil), async=0
2018-10-15 19:50:15.396514 UTC - [2387:Main Thread]: D/proxy pac thread callback PROXY www-proxy-adcq7-new.us.oracle.com:80; PROXY www-proxy-brmdc.us.oracle.com:80; DIRECT;
2018-10-15 19:50:15.396520 UTC - [2387:Main Thread]: D/nsHttp nsHttpChannel::OnProxyAvailable [this=0x7f08bc76c800 pi=(nil) status=0 mStatus=0]
Flags: needinfo?(ypodolyan)
I can run logging before connection goes bad, but the log file will be too big. Which sections do you want in the log other than proxy:5? Maybe I could reduce what I have to only what you need to limit the size of the log file.

As for removing DIRECT...the proxy config file is loaded from the web. I assume it does not change often, so I can just download it, remove DIRECT and set in the configuration. But all internal sites should be accessed without going through proxy. Won't removing DIRECT actually leave me with no access to internal ones?
Flags: needinfo?(ypodolyan) → needinfo?(juhsu)
(In reply to Eugene from comment #19)
> I can run logging before connection goes bad, but the log file will be too
> big. Which sections do you want in the log other than proxy:5? Maybe I could
> reduce what I have to only what you need to limit the size of the log file.

Actually we really need all the tags.
Maybe D/proxy and other logs for one connections is enough, but it's hard to extract, though.

If you don't mind, you can send the whole log to juhsu at mozilla dot com
Or prepend rotate:200 to MOZ_LOG for limiting the log size to 200MB.

> 
> As for removing DIRECT...the proxy config file is loaded from the web. I
> assume it does not change often, so I can just download it, remove DIRECT
> and set in the configuration. But all internal sites should be accessed
> without going through proxy. Won't removing DIRECT actually leave me with no
> access to internal ones?

That's true.
An ideal pac might able to recognize if the URL needs a proxy or not.
Like your case, it should force the local connections to DIRECT and external using proxy.
However, it's uneasy to do things like this.

And Eugene,
if my theory is right, setting network.proxy.failover_timeout to 0 in about:config solves your problem.
Can you try it?
Flags: needinfo?(juhsu) → needinfo?(ypodolyan)
A proposed solution:
We have at least one proxy other than DIRECT.
Note that DIRECT can't be disabled.
If we disable all of others proxy, them enable them all.
I set network.proxy.failover_timeout to 0. Will report back soon.
Flags: needinfo?(ypodolyan)
Junior, I am assigning this to you for now.
Assignee: nobody → juhsu
Priority: -- → P2
Whiteboard: [necko-triaged]
Full day with no issues. Will keep watching on Wednesday (skipping 1 day). But looks like setting network.proxy.failover_timeout to 0 resolved the issue.
(In reply to Eugene from comment #24)
> Full day with no issues. Will keep watching on Wednesday (skipping 1 day).
> But looks like setting network.proxy.failover_timeout to 0 resolved the
> issue.

Thanks for your report.
That is a good news :)

And it starts to make sense to implement Comment 21
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Had no issues yesterday with proxy. Setting network.proxy.failover_timeout to 0 resolved it.
In the proxy auto config file, we have 2 PROXY and 1 DIRECT listed. I think it makes sense to implement what Junior suggested (restore all proxies when only DIRECT remains after disabling them).
Keywords: checkin-needed
Pushed by nbeleuzu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/634d9ca93c94
don't prune proxy if all non-direct proxies are disabled r=bagder
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/634d9ca93c94
Status: ASSIGNED → RESOLVED
Closed: 1 year ago1 year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
Can you try if the nightly works for you, Eugene?
Flags: needinfo?(ypodolyan)
Will try tomorrow when I am back in the office.
Flags: needinfo?(ypodolyan)
Used nightly (from Nov 7) (with reverting setting network.proxy.failover_timeout back to default) and had no issues.
Super! Thanks for validating!
Comment on attachment 9021951 [details]
Bug 1494364 - don't prune proxy if all non-direct proxies are disabled

[ESR Uplift Approval Request]

If this is not a sec:{high,crit} bug, please state case for ESR consideration: Per bug 1513571 comment 0, some enterprise company with proxies  may fail to connect site after all proxies have failed to connect one time before.

User impact if declined: ESR might fail to browse any external site in some enterprise environment. Needs to restart browser or do some hack.

Fix Landed on Version: 65

Risk to taking this patch: Medium

Why is the change risky/not risky? (and alternatives if risky): Only validated in nightly, beta and by two reporters.

String or UUID changes made by this patch: No
Attachment #9021951 - Flags: approval-mozilla-esr60?
Comment on attachment 9021951 [details]
Bug 1494364 - don't prune proxy if all non-direct proxies are disabled

Fixes proxy connection issues for some enterprise users. Fix verified on Nightly and Beta. Approved for 60.5.0esr.
Attachment #9021951 - Flags: approval-mozilla-esr60? → approval-mozilla-esr60+
Duplicate of this bug: 1513571

Build ID:20190429215338
Agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0

the issue is back again - we are using PAC file for network configuration and after some time the external addresses stop working due to timeout. opening the network configuration and pressing the reload button near the PAC file location box will restore connectivity.

Regressions: 1549678
You need to log in before you can comment on or make changes to this bug.