Closed Bug 1121800 Opened 5 years ago Closed 3 years ago

Firefox doesn't recover from inaccessible proxy given by PAC

Categories

(Core :: Networking, defect)

35 Branch
x86_64
Windows 7
defect
Not set

Tracking

()

RESOLVED DUPLICATE of bug 1230803
mozilla44
Tracking Status
firefox44 --- fixed

People

(Reporter: hamfastgamgee, Assigned: bagder)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Whiteboard: [lame-network][proxy][necko-active])

Attachments

(2 files, 3 obsolete files)

Splitting out from bug 939318 by request of Daniel Stenberg:

Firefox appears to not be able to recover gracefully from a badly configured proxy PAC file that doesn't specify skipping the proxy when outside of the corporate network.  Our corporate PAC specifies direct connections for a number of local domains and addresses, and sends everything else through one of two proxies for load balancing.  Obviously this isn't best practice (it should be falling back to DIRECT if the proxies fail, or at least allow IPs not on the local subnets to use DIRECT), but it seems that Firefox can't handle this case, whereas both Chrome and IE seems to fall back to not using the proxy if it's not accessible, regardless of what the PAC specifies.  I think Firefox may be doing the "right" thing given the PAC, but it definitely isn't the best user experience.

Alternatively, it could be that the behavior is bad when the proxy PAC can't actually be found, since it is not hosted externally.  So it is found when I'm connected to the corporate network, but it's not accessible when I'm on a different network.

Either way, the consequence is that I have to switch back and forth between "Use system proxy" and "No proxy".
Depends on: 939318
(I think this is more related to proxy than DNS so I changed Component.)

Regarding the PAC you setup at work: when you run Firefox outside of work, can you still access that work-PAC URL?
Component: Networking: DNS → Networking
Whiteboard: [lame-network]
No, it's apparently not available outside of the corporate network. I remembered to check that after writing up everything about having no DIRECT fallback in the PAC and added the second paragraph quickly to mention it. Sorry about the stream of consciousness. :)

(To be clear, I have no control over the PAC file. I just can read its contents when inside the corporate network.)
When you switch to your outside-of-work network, do you then get an error page/message saying the proxy server is inaccessible/refusing connections ?
There's definitely an error page involved.  I can check at home, but I'm pretty sure it mentions timing out.  I don't recall for certain if it says the proxy timed out.
Is that behavior any different due to the network change or isn't that exactly how it behaves if you start your browser on the outside network?

When I tried some things here right now, it behaved like this:

1 - if I give a PAC URL that doesn't work, Firefox instead silently goes on and connects directly

2 - if I give a PAC URL that works, but the PAC offers a proxy that doesn't work (I had mine block the specific IP the client came from) it instead shows an error page.

Given the same exercise, Chrome silently ignores the problem in (2) and connects directly to the site instead.
think its ok to have an implicit fallback to DIRECT if the proxy is not available, which I think is what you're asking. You could argue its a policy violation (you told it to use a proxy damn it) but you point our we already fallback to DIRECT if the pac itself is not available, so I don't really think that's a new hole.
My behavior is in line with what comment #5 indicates.  If I open Firefox cleanly when I'm on my home network (i.e. the PAC is not able to be found), everything works fine.  If instead I have Firefox open while I'm at work (i.e. the PAC is resolvable) and then bring the laptop home without restarting Firefox, then things fail until I either restart Firefox or change it to "No proxy".  I'm not sure if Firefox caches the PAC file in memory in this situation and then tries the proxy based upon what the PAC said but can't resolve it (i.e. the scenario in comment #6) or if something else is going on.

I find it extraordinarily unlikely that my corporate security guys intentionally meant to block non-VPN home traffic from working entirely, and I'm not sure I would see a benefit to Firefox enforcing such a restriction since obviously just turning off proxy use gets around the issue anyway.  I think it's much more likely that they forgot to put in a DIRECT fallback and only tested IE (the corporate standard browser), which falls back in this situation like Chrome does.  (There's a comment in the PAC file suggesting that IPs not on the corporate network should fall through to DIRECT but they didn't actually write the file to implement it that way.)
hey Mark - I think we will make the change to fallback to DIRECT generally - but just to be clear about the concern in doing so: it's not about bypassing a policy someone has placed on you (corp security) it's about enforcing your own preference.. so if you have a proxy setup to do virus checking and we silently bypass that because we can't reach the proxy that's a problem if you think you're still getting virus checking.. so that's the downside.
daniel, do you want to tackle this?
Assignee: nobody → daniel
Yes, I already intended to so assigning it to me is just fine!
(In reply to Patrick McManus [:mcmanus] from comment #8)
> hey Mark - I think we will make the change to fallback to DIRECT generally -
> but just to be clear about the concern in doing so: it's not about bypassing
> a policy someone has placed on you (corp security) it's about enforcing your
> own preference.. so if you have a proxy setup to do virus checking and we
> silently bypass that because we can't reach the proxy that's a problem if
> you think you're still getting virus checking.. so that's the downside.

I wouldn't want my corporate proxy settings enforced at home anyway. That's when I'm actually able to go to video game sites. :)  (But point taken.)
Ok, here's a first dead simple approach that I'm sure is wrong in several ways.

It catches the case when all proxies fail to work and then add a "DIRECT" proxy instead. I tested it for my simple case when I just tell my Firefox to use a non-existing hostname etc.

I'm not that familiar with this part of the code so I'm not convinced this is the best way to accomplish this nor do I know how long this "fake" proxy list lives. I haven't really considered how or if the original list of proxies should be tested again if they possibly only failed for a little while so I would appreciate if someone has ideas about this.
Attachment #8581715 - Flags: feedback?(mcmanus)
Comment on attachment 8581715 [details] [diff] [review]
0001-bug-1121800-when-all-proxies-fail-try-DIRECT.patch

Review of attachment 8581715 [details] [diff] [review]:
-----------------------------------------------------------------

apologies for the delay. my mailer doesn't flag feedback they way it flags ni and r - I fixed that

man. some smart pointers in here would really help :)

The first issue is that a proxy only stays disabled for 30 minutes.. this goes back to darin's original design and I'm not sure its applicable to the modern scene - but its going to interfere with your alldisabled check (they will be enabled again every 30 minutes)... maybe we can just get rid of that timer beacuse your linkup/linkdown code will reset the list and re-enable things every time the link changes anyhow, right?

::: netwerk/base/nsProtocolProxyService.cpp
@@ +1962,5 @@
>  
> +    if (allDisabled) {
> +        LOG(("All proxies are disabled, try a DIRECT rule!"));
> +        (void) NewProxyInfo(NS_LITERAL_CSTRING("DIRECT"),
> +                            NS_LITERAL_CSTRING(""), -1, 0, 0, nullptr, list);

if only this module were that easy!

you're going to leak head if you do that.. so you need to free that list too.
also, you can return *list = nullptr instead of the explicit DIRECT rule.
Attachment #8581715 - Flags: feedback?(mcmanus)
(In reply to Patrick McManus [:mcmanus] from comment #13)

> The first issue is that a proxy only stays disabled for 30 minutes.. this
> goes back to darin's original design and I'm not sure its applicable to the
> modern scene - but its going to interfere with your alldisabled check (they
> will be enabled again every 30 minutes)...

Well, it only makes the test get done every 30 minutes, right? I'm not sure that's a problem in reality. Isn't there otherwise just a risk that there actually are users out there with flaky proxy servers who rely on this timeout and retry mechanism?

Of course one can also just question how often a 30 minute retry timeout helps since 30 minutes is way too long for anyone to wait in case of actual problems and if there are no problems then there's no point in trying again every 30 minutes.

> maybe we can just get rid of that
> timer beacuse your linkup/linkdown code will reset the list and re-enable
> things every time the link changes anyhow, right?

Yes, network changes will cause the PAC file to get reloaded etc. I figure the timeout would be for when the proxy comes and goes without it happening because of a network change.

I'll upload an updated patch in a sec.
Here's an updated version I believe shouldn't leak anything.

nsProtocolProxyService::GetFailoverForProxy() now also do the failover logic for "manual" proxies which could be a reason to change the condition at the top of that function. I made the failover apply for manual proxies as well only to make sure it gets disabled when not working as then this new "go DIRECT when all proxies fail" logic works even for that setup.

Thoughts?
Attachment #8581715 - Attachment is obsolete: true
Attachment #8597272 - Flags: review?(mcmanus)
Comment on attachment 8597272 [details] [diff] [review]
v2-0001-bug-1121800-when-all-proxies-fail-try-DIRECT-r-mc.patch

Review of attachment 8597272 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/base/nsProtocolProxyService.cpp
@@ +1367,4 @@
>                                              nsIProxyInfo **aResult)
>  {
>      // We only support failover when a PAC file is configured, either
>      // directly or via system settings

update the comment.

@@ +1981,5 @@
>      }
>  
> +    if (allDisabled) {
> +        LOG(("All proxies are disabled, try a DIRECT rule!"));
> +        NS_RELEASE(head);

unfortunately you need to release each element in the list.. take a look at the iterator on the else clause here and see how its releasing each disabled pi. Indeed you can probably just use that branch unconditionally and do a null check in the terminator.

like I said, some pointer management classes here would really help :)
Attachment #8597272 - Flags: review?(mcmanus) → review-
Possible dupe of bug 973169
Summary: Firefox can't seem to recover from bad proxy configuration → Firefox doesn't recover from inaccessible proxy given by PAC
Version 3. I'm doing more testing and try-runs with this.
Attachment #8597272 - Attachment is obsolete: true
Attachment #8665662 - Flags: review?(mcmanus)
Comment on attachment 8665662 [details] [diff] [review]
v3-0001-bug-1121800-when-all-proxies-fail-try-DIRECT-r-mc.patch

Review of attachment 8665662 [details] [diff] [review]:
-----------------------------------------------------------------

so this gets rid of the cycle where we keep banging our heads against the wall (good!), but I thought we were also going to see a lowered timeout for the discovery.. no?

::: netwerk/base/nsProtocolProxyService.cpp
@@ +1376,5 @@
>                                              nsIURI        *aURI,
>                                              nsresult       aStatus,
>                                              nsIProxyInfo **aResult)
>  {
> +    if (mProxyConfig == PROXYCONFIG_DIRECT)

braces

@@ +2003,5 @@
> +        // since we are about to use this proxy, make sure it is not on
> +        // the disabled proxy list.  we'll add it back to that list if
> +        // we have to (in GetFailoverForProxy).
> +        //
> +        // XXX(darin): It might be better to do this as a final pass.

we can kill this comment at this point
Flags: needinfo?(daniel)
(In reply to Patrick McManus [:mcmanus] from comment #19)

> so this gets rid of the cycle where we keep banging our heads against the
> wall (good!), but I thought we were also going to see a lowered timeout for
> the discovery.. no?

Yes, but since that wasn't exactly what this bug is about I took that effort to the separate bug 1207537 and it can in fact be considered independently of this or other proxy fixes.

[edited my local version of patch according to comments]
Flags: needinfo?(daniel)
Attachment #8665662 - Flags: review?(mcmanus) → review+
Rebased
Attachment #8665662 - Attachment is obsolete: true
Attachment #8671784 - Flags: review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/13db8516d365
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
Hey Daniel. I am not familiar with this bug but I think this commit may be the reason of recent Nightly crashes. Here's a recent crash report of me.

https://crash-stats.mozilla.com/report/index/50e15376-16b7-41af-a194-aa3032151015#extensions

This crash should be fixed 10 years ago in bug287956. It is crashing again and I found that the most recent committed changes to http://hg.mozilla.org/mozilla-central/annotate/607a236c2299/netwerk/base/nsProtocolProxyService.cpp#l1912 is from you.

So anyway maybe you could check the commit again or I will file a new bug.
Flags: needinfo?(daniel)
Interesting. I don't see how my change introduce any new pointer logic that can easily explain this problem. Do you have any way to reproduce this crash? What's your proxy situation like? And yes, I think a new bug makes sense.
Flags: needinfo?(daniel)
(In reply to Daniel Stenberg [:bagder] from comment #26)
> Interesting. I don't see how my change introduce any new pointer logic that
> can easily explain this problem. Do you have any way to reproduce this
> crash? What's your proxy situation like? And yes, I think a new bug makes
> sense.

I set up a local SSH proxy at `127.0.0.1:7070`, go to Preferences -> Network -> Proxy and set global HTML proxy as `127.0.0.1:7070` and `use it for all protocols`. Load google.com and wait for a few seconds, crash.

Here's the report on a clean and new profile.

https://crash-stats.mozilla.com/report/index/53ca4e7e-cb1d-4616-b8f0-30bd32151015

If you could reproduce this too, I would like to file a new bug.
Flags: needinfo?(daniel)
I just found that a local proxy is not even needed. Just set proxy to `127.0.0.1:7070`(which is invalid) and wait for the crash...
Seems to be bug 1214200
Flags: needinfo?(daniel)
I've posted a suggested patch now over in bug 1214200
Depends on: 1214200
It looks like this broke AWSY where we set an invalid proxy to avoid hitting the (non-local) network [1]. We're using the TP5 pageset, some of which tries to access non-existent ad servers, etc and are now getting test timeouts as the page tries to load inaccessible content. Any suggestions on a work around?

[1] https://github.com/mozilla/areweslimyet/blob/master/benchtester/MarionetteTest.py#L49-L53
Flags: needinfo?(daniel)
Option 1)

Originally mentioned over in Bug 1207798, where some users were concerned about the use-case where using the proxy is CRITICAL and we MUST NOT try accessing the internet directly for security and other concerns: provide a pref variable that when set prevents Firefox from trying the network without a proxy even if the proxy isn't accessible.

Option 2)

Your test case can use a fully functional proxy that simply returns something specific and well defined for everything it gets asked for (as that would be content you don't want for your test). Like a 404 or an empty 200.

Option 3)

Uh, something else!
Flags: needinfo?(daniel)
(In reply to Daniel Stenberg [:bagder] from comment #33)
> Option 1)
> 
> Originally mentioned over in Bug 1207798, where some users were concerned
> about the use-case where using the proxy is CRITICAL and we MUST NOT try
> accessing the internet directly for security and other concerns: provide a
> pref variable that when set prevents Firefox from trying the network without
> a proxy even if the proxy isn't accessible.

This would be preferable, I imagine other pieces of automation would like this as well. Although arguably I just want a "network-on-localhost-only" pref, I don't really care if that's done through fake proxies or not.

> Option 2)
> 
> Your test case can use a fully functional proxy that simply returns
> something specific and well defined for everything it gets asked for (as
> that would be content you don't want for your test). Like a 404 or an empty
> 200.

I've implemented this as a temporary workaround on our test machine.
Depends on: 1207798
Status: RESOLVED → REOPENED
Depends on: 1230803
Resolution: FIXED → ---
Blocks: 1207436
Whiteboard: [lame-network] → [lame-network][proxy]
Whiteboard: [lame-network][proxy] → [lame-network][proxy][necko-active]
Duplicate of this bug: 733688
Debugging wonky network situations[1] when using PAC/system proxy with Firefox
==============================================================================

I've recently added additional logging to the proxy module in Firefox [2], so
please try the following with a Firefox build that contains this addition
[3]. The idea is that with the added logging we should get more information
about what exactly is happening and failing in your particular case.

Enable HTTP logging as described here:

  https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging

The minimum amount of logging would probably include:

  export NSPR_LOG_MODULES=timestamp,proxy:5,nsNotifyAddr:5
  export NSPR_LOG_FILE=httpproxylog.txt

"proxy" is for the PAC and proxy logic to get logged and "nsNotifyAddr" is for
the "link monitor" logs that inform about detected network changes.

With this logging enabled, try to reproduce the problem you've reported and
then attach the produced HTTP log file to the corresponding bug report for the
bug you're working on.

[1] = this includes laptops coming back from sleep/hibernation etc as that is
basically a network change for Firefox internals.

[2] = Bug 1260407 landed in
https://hg.mozilla.org/mozilla-central/rev/4a7ab6d8c379

[3] = Firefox nightly as of April 1st (no joke) and later includes this
change.
Flags: needinfo?(hamfastgamgee)
I connected to a few different websites at work and then tried loading talkingpointsmemo.com when I got home, resulting in failures to connect.  You'll see the relevant messages for the "home" case at the bottom of the log.
Flags: needinfo?(hamfastgamgee)
Preventing a browser from making direct requests to the Internet is not security at all. Don't make any sense one say that it is a "critical" thing. 

If it is necessary to forbid your computers to access Internet directly, you should use a FIREWALL system to do so, not a configuration on the users' browsers. 

What any browser should do is precisely the opposite: try till its last breath to make the Internet connection work for any user. 

In this sense, what should be available is a way to fall from a "PAC" to a manual proxy setting, then to automatically discover a proxy setting, then to try a direct connection, then to put an error saying "sorry, I'm exhausted to try and I could not find an Internet connection for you".
Just adding some: a pac file can be unavailable for multiple reasons: the dns down, the web servers down, corruption of the file, etc. It seems that if Firefox cannot access the pac file, it doesn't try to discover the proxy automatically (through wpad protocol) nor to use the manual proxy setting. It just tries to go directly. And as in a good network a direct access to the Internet is forbidden on the security systems, it just fails. The infamous IE, for instance, acts as expected, find a proxy to use and doesn't let the user down. 

That is a behaviour to fix.
This turns out to be a question of general improved proxy behavior and as such I'll mark it as a dupe of the super bug for improved proxy and pac behavior: bug 1230803. In that bug we'll work on making proper flows on how to behave when certain things don't work, what to try next and not to try next and how to select backup behaviors.
Status: REOPENED → RESOLVED
Closed: 4 years ago3 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1230803
You need to log in before you can comment on or make changes to this bug.