Closed Bug 562917 Opened 10 years ago Closed 2 years ago

[meta] implement captive portal detection

Categories

(Core :: Networking, defect, P3)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: beltzner, Assigned: valentin)

References

(Depends on 3 open bugs, Blocks 6 open bugs)

Details

(Keywords: meta, privacy, sec-want, Whiteboard: [captive portal][lame-network][necko-backlog])

Attachments

(1 file)

Safari on iPhone, iPod Touch and iPad can detect when its connected to the Internet via a captive portal. When that situation is detected, instead of trying to load network resources, it delays network loads and presents a window in which the user can authenticate through the portal before continuing.

We should do the same. Not sure if the code for this is in WebKit or not.
Webkit does not include a network stack.  In fact, Safari's network stack is closed-source last I checked.
(In reply to comment #1)
> Webkit does not include a network stack.  In fact, Safari's network stack is
> closed-source last I checked.

I'm not sure that it's happening at the level of the network stack, actually. I put the bug here, but it might be easiest to do this at the application level. Y'all'd know better than me :)
Joe says that Apple does this by hitting a known URL on apple.com and comparing results; if they're not as expected, it's captive portal time!
Hmm.  That shouldn't be too bad to do, I guess.
While I know of the perverse incentives that bounties bring: a bottle of scotch to whomever comes up with a patch that passes reviews!
(In reply to comment #5)
And my axe!
See Also: → 603505
Whiteboard: [captive portal]
Blocks: 157555
windows 7 also detects this case fwiw.

i think one approach is to try to load www.example.com and the page you're trying to load, if you get similar content or if www.example.com resolves to private space or 1.1.1.1, then it's captive. But yeah, having a static page in mozilla.org that we expect to reach is sort of ok. There might be people who'd complain about privacy concerns.

Perhaps someone could use wireshark to see how Windows handles this :)
Please consider air-gap networks, too, where reaching the public Internet is intentionally not possible.

Will air-gap networks resolve external hostnames to an internal landing page ("sorry, you cannot go there"), or will DNS say "not found"? I'm afraid both might be possible.
OS: Mac OS X → All
Hardware: x86 → All
What are the false positive possibilities here?

Is there a phone-home privacy issue with doing the equivalent of "hit a URL on apple.com" technique?

Could we do a DNS lookup for mozilla.com and see if it was in our netblock?

Gerv
(In reply to comment #10)

> Could we do a DNS lookup for mozilla.com and see if it was in our netblock?
> 

generally the dns is not impacted; all http gets a redirect while you're captive. but that's not 100% consistent.

pipeline will likely use a pre-test (bug 603505) so maybe the captive portal test can overlap with that one.. (and yes, that means pipelining gets disabled in captive portal situations which is a conservative way to start rolling that out).

lastly, this isn't just a startup time issue..
> Please consider air-gap networks, too

In what ways (that we care about) do air-gap networks act differently than captive portals.

> What are the false positive possibilities here?

If we use the phone-home technique, I believe we can do it in a way that will have no false positives as long as the system time is correct, by having the server return a signed response. 

> lastly, this isn't just a startup time issue..

To clarify what I think Patrick is saying: captive portals can mess things up at any point, not just during startup. Lots of captive portals implement timeouts that cause them to start intercepting your traffic in the middle of a session (e.g. in order to ask you to pay more). Also, the user's system can change networks any time to one that requires a (different) captive portal.
corporate networks often have working dns to the world but no connectivity outside the corporate network without using a proxy.

that means i can reach https://mail.myemployer.com and http://www.myemployer.com and a thousand other sites, while i can't reach http://www.google.com.

The captive portals I've met generally send very close to the same content for each page you request. Note that an air gap network is likely to result in no response for the request (socket timeout or whatever) which is rather different from providing wrong but identical content.
http://technet.microsoft.com/en-us/library/cc766017%28WS.10%29.aspx
http://blog.superuser.com/2011/05/16/windows-7-network-awareness/

Windows Vista and later already implement captive-portal detection. I believe MacOS/iOS has it too. We may be able to piggy-back on their detection when we can, in order to avoid the latency and the privacy implications of using our own.
Blocks: 340548
No longer blocks: 157555
Keywords: privacy
See Jo's comment in bug 688448 comment 1 regarding the difficulties in detecting this situation.
Blocks: 688448
(In reply to Mike Beltzner [:beltzner] from comment #3)
> Joe says that Apple does this by hitting a known URL on apple.com and
> comparing results; if they're not as expected, it's captive portal time!

That URL seems to be <http://www.apple.com/library/test/success.html> according to <http://www.mactalk.com.au/31/66812-iphone-3-0-wireless-captive-portal-support.html>. Note that first comment in that pays says that <http://www.apple.com/> is used, but that can't be right.

The Microsoft algorithm for network detection is described here : <http://blog.superuser.com/2011/05/16/windows-7-network-awareness/>. Interestingly, it's not just captive portal detection, but also network detection. If the http-request fails /and/ the DNS-lookup fails, then there's no network access at all.
Some wireless networks display an advertisement for the very first http-request (possibly added to the page or as a popup, but it can also a click-through page), but there's no real captive portal. That is, the second http-request (or the first one after the click through page) will always go through.

How can we deal with this situation ? Try 2 different urls, and if the second goes through, then it's not a captive portal ? If both urls are captured, then it must be a captive portal (or a click-through ad).

PS : let's hope that Adblock Plus doesn't remove the ad, otherwise the user will not understand why the browser is always showing an empty page first :-)
Whiteboard: [captive portal] → [captive portal][lame-network]
Brian: can you expand on your uiwanted request?
Blocks: 725986
I think the UI for this should be done in a separate bug.

I propose to start by implementing a background service that we can query for the current captive portal state.

This feature could help us to improve the user experience when using stricter security checks (and I propose to discuss such UI in bug 725986).
Keywords: uiwanted
I believe letting the captive portal tell the browser where it is, would be more useful than trying to guess if a captive portal exists in the first place.

The ietf has defined
HTTP error 511 Network Authentication Required
for this case in its latest drafts

(see bug #728658)
(In reply to Nicolas Mailhot from comment #20)
> I believe letting the captive portal tell the browser where it is, would be
> more useful than trying to guess if a captive portal exists in the first
> place.
> 
> The ietf has defined
> HTTP error 511 Network Authentication Required
> for this case in its latest drafts
> 
> (see bug #728658)

-> <http://www.rfc-editor.org/queue2.html#draft-nottingham-http-new-status> (so the spec is approved and in the queue for publication as RFC)
From Bug 604975 Comment 4:

If we want to use the Windows APIs to detect this case then the information
below from from
https://blogs.msdn.com/b/ieinternals/archive/2011/05/18/how-windows-detects-a-captive-network-portal-and-prompts-to-open-a-browser.aspx
will help:

"Applications which want to make use of this information can use the Network
List Manager API. There's a get_IsConnectedToInternet method on the INetwork
interface and the captive portal flag can be found by looking at the VT_UINTs
named NA_InternetConnectivityV4 and NA_InternetConnectivityV6 in the INetwork's
property bag. If the flag NLM_INTERNET_CONNECTIVITY_WEBHIJACK is set, then a
captive portal was detected."
(In reply to Julian Reschke from comment #21)
> (In reply to Nicolas Mailhot from comment #20)
> > I believe letting the captive portal tell the browser where it is, would be
> > more useful than trying to guess if a captive portal exists in the first
> > place.
> > 
> > The ietf has defined
> > HTTP error 511 Network Authentication Required
> > for this case in its latest drafts
> > 
> > (see bug #728658)
> 
> -> <http://www.rfc-editor.org/queue2.html#draft-nottingham-http-new-status>
> (so the spec is approved and in the queue for publication as RFC)

-> <http://greenbytes.de/tech/webdav/rfc6585.html#status-511>
I think using the NLM_INTERNET_CONNECTIVITY_WEBHIJACK is the best solution to start with: it could be a simple fix that makes life better for our windows users.

Can anyone tell me the rough location in the source code I should start looking at ?
(In reply to Devdatta Akhawe from comment #24)
> Can anyone tell me the rough location in the source code I should start
> looking at ?

I'm invoking Aahz's Law: "The best way to get information on Usenet is not to ask a question, but to post the wrong information." - so what I'm about to say is likely wrong, but may prompt someone to reply with the correct answer :)

In comment #19, Kai said:
> I propose to start by implementing a background service that we can query
> for the current captive portal state.

So it seems there are actually 2 distinct tasks here - define the service interface, then implement that interface on the platforms we care about.  The former is likely to be more contentious.

One thought is that nsINetworkLinkService (http://mxr.mozilla.org/mozilla-central/source/netwerk/base/public/nsINetworkLinkService.idl) could grow new functionality for this purpose.  It sounds like we might just need one method returning a boolean (isCaptivePortal()?) and possibly define a new observer topic that could be sent as we detect a new state.  The other alternative is obviously to define a new interface specifically for this use-case.  Once the service interface has been thrashed out, I'd guess the windows implementation would end up in http://mxr.mozilla.org/mozilla-central/source/netwerk/system/win32/
or a new link status:

#define NS_NETWORK_LINK_DATA_WEBHIJACK "web mitm"

which will hopefully also map cleanly to the Windows style
It seems Mozilla currently uses the IP Helper API to figure out the status of network interfaces, instead of the Network List Manager API. (See http://msdn.microsoft.com/en-us/library/windows/desktop/ee663286(v=vs.85).aspx 

Switching to Network List Manager API makes sense I think and should only require changes at CheckAdaptersAddresses function in http://mxr.mozilla.org/mozilla-central/source/netwerk/system/win32/nsNotifyAddrListener.cpp#322
replacing the code in CheckAdaptersAddresses with something like http://msdn.microsoft.com/en-us/library/windows/desktop/aa965303(v=vs.85).aspx should work I think.
So the logic to ask Windows about the network status is simple, and I already have a prototype. The question is how/when to call this code is called: Firefox seems to default to "assume network is up". For perf reasons, we might not want to change that, and typically it doesn't seem to be a problem. HTTP sites redirect to the captive portal page. 

I think the problem is for SSL connections. Maybe SSL code should call the network service to check if the link is a captive portal? 

@bsmith: what do you think?
Here's a simple patch that uses the NLM interface to look at the status of the connection. I am not quite sure what to do with the return value of get_IsConnectedToInternet.
bug 752982 requests captive portal detection on Firefox OS. I propose a CaptivePortalDetector service that waiting for network interface change event and post a custom event to notify that login is required. We can reuse the captive portal detection algorithm on different application and platform, since the communication between CaptivePortalDetector, network interface, login page is done by nsIObserverService and CustomEvent.
How do you think about this approach?
Keywords: sec-want
Comment on attachment 636512 [details] [diff] [review]
Use new NLM Code for checking adaptor status

After talking with Devdatta and Brian, I learned that this patch doesn't do what we want to do yet. I don't think this code would be reached.

We might also want to figure out the user interactions here before starting on this type of implementation since the UX may bring with it extra requirements/constraints that would be good to know of ahead of time.
Attachment #636512 - Flags: feedback?(bsmith)
> How do you think about this approach?

@Shih-chiang: see Mark's comment #25. I agree with him that it is cleaner to just use the existing nsINetworkLinkService. That said, if you have the implementation of a new service ready, then that should be good too.
Blocks: 816866
No longer blocks: 816866
Blocks: 803202
Captive Portal, sounds good, but have a look at this:
Consider a user connected to internet from a wifi, leased line, over LAN proxy or whatever that requires authentication first. Now all of them has in common is their IP class(192.168.X.X or something else that comes under public IP). So a website with a static IP(say mozilla.org) will be re-routed to the authentication portal. Given the static IP of a website(DNS lookup, but sometime they too map the IP on the authentication portal, so we can have something saved in prefs that is updated once in a browsing session), we can easiely match the redirected IP and can check it for it being a public IP or not(just for the sake that if a user tries to open a website in the first go and it is been redirected to another one, TEST CASE).
Captive Portal, sounds good, but have a look at this:
Consider a user connected to internet from a wifi, leased line, over LAN proxy or whatever that requires authentication first. Now all of them has in common is their IP class(192.168.X.X or something else that comes under public IP). So a website with a static IP(say mozilla.org) will be re-routed to the authentication portal. Given the static IP of a website(DNS lookup, but sometime they too map the IP on the authentication portal, so we can have something saved in prefs that is updated once in a browsing session), we can easiely match the redirected IP and can check it for it being a public IP or not(just for the sake that if a user tries to open a website in the first go and it is been redirected to another one, TEST CASE).
That's bug 701823, which is dependent on this one.

Note that a captive portal doesn't redirect you to a different ipaddress per se. You just receive different content for your URL. Sometimes your DNS lookup will also be changed if you haven't logged in yet (and it's not necessarily a RFC 4193 ipaddress), that's why the DNS cache needs to be cleared.
I guess i have prompted a solution and not a bug, anyways
Applying captive portal, has 2 things to be considered:
1. Told by me in previous comment, where redirection take place.
2. Told by you, as link remain same just a NTLM security login pops up.
So, for a captive model that has to be designed have to consider these 2 points.
Caution: Implementing a captive portal means that a low scale DDOS attack on a website(that we will connect to) as if lakhs of user surf web at a time then they all are sending queries to the server, just to check whether he/she is online.
We're thinking about implementing a solution to this problem within HTTPS Everywhere because it's a major source of poor UX for our users ( https://www.eff.org/https-everywhere/faq#captiveportal ).  

One thing I'm curious about is whether a solution that we might implement in extension JS + XPCOM would be readily landable in Mozilla itself, or whether porting it over to native code would be necessary.
Captive portal detection which uses a server to detect them was implemented for B2G in 752982. The code is in toolkit so could theoretically be used by Firefox. See /toolkit/components/captivedetect and nsICaptivePortalDetector.
Depends on: 752982
I think determining whether or not any particular implementation could land would first require us to agree on what, exactly, that implementation would do.

At one point, some people on the Firefox team were investigating this issue. I am not sure if that is still happening and/or if it is happening in a different bug.

Here are a few notes I have about this, from a private email conversation:

Larissa Co wrote:
> The desirable UX outcomes from my POV are:
> 1. We never show the user any certificate error page
>
> 2. We try as much as possible to preserve the URL the user was on
> (currently, a captive portal redirects the URL, and there's no way
> for the user to navigate back to the page he was on without using
> history or the back button. I'd rather find a way to preserve all
> his tabs)
>
> 3. We detect the captive portal almost immediately after the user
> connects to WiFi so that he can recognize that it's a required step
> to be connected to the Internet (for example, so that he doesn't go
> into his email and try to send mail before he's fully connected)

Some operating systems, including especially Windows, have captive portal detection already. In fact, for B2G we mostly copied their strategy.

The patch in bug 562917 uses the Windows mechanism to be notified when Windows things that we're affected by a captive portal. But, based on my own experience, Windows tends to tell us that only right after we join a network. It doesn't try to do the detection after its first check, in my experience. Note that this captive portal detection is what causes Windows to pop open the "This network may require authentication" popup from the system tray.

The problem with certificate errors is that they are a symptom of a captive portal that happens after that initial detection. There are multiple dimensions to the detection problem: before the user starts browsing vs. during browsing, main page vs. subresources, captive portals that redirect vs. captive portals that don't redirect, redirects to HTTPS sites vs. redirects to non-HTTPS sites.

Before the user starts browsing vs. after the user starts browsing:

It seems like the case being described above by Larissa is mostly the "before the user starts browsing" case. That is the same case that Windows (and maybe Mac) already provide APIs for. I think at least, we need to make sure that we handle the case where the user clicks on the "This network may require authentication" popup correctly. IIRC, we currently do the wrong thing because we load about:home when the user clicks the tab (When Firefox isn't running). Instead, we should be loading the captive portal's page in that case.

If we fixed that bug, then on startup we can also do a captive portal check similar to windows, and open up the captive portal's page instead of (or in addition to?) about:home. HOWEVER, my concern is that if we pause the loading of all the tabs the user has, to do the captive portal check, then we will create a HUGE startup time (time to load home page) performance regression. So, I think instead we need to have the captive portal detection logic running in parallel with our normal attempts to load the page. In that sense then, the "before the user starts browsing" case might actually just be a very slight specializatoin of the "after the user starts browsing" case.

I spend a lot of time dealing with captive portals and I can tell you that it is pretty common for a captive portal to cut off your network access during a browsing session in order to make you log in again. For example, many hotels will make you log in every 24 hours. So, if you keep your browser up more than 24 hours, then some sites will just stop working (with certificate error pages and/or failed AJAX loads) until you open a tab and log in again. So, I think it is important to handle the "after the user starts browsing" case too. Perhaps it shouldn't be as high of a priority as the "before the user starts browsing case," unless we end up implementing the "before the user starts browsing" case as just a special case of this. The "after the user starts browsing" case can be split up into main-page and subresource cases.

The easiest case to handle is when the captive portal blocks the loading of the main page. If the site you are trying to visit is not HTTPS then it "just works." In the case where the site is HTTPS then you get the untrusted certificate error page. Presumably, then, we just need to determine if the untrusted certificate error is caused by a captive portal or a bad guy, and then either open a new tab automatically to sign into the captive portal or show the user some UI to allow them to do so.

The harder case to handle is when the captive portal blocks the loading of a subresource of a page. For example, you might be reading your GMail or https://mail.mozilla.com mail and then everything just stops working due to hidden untrusted certificate errors. (We don't show the error page for untrusted certificate errors caused by subresource loads.) A possible solution here is to show an infobar or something similar with a button that allows the user to open a new tab to sign into the captive portal; perhaps you guys have a better solution.


So, I would like to propose an implementation strategy:

1. Make Necko (the Gecko networking component) responsible for detecting captive portals and reporting a special error code NS_ERROR_SEEMS_TO_BE_CAPTIVE_PORTAL from networking calls.

2. When the DocShell/Window's WebProgressListener (or whatever) sees that a network load failed with NS_ERROR_SEEMS_TO_BE_CAPTIVE_PORTAL, it does it's stuff. For example, if the load was for the main page, then perhaps it shows some "Hey, things aren't working, probably because you need to log into this network. Do you want to do that in a new tab?" page, and if the load was for a subresource then perhaps it shows "Hey, things aren't working, probably because you need to log into this network. Do you want to do that in a new tab?" infobar.

3. For the tab that we open the captive portal login page in, periodically check to see if the captive portal is still blocking stuff from loading. For example, we can do this with a timer and/or based on navigation events in that tab. When we detect that the captive portal isn't blocking stuff from loading any more, then we notify all the other tabs that the captive portal isn't being a bitch anymore. Then those tabs that had the main page blocked can automatically reload the page, and those tabs that had some sub-resource blocked can change their infobar (or whatever) to have a "hey, maybe you need to refresh this page or something" message. (It would be nice to automatically reload the subresources that were blocked, but often that won't completely fix the page and it might cause more harm than benefit.)


Now, I would like to answer the question "How to tell the difference between an attacker and a captive portal." I propose that we try to handle this as follows:

1. When Necko detects any "untrusted certificate" error, it sets that connection to be in "captive portal detection only" mode. That means necko will only use that connection for requests (nsIRequests) that have a special CAPTIVE_PORTAL_DETECTION_REQUEST flag set on them. Then, instead of sending the original request (e.g. GET / from Host: facebook.com) we send the captive portal detection request (e.g. GET / from Host: mozillacaptiveportaldetection.com) and then close that bad connection. The captive portal detection request will tell us (probably) whether the connection is blocked by a captive portal or not. There are multiple cases to consider:

Case 1: The response for the captive portal request was a (302) redirect to a https:// site that has a VALID certificate that has a different ETLD+1 from mozillacaptiveportaldetection.com. This is almost definitely a "safe" captive portal.

Case 2: The response for the captive portal request was a (302) redirect to a non-HTTPS site. Probably a valid captive portal but really it could be anything.

Case 3: The response for the captive portal request was a (302) redirect to a HTTPS site with an invalid certificate. Based on my experience, probably not a captive portal.

Case 4: The response was not a (302) redirect. Very difficult to tell.

Based on my recent experience, Case 1 is by far the most common within the US and Western Europe. So, I suggest that we try to optimize things for Case 1 and/or we try to get telemetry about how common the other cases are.

Also, in cases 1-3, Necko would be able to tell the caller what the redirected-to site would be. So, if we were to show a "helper page" and/or infobar or something, we could include the link to the redirected-to site in that page. But, IMO, it is probably not safe to automatically open a new page automatically for cases 1-3 EXCEPT for the "Before the user starts browsing" case. And, especially for cases 2, 3, 4, the tab we open (if we open one at all) should probably have some kind of warning indicator to explain the risk and/or why we opened that site for the user.
(In reply to Brian Smith (:bsmith) from comment #41)
 
> Case 2: The response for the captive portal request was a (302) redirect to
> a non-HTTPS site. Probably a valid captive portal but really it could be
> anything.

Only 302? I'd think at least 307 and maybe all 3xx redirects here

You'd also need to handle HTTP 511 there now it's been specified by the IETF
(In reply to Brian Smith (:bsmith) from comment #41)

> I spend a lot of time dealing with captive portals and I can tell you that
> it is pretty common for a captive portal to cut off your network access
> during a browsing session in order to make you log in again. 

Any properly designed portal will do that, infinite credentials is not something the captive portal manager will ever want to allow. And besides no captive portal hardware has 100% availability, they all restart from time to time.
(In reply to Brian Smith (:bsmith) from comment #41)

> The patch in bug 562917 uses the Windows mechanism to be notified when
> Windows things that we're affected by a captive portal. But, based on my own
> experience, Windows tends to tell us that only right after we join a
> network. It doesn't try to do the detection after its first check, in my
> experience. Note that this captive portal detection is what causes Windows
> to pop open the "This network may require authentication" popup from the
> system tray.

And why does it say the network may? Because detection at network join time is pretty useless. Captive portals are not always placed between the client and any network access. Quite often they only mediate accesses from the local networks to the outside (perimeter security system). So any browser access to local sites won't need any portal auth, it only triggers to accesses to restricted web sites (which may be everything, everything on the internet, or everything except some sites trusted by the portal operator). It's not a binary all or nothing state.

The only reliable way to trigger portal auth when its needed, and not just in case, is to handle the captive portal responses to actual user requests, and not to a test url that does not match what the user is going to do.
Something something perfect enemy of the good something something. Also, covering the 90% case, which is the "I am in a Starbucks and just got captive portal'd" scenario, will be a huge win. We can iterate towards perfect afterwards.
The design I was imagining is below.  I think it takes an easier route around the "detecting capitve portals" problem than bsmith's proposal; I'm not sure whether it has any important drawbacks.

1. detect new network connection events via the OS. I think most or maybe even all of that can be done by with an nsIObserver for "network:offline-status-changed" and the nsIWifiMonitor service.  If there's a new network connection, go to step 2.  Per bsmith's design ideas, a cert warning should probably also go to step 2.

2. fire DNS, HTTP and HTTPS requests at a special purpose domain like "hijack-me.com" that exists only to give standard responses for captive portal detection purposes.  If any of these responses look abnormal, go to step 3

3. popup a window with differently themed chrome (perhaps red decoration, or yellow-and-black warning stripes), and an explanatory title like "this network is currently interfering with your connections" and continue the hijack-me.com load in that visible window

4. Similarly to bsmith's step 3, the hijack-me.com load will be hijacked in that window, and the user can click "I agree", enter their CC details, or whatever the captive portal demands

5. if that window redirects to the real hijack-me.com, close it and declare victory.  Each time a page load, form submission or redirect completes in that window, retry the connection to hijack-me.com in the background; if it succeeds without interference, close the warning window.
Whatever we do, can we please complete the check before kicking off the loads of any restored tabs? 

I see a pattern where I connect to a wifi network and start Firefox. It loads (some of) my old tabs, which all then get redirected to the portal, but the portal is broken and so it doesn't take me back to the original page everywhere, once I've logged in on one of the page instances.

We can fix this by not kicking off loads of restored sites until we've completed captive portal detection. (We should investigate whether this will lead to a user-noticeable perf decrease.)

Gerv
(In reply to Peter Eckersley from comment #46)
> The design I was imagining is below.  I think it takes an easier route
> around the "detecting capitve portals" problem than bsmith's proposal; I'm
> not sure whether it has any important drawbacks.
> 
> 1. detect new network connection events via the OS. I think most or maybe
> even all of that can be done by with an nsIObserver for
> "network:offline-status-changed" and the nsIWifiMonitor service.  If there's
> a new network connection, go to step 2.  Per bsmith's design ideas, a cert
> warning should probably also go to step 2.

That assumes gateway state in a network state. It's not a network state, as Brian Smith stated requests being allowed (or not) is a property of

1. the accessed URL (any big hotel or food chain is going to have its own services and entertainment portal in a cloud soonish, with free accesses to those services: the browser should not bother the user by default, when the user only launched in to command laundry or lunch service on the hotel cloud) 

2. the time elapsed between the log in to the external world: login is not eternal

Network connection detection does not address those points

> 5. if that window redirects to the real hijack-me.com, close it and declare
> victory.  Each time a page load, form submission or redirect completes in
> that window, retry the connection to hijack-me.com in the background; if it
> succeeds without interference, close the warning window.

So to avoid handling status responses, you duplicate every call to a special domain. That will only result in this domain being blackholed or blacklisted on any sane network to limit network load.
(In reply to Nicolas Mailhot from comment #48)

> 
> That assumes gateway state in a network state. It's not a network state, as
> Brian Smith stated requests being allowed (or not) is a property of
> 
> 1. the accessed URL (any big hotel or food chain is going to have its own
> services and entertainment portal in a cloud soonish, with free accesses to
> those services: the browser should not bother the user by default, when the
> user only launched in to command laundry or lunch service on the hotel
> cloud) 

I fear this case is going to be pretty hard to handle without triggering captive portal detection.  Even in bsmith's design (which after a closer re-reading seems pretty similar to what I was thinking), update pings, safe browsing queries and extension requests in the background via HTTPS would cause detection of captivity unless all of those requests are flagged for necko to not perform captive portal detection.

> 
> 2. the time elapsed between the log in to the external world: login is not
> eternal

Agreed, one definitely needs to check at cert warning time too.

> So to avoid handling status responses, you duplicate every call to a special
> domain. That will only result in this domain being blackholed or blacklisted
> on any sane network to limit network load.

You wouldn't want to send more than one tiny request to hijack-me.com or mozillacaptiveportaldetection.com, except on captive networks.  As soon you see real responses from it, stop.

Well some method will be required to know whether the user's interaction with the captive portal (which I think we agree should be in a custom window with some kind of warning UI) was "successful" and should be considered complete.  Perhaps there's a zen way of doing this without making further test requests, but that isn't clear to me yet.
Also, one other case to consider:

I've seen a few cluefully implemented captive portal networks which allow HTTPS requests to occur without interference, but grab and redirect the first HTTP request they see.

If the detection code only fires for HTTPS requests it will miss these captive portals altogether.  It may be that this situation is unusual enough and innocuous enough that we can live without special UI for it.  It would still be nicer for the user to have the portal rendered in a custom window, if that's possible.
I like Peter's design. Per Mike's comment, lets not make the perfect the enemy of the useful. I am only a bit concerned about Step 1's performance. For example, in my experience, Firefox takes quite some time before giving up and showing me a cert warning when I am behind a captive portal. If I am right, bsmith probably knows the reason and the place in PSM/NSS to short-circuit and fail faster. Similarly, I don't know whether network:offline-status-changed is called everytime a new wifi connection is made. IIRC from the time I wrote the patch above, Firefox took quite some time to decide that the network status has actually changed to offline (since it could just be a transient network connectivity issue).
(In reply to Devdatta Akhawe [:devd] from comment #51)
> I like Peter's design. Per Mike's comment, lets not make the perfect the
> enemy of the useful. I am only a bit concerned about Step 1's performance.
> For example, in my experience, Firefox takes quite some time before giving
> up and showing me a cert warning when I am behind a captive portal. If I am
> right, bsmith probably knows the reason and the place in PSM/NSS to
> short-circuit and fail faster.

It is usually because of OCSP and I have a plan for mitigating, which I will share soon on dev-tech-crypto and/or dev-security.

As was pointed out above, bug 752982 implemented nsICaptivePortalDetector, which does something similar to what peter suggested above. If that is not good enough, we should file more specific bugs regarding its deficiencies. But, I think nsICaptivePortalDetector may be good enough to enable people to move forward on fixing the other bugs that are "blocked" on this bug. Note these more specific bugs already on file:

Bug 878566 for designing the UI for captive portal login.
Bug 816866 for triggering captive portal detection in response to certificate errors.
Bug 728658 for triggering the captive portal UI for the HTTP 511 response code.
Bug 604975 about showing the captive portal login when the user starts (switches to) Firefox from the Windows captive portal system tray popup notification.
Bug 555551 about how captive portals make session restore suck.

A lot of the complexity of what I suggest in comment 41 is to provide what I think is a good UX. For example, when you try to load https://example.org in a tab, and that causes us to kick off the captive portal UX (a special new tab or window, or whatever), what doe the tab that you were trying to load https://example.org display? How does the docshell for that tab know it needs to automatically reload itself when it is notified (in whatever way) that the captive portal has been bypassed? That all depends on bug 878566.

If we can show an error page that says "reload this page after you've logged into the captive portal" in that tab, and just rely on the user to manually reload the tab, then we can solve most of this problem in a much simpler way: Just modify nsDocShell to use nsICaptivePortalDetector in response to various networking errors (certain cert errors and/or HTTP 511 responses), and if the nsICaptivePortalDetector returns "yep, there's a captive portal" then fire an Observer event that the UI later listens for, and then show a "reload this page after you log into the captive portal" error page. Until we have a native UI, an addon could provide a captive portal UX until we design a native one for Firefox. (Perhaps the developers of such addons could work with our UX team to prototype the eventual Firefox UX.)
See Also: → 878566
(In reply to Peter Eckersley from comment #49)

> Well some method will be required to know whether the user's interaction
> with the captive portal (which I think we agree should be in a custom window
> with some kind of warning UI) was "successful" and should be considered
> complete.

If you implement portal detection properly (not via connexions to a special domain, but by handling errors on real connexions) you will detect success if a new connexion to a previously blocked URL succeeds.

The correct algorithm is:

1. store a list of all the requests interrupted by a captive portal request since connexion to the current network (on startup, or when portal auth expires, you may have *many requests* in flight before you detect a captive portal)

2. identify captive portals by the redirection target (for 3xx cases) or the portal url in the error 511 response. For a sufficiently big network external accesses can be mediated by several gateways (for load balancing, availability, or because some web sites require special handling). It's no good retrying requests blocked by gateway B when you've cleared gateway A. On a 511 answer, if the portal url uses https, use the portal certificate as portal identifier. In dns hijacking cases you can only have one gateway.

3. when the portal answers 403, that's too bad, it's no use trying again on this particular request till network changes

4. open one special window with restricted privileges for each of the detected captive portals. Mark each window with the portal id so SSO robots that do windows scraping don't misidentify portal forms

5. as soon as interaction in one of those portals succeeds (the browser post-ed something and the portal answered with a success code) retry all the requests that were interrupted by this particular portal. If they succeed or are interrupted by another portal close this portal window and ask the user if he wants firefox to remember the credentials sent to this portal, either for this session duration, or forever (the trust level is a per-portal decision; default save duration may be a preferences setting). If they don't succeed and they are still interrupted by the same portal the user didn't finish this particular portal magic, so keep the portal window open and let the user continue. Repeat till it works.

6. later on, on the same network, when the same portal interrupts again a request, and if the user agreed for, silently post the same answer that cleared the previous blocking. If this post does not succeed, invalidate the saved info and handle the portal as a new (unknown) portal

This should be correct enough to handle most portal configurations, from the simplest to the most complex. Since portal clearing results in a simple retry of previous unmodified requests, the security risks should be limited to the minimum. The portal can not inject any info in non-portal windows. The portal windows are isolated and the browser can execute them in a restricted environment with the necessary chrome to make the user aware they are portal interactions.
(In reply to Nicolas Mailhot from comment #53)

> 1. store a list of all the requests interrupted by a captive portal request
> since connexion to the current network 

(and actually, in the case of a http 511 portal, with identifying certificate, there is no risk of portal spoofing so you do not need to attach this portal to a specific network. When a location offers both wired and wireless access they are often gated via the same portal, so if the user switches connexion it should not matter)
(In reply to Peter Eckersley from comment #49)
> (In reply to Nicolas Mailhot from comment #48)
> 
> > 
> > That assumes gateway state in a network state. It's not a network state, as
> > Brian Smith stated requests being allowed (or not) is a property of
> > 
> > 1. the accessed URL (any big hotel or food chain is going to have its own
> > services and entertainment portal in a cloud soonish, with free accesses to
> > those services: the browser should not bother the user by default, when the
> > user only launched in to command laundry or lunch service on the hotel
> > cloud) 
> 
> I fear this case is going to be pretty hard to handle without triggering
> captive portal detection.

It is not hard to handle if you don't try to trigger the captive portal with special requests, but only react to responses to actual user request. If a portal blocks some browser background processing (extension updates, site reputation, etc) the user should not be bothered and the requests should be put on hold and only retried when the user accesses and clears the portal for its own reasons.
(In reply to Nicolas Mailhot from comment #53)

> 5. as soon as interaction in one of those portals succeeds (the browser
> post-ed something and the portal answered with a success code) retry all the
> requests that were interrupted by this particular portal.

Another clear possibility is the user refreshed or retried one of the blocked requests, and it passed. In that case all the other requests marked on hold for this portal should be cleared retried and cleared from the portal list if they succeed (this can occur when the user performed portal magic out of band, for example in another browser instance)
(In reply to Nicolas Mailhot from comment #53)
> 2. identify captive portals by the redirection target (for 3xx cases)

With your suggested technique, how would we tell the difference between a redirect to a captive portal page and a redirect done by the origin website?

I do not see the problem with the way that nsICaptivePortalDetector does things: making a connection to a domain we know will never redirect. It is very much like what other captive portal detection systems are doing. It is true that we have to be careful to not send too much traffic to that domain, but if we're only doing it in the cases that are listed in the "blocks" list then I suspect it won't be very much traffic at all.
(In reply to Brian Smith (:bsmith) from comment #57)
> (In reply to Nicolas Mailhot from comment #53)
> > 2. identify captive portals by the redirection target (for 3xx cases)
> 
> With your suggested technique, how would we tell the difference between a
> redirect to a captive portal page and a redirect done by the origin website?

That's why the ietf normalized error 511: you can't know for sure and you're in heuristics land. The heuristics that may be used would be:

1. the redirect target is a captive portal you've already detected by other means
2. the redirect occurs on an https access
3. many unrelated browser tabs suddenly redirect to the same target
4. a web site request on a public ip redirects to a private ip target
5. the emitter of the redirect is on a different class address than the previous answer you just got from the web site
6. a 'safe' site redirects,

The best solution is probably a mix of all the above. The simplest solution is probably to perform a request to a safe site on network change, note the target as 'probably a portal' if it redirects to something else, do nothing else, and then trigger the special portal handling logic if the same redirect occurs on an actual user request (that does not mean you can avoid the rest of the error handling for the other portal cases, as this solution is not completely reliable, and a network operator that emits 511 for example is likely to special case your special url to avoid network load)

> It is
> true that we have to be careful to not send too much traffic to that domain,
> but if we're only doing it in the cases that are listed in the "blocks" list
> then I suspect it won't be very much traffic at all.

On a private corporate network, where people work all day round on internal webapps, and only occasionally on external sites, that won't be little traffic at all.
(In reply to Brian Smith (:bsmith) from comment #57)
> (In reply to Nicolas Mailhot from comment #53)
> > 2. identify captive portals by the redirection target (for 3xx cases)
> 
> With your suggested technique, how would we tell the difference between a
> redirect to a captive portal page and a redirect done by the origin website?
> 

I am /not/ advocating doing things this way, but one conceivable heuristic would be:

1. Instrument a large number of browsers to observe unexpected 30x redirects away from popular homepages.  Perhaps with the permission of those popular homepages.
2. The set of domains which are redirect desintations from mozilla.org, google.com, facebook.com, twitter.com, etc, are the captive portal domains.
3. Ship the entire list of known captive portals in the browser.

The downside is that you need to ship probably 100K - 1MB of compressed captive portal domain names.  The upside is that you could tell future captive portal implementors "sorry, captive portals were a terrible idea, we whitelisted the ones that existed before 2013 however they're going away by 2020 and you can't make new ones" ;).
Nope, won't work. I write router software for a living. Some of the captive portals are not recognizable by their URLS or even their ip addresses, because they're not redirects at all. The TCP connection is intercepted, sometimes modified on the fly. And yes, TLS won't work. 

Most captive portals use a fake redirect though, like what you see in Wifi access points. Whether TLS is blocked or not, depends on the network operator. It's also possible that you receive different results for different URLs.
(In reply to Nicolas Mailhot from comment #58)
> (In reply to Brian Smith (:bsmith) from comment #57)
> > (In reply to Nicolas Mailhot from comment #53)
> > > 2. identify captive portals by the redirection target (for 3xx cases)
> > 
> > With your suggested technique, how would we tell the difference between a
> > redirect to a captive portal page and a redirect done by the origin website?
> 
> That's why the ietf normalized error 511: you can't know for sure and you're
> in heuristics land. The heuristics that may be used would be:

> 6. a 'safe' site redirects,

7. you get loads of redirect to an ip associated to a proxy in the loaded proxypac (sometimes the simplest heuristics are the best)
Depends on: 1048131
Blocks: 878566
Whiteboard: [captive portal][lame-network] → [captive portal][lame-network][necko-active]
Assignee: nobody → valentin.gosu
Duplicate of this bug: 583228
Duplicate of this bug: 664266
Duplicate of this bug: 688804
Duplicate of this bug: 863911
This depends on frontend implementing the user-facing part of this feature.
Assignee: valentin.gosu → nobody
Whiteboard: [captive portal][lame-network][necko-active] → [captive portal][lame-network][necko-backlog]
The frontend captive portal work that we are tracking includes the dependencies of bug 878566. Do we still need this bug? If not, do you want to turn it into a tracking bug?
(In reply to Panos Astithas [:past] from comment #69)
> The frontend captive portal work that we are tracking includes the
> dependencies of bug 878566. Do we still need this bug? If not, do you want
> to turn it into a tracking bug?

Yes, I think this should be a tracking bug. A lot of bugs are chained to this one and we should try to preserve that connection.
Keywords: meta
I think if the primary need here is to coordinate front end with necko, jason should be assigned. We don't want to drop this ball.

Valentin, there is more work we can do here (in other bugs that might not be filed), right? Like pausing https when confirmed in CP, etc.. We should get out in front of that.
Assignee: nobody → jduell.mcbugs
Whiteboard: [captive portal][lame-network][necko-backlog] → [captive portal][lame-network][necko-active]
See Also: → 989193
valentin will add "retest on cert failure" to the service and then this is believed to be ready for front end
Assignee: jduell.mcbugs → valentin.gosu
Whiteboard: [captive portal][lame-network][necko-active] → [captive portal][lame-network][necko-next]
Front end bug is bug 989193 - Open captive portal automatically in a new tab when detected
Depends on: 1272587
Depends on: 1280599
No empty tab, no indication other than bad cert warning on every open tab attempting to connect to WiFi on Scot Rail and Virgin Trains in Scotland and England this week.  Using Firefox 49.0a2 (Dev. Ed.) and FF for iOS 4.0(11)

Also no way to open a tab and get to the portal authentication page.  Had to use Safari or Chrome on OS X and Safari on iOS.
Bug 989193 only landed in 50, so Developer Edition is not expected to work yet.
Depends on: 1303775
Blocks: 801278
Depends on: 1349286
Whiteboard: [captive portal][lame-network][necko-next] → [captive portal][lame-network][necko-backlog]
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: P1 → P3
I notice Firefox now shows a notification below the address bar when a captive portal is detected. This has been working very well for me. This notification always appears as expected and when I follow the instructions, I'm always able to log into the network successfully.

Is this done?
Indeed, this was implemented, and this bug is used as a meta bug, to track other features.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Summary: implement captive portal detection → [meta] implement captive portal detection
You need to log in before you can comment on or make changes to this bug.