Closed Bug 1320088 Opened 8 years ago Closed 8 years ago

[Captive Portal] Detection fails when FF is open (in another connection) prior to engaging CP able Wifi

Categories

(Core :: Networking, defect)

defect
Not set
major

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox51 --- unaffected
firefox52 --- verified
firefox53 --- verified

People

(Reporter: aflorinescu, Assigned: valentin)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-active])

Attachments

(2 files)

[Enviroment:]
OS'es:
100% repro - Ubuntu 16.04 
60% repro   -  Windows 10 x 64
NOT repro   - Mac OSX 10.10 and Mac 10.12

53.0a1 Build ID  20161121074938   (try-build with debug enabled)

[Description]: Captive Portal detection fails when FF is already started at the time of Wifi connection.


[Steps]:
    1. Connect Wan (or any other direct type connection).
    2. Start FF and browse a few pages.
    3. Disconnect all network connections.
    4. Enable wifi and attempt connecting to a Captive portal SSID. (do not finalize the connection by actually logging in)
    5. Focus FF or try to refresh any page.
    
    
    [Actual Result]:
       The Captive Portal detection doesn't happen. 
       The console logs:           -  http://pastebin.com/rFG6wYQJ   (check the *** for the two logs: first one for the case specified in [Steps] and the one for successful detection - steps in which FF is opened after the wifi connection is initialized);

   [Expected Result]:
   Detection should be consistent. On OSX, the detection is always successful with [Steps], on Ubuntu 100% reproducibility, on Windows 10  60-70% reproducibility. 


Note: Another interesting log would be this (from Windows10).
I managed to get to a point with multiple disconnects and reconnects to and from the wireless CP network when i get into this loop: http://pastebin.com/daBXVg6a . I'm not sure exactly how I got it stuck, most likely while trying to reproduce this bug. If I manage to get a clear STR for it, I'll log a different issue.
Severity: normal → major
Valentin, is this a necko bug?
Flags: needinfo?(valentin.gosu)
Thanks for the logs.

The second log: http://pastebin.com/daBXVg6a
What I can tell from the logs, it seems that the connection times out. So either the Wifi quality isn't good enough, or the requests are blocked by the captive portal. I don't think this one is a problem.


The first log: http://pastebin.com/rFG6wYQJ
************* FF already started when disc. wan and connecting wireless: CP detection fails
-*- CaptivePortalDetector component: received content: success
-*- CaptivePortalDetector component: sendEvent "false"

This means the check actually completed successfully. I think detection failed because it was actually already logged into the CP from an earlier session. Maybe. 

************* FF startup with wan already disc. and wireless connected: CP detection successful.
This is how it should work.

Adrian, can I ask you to check a few things?
1. After step 5, does the loading of pages actually work, or do you get redirected to the login page but the UI doesn't detect the portal?
2. After step 5, could you also try opening another Chrome or another browser, and see if you can load http://detectportal.firefox.com/success.txt
3. Wait for about 5 minutes. shift-reload a page every minute. See if the captive portal detection happens after 5 minutes.

Thanks!
Flags: needinfo?(valentin.gosu) → needinfo?(adrian.florinescu)
Hello Valentin,
Sorry for the long wait.

Windows:

I can see visible improvement on the detection process on Windows on a Nightly build. The comment 0 steps are not reproducible anymore. I can see the detection in console as soon as I connect the Captive Portal wireless connection (even when FF is being minimized).

Side note:
On the debug enabled build I try-pushed for testing the FF hangs entering a NotResponding state after a few minimize/restore and switching bewten 2-3 tabs. I didn't manage to get a crash-report, and the console stops at:
-*- CaptivePortalDetector component: sendEvent "true" (usually this line gets refreshed over and over again in console). No memory and processor consumption visible.
Given the fact that my try-push build runs normally in a connected wi-fi, I'm assuming something tricky happens, but this doesn't happen on a stock Nightly, so we can waive this behavior as not relevant.

Linux:

Unfortunately I can't give you any update yet, since I am having issues with my wireless setup on the machine I was testing on. As soon as I get it working and I can test, I'll reply on this bug.
Linux update:

> 1. After step 5, does the loading of pages actually work, or do you get
> redirected to the login page but the UI doesn't detect the portal?
Pages appear as to be loading, no redirect to the login page, no portal detection.

> 2. After step 5, could you also try opening another Chrome or another
> browser, and see if you can load http://detectportal.firefox.com/success.txt
In chrome, I get redirected to the CP authentication page.

> 3. Wait for about 5 minutes. shift-reload a page every minute. See if the
> captive portal detection happens after 5 minutes.
No detection. See log below.

To sumup:
-if FF is opened prior to Wifi enable, no detection. https://pastebin.mozilla.org/8952608
-for steps in comment 0 -no detection: https://pastebin.mozilla.org/8952607 for comment 0
-detection happens successful if FF is opened after the wifi is enabled.

Note:
Although the both logs are almost identical, I've added them both because in the comment0 one, there is a "NetworkError when attempting to fetch resource." , but I don't know if that's generated by the attempt to fetch addons.
Flags: needinfo?(adrian.florinescu) → needinfo?(valentin.gosu)
Assigning valentin for now since QE would like this in 52.
Assignee: nobody → valentin.gosu
Component: General → Networking
Product: Firefox → Core
(In reply to Adrian Florinescu [:AdrianSV] from comment #4)
> Linux update:
> 
> > 1. After step 5, does the loading of pages actually work, or do you get
> > redirected to the login page but the UI doesn't detect the portal?
> Pages appear as to be loading, no redirect to the login page, no portal
> detection.

This is very strange. If the pages load, that suggests that the captive portal isn't actually intercepting the requests. PS. make sure to always try this with HTTP pages. Some captive portals just let HTTPS request proceed as normal.
It might be useful if you could find out the maker of the captive portal.

> > 2. After step 5, could you also try opening another Chrome or another
> > browser, and see if you can load http://detectportal.firefox.com/success.txt
> In chrome, I get redirected to the CP authentication page.



> > 3. Wait for about 5 minutes. shift-reload a page every minute. See if the
> > captive portal detection happens after 5 minutes.
> No detection. See log below.
> 
> To sumup:
> -if FF is opened prior to Wifi enable, no detection.
> https://pastebin.mozilla.org/8952608

This log suggests that the detection times out. I can't figure out why.

> -for steps in comment 0 -no detection: https://pastebin.mozilla.org/8952607

It seems all of the requests time out. I am unsure why it should work after restarting the browser.

> for comment 0
> -detection happens successful if FF is opened after the wifi is enabled.
> 
> Note:
> Although the both logs are almost identical, I've added them both because in
> the comment0 one, there is a "NetworkError when attempting to fetch
> resource." , but I don't know if that's generated by the attempt to fetch
> addons.

Trying to update the addons also seems to time out. Again, I'm unsure why. Let me know where you have encountered this captive portal, and if possible the maker, and I will try to reproduce and figure out why it behaves like this.
Flags: needinfo?(valentin.gosu) → needinfo?(adrian.florinescu)
(In reply to Valentin Gosu [:valentin] from comment #6)
> (In reply to Adrian Florinescu [:AdrianSV] from comment #4)
> > Linux update:
> > 
> > > 1. After step 5, does the loading of pages actually work, or do you get
> > > redirected to the login page but the UI doesn't detect the portal?
> > Pages appear as to be loading, no redirect to the login page, no portal
> > detection.
> 
> This is very strange. If the pages load, that suggests that the captive
> portal isn't actually intercepting the requests. PS. make sure to always try
> this with HTTP pages. Some captive portals just let HTTPS request proceed as
> normal.
> It might be useful if you could find out the maker of the captive portal.
The pages do not load, they have the loading animation, but they shouldn't load since the Captive portal wi-fi is just connected but not authenticated, so no internet connection.
 
> > > 2. After step 5, could you also try opening another Chrome or another
> > > browser, and see if you can load http://detectportal.firefox.com/success.txt
> > In chrome, I get redirected to the CP authentication page.
> 
> 
> 
> > > 3. Wait for about 5 minutes. shift-reload a page every minute. See if the
> > > captive portal detection happens after 5 minutes.
> > No detection. See log below.
> > 
> > To sumup:
> > -if FF is opened prior to Wifi enable, no detection.
> > https://pastebin.mozilla.org/8952608
> 
> This log suggests that the detection times out. I can't figure out why.
> 
> > -for steps in comment 0 -no detection: https://pastebin.mozilla.org/8952607
> 
> It seems all of the requests time out. I am unsure why it should work after
> restarting the browser.

Same as the previous comment: the Captive Portal wi-fi is just connected, but not authenticated.
Note: nowhere in this bug I authenticate to the wi-fi. All steps are prior to authentication, therefore no internet connection available. 

> > for comment 0
> > -detection happens successful if FF is opened after the wifi is enabled.
> > 
> > Note:
> > Although the both logs are almost identical, I've added them both because in
> > the comment0 one, there is a "NetworkError when attempting to fetch
> > resource." , but I don't know if that's generated by the attempt to fetch
> > addons.
> 
> Trying to update the addons also seems to time out. Again, I'm unsure why.
> Let me know where you have encountered this captive portal, and if possible
> the maker, and I will try to reproduce and figure out why it behaves like
> this.
I find it normal that the trying to update the addons time out, since there is no internet connectivity (CP is not authenticated).
I will try to find out the maker, but I do not believe that is relevant since the same setup works fine in both windows and osx, so my assumption is that is something related to Linux, in my case Ubuntu 16.04. I am almost sure that you will be able to replicate this issue on a Captive Portal wifi with a Linux distro.
Flags: needinfo?(adrian.florinescu)
(In reply to Adrian Florinescu [:AdrianSV] from comment #7)
> I am almost
> sure that you will be able to replicate this issue on a Captive Portal wifi
> with a Linux distro.

I use Gnome Ubuntu 16.10 on my primary laptop, and it worked flawlessly for me, including at the work week.
So from what I can understand, the problem is that all connections time out if the browser is opened before connecting to the CP Wifi. That may be related to the CP, or may be an unrelated networking issue.

Are you able to gather more logs? https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging#Using_aboutnetworking
Paste this in the log modules box:
timestamp,sync,nsHttp:5,nsSocketTransport:5,nsStreamPump:5,nsHostResolver:5,CaptivePortalService:5
Attached file log.txt-child.6167
Attached file log.txt-main.5550
Used different machine: Ubuntu 14.04 x64

Comment 0 steps (about: networking, wikipedia.org and youtube opened from wan)
The attached logs are started after the pages are fully loaded on wan and after the wan is disconnected.

Basically same behavior for me, no redirect to canonical, No CP notification bar. I've kept FF open for a few minutes, refreshing the pages.

After restarting FF, CP detected and CP notification bar displayed.
Whiteboard: [necko-active]
Depends on: 1320087
The try build that fixes 1320087 had no effect on this bug:

wifi disconnected:
-*- CaptivePortalDetector component: sendEvent "true"
-*- CaptivePortalDetector component: abort for captive-portal-inteface
-*- CaptivePortalDetector component: detach HttpObserver for login activity
-*- CaptivePortalDetector component: sendEvent "{"type":"captive-portal-login-abort","id":"0"}"
-*- CaptivePortalDetector component: remove running request

wi-fi reconnected:
-*- CaptivePortalDetector component: enter applyDetection(captive-portal-inteface)
-*- CaptivePortalDetector component: finish preparation phase for interface "captive-portal-inteface"
-*- CaptivePortalDetector component: startDetection {site=http://detectportal.firefox.com/success.txt,content=success
,time=5000}
-*- CaptivePortalDetector component: retry-Detection: 1/5
-*- CaptivePortalDetector component: startDetection {site=http://detectportal.firefox.com/success.txt,content=success
,time=5000}
-*- CaptivePortalDetector component: retry-Detection: 2/5
-*- CaptivePortalDetector component: startDetection {site=http://detectportal.firefox.com/success.txt,content=success
,time=5000}
-*- CaptivePortalDetector component: retry-Detection: 3/5
-*- CaptivePortalDetector component: startDetection {site=http://detectportal.firefox.com/success.txt,content=success
,time=5000}
-*- CaptivePortalDetector component: retry-Detection: 4/5
-*- CaptivePortalDetector component: startDetection {site=http://detectportal.firefox.com/success.txt,content=success
,time=5000}
-*- CaptivePortalDetector component: retry-Detection: 5/5
-*- CaptivePortalDetector component: startDetection {site=http://detectportal.firefox.com/success.txt,content=success
,time=5000}
-*- CaptivePortalDetector component: callback executed
-*- CaptivePortalDetector component: remove running request
Hi Adrian,

I've looked at the logs some more, and I don't think this is a captive portal issue, but probably a DNS issue.
To confirm this, can you turn off CP detection by setting network.captive-portal-service.enabled to false, and then attempt the steps in comment 0 again? 
If you still encounter issues navigating and logging into the CP, we can start to narrow down the actual bug here.
Following up comment 13,
I've set network.captive-portal-service.enabled to false and redid the steps from comment 0 with the trybuild from 1320087 comment 6. After step 5 from this bug comment 0, the open pages finally time out.. after few minutes with secure connection failed. I'm not redirected to the captive portal login page, although I'm not sure I'm supposed to. When I open new page and try to access any link, I'm sent to the Captive Portal login page. Every new page redirects login to the SSID wifi network. (minus the pages already opened in the other wifi connection). 

What's puzzling for me is that I still get (with the pref false) {site=http://detectportal.firefox.com/success.txt,content=success
,time=5000} -*- CaptivePortalDetector component: retry-Detection: 1/5 ... sections in the logs.

I'm not sure I understand why we would reduce this to a DNS issue, since we have no problems when testing on Win/OSX environments, but in order to make sure, I will try to retest the above with several additional wi-fi SSID's . (is it ok if I'm using the try-build that we used for bug 1320087?) 

(Please note that for all the steps, for the target Captive Portal wifi, I am not finishing the login process at any point, I'm just interested to get to the SSID login page).
I've re-run the comment 0 scenario on Ubuntu / Windows 10 and Mac OSX 10.10 using Latest Nightly and Aurora builds (01.18).
We've regressed comment 3 and now the same behavior as for Linux is present for Windows 10 with 100% reproducibility: no detection when firefox is open prior to connecting to the SSID captive portal. - MacOSX 10.10 still has no problem and is still not affected by this bug.

I've also excluded at this point the assumption from comment 13 that the Captive Portal SSID I was testing on had DNS issues: I've tested on 4 public separate Captive Portal enabled SSIDs (from which one was hosted by Orange and one by Vodafone).

I will try tomorrow to come back with a regression range for the Windows regression.

Based on the fact that this bug is currently affecting Windows again, I'm updating the severity to blocker.
Severity: major → blocker
User Story: (updated)
Flags: needinfo?(valentin.gosu)
Flags: needinfo?(nhnt11)
When I started to search for a regression range, I realized an error I made last night.

(In reply to Adrian Florinescu [:AdrianSV] from comment #15)
> I've re-run the comment 0 scenario on Ubuntu / Windows 10 and Mac OSX 10.10
> using Latest Nightly and Aurora builds (01.18).
> We've regressed comment 3 and now the same behavior as for Linux is present
> for Windows 10 with 100% reproducibility: no detection when firefox is open
> prior to connecting to the SSID captive portal. - MacOSX 10.10 still has no
> problem and is still not affected by this bug.

The above comment is invalid, and according to comment 0 scenario, we have a verify fix for Ubuntu on both Aurora and Nightly. The Windows10 behavior is also consistent and verified on both Aurora and Nightly with the comment 0 scenario. 

However, there is a second bug that was masked in this one: (if you open FF when no connection was available.)

So based on the above, I will set the severity of this bug back to major, mark it as works for me and I will clone this bug into one that will reflect the current state.
Severity: blocker → major
Status: NEW → RESOLVED
Closed: 8 years ago
User Story: (updated)
Flags: needinfo?(valentin.gosu)
Flags: needinfo?(nhnt11)
Resolution: --- → WORKSFORME
Summary: [Captive Portal] Detection fails when FF is open prior to engaging Wifi → [Captive Portal] Detection fails when FF is open (in another connection) prior to engaging CP able Wifi
Blocks: 1332271
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: