Closed
Bug 1047258
Opened 9 years ago
Closed 7 years ago
WiFi with Captive Portal and data connection can mess up
Categories
(Firefox OS Graveyard :: Wifi, defect)
Tracking
(tracking-b2g:backlog)
RESOLVED
WONTFIX
tracking-b2g | backlog |
People
(Reporter: gerard-majax, Unassigned)
Details
(Keywords: foxfood)
[Blocking Requested - why for this release]: This is breaking network connectivity in real life use. I could get into a state where none of WiFi or data could work on my Nexus S running master, but I believe it's unrelated to the device. STR: 0. Be under not perfect captive portal coverage 1. Make sure WiFi and data are disabled 2. Enable WiFi 3. While it is requesting IP address, enable data Expected: Data connection should be up and displayed in status bar. When Wi-Fi finally gets connected, I should get the captive portal notification. When connecting to a website, I should be redirected to the captive portal. Actual: I see pending wifi icon in status bar. Once connected, going to a website (namely google.fr) ends up in network error/unreachable displayed in browser As soon as I disable the data connection, I can get to the captive portal by hittin the reload button. With the same STR but disabling WiFi and reloading the page, I get to google.fr on data connection.
Updated•9 years ago
|
Component: RIL → Wifi
I don't think it's related to captive portal, it's more likely something wrong with the routing table in step 3, maybe a race condition changing the routing table because there might be async ops in step 3: (data call) set IP info of data call -- while setting IP info -> (wifi connected) remove IP info of data call, then set IP info of wifi now we have two modules changing IP info at same time, which might get unexpected result. But I don't think it's blocker because it's can be recovered by disabling one of the connections.
How high does the reproduce rate? The behavior should acts like quick switching on data call right after wifi connecting, but can't be reproduce in such way so far.
Comment 4•9 years ago
|
||
(In reply to howie [:howie] from comment #3) > may QA test what's the reproduce rate here? thanks. Gerry - Could you assist with the QA request here? I don't think US QA has access to a captive portal setup on our side, but I think your team does.
Flags: needinfo?(gchang)
Comment 5•9 years ago
|
||
I can recreate the problem on Flame only once with following build. And, the reproduce rate is very very low. I test this more than 40 times and only got once. I think this is about timing issue because we need quick switching on data call right after wifi is connecting to captive portal. Gaia 54c3c19d439f7dbafda5c6cc3b4850b545a068ba Gecko https://hg.mozilla.org/mozilla-central/rev/bd44d84142e8 BuildID 20140807160201 Version 34.0a1
Flags: needinfo?(gchang)
Reporter | ||
Comment 6•9 years ago
|
||
(In reply to Gerry Chang [:cfchang] from comment #5) > I can recreate the problem on Flame only once with following build. > And, the reproduce rate is very very low. I test this more than 40 times and > only got once. > I think this is about timing issue because we need quick switching on data > call right after wifi is connecting to captive portal. > > Gaia 54c3c19d439f7dbafda5c6cc3b4850b545a068ba > Gecko https://hg.mozilla.org/mozilla-central/rev/bd44d84142e8 > BuildID 20140807160201 > Version 34.0a1 And I do trigger this often enough so that it is annoying my dogfooding. I'm sorry, I cannot hepl more than saying this happens at the train station with the WiFi and data connectivity available there. It can probably be triggered more easily in not good network conditions (WiFi AP with bad signal, crowded area, etc.).
Comment 7•9 years ago
|
||
we'd like to know if this is regression. Gerry, thank you very much for this.
Flags: needinfo?(gchang)
Keywords: regressionwindow-wanted
Comment 8•9 years ago
|
||
sorry for the wrong tag, please have branch test on 1.4, 2.0, thanks.
Keywords: regressionwindow-wanted → qawanted
Comment 9•9 years ago
|
||
(In reply to Alexandre LISSY :gerard-majax from comment #6) > And I do trigger this often enough so that it is annoying my dogfooding. I'm > sorry, I cannot hepl more than saying this happens at the train station with > the WiFi and data connectivity available there. It can probably be triggered > more easily in not good network conditions (WiFi AP with bad signal, crowded > area, etc.). It is an issue that we need to fix. However, even if we fix this bug, user still may not be able to surf the internet in such bad network condition. So, I wonder if this is a blocker.
Reporter | ||
Comment 10•9 years ago
|
||
My experience shows that once connected, there is no problem to surf, it's totally usable even if not perfect.
Comment 11•9 years ago
|
||
Removing the Qa-wanted tag (as indicated in comment 4 - we do not have access to a captive portal here) - the NI to Gerry (thanks Gerry!) should be all that is necessary to get a branch check.
Keywords: qawanted
I try to reproduce this by inserting code to enable data connection in different steps of wifi connecting process. I have tested at steps of just associating[1], just connected[2] and right before running DHCP[3], but can't reproduce at any of these steps. [1] http://hg.mozilla.org/mozilla-central/file/d7e78f0c1465/dom/wifi/WifiWorker.js#l2076 [2] http://hg.mozilla.org/mozilla-central/file/d7e78f0c1465/dom/wifi/WifiWorker.js#l617 [3] http://hg.mozilla.org/mozilla-central/file/d7e78f0c1465/dom/wifi/WifiWorker.js#l627
Comment 13•9 years ago
|
||
I can't recreate this problem on 2.0, either. I use below build Gaia 8b1b64ca3347e015d7a57df6d053f95cd26046ca Gecko https://hg.mozilla.org/releases/mozilla-b2g32_v2_0/rev/2f288e8aea09 BuildID 20140813160201 Version 32.0
Flags: needinfo?(gchang)
There is a very short of time, less than 500ms I estimated, both data connection and wifi interface are active, because NetworkManager makes sure Wifi is connected then ask RIL to disable data connection: netcfg > rmnet0 UP 111.81.199.208/27 0x00000041 00:00:00:00:00:00 > lo UP 127.0.0.1/8 0x00000049 00:00:00:00:00:00 > wlan0 UP 10.247.30.92/21 0x00001043 00:0a:f5:df:60:60 routing table > Iface Destination Gateway Flags RefCnt Use Metric Mask MTU Window IRTT > wlan0 00000000 0118F70A 0003 0 0 0 00000000 0 0 0 > wlan0 0018F70A 00000000 0001 0 0 322 00F8FFFF 0 0 0 > rmnet0 C0C7516F 00000000 0001 0 0 0 E0FFFFFF 0 0 0 > rmnet0 01015FA8 D1C7516F 0007 0 0 0 FFFFFFFF 0 0 0 > rmnet0 01C05FA8 D1C7516F 0007 0 0 0 FFFFFFFF 0 0 0 DNS property > [dhcp.wlan0.dns1]: [10.247.75.5] > [dhcp.wlan0.dns2]: [] > [dhcp.wlan0.dns3]: [] > [dhcp.wlan0.dns4]: [] > [net.dns1]: [10.247.75.5] > [net.dns2]: [168.95.192.1] > [net.dnschange]: [7] > [net.rmnet0.dns1]: [168.95.1.1] > [net.rmnet0.dns2]: [168.95.192.1] > [net.wlan0.dns1]: [10.247.75.5] > [net.wlan0.dns2]: [0.0.0.0] But the default route is already changed to wifi, and data connection interface will down: netcfg > rmnet0 DOWN 0.0.0.0/0 0x00000000 00:00:00:00:00:00 > lo UP 127.0.0.1/8 0x00000049 00:00:00:00:00:00 > wlan0 UP 10.247.30.92/21 0x00001043 00:0a:f5:df:60:60 routing table > Iface Destination Gateway Flags RefCnt Use Metric Mask MTU Window IRTT > wlan0 00000000 0118F70A 0003 0 0 0 00000000 0 0 0 > wlan0 0018F70A 00000000 0001 0 0 322 00F8FFFF 0 0 0 DNS property > [dhcp.wlan0.dns1]: [10.247.75.5] > [dhcp.wlan0.dns2]: [] > [dhcp.wlan0.dns3]: [] > [dhcp.wlan0.dns4]: [] > [net.dns1]: [10.247.75.5] > [net.dns2]: [168.95.192.1] > [net.dnschange]: [7] > [net.rmnet0.dns1]: [] > [net.rmnet0.dns2]: [] > [net.wlan0.dns1]: [10.247.75.5] > [net.wlan0.dns2]: [0.0.0.0] Since default route and DNS are not changed, the captive portal detection should not be affected by the co-existence of data connection and wifi interface - the traffice always use wifi interface once it's connected. Furthermore, the captive portal detection is executed after data connection is disabled(confirmed by adding debug message). So I think the problem isn't caused by the timing of both interfaces are up.
Another assumption is captive portal detector try to send http request through wifi to check if captive portal exists, the request/response packet is lost due to poor signal or wifi is disconnected and switch to data call for a very short of time. As I can remember, all traffic are handled by necko, and I am not sure if necko will get blocked in this case. But I have to figure out how to create such scenario first.
Comment 16•9 years ago
|
||
Triage: Not a blocker due to the reproduce rate and user impact. But to keep investigate and track this.
blocking-b2g: 2.1? → backlog
Assignee | ||
Updated•8 years ago
|
blocking-b2g: backlog → ---
tracking-b2g:
--- → backlog
You need to log in
before you can comment on or make changes to this bug.
Description
•