Closed Bug 1154690 Opened 5 years ago Closed 5 years ago

[Shinano][Aries] Sometimes it's hard to connect to some WiFi network

Categories

(Firefox OS Graveyard :: GonkIntegration, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:2.5+, b2g-master fixed)

RESOLVED FIXED
2.2 S12 (15may)
blocking-b2g 2.5+
Tracking Status
b2g-master --- fixed

People

(Reporter: gerard-majax, Assigned: gerard-majax)

References

Details

(Keywords: regression, Whiteboard: [spark])

Attachments

(8 files)

And I don't know why. All I could observe was:
 - Settings app reports connection status change with huge delay (5 secs between the moment I tap a SSID and the moment it's marked as Connecting)
 - Settings app reports sometime wrong SSID being in Connecting state
That might be because of a bug in the wifi driver, as documented on http://www.google.com/url?q=http%3A%2F%2Fissues.webos-ports.org%2Fissues%2F273&sa=D&sntz=1&usg=AFQjCNGXUTnozjR7BfLhjkq76wReb44UjQ
Summary: [Shinano][Aries] Sometimes it's hard to connect to Mozilla Guest → [Shinano][Aries] Sometimes it's hard to connect to some WiFi network
I also note it's being an intermittent problem: one day, no issue, another day, problems.
Attached file wifi_debug.log
with network.debugging.enabled and wifi.debugging.enabled
Henry, can you help me and have a look there?
Flags: needinfo?(hchang)
Blocks: spark
Another strange behavior: I'm trying to associate to Mozilla Guest. The network is not highlighted in the Settings app, but when I tap onto it, wpa_supplicant says I'm already associated.
Maybe a regression introduced by bug 1148671 ?
Depends on: 1148671
So far, since I locally reverted bug 1148671, I'm able to connect 100% of the time on Mozilla Guest. As a side note, it's indeed true that all the networks I am using and that had issues are providing IPv6.
Flags: needinfo?(jjong)
Keywords: regression
Jessica, I'll keep a look on this, but I have high confidence that my revert of bug 1148671 helps a lot there.

I did a couple of tests, with rebooting the device a couple of times:
 - with bug 1148671 landed, I'm never able to get network reliably
 - with bug 1148671 reverted, I'm able to get network very reliably
I forgot to mention that my devices reproducing this are all running kitkat release. I've only been able to reproduce on my Z3 Compact devices, mostly because it takes some time to reproduce and I don't use my Flame long enough.
(In reply to Alexandre LISSY :gerard-majax from comment #10)
> I forgot to mention that my devices reproducing this are all running kitkat
> release. I've only been able to reproduce on my Z3 Compact devices, mostly
> because it takes some time to reproduce and I don't use my Flame long enough.

Hi Alexandre,

The code touched by bug 1148671 would be only used by lollipop.
Since you have high confidence for the tests, can you turn the 
debug flag [1] on and attach the log for us to debug with? I
don't have Z3 Compact to reproduce the issue :(.

Thanks!



[1] https://dxr.mozilla.org/mozilla-central/source/dom/system/gonk/NetworkUtils.cpp#30
Flags: needinfo?(hchang)
Alexandre,

As Henry mentioned, bug 1148671 affects only lollipop releases. We'll still try to reproduce it here in Taipei. If you could help reproduce it again, besides from wifi and NetworkUtils (see comment 11), please also enable network logs in Developer Settings -> Network Output in ADB. Thanks.

Keeping the NI for tracking.
Would be much more easier if you get a Z3 or a Z3 Compact. I know some people in the Taipei office do have some.
Attached file wifi_debug_more_1.log
Attached file wifi_debug_more_2.log
So attachment 8594634 [details] and attachment 8594635 [details] are logs I produced during my debugging with the flags you mentionned. It's quite verbose, I'm not sure how much useful this will be.
From attachment 8593935 [details], NetworkManager did not receive any CONNECTED event from wifi. attachment 8594634 [details] and attachment 8594635 [details] do not have NetworkManager/NetworkService logs (you have to enable it from Developer Settings -> Network Output in ADB), but from NetworkUtils log, we can tell that it never ran into bug 1148671 changes.

Yes, we've got a Z3 and we are setting up the environment here to see if we can reproduce it.
(In reply to Jessica Jong [:jjong] [:jessica] from comment #17)
> From attachment 8593935 [details], NetworkManager did not receive any
> CONNECTED event from wifi. attachment 8594634 [details] and attachment
> 8594635 [details] do not have NetworkManager/NetworkService logs (you have
> to enable it from Developer Settings -> Network Output in ADB), but from
> NetworkUtils log, we can tell that it never ran into bug 1148671 changes.

It's possible the first one was with bug 1148671 applied and the second one was without. I don't remember exactly, sorry, but I have lost lots of time because of this issue while working on other things.

> 
> Yes, we've got a Z3 and we are setting up the environment here to see if we
> can reproduce it.
(In reply to Alexandre LISSY :gerard-majax from comment #18)
> (In reply to Jessica Jong [:jjong] [:jessica] from comment #17)
> > From attachment 8593935 [details], NetworkManager did not receive any
> > CONNECTED event from wifi. attachment 8594634 [details] and attachment
> > 8594635 [details] do not have NetworkManager/NetworkService logs (you have
> > to enable it from Developer Settings -> Network Output in ADB), but from
> > NetworkUtils log, we can tell that it never ran into bug 1148671 changes.
> 
> It's possible the first one was with bug 1148671 applied and the second one
> was without. I don't remember exactly, sorry, but I have lost lots of time
> because of this issue while working on other things.

You mean that the second one, you reproduced it without bug 1148671?
We have been reproducing this here with Henry, we did find some issues on Shinano, like wifi state kind of messed up when tapping quickly on the SSIDs in Settings. But we were able to reproduce it with or without bug 1148671 applied, and as mentioned before, bug 1148671 changes only affects Lollipop (SDK VERSION >= 20), so we strongly believe is unrelated to bug 1148671.

I think I can't help much regarding to wifi states.

> 
> > 
> > Yes, we've got a Z3 and we are setting up the environment here to see if we
> > can reproduce it.
Flags: needinfo?(jjong)
After testing on Z3 device on hand, I found at least the following issues:

1) Wifi might be not turned on after reboot. In this case, wpa_supplicant
   is not running and we would never get any scan result.

2) "CTRL-EVENT-ASSOC-REJECT: network error" appears very often. In this case,
   the network keeps showing 'connecting'. It's probably the case is this
   bug.

3) Connecting across remembered networks back and forth makes gaia a mess.
   The network which shows 'connected' is not the one we actually are connecting.
(In reply to Henry Chang [:henry] from comment #20)
> After testing on Z3 device on hand, I found at least the following issues:
> 
> 1) Wifi might be not turned on after reboot. In this case, wpa_supplicant
>    is not running and we would never get any scan result.
> 
> 2) "CTRL-EVENT-ASSOC-REJECT: network error" appears very often. In this case,
>    the network keeps showing 'connecting'. It's probably the case is this
>    bug.
> 
> 3) Connecting across remembered networks back and forth makes gaia a mess.
>    The network which shows 'connected' is not the one we actually are
> connecting.

Yeah, I already filed a bug for this. That was bug 1144623.
(In reply to Henry Chang [:henry] from comment #20)
> After testing on Z3 device on hand, I found at least the following issues:
> 
> 1) Wifi might be not turned on after reboot. In this case, wpa_supplicant
>    is not running and we would never get any scan result.

Never seen this.

> 
> 2) "CTRL-EVENT-ASSOC-REJECT: network error" appears very often. In this case,
>    the network keeps showing 'connecting'. It's probably the case is this
>    bug.

I don't think so, I do see a lot of CTRL-EVENT-CONNECTED against Mozilla Guest for instance.

Anyway, after a couple of days without bug 1148671, I do confirm that I don't have any WiFi connection issue anymore.

The fact that all the networks I reproduce the issue have IPv4 and IPv6 and that without this patch I don't have issue at all cannot be just a coincidence. Even if it's supposed to be only impacting Lollipop.
Contributors with Flame reported WiFi connectivity issues, with one symptom being this visible in logcat: "NetworkUtils: addHostRouteLegacy does not support empty gateway."

After checking my logs from the present bug, it looks like I do have the same kind of error messages in my case when I apply bug 1148671.
So, Kamil has a very very similar issue on a Flame, and his WiFi network is also dual stack IPv4 and IPv6. He's testing right now a local revert of bug 1148671 to confirm.
Flags: needinfo?(kamil0717)
Summary: [Shinano][Aries] Sometimes it's hard to connect to some WiFi network → [Kitkat] Sometimes it's hard to connect to some WiFi network
I am using shinano to try to reproduce the issues and following are
some updates in comment 20:

> 1) Wifi might be not turned on after reboot. In this case, wpa_supplicant
>   is not running and we would never get any scan result.

I still see this issue. After digging deeper, it's caused by wifi_load_driver()
not returning in the very initial stage of |WifiWokrer.js|. Since shinano
uses proprietary libhardware_legacy.so, I cannot effectively debug with this 
at the moment.

Alexandre, can you help confirm if all the Z3 users don't have this issue?

2) "CTRL-EVENT-ASSOC-REJECT: network error" appears very often. In this case,
   the network keeps showing 'connecting'. It's probably the case is this
   bug.

I do not see this event after flashing the image from what I build with some
vendor proprietary libraries.

3) Connecting across remembered networks back and forth makes gaia a mess.
   The network which shows 'connected' is not the one we actually are connecting.

Is this the second issue that you mentioned in comment 0?

-------------------------------------------------------------------------------

As for the original issue you stated in the first comment:

"- Settings app reports connection status change with huge delay (5 secs between the moment I tap a SSID and the moment it's marked as Connecting)"

I printed the log (as following):

# 00:37:40.810: SSID was tapped and we sent command to wpa_supplicant to connect to openwireless.org
# 00:37:40.810: Then wpa_supplicant goes to "SCANNING" state.
# 00:37:45.130: In 5 seconds, wpa_supplicant replies "Trying to associate with SSID 'openwireless.org'".
                This is the information that we use to fire "connecting" event to gaia.
  
04-22 00:37:40.810   317   317 I Gecko   : -*- WifiWorker component: Event coming in: CTRL-EVENT-STATE-CHANGE id=0 state=3 BSSID=00:00:00:00:00:00 SSID=openwireless.org
04-22 00:37:40.810   317   317 I Gecko   : -*- WifiWorker component: State change: INACTIVE -> SCANNING
04-22 00:37:40.920   317   317 I Gecko   : -*- WifiWorker component: Event coming in: CTRL-EVENT-SCAN-STARTED 
04-22 00:37:45.100   317   317 I Gecko   : -*- WifiWorker component: Event coming in: CTRL-EVENT-BSS-ADDED 98 3c:94:d5:76:17:42
04-22 00:37:45.110   317   317 I Gecko   : -*- WifiWorker component: Event coming in: CTRL-EVENT-BSS-ADDED 99 3c:94:d5:7c:11:82
04-22 00:37:45.110   317   317 I Gecko   : -*- WifiWorker component: Event coming in: CTRL-EVENT-BSS-ADDED 100 3c:94:d5:7b:87:42
04-22 00:37:45.120   317   317 I Gecko   : -*- WifiWorker component: Event coming in: CTRL-EVENT-BSS-ADDED 101 3c:94:d5:7b:b8:02
04-22 00:37:45.120   317   317 I Gecko   : -*- WifiWorker component: Event coming in: CTRL-EVENT-BSS-ADDED 102 dc:38:e1:82:8a:02
04-22 00:37:45.120   317   317 I Gecko   : -*- WifiWorker component: Event coming in: CTRL-EVENT-BSS-ADDED 103 3c:94:d5:7b:40:80
04-22 00:37:45.120   317   317 I Gecko   : -*- WifiWorker component: Event coming in: CTRL-EVENT-BSS-ADDED 104 dc:38:e1:82:8a:04
04-22 00:37:45.120   317   317 I Gecko   : -*- WifiWorker component: Event coming in: CTRL-EVENT-BSS-ADDED 105 dc:38:e1:82:8a:00
04-22 00:37:45.130   317   317 I Gecko   : -*- WifiWorker component: Event coming in: CTRL-EVENT-BSS-ADDED 106 dc:38:e1:82:19:85
04-22 00:37:45.130   317   317 I Gecko   : -*- WifiWorker component: Event coming in: CTRL-EVENT-BSS-ADDED 107 00:e0:4c:ac:16:4f
04-22 00:37:45.130   317   317 I Gecko   : -*- WifiWorker component: Event coming in: CTRL-EVENT-BSS-ADDED 108 00:18:25:17:59:d1
04-22 00:37:45.130   317   317 I Gecko   : -*- WifiWorker component: Event coming in: CTRL-EVENT-BSS-ADDED 109 dc:38:e1:82:8a:08
04-22 00:37:45.130   317   317 I Gecko   : -*- WifiWorker component: Event coming in: CTRL-EVENT-BSS-ADDED 110 dc:38:e1:83:69:87
04-22 00:37:45.130   317   317 I Gecko   : -*- WifiWorker component: Event coming in: CTRL-EVENT-BSS-ADDED 111 00:18:25:02:5f:60
04-22 00:37:45.130   317   317 I Gecko   : -*- WifiWorker component: Event coming in: CTRL-EVENT-SCAN-RESULTS 
04-22 00:37:45.130   317   317 I Gecko   : -*- WifiWorker component: Notifying of scan results available
04-22 00:37:45.130   317   317 I Gecko   : -*- WifiWorker component: Scan results available, but we don't need them
04-22 00:37:45.130   317   317 I Gecko   : -*- WifiWorker component: Event coming in: WPS-AP-AVAILABLE 
04-22 00:37:45.130   317   317 I Gecko   : -*- WifiWorker component: Event coming in: Trying to associate with SSID 'openwireless.org'
04-22 00:37:45.130   317   317 I Gecko   : -*- WifiWorker component: Matched: openwireless.org
04-22 00:37:45.130   317   317 I Gecko   : -*- WifiWorker component: Event coming in: CTRL-EVENT-STATE-CHANGE id=0 state=5 BSSID=00:00:00:00:00:00 SSID=openwireless.org
04-22 00:37:45.130   317   317 I Gecko   : -*- WifiWorker component: State change: SCANNING -> ASSOCIATING
04-22 00:37:49.450   317   317 I Gecko   : -*- WifiWorker component: Event coming in: CTRL-EVENT-STATE-CHANGE id=0 state=6 BSSID=00:00:00:00:00:00 SSID=openwireless.org
04-22 00:37:49.450   317   317 I Gecko   : -*- WifiWorker component: State change: ASSOCIATING -> ASSOCIATED
04-22 00:37:49.460   317   317 I Gecko   : -*- WifiWorker component: Event coming in: Associated with 3c:94:d5:76:17:46
04-22 00:37:49.460   317   317 I Gecko   : -*- WifiWorker component: Event coming in: CTRL-EVENT-CONNECTED - Connection to 3c:94:d5:76:17:46 completed (auth) [id=0 id_str=]
04-22 00:37:49.490   317   317 I Gecko   : -*- WifiWorker component: Event coming in: CTRL-EVENT-STATE-CHANGE id=0 state=9 BSSID=3c:94:d5:76:17:46 SSID=openwireless.org
04-22 00:37:49.490   317   317 I Gecko   : -*- WifiWorker component: State change: ASSOCIATED -> COMPLETED
(In reply to Henry Chang [:henry] from comment #25)
> I am using shinano to try to reproduce the issues and following are
> some updates in comment 20:
> 
> > 1) Wifi might be not turned on after reboot. In this case, wpa_supplicant
> >   is not running and we would never get any scan result.
> 
> I still see this issue. After digging deeper, it's caused by
> wifi_load_driver()
> not returning in the very initial stage of |WifiWokrer.js|. Since shinano
> uses proprietary libhardware_legacy.so, I cannot effectively debug with this 
> at the moment.
> 
> Alexandre, can you help confirm if all the Z3 users don't have this issue?

I don't know.

> 
> 2) "CTRL-EVENT-ASSOC-REJECT: network error" appears very often. In this case,
>    the network keeps showing 'connecting'. It's probably the case is this
>    bug.
> 
> I do not see this event after flashing the image from what I build with some
> vendor proprietary libraries.
> 
> 3) Connecting across remembered networks back and forth makes gaia a mess.
>    The network which shows 'connected' is not the one we actually are
> connecting.
> 
> Is this the second issue that you mentioned in comment 0?

That might be yes.

> 
> -----------------------------------------------------------------------------
> --
> 
> As for the original issue you stated in the first comment:
> 
> "- Settings app reports connection status change with huge delay (5 secs
> between the moment I tap a SSID and the moment it's marked as Connecting)"
> 
> I printed the log (as following):
> 
> # 00:37:40.810: SSID was tapped and we sent command to wpa_supplicant to
> connect to openwireless.org
> # 00:37:40.810: Then wpa_supplicant goes to "SCANNING" state.
> # 00:37:45.130: In 5 seconds, wpa_supplicant replies "Trying to associate
> with SSID 'openwireless.org'".
>                 This is the information that we use to fire "connecting"
> event to gaia.
>   
> 04-22 00:37:40.810   317   317 I Gecko   : -*- WifiWorker component: Event
> coming in: CTRL-EVENT-STATE-CHANGE id=0 state=3 BSSID=00:00:00:00:00:00
> SSID=openwireless.org
> 04-22 00:37:40.810   317   317 I Gecko   : -*- WifiWorker component: State
> change: INACTIVE -> SCANNING
> 04-22 00:37:40.920   317   317 I Gecko   : -*- WifiWorker component: Event
> coming in: CTRL-EVENT-SCAN-STARTED 
> 04-22 00:37:45.100   317   317 I Gecko   : -*- WifiWorker component: Event
> coming in: CTRL-EVENT-BSS-ADDED 98 3c:94:d5:76:17:42
> 04-22 00:37:45.110   317   317 I Gecko   : -*- WifiWorker component: Event
> coming in: CTRL-EVENT-BSS-ADDED 99 3c:94:d5:7c:11:82
> 04-22 00:37:45.110   317   317 I Gecko   : -*- WifiWorker component: Event
> coming in: CTRL-EVENT-BSS-ADDED 100 3c:94:d5:7b:87:42
> 04-22 00:37:45.120   317   317 I Gecko   : -*- WifiWorker component: Event
> coming in: CTRL-EVENT-BSS-ADDED 101 3c:94:d5:7b:b8:02
> 04-22 00:37:45.120   317   317 I Gecko   : -*- WifiWorker component: Event
> coming in: CTRL-EVENT-BSS-ADDED 102 dc:38:e1:82:8a:02
> 04-22 00:37:45.120   317   317 I Gecko   : -*- WifiWorker component: Event
> coming in: CTRL-EVENT-BSS-ADDED 103 3c:94:d5:7b:40:80
> 04-22 00:37:45.120   317   317 I Gecko   : -*- WifiWorker component: Event
> coming in: CTRL-EVENT-BSS-ADDED 104 dc:38:e1:82:8a:04
> 04-22 00:37:45.120   317   317 I Gecko   : -*- WifiWorker component: Event
> coming in: CTRL-EVENT-BSS-ADDED 105 dc:38:e1:82:8a:00
> 04-22 00:37:45.130   317   317 I Gecko   : -*- WifiWorker component: Event
> coming in: CTRL-EVENT-BSS-ADDED 106 dc:38:e1:82:19:85
> 04-22 00:37:45.130   317   317 I Gecko   : -*- WifiWorker component: Event
> coming in: CTRL-EVENT-BSS-ADDED 107 00:e0:4c:ac:16:4f
> 04-22 00:37:45.130   317   317 I Gecko   : -*- WifiWorker component: Event
> coming in: CTRL-EVENT-BSS-ADDED 108 00:18:25:17:59:d1
> 04-22 00:37:45.130   317   317 I Gecko   : -*- WifiWorker component: Event
> coming in: CTRL-EVENT-BSS-ADDED 109 dc:38:e1:82:8a:08
> 04-22 00:37:45.130   317   317 I Gecko   : -*- WifiWorker component: Event
> coming in: CTRL-EVENT-BSS-ADDED 110 dc:38:e1:83:69:87
> 04-22 00:37:45.130   317   317 I Gecko   : -*- WifiWorker component: Event
> coming in: CTRL-EVENT-BSS-ADDED 111 00:18:25:02:5f:60
> 04-22 00:37:45.130   317   317 I Gecko   : -*- WifiWorker component: Event
> coming in: CTRL-EVENT-SCAN-RESULTS 
> 04-22 00:37:45.130   317   317 I Gecko   : -*- WifiWorker component:
> Notifying of scan results available
> 04-22 00:37:45.130   317   317 I Gecko   : -*- WifiWorker component: Scan
> results available, but we don't need them
> 04-22 00:37:45.130   317   317 I Gecko   : -*- WifiWorker component: Event
> coming in: WPS-AP-AVAILABLE 
> 04-22 00:37:45.130   317   317 I Gecko   : -*- WifiWorker component: Event
> coming in: Trying to associate with SSID 'openwireless.org'
> 04-22 00:37:45.130   317   317 I Gecko   : -*- WifiWorker component:
> Matched: openwireless.org
> 04-22 00:37:45.130   317   317 I Gecko   : -*- WifiWorker component: Event
> coming in: CTRL-EVENT-STATE-CHANGE id=0 state=5 BSSID=00:00:00:00:00:00
> SSID=openwireless.org
> 04-22 00:37:45.130   317   317 I Gecko   : -*- WifiWorker component: State
> change: SCANNING -> ASSOCIATING
> 04-22 00:37:49.450   317   317 I Gecko   : -*- WifiWorker component: Event
> coming in: CTRL-EVENT-STATE-CHANGE id=0 state=6 BSSID=00:00:00:00:00:00
> SSID=openwireless.org
> 04-22 00:37:49.450   317   317 I Gecko   : -*- WifiWorker component: State
> change: ASSOCIATING -> ASSOCIATED
> 04-22 00:37:49.460   317   317 I Gecko   : -*- WifiWorker component: Event
> coming in: Associated with 3c:94:d5:76:17:46
> 04-22 00:37:49.460   317   317 I Gecko   : -*- WifiWorker component: Event
> coming in: CTRL-EVENT-CONNECTED - Connection to 3c:94:d5:76:17:46 completed
> (auth) [id=0 id_str=]
> 04-22 00:37:49.490   317   317 I Gecko   : -*- WifiWorker component: Event
> coming in: CTRL-EVENT-STATE-CHANGE id=0 state=9 BSSID=3c:94:d5:76:17:46
> SSID=openwireless.org
> 04-22 00:37:49.490   317   317 I Gecko   : -*- WifiWorker component: State
> change: ASSOCIATED -> COMPLETED
I tested reverting commit abb00a, but problem still exists, after connecting to network, I have an immediate disconnect.

I attached logcat from before and after reverting that commit
(In reply to Alexandre LISSY :gerard-majax from comment #0)
> And I don't know why. All I could observe was:
>  - Settings app reports connection status change with huge delay (5 secs
> between the moment I tap a SSID and the moment it's marked as Connecting)

When wpa_supplicant tries to associate with specified AP, it tries to scan if the AP is nearby before connecting to it. It takes about 5 seconds for the wpa_drv_scan to get the results from wifi driver. It's probably because the driver scans each channel and takes longer time to get the scan results. As far as I know, one channel takes about 200 ~ 300 ms and total channel number which depended on region is 11 ~ 14.

Compare with the Flame device, the driver returns scan result for about 2 second. Not sure what's the difference between Flame and Z3-c  yet. 

[1] http://androidxref.com/5.1.0_r1/xref/external/wpa_supplicant_8/wpa_supplicant/driver_i.h#68
On Android with the same device, connecting to the same networks is nearly instant (< 2 secs).
(In reply to Alexandre LISSY :gerard-majax from comment #31)
> On Android with the same device, connecting to the same networks is nearly
> instant (< 2 secs).
I know, will try to check the log in Android. Not quite sure if we pull and use the same wifi driver and wpa_supplicant from z3. Or it's sort of porting issue and can be fixed in BoardConfig.mk.
So, is someone investigating why bug 1148671 can mess up like this?
Flags: needinfo?(vchang)
Flags: needinfo?(jjong)
Wesley, can you please see if we can get a fix for this asap? It is a hard blocker for the Z3C builds that need to be ready in the next few weeks.
Flags: needinfo?(whuang)
Priority: -- → P1
Assignee: nobody → vchang
Flags: needinfo?(vchang)
Some quick updates, 

1. Even if I bypass the scan command [1] and use old scan results instead. The connection may be blocked in the follow-up driver command.    

2. Found some missing parameters in z3c device defined in [2], I don't find some of them in our device's file system. Not quite sure if it is related to the 4 ~ 5 seconds driver delays. 

3. I am also noticed by "dmesg" command that shows some CFG80211 error when trying to connect to "Mozilla Guest"

<6>[  309.797023] CFG80211-ERROR) wl_cfg80211_connect : Connectting withff:ff:ff channel (0) ssid "Mozilla Guest", len (13)

[1] http://androidxref.com/5.1.0_r1/xref/external/wpa_supplicant_8/wpa_supplicant/events.c#1529
[2] https://github.com/CyanogenMod/android_device_sony_shinano-common/blob/cm-11.0/BoardConfigCommon.mk#L125
Whiteboard: [spark]
[Blocking Requested - why for this release]:
blocking-b2g: --- → spark?
Adding "+" per drs on IRC.
blocking-b2g: spark? → spark+
I think Vincent and Henry are looking into the root cause here.
Flags: needinfo?(jjong)
clear ni for myself and cc Aaron.
Flags: needinfo?(whuang)
Thanks Wesley! I will follow up with Vincent as Spark priority fix.
The patch improves scan and connect command delay(Item 1 in comment 1). I found that android z3c also has a 4~5 seconds delay while doing the scan in wifi driver. But Android does periodical scan and associate the AP in the middle of scanning to prevent waiting for scan results. The patch also reduces the delay for connect command by specifying bssid before associate an AP. 

LISSY, can you help to apply the patch and see if it helps to improve the delay problem. 

BTW, I still saw some strange behaviors while doing the test. Like Henry mentions in comment 20. Will continue to find out the root causes.
Flags: needinfo?(lissyx+mozillians)
Do you have any rationale on why this would help and more specifically how it relates to bug 1148671 ?
Flags: needinfo?(lissyx+mozillians)
Flags: needinfo?(vchang)
(In reply to Vincent Chang[:vchang] from comment #42)
> The patch improves scan and connect command delay(Item 1 in comment 1). I
> found that android z3c also has a 4~5 seconds delay while doing the scan in
> wifi driver. But Android does periodical scan and associate the AP in the
> middle of scanning to prevent waiting for scan results. The patch also
> reduces the delay for connect command by specifying bssid before associate
> an AP. 
> 
> LISSY, can you help to apply the patch and see if it helps to improve the
> delay problem. 
> 
> BTW, I still saw some strange behaviors while doing the test. Like Henry
> mentions in comment 20. Will continue to find out the root causes.

And I already filled this as bug 1144623 some time ago ...
(In reply to Alexandre LISSY :gerard-majax from comment #43)
> Do you have any rationale on why this would help and more specifically how
> it relates to bug 1148671 ?

As I said in comment 42, scan command takes about 4 ~ 5 seconds to get response from driver, the connect command with broadcast address also takes 4 ~ 5 seconds. These delay are happened in Z3c. I don't observe the same behavior in others device so far. The patch tries to reduce the delay generated by wifi driver. 

However, even if we can reduce the connection time, we still need to polish the settings app. Currently, the wifi connection status in settings app is depended on onstatuschange callback [1]. Since it still takes 2 seconds for scan results after applying the patch, user may still need to wait and reflect the status change.     

Without my patch, 

User connect=>associate=>scan=>wait scan results are presented(takes about 4 seconds)=>connect(with broadcast Mac address, it sometimes takes 4 seconds to connect)=>get ipaddress

With my patch, 

User connect=>scan=>associate(skip ongoing scan request)=>scan results are presented, get APs info from scan results(reduce about 2 seconds)=>connect(with AP's Mac address, it saves 4 seconds)  

[1] https://dxr.mozilla.org/mozilla-central/source/dom/webidl/MozWifiManager.webidl#327
Flags: needinfo?(vchang)
Flags: needinfo?(kamil0717)
Thanks Vincent, but I don't understand how this kind of delay can mess up with WiFi connection itself and especially with the changes from  bug 1148671.

Sorry to be picky, but I don't want to play a blind test.
Flags: needinfo?(vchang)
(In reply to Alexandre LISSY :gerard-majax from comment #44)
> (In reply to Vincent Chang[:vchang] from comment #42)
> > The patch improves scan and connect command delay(Item 1 in comment 1). I
> > found that android z3c also has a 4~5 seconds delay while doing the scan in
> > wifi driver. But Android does periodical scan and associate the AP in the
> > middle of scanning to prevent waiting for scan results. The patch also
> > reduces the delay for connect command by specifying bssid before associate
> > an AP. 
> > 
> > LISSY, can you help to apply the patch and see if it helps to improve the
> > delay problem. 
> > 
> > BTW, I still saw some strange behaviors while doing the test. Like Henry
> > mentions in comment 20. Will continue to find out the root causes.
> 
> And I already filled this as bug 1144623 some time ago ...

The major state changes to connect to the AP. 
scan -> associating -> associated -> connect -> completed -> run dhcpdc to obtain ip address  

I found that wifi driver may send DISCONNECT event actively even if wifi is in completed state. After that, wpa_supplicant automatically reconnect to AP, but always gets ASSOC_REJECT event. It's really frustrated to debug such problems without partner's help. Maybe we should use built-in driver in linux kernel?


The other case is that dhcpdc sends DHCP DISCOVER packet with no response and timeout. I guess that the packets are not sent out and blocked in the driver.
Flags: needinfo?(vchang)
We do have the out-of-tree driver sources, you know ? Those are in vendor/broadcom/wlan/. So the delay will make us getting an IP address sooner ?

And I still don't understand why bug 1148671 interacts here.
Flags: needinfo?(vchang)
BTW the builtin kernel bcmhd driver which is in-tree for this kernel is older than the outoftree one we have and are using.
(In reply to Alexandre LISSY :gerard-majax from comment #46)
> Thanks Vincent, but I don't understand how this kind of delay can mess up
> with WiFi connection itself and especially with the changes from  bug
> 1148671.
> 
> Sorry to be picky, but I don't want to play a blind test.

The patch doesn't help to fix mess up of wifi connection and the problems I found in comment 47. But it helps to reduce the connection time. I am still trying to find out why the wifi connection is messed up. But my strong opinion is that wifi driver or firmware is quite unstable and may has certain traps while doing scanning to cause the problem.
Flags: needinfo?(vchang)
(In reply to Alexandre LISSY :gerard-majax from comment #49)
> BTW the builtin kernel bcmhd driver which is in-tree for this kernel is
> older than the outoftree one we have and are using.

Do you think we should give it a shoot for built-in wifi driver in the kernel even if it is older?
I saw a lot of compiler flag for outoftree one, not sure we make every flags right.
Dropping the block on the spark meta-bug since this is blocking-b2g:spark+.
No longer blocks: spark
No longer depends on: 1148671
Keywords: regression
I changed the repository of bcmdhd wifi driver to https://github.com/sonyxperiadev/bcmdhd. Most of unstable, weird wifi driver issues are gone. Do you want to give it a shot?
Flags: needinfo?(lissyx+mozillians)
If we have a newer driver that is provided by Sony, with source code, and that it fixes the issues, I really don't see any objection ...

But when you are referring to "most of unstable, weird wifi driver issues", what are you exactly pointing at:
 - the side effects documented with the strange feedback of status,
 - the issue that is being tracked here, i.e., the regression from bug 1148671 ?

BTW I've reset the tracking flags, because until proved otherwise, this is a regression from bug 1148671.
Blocks: 1148671
Flags: needinfo?(lissyx+mozillians)
Keywords: regression
Flags: needinfo?(vchang)
(In reply to Alexandre LISSY :gerard-majax from comment #54)
> If we have a newer driver that is provided by Sony, with source code, and
> that it fixes the issues, I really don't see any objection ...
> 
> But when you are referring to "most of unstable, weird wifi driver issues",
> what are you exactly pointing at:
>  - the side effects documented with the strange feedback of status,
>  - the issue that is being tracked here, i.e., the regression from bug
> 1148671 ?
> 
> BTW I've reset the tracking flags, because until proved otherwise, this is a
> regression from bug 1148671.

I really wish it's regression then we can backout Bug 1148671 and close this bug..
(In reply to Henry Chang [:henry] from comment #55)
> (In reply to Alexandre LISSY :gerard-majax from comment #54)
> > If we have a newer driver that is provided by Sony, with source code, and
> > that it fixes the issues, I really don't see any objection ...
> > 
> > But when you are referring to "most of unstable, weird wifi driver issues",
> > what are you exactly pointing at:
> >  - the side effects documented with the strange feedback of status,
> >  - the issue that is being tracked here, i.e., the regression from bug
> > 1148671 ?
> > 
> > BTW I've reset the tracking flags, because until proved otherwise, this is a
> > regression from bug 1148671.
> 
> I really wish it's regression then we can backout Bug 1148671 and close this
> bug..

We would only know this if someone investigates why bug 1148671 breaks badly. I already documented this on comment 9.

Meanwhile, I'm going to test a build with bug 1148671 applied, and new WiFi driver. Sadly, it will take several days before I can give any conclusive feedback.
So in case anyone else might want to give a try, here is the patch to apply.
Vincent, Jessica, can you help me understanding one point here? How does the bad WiFi behavior explains that bug 1148671 breaks us?

I'm curious, because with the new driver, for now, it seems to be pretty stable and working very well.
Assignee: vchang → lissyx+mozillians
Component: Wifi → GonkIntegration
Flags: needinfo?(vchang) → needinfo?(jjong)
Summary: [Kitkat] Sometimes it's hard to connect to some WiFi network → [Shinano][Aries] Sometimes it's hard to connect to some WiFi network
(In reply to Alexandre LISSY :gerard-majax from comment #58)
> Vincent, Jessica, can you help me understanding one point here? How does the
> bad WiFi behavior explains that bug 1148671 breaks us?
> 
> I'm curious, because with the new driver, for now, it seems to be pretty
> stable and working very well.

Sorry, we can't explain how bug 1148671 breaks this because from the beginning we have thought, and we still think, they are unrelated. We have had wifi issues in z3c, with or without the patches in bug 1148671, that's why vincent and henry have been looking for the real root cause in the driver.
Flags: needinfo?(jjong)
After five days, I don't experience the issue anymore on all the WiFi networks I used to experience it. IPv6 still does work too. Let's land this.
Attached file Manifest PR
Attachment #8605221 - Flags: review?(mwu)
Status: NEW → ASSIGNED
Attachment #8605221 - Flags: review?(mwu) → review+
https://github.com/mozilla-b2g/b2g-manifest/commit/2e23003de93ac64970c1194461b606d6ac12de4e
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
blocking-b2g: spark+ → 2.5+
Target Milestone: --- → 2.2 S12 (15may)
Duplicate of this bug: 1223352
You need to log in before you can comment on or make changes to this bug.