Closed
Bug 809663
Opened 12 years ago
Closed 6 years ago
B2G Wifi: Make sure we successfully turn on background scanning (redux, again)
Categories
(Firefox OS Graveyard :: General, defect, P1)
Firefox OS Graveyard
General
Tracking
(blocking-b2g:-)
People
(Reporter: mrbkap, Unassigned)
References
Details
(Whiteboard: [awaiting partner help] [label:networking])
Attachments
(2 files)
454 bytes,
patch
|
Details | Diff | Splinter Review | |
129.62 KB,
patch
|
Details | Diff | Splinter Review |
+++ This bug was initially created as a clone of Bug #807148 +++ This bug won't die. This tracks bug 807148 comment 8. I'm going to give it to Vincent since he found it :-)
Comment 1•12 years ago
|
||
Blake, are we blocking on this, or did that flag just get carried over from the clone?
Reporter | ||
Comment 2•12 years ago
|
||
Well, the flag was accidentally carried over; but I think this needs to block (at least until we understand it better) since this bug can result in wifi not working until the phone is rebooted.
Updated•12 years ago
|
blocking-basecamp: ? → +
Comment 3•12 years ago
|
||
Marking for C2, given this meets the criteria of known P1/P2 blocking-basecamp+ bugs at the end of C1.
Target Milestone: --- → B2G C2 (20nov-10dec)
Comment 4•12 years ago
|
||
This can't be reproduced always. This case is used for monitoring. We're not quite sure how to reproduce this issue.
Keywords: qawanted
Updated•12 years ago
|
QA Contact: atsai
Comment 5•12 years ago
|
||
According to the log I found before, it seems that wifi driver got stuck and sent "Failed to initiate AP" event to WifiWorker. Reload the wifi driver seems help to recovery the device. Not sure if we need to do this workaround. mrbkap, how do you think ? 685:D/wpa_supplicant(647): nl80211: Scan trigger failed: ret=-16 (Device or resource busy) 686:W/wpa_supplicant( 647): wlan0: Failed to initiate AP scan 687:I/Gecko(109): -*- WifiWorker component: Event coming in: Failed to initiate AP scan
Comment 6•12 years ago
|
||
I tried to re-enabled unloadDriver in WifiWorker.js, and make it did unload the driver. It works fine in current kernel and driver. Don't remember why not unload driver in the beginning.
Comment 7•12 years ago
|
||
Blake, do you have any comment for comment 5? Thanks.
Flags: needinfo?(mrbkap)
Comment 8•12 years ago
|
||
Even if we tried hard to prevent wifi scan from getting stuck in Bug 796640 and Bug 807148 using background scan command(SET pno 1/0), but I found that we still may fall to do wifi scan. Even worse, the wifi driver may return -16 with device or resource busy in wpa_supplicant. In that case, the wifi driver seems getting stuck, and only reload wifi driver can recovery it. "D/wpa_supplicant( 739): nl80211: Scan trigger failed: ret=-16 (Device or resource busy)" The initial problem we tried to fix is that when device is moving out of coverage of connected AP, it will be disconnected. After the device is moving back to the coverage of known AP, it should connect to known AP automatically. When I was testing this, I saw the supplicant realize the connect had been lost, disconnect scan once (maybe) and start scanning a second time and get stuck in schedule scan. This also meant that further attempts to schedule scans failed, leading to an empty network list. Because we use the default setting ap_scan=1 and reply on wpa_supplicant to initiate scanning and AP selection. I saw there is a flag "sched_scan_supported" defined in external/hostap/src/drivers/driver_nl80211.c This value of this flag comes from driver's capability query results and is set to 1 in unagi. Since the problem is related to schedule scan, so I set it to 0 manually and disable background scan patch we have done in Bug 807148. After that, I don't observe schedule scan bug anymore. It turns out to let me thought that this bug should be fixed in wpa_supplicant or in wifi driver level. The repo of wpa_supplicant seems link to codeaurora. Not sure if our friends can help to verify and fix it. mvines, may I have you comments ?
Comment 9•12 years ago
|
||
Use android's private command to do sched_scan
Reporter | ||
Comment 10•12 years ago
|
||
Hey Mikes, it looks like our backs are to the wall here. We've tried really hard to work around this apparent driver bug, but the most resilient fix is to simply patch the supplicant (note: the best fix would be to make the driver play nicely with wpa_supplicant, but it isn't clear if that is a possible avenue before the release. How bad/hard is it for us to carry a patch to wpa_supplicant?
Flags: needinfo?(mrbkap) → needinfo?(mwu)
Reporter | ||
Updated•12 years ago
|
Flags: needinfo?(mvines)
Comment 11•12 years ago
|
||
There is a patching mechanism that lets us put in B2G specific patches. We don't use it currently but it's available if all else fails. However, we might be able to get this fixed on hamachi - I'll need to ask around.
Flags: needinfo?(mwu)
Comment 12•12 years ago
|
||
Reporter | ||
Comment 13•12 years ago
|
||
I'm going to give this to mwu since it sounds like he's the best-placed guy to get movement from the driver authors.
Assignee: vchang → mwu
Comment 14•12 years ago
|
||
:mwu, do you need help here? what's the update on this?
Comment 15•12 years ago
|
||
Michael Vines said he and our other friends can help but not for a bit as they're busy with other things right now.
Comment 16•11 years ago
|
||
I'm honestly not sure this is b-b+, since it may not be resolved by our code ship if this is a driver issue. Do we have another way of tracking partner work?
Target Milestone: B2G C2 (20nov-10dec) → B2G C3 (12dec-1jan)
Updated•11 years ago
|
Flags: needinfo?(mvines)
Comment 17•11 years ago
|
||
m1, can we get https://bug809663.bugzilla.mozilla.org/attachment.cgi?id=688188 added to the patches list? I would want to get the driver investigated and fixed here but I don't have the access to the kernel source nor much time.
Flags: needinfo?(mvines)
Comment 18•11 years ago
|
||
Yep, will take this up in the new year once back in the office.
Flags: needinfo?(mvines)
m1, feel free to close this after you've enqueued the patch.
Comment 20•11 years ago
|
||
10-4. Still in the queue. You may flog me in person next week if it's still outstanding (50/50 chance at this point I'd say).
Comment 21•11 years ago
|
||
I'm assigning to m1 just so we're clear.
Assignee: mwu → mvines
Whiteboard: [label:networking] → [awaiting partner help] [label:networking]
Updated•11 years ago
|
Target Milestone: B2G C3 (12dec-1jan) → B2G C4 (2jan on)
Comment 23•11 years ago
|
||
/flog
Updated•11 years ago
|
blocking-b2g: --- → tef+
blocking-basecamp: + → -
I can't seem to reproduce this issue.... removing qawanted.
Keywords: qawanted
Comment 27•11 years ago
|
||
This issue has not come up in our WLAN test, so at this point I'm a little uncomfortable landing this patch in our tree. Can I get some clear STR?
Reporter | ||
Comment 29•11 years ago
|
||
(In reply to Michael Vines [:m1] from comment #27) > This issue has not come up in our WLAN test, so at this point I'm a little > uncomfortable landing this patch in our tree. Can I get some clear STR? Hey Mike, This is a very intermittent problem: I've never seen it on my own devices, but I have seen it on other folks' devices. The symptom is that b2g won't automatically connect to other networks and in some cases, it'll refuse to scan at all.
Updated•11 years ago
|
Flags: needinfo?(ggrisco)
Comment 30•11 years ago
|
||
(greg, can you please try to reproduce this here)
Comment 31•11 years ago
|
||
(In reply to Andrew Overholt [:overholt] from comment #28) > Vincent, can you provide a reliable STR here? STR for schedule scan gets stuck(reproductive every time). 0. disable schedule scan workaround by setting manager.schedScanRecovery = false in http://mxr.mozilla.org/mozilla-central/source/dom/wifi/WifiWorker.js#57. You need to recompile and flash the code. 1. connect to AP. 2. leave the coverage of connected AP. => sched_scan failed here. It tries to do sched_scan but stop there. The normal scan command will be rejected because sched_scan operation is in progress. 3. After about 12 seconds, back to the coverage of AP. expect, the phone should connect to AP automatically. actual, the schedule scan get stuck. So the phone doesn't reconnect to AP automatically. STR for wifi driver return no resource or device busy error(not reproductive every time) 0. make sure manager.schedScanRecovery is true(the default settings) 1. connect to AP1 2. connect to AP2 3. turn off AP2, the phone will connect to AP1 automatically. 4. press the power button to turn off the screen of the phone and wait for about 2 minute. The phone will fall into sleep mode. 5. turn on the screen. expect, the phone is connected to AP1 automatically. actual, wpa_supplicant gets stuck and show "Failed to initiate AP scan" in logcat. Only reboot the device can recovery it.
Flags: needinfo?(vchang)
Comment 32•11 years ago
|
||
More information, just get a chance to try wpa_supplicant binary from unagi vendor. It works very well on schedule scan bug.
Comment 33•11 years ago
|
||
(In reply to Vincent Chang[:vchang] from comment #32) > More information, just get a chance to try wpa_supplicant binary from unagi > vendor. > It works very well on schedule scan bug. So what's left to be done here? Presumably we won't be providing the wpa_supplicant binary for shipping devices?
Flags: needinfo?(vchang)
Comment 34•11 years ago
|
||
(In reply to Andrew Overholt [:overholt] from comment #33) > (In reply to Vincent Chang[:vchang] from comment #32) > > More information, just get a chance to try wpa_supplicant binary from unagi > > vendor. > > It works very well on schedule scan bug. > > So what's left to be done here? Presumably we won't be providing the > wpa_supplicant binary for shipping devices? The wpa_supplicant is coming from codeaurora repo for unagi and otoro. Apparently, there is a compatibility bug for wpa_supplicant_codeaurora and wifi driver. Because I don't observe schedule scan bug when I replace wpa_supplicant_codeaurora to wpa_supplicant_unagi_vendor. It would be nice if our friend can reproduce the problem and help to fix it. So that we don't need the workaround in gecko which might make wifi driver getting stuck. We need to reboot the device when we encounter this situation.
Flags: needinfo?(vchang)
Comment 35•11 years ago
|
||
Why do the moz builds not just use the wpa_supplicant from the vendor as that's what is ultimately what is being used in the product?
Comment 36•11 years ago
|
||
(In reply to Michael Vines [:m1] from comment #35) > Why do the moz builds not just use the wpa_supplicant from the vendor as > that's what is ultimately what is being used in the product? Makes sense to me. We should be using whichever libs are closest to the shipping version.
Comment 37•11 years ago
|
||
(In reply to Michael Vines [:m1] from comment #35) > Why do the moz builds not just use the wpa_supplicant from the vendor as > that's what is ultimately what is being used in the product? Who would do this work? Do the right people have access to the wpa_supplicant binary? Is it okay from a legal standpoint for us to do this?
Flags: needinfo?
Comment 40•11 years ago
|
||
(In reply to Andrew Overholt [:overholt] from comment #37) > (In reply to Michael Vines [:m1] from comment #35) > > Why do the moz builds not just use the wpa_supplicant from the vendor as > > that's what is ultimately what is being used in the product? > > Who would do this work? Do the right people have access to the > wpa_supplicant binary? Is it okay from a legal standpoint for us to do this? If the original Otoro/Unagi Android image contains the right wpa_supplicant binary, we can simply extract it in build script without any legal issue, just like what we extract other blobs. Vincent, do you know whether the wpa_supplicant received from partner is same as the one in Android image?
Comment 41•11 years ago
|
||
> If the original Otoro/Unagi Android image contains the right wpa_supplicant
> binary, we can simply extract it in build script without any legal issue,
> just like what we extract other blobs.
>
> Vincent, do you know whether the wpa_supplicant received from partner is
> same as the one in Android image?
The checksum are different for these two blobs. But wpa_supplicant binary from Android image seems working well.
Reporter | ||
Comment 42•11 years ago
|
||
(In reply to Vincent Chang[:vchang] from comment #41) > The checksum are different for these two blobs. But wpa_supplicant binary > from Android image seems working well. I thought you'd tried this in Berlin and it didn't work. I wonder what changed.
Comment 43•11 years ago
|
||
Does this even need to block? If it's a vendor thing that we can't control ...
Comment 44•11 years ago
|
||
(In reply to Blake Kaplan (:mrbkap) from comment #42) > (In reply to Vincent Chang[:vchang] from comment #41) > > The checksum are different for these two blobs. But wpa_supplicant binary > > from Android image seems working well. > > I thought you'd tried this in Berlin and it didn't work. I wonder what > changed. I used the wpa_supplicant from otoro in Berlin. But I used wpa_supplicant from unagi in yesterday's try. Not sure the difference between these two binaries.
Comment 45•11 years ago
|
||
This doesn't appear to need to block since there's nothing we can do and things are in the OEM's hands. Re-nomming.
blocking-b2g: tef+ → tef?
Updated•11 years ago
|
blocking-b2g: tef? → -
blocking-basecamp: - → ---
Updated•11 years ago
|
Assignee: mvines → nobody
Flags: needinfo?(ggrisco)
Updated•11 years ago
|
Flags: needinfo?(mwu)
Comment 46•6 years ago
|
||
Firefox OS is not being worked on
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•