Closed
Bug 1061993
Opened 11 years ago
Closed 11 years ago
Closing an invalid fd in WiFi
Categories
(Firefox OS Graveyard :: Wifi, defect)
Tracking
(blocking-b2g:2.0+, firefox33 wontfix, firefox34 fixed, firefox35 fixed, b2g-v2.0 fixed, b2g-v2.0M fixed, b2g-v2.1 fixed, b2g-v2.2 fixed)
People
(Reporter: bhargavg1, Assigned: hchang)
References
Details
(Keywords: crash, Whiteboard: [b2g-crash][caf-crash 335][caf priority: p1][CR 716601])
Attachments
(5 files, 3 obsolete files)
591.49 KB,
text/plain
|
Details | |
536.89 KB,
text/plain
|
Details | |
1.71 MB,
application/zip
|
Details | |
5.36 KB,
patch
|
bajaj
:
approval-mozilla-b2g32+
|
Details | Diff | Splinter Review |
2.98 KB,
patch
|
bajaj
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
When using the patch from Bug 1057220 to catch all invalid fds being closed, we see a crash when operating the WiFi module
You can see the point of crash in the logs as,
libc : close() ret -1 errno -9 fd 213
In most of the cases when the crash happens see,
D DHCP : failed to set prefixLength 0: Cannot assign requested address
this crash has been seen 10 times in automation
We need a stack here.
Flags: needinfo?(bhargavg1)
extra file at time of crash
Please look for "close()" string in the file for the exact point
Flags: needinfo?(bhargavg1)
Flags: needinfo?(dlee)
Attachment #8483130 -
Attachment is obsolete: true
[Blocking Requested - why for this release]:
blocking-b2g: --- → 2.0?
Suspect to be a race condition while calling ifc_* functions because they all share same variable |ifc_ctl_sock| while doing ioctl() instead of request socket on their own.
Assignee | ||
Comment 8•11 years ago
|
||
There is a checkin-needed patch (see Bug 1038531) which may fix
the race condition by serializing all libnetutil command to one
thread (NetworkWorker). We can rerun the automation test while
the patch lands.
Flags: needinfo?(dlee)
(In reply to Henry Chang [:henry] from comment #8)
> There is a checkin-needed patch (see Bug 1038531) which may fix
> the race condition by serializing all libnetutil command to one
> thread (NetworkWorker). We can rerun the automation test while
> the patch lands.
Do you have a patch for v2.0? we can verify the same if available
There are two threads running ifc_* functions at same time after wifi disabled:
1. Network Worker thread [1]: ifc_act_on_ipv4_route()
2. Wifi Control thread [2]: ifc_configure() -> ifc_reset_connection()
The log shows crash happen in thread(30) running |close()| in |ifc_act_on_ipv4_route()|, while another thread(15) running |ioctl()| in |ifc_reset_connection()|.
Because ifc_* shares ifc_ctl_sock[3] between functions, the crash might be caused by ifc_act_on_ipv4_route() and ifc_configure() calls close() almost at same time, leaves no time for the set and check of ifc_ctl_sock in ifc_close()[4] to run.
Since ifc_* functions are not thread-safe, executing all ifc_* functions in single thread, as the patch mentioned in comment 8, might resolve this issue.
[1] http://mxr.mozilla.org/mozilla-central/source/dom/system/gonk/NetworkWorker.cpp#226
[2] http://mxr.mozilla.org/mozilla-central/source/dom/wifi/WifiProxyService.cpp#269
[3] https://android.googlesource.com/platform/system/core/+/master/libnetutils/ifc_utils.c#58
[4] https://android.googlesource.com/platform/system/core/+/master/libnetutils/ifc_utils.c#148
Updated•11 years ago
|
Whiteboard: [CR 716601] → [caf priority: p1][CR 716601]
Updated•11 years ago
|
Whiteboard: [caf priority: p1][CR 716601] → [b2g-crash][caf-crash 335][caf priority: p1][CR 716601]
Comment 11•11 years ago
|
||
Observed on:
Device: msm8226
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.076
Moz BuildID: 20140829000203
B2G Version: 2.0
Gecko Version: 32.0
Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=5bf3b8cdea15e62ce7bf77a15085a18e24e33c44
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=f2ab70e9ab18439dae25e6eb21f186708b2888de
Assignee | ||
Comment 12•11 years ago
|
||
(In reply to bhargavg1 from comment #9)
> (In reply to Henry Chang [:henry] from comment #8)
> > There is a checkin-needed patch (see Bug 1038531) which may fix
> > the race condition by serializing all libnetutil command to one
> > thread (NetworkWorker). We can rerun the automation test while
> > the patch lands.
>
> Do you have a patch for v2.0? we can verify the same if available
I believe the latest patch in Bug 1038531 cannot apply to 2.0.
Maybe one of the obsoleted patch could (such as attachment 8462323 [details] [diff] [review])
Updated•11 years ago
|
Assignee: nobody → hchang
Assignee | ||
Comment 13•11 years ago
|
||
Comment 14•11 years ago
|
||
(In reply to Henry Chang [:henry] from comment #13)
> Created attachment 8483383 [details] [diff] [review]
> Patch_for_V2_0.patch
This is a large change and i think too risky for 2.0 at this point. Can we have an alternate solution for 2.0 which is less risky?
Updated•11 years ago
|
Flags: needinfo?(hchang)
Assignee | ||
Comment 15•11 years ago
|
||
(In reply to Inder from comment #14)
> (In reply to Henry Chang [:henry] from comment #13)
> > Created attachment 8483383 [details] [diff] [review]
> > Patch_for_V2_0.patch
>
> This is a large change and i think too risky for 2.0 at this point. Can we
> have an alternate solution for 2.0 which is less risky?
Probably moving only the function which causes the race condition
is enough. Will provide a lighter patch as soon as possible.
Flags: needinfo?(hchang)
Assignee | ||
Comment 16•11 years ago
|
||
A more lightweight solution: adding exclusive lock to ifc_* functions [1]
[1] http://hg.mozilla.org/mozilla-central/file/776fa9cf70cd/dom/network/NetUtils.cpp#l88
Assignee | ||
Comment 17•11 years ago
|
||
Chuck,
Could you please take a look at this patch? I add a lock to ifc functions
to avoid race condition and it has lower rick to the release branch. Thansk!
Attachment #8483928 -
Flags: review?(chulee)
Comment on attachment 8483928 [details] [diff] [review]
Less risky patch
Review of attachment 8483928 [details] [diff] [review]:
-----------------------------------------------------------------
Looks good, thanks!
::: dom/network/src/NetUtils.cpp
@@ +60,5 @@
> DEFINE_DLFUNC(ifc_remove_host_routes, int32_t, const char*)
> DEFINE_DLFUNC(ifc_remove_default_route, int32_t, const char*)
> DEFINE_DLFUNC(dhcp_stop, int32_t, const char*)
>
> +NetUtils::NetUtils()
Nit: Extra space
Attachment #8483928 -
Flags: review?(chulee) → review+
Updated•11 years ago
|
blocking-b2g: 2.0? → 2.0+
Comment 19•11 years ago
|
||
CAF is picking this up today to test and ensure the issue is fixed. NI inder so it does not fall off the plate.
Flags: needinfo?(ikumar)
Comment 20•11 years ago
|
||
It was built into our AU: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.081. We will update with our findings shortly.
Flags: needinfo?(ikumar)
Comment 21•11 years ago
|
||
We want to see if crash reproduces with "Less risky patch" and bionic close() debug patch
Assignee | ||
Comment 22•11 years ago
|
||
Removed the trailing white space and renamed the patch since it's not risky at all :p
Attachment #8483383 -
Attachment is obsolete: true
Attachment #8483928 -
Attachment is obsolete: true
Assignee | ||
Comment 23•11 years ago
|
||
No news is good news!
Comment 24•11 years ago
|
||
Let's wait for final confirmation from our side. Adding ni on Greg to confim once we get results on this.
Flags: needinfo?(ggrisco)
Comment 25•11 years ago
|
||
Hi Greg and Inder, any update on this one? Thank you.
Flags: needinfo?(ikumar)
Flags: needinfo?(ggrisco)
Hi Howie,
We are not seeing this issue anymore in our stability test. Please go ahead and land this patch. We will create a new bug if it comes again :) .
Flags: needinfo?(ikumar)
Flags: needinfo?(ggrisco)
Comment 27•11 years ago
|
||
Hi Henry, please help to land this on master first. After it's fixed, request approval flag: approval‑mozilla‑b2g32 (v2.0). If the patch also apply to v2.1, please request the following approval too: approval‑mozilla‑aurora.
Flags: needinfo?(hchang)
Assignee | ||
Comment 28•11 years ago
|
||
(In reply to howie [:howie] from comment #27)
> Hi Henry, please help to land this on master first. After it's fixed,
> request approval flag: approval‑mozilla‑b2g32 (v2.0). If the patch also
> apply to v2.1, please request the following approval too:
> approval‑mozilla‑aurora.
This patch will only be applied to branched releases since this issue
can be fixed whenever Bug 1038531 lands.
Flags: needinfo?(hchang)
Assignee | ||
Comment 29•11 years ago
|
||
Comment on attachment 8487039 [details] [diff] [review]
No-risk patch (r+'d) (for 2.0 only)
NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.
[Approval Request Comment]
Bug caused by (feature/regressing bug #): Wifi
User impact if declined: Switching wifi off may cause invalid fd to be closed.
Testing completed: Yes.
Risk to taking this patch (and alternatives if risky): No
String or UUID changes made by this patch: No
Attachment #8487039 -
Attachment description: No-risk patch (r+'d) → No-risk patch (r+'d) (for 2.0/2.1 only)
Attachment #8487039 -
Flags: approval-mozilla-b2g30?
Attachment #8487039 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 30•11 years ago
|
||
Assignee | ||
Comment 31•11 years ago
|
||
By the way, since the patch is not harmful to master, I also attached patch for m-c.
Thanks.
Assignee | ||
Comment 32•11 years ago
|
||
Comment on attachment 8487039 [details] [diff] [review]
No-risk patch (r+'d) (for 2.0 only)
Withdraw mozilla-aurora approval request since this patch needs to rebase to apply to mozilla-aurora
Attachment #8487039 -
Flags: approval-mozilla-aurora?
Assignee | ||
Updated•11 years ago
|
Attachment #8487039 -
Attachment description: No-risk patch (r+'d) (for 2.0/2.1 only) → No-risk patch (r+'d) (for 2.0 only)
Assignee | ||
Updated•11 years ago
|
Attachment #8489894 -
Attachment description: Bug1061993_for_mc.patch → Bug1061993 (applicable for 2.1 and m-c)
Comment 33•11 years ago
|
||
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → 2.1 S5 (26sep)
Comment 35•11 years ago
|
||
Comment on attachment 8487039 [details] [diff] [review]
No-risk patch (r+'d) (for 2.0 only)
Correct the nom in Comment 29 to approval‑mozilla‑b2g32 (v2.0)
Attachment #8487039 -
Flags: approval-mozilla-b2g30? → approval-mozilla-b2g32?
Comment 36•11 years ago
|
||
There needs to be an Aurora approval request for the v2.1 uplift as well.
status-b2g-v2.0:
--- → affected
status-b2g-v2.1:
--- → affected
status-b2g-v2.2:
--- → fixed
status-firefox33:
--- → wontfix
status-firefox34:
--- → affected
status-firefox35:
--- → fixed
Flags: needinfo?(hchang)
Updated•11 years ago
|
Attachment #8487039 -
Flags: approval-mozilla-b2g32? → approval-mozilla-b2g32+
Comment 38•11 years ago
|
||
https://hg.mozilla.org/releases/mozilla-b2g32_v2_0/rev/5cf783171d5c
(In reply to Inder from comment #37)
> Bhavana -- We need this fixed for 2.1 as well.
Needs an approval request first. See comment 35.
Flags: needinfo?(bbajaj)
Comment 39•11 years ago
|
||
status-b2g-v2.0M:
--- → fixed
Comment 40•11 years ago
|
||
Henry, can you please help with the needed aurora patch here?
Comment 41•11 years ago
|
||
(In reply to bhavana bajaj [:bajaj] from comment #40)
> Henry, can you please help with the needed aurora patch here?
The patch applies cleanly to Aurora. Just needs an approval request...
Assignee | ||
Comment 42•11 years ago
|
||
Comment on attachment 8489894 [details] [diff] [review]
Bug1061993 (applicable for 2.1 and m-c)
Approval Request Comment
[Feature/regressing bug #]: Wifi
[User impact if declined]: Turning off wifi may cause to close invalid fd.
[Describe test coverage new/current, TBPL]: Bug 1057220 and wifi test cases
[Risks and why]: No
[String/UUID change made/needed]: No
Attachment #8489894 -
Flags: approval-mozilla-aurora?
Flags: needinfo?(hchang)
Assignee | ||
Comment 43•11 years ago
|
||
Sorry for late response. Already requested approval.
Thanks!
Updated•11 years ago
|
Attachment #8489894 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 44•11 years ago
|
||
Comment 45•11 years ago
|
||
Unable to verify as it is a back-end issue.
QA Whiteboard: [QAnalyst-Triage?][QAnalyst-verify-]
Flags: needinfo?(ktucker)
Updated•11 years ago
|
QA Whiteboard: [QAnalyst-Triage?][QAnalyst-verify-] → [QAnalyst-Triage+][QAnalyst-verify-]
Flags: needinfo?(ktucker)
You need to log in
before you can comment on or make changes to this bug.
Description
•