Closed
Bug 1001897
Opened 10 years ago
Closed 10 years ago
crash in wifi_connect_on_socket_path
Categories
(Firefox OS Graveyard :: Wifi, defect, P1)
Tracking
(blocking-b2g:1.4+)
People
(Reporter: tkundu, Assigned: hchang)
References
Details
(Keywords: crash, Whiteboard: [caf priority: p1][CR 655397][b2g-crash][p=3])
Crash Data
Attachments
(9 files)
Test steps: 1.Made an MO Call and got connected. 2.Sent Multiple MO SMS. 3.BT ON and paired with other device. 4.Transfered vedio files via BT. 5.Downloaded some games via WIFI. 6.While doing WIFI ON/OFF device got crashed.
Reporter | ||
Updated•10 years ago
|
blocking-b2g: --- → 1.4?
Reporter | ||
Comment 1•10 years ago
|
||
Updated•10 years ago
|
Severity: normal → critical
blocking-b2g: 1.4? → 1.4+
Keywords: crash
Whiteboard: [CR 655397] → [CR 655397][b2g-crash]
Assignee | ||
Comment 2•10 years ago
|
||
Looks like caused by unmatched c library function call: The stack trace: 0 libhardware_legacy.so!wifi_connect_on_socket_path [wifi.c : 653 + 0x2] ==> JB/KK wifi.c (because wifi_connect_on_socket_path only appears in JB/KK) 1 libxul.so!ICSWpaSupplicantImpl::do_wifi_connect_to_supplicant(char const*) [WifiUtils.cpp : 182 + 0x1] ==> ICS For ICS, we call wifi.c::wifi_connect_to_supplicant with no arg ( http://dxr.mozilla.org/mozilla-central/source/dom/wifi/WifiUtils.cpp#182 ) but the wifi.c::wifi_connect_to_supplicant actually called seems to be the JB/KK version, which requires an argument "interface".
Comment 4•10 years ago
|
||
Unfortunately I don't think it's quite that. In KK, the parameter to wifi_connect_on_socket_path() was dropped again. * https://www.codeaurora.org/cgit/quic/la/platform/hardware/libhardware_legacy/tree/wifi/wifi.c?h=kk_3.5#n656 * https://www.codeaurora.org/cgit/quic/la/platform/hardware/libhardware_legacy/commit/wifi?id=0f330488afcfff031bc0ba88de826f998f7cfaf9 I don't see much to go on right now, I'm hoping that we'll catch the crash again soon and maybe get lucky with the logs.
Assignee | ||
Comment 5•10 years ago
|
||
(In reply to Michael Vines [:m1] [:evilmachines] from comment #4) > Unfortunately I don't think it's quite that. In KK, the parameter to > wifi_connect_on_socket_path() was dropped again. > > * > https://www.codeaurora.org/cgit/quic/la/platform/hardware/libhardware_legacy/ > tree/wifi/wifi.c?h=kk_3.5#n656 > * > https://www.codeaurora.org/cgit/quic/la/platform/hardware/libhardware_legacy/ > commit/wifi?id=0f330488afcfff031bc0ba88de826f998f7cfaf9 > > I don't see much to go on right now, I'm hoping that we'll catch the crash > again soon and maybe get lucky with the logs. Yes you're right. If this crash occurred on KK, then the missing parameter is not the root cause. Are we able to get the wifi.c being used for this crash? Or it's just the unchanged AOSP KK wifi.c?
Comment 6•10 years ago
|
||
(In reply to Henry Chang [:henry] from comment #5) > > * https://www.codeaurora.org/cgit/quic/la/platform/hardware/libhardware_legacy/tree/wifi/wifi.c?h=kk_3.5 > > Are we able to get the wifi.c being used for this crash? Yep, I linked to it in comment 4. The reference to line 653 in the backtrace doesn't help me at least :-/
Assignee | ||
Comment 7•10 years ago
|
||
(In reply to Michael Vines [:m1] [:evilmachines] from comment #6) > (In reply to Henry Chang [:henry] from comment #5) > > > * https://www.codeaurora.org/cgit/quic/la/platform/hardware/libhardware_legacy/tree/wifi/wifi.c?h=kk_3.5 > > > > Are we able to get the wifi.c being used for this crash? > > Yep, I linked to it in comment 4. The reference to line 653 in the > backtrace doesn't help me at least :-/ Sadly I cannot reproduce this issue on nexus 5 with the codebase I checked out last night.
Assignee | ||
Comment 8•10 years ago
|
||
Also cannot reproduce on nexus 4 kk with manifest: https://github.com/mozilla-b2g/b2g-manifest/blob/master/nexus-4-kk.xml except for 1) gecko 1.4: http://hg.mozilla.org/releases/mozilla-b2g30_v1_4/ 2) hardware/libhardware_legacy/wifi/wifi.c @ rev kk_3.5
Comment 9•10 years ago
|
||
Hi Michael, can you help to check this in your side first per comment 7/8, and it is crashed in wifi.c ? Not sure about the reproduce rate about this bug ?
Flags: needinfo?(mvines)
Comment 10•10 years ago
|
||
We have seen this crash twice, and is not something that you're going to be able to easily reproduce with a couple minutes of testing. I'm hoping that the next time this reproduces we'll have more to go on.
Flags: needinfo?(mvines)
Comment 11•10 years ago
|
||
Hi Jason, this issue is still not reproducible. Should we remove the blocking then investigate it when there's more info from CAF? Thanks.
Flags: needinfo?(jsmith)
Comment 12•10 years ago
|
||
We've instrumented our build and hopefully the next time this reproduces there'll be more to go on. In the meantime we are in a holding pattern so if misusing this bug is the preferred Moz workflow then go for it!
Comment 13•10 years ago
|
||
according to comment 12, mark it as 1.4?, thanks.
blocking-b2g: 1.4+ → 1.4?
Flags: needinfo?(jsmith)
Updated•10 years ago
|
blocking-b2g: 1.4? → backlog
Comment 14•10 years ago
|
||
Comment 15•10 years ago
|
||
Updated•10 years ago
|
blocking-b2g: backlog → 1.4?
Comment 16•10 years ago
|
||
I meant to add these additional minidump/extra file to bug 1007766 which is related. I will update that bug now.
Comment 17•10 years ago
|
||
Henry - Is the new information Greg has provided enough to go off of to fix this bug?
Flags: needinfo?(hchang)
Assignee | ||
Comment 18•10 years ago
|
||
(In reply to Jason Smith [:jsmith] from comment #17) > Henry - Is the new information Greg has provided enough to go off of to fix > this bug? There are two different kinds of coredump until now: 1) stack check failure after wpa_ctrl_attach_helper returns. It's most likely due to the stack frame of function wpa_ctrl_attach_helper() corruption. However, I checked wpa_supplicant source code (I checked kk_3.5, not sure if it's being used in the test) but found no clue. 2) Invalid memory access in wpa_ctrl_recv in 1007766. We are suspecting it's due to some racing issue around the terminating event. For this dump, we are waiting for the test with patch applied as Bug 1007766 comment 15 mentioned. I am not sure if both of them are from the same root cause. (I hope they are ...) Greg, is the revision of libhardware_legacy as well as wpa_supplicant kk_3.5? Have you tried the same test run with any device that we can get in Taipei office? We tried nexus 4 kk and nexus 5 but the crash still not seen.... Thanks!
Flags: needinfo?(hchang) → needinfo?(ggrisco)
Assignee | ||
Updated•10 years ago
|
Whiteboard: [CR 655397][b2g-crash] → [CR 655397][b2g-crash][p=3]
Target Milestone: --- → 2.0 S2 (23may)
Comment 19•10 years ago
|
||
Triage : We are waiting for Greg to come back here before making a (non)blocking call.
Comment 20•10 years ago
|
||
(In reply to Henry Chang [:henry] from comment #18) > (In reply to Jason Smith [:jsmith] from comment #17) > > Henry - Is the new information Greg has provided enough to go off of to fix > > this bug? > > There are two different kinds of coredump until now: > > 1) stack check failure after wpa_ctrl_attach_helper returns. It's most > likely due to > the stack frame of function wpa_ctrl_attach_helper() corruption. However, > I checked > wpa_supplicant source code (I checked kk_3.5, not sure if it's being used > in the test) > but found no clue. > > 2) Invalid memory access in wpa_ctrl_recv in 1007766. We are suspecting it's > due to > some racing issue around the terminating event. For this dump, we are > waiting for > the test with patch applied as Bug 1007766 comment 15 mentioned. > > I am not sure if both of them are from the same root cause. (I hope they are > ...) > > Greg, is the revision of libhardware_legacy as well as wpa_supplicant kk_3.5? > Have you tried the same test run with any device that we can get in Taipei > office? > We tried nexus 4 kk and nexus 5 but the crash still not seen.... > > Thanks! Henry, Can we please try the same on a QRD? Is it reproducible on that device?
Comment 21•10 years ago
|
||
(In reply to Preeti Raghunath(:Preeti) from comment #20) > > Henry, > > Can we please try the same on a QRD? > > Is it reproducible on that device? I wonder if we need to do this again because this bug comes from partner and partner only uses QRD to do the test.
Comment 22•10 years ago
|
||
(In reply to Henry Chang [:henry] from comment #18) > > Greg, is the revision of libhardware_legacy as well as wpa_supplicant kk_3.5? > Have you tried the same test run with any device that we can get in Taipei > office? Yes, these are both kk_3.5. I'm waiting for test results from applying the patch in bug 1007766 and will report back once I have them. We haven't tried on other devices, I think test team only has QRD.
Comment 23•10 years ago
|
||
Crash observed on: Device: msm8226 Gonk Version: AU_LINUX_GECKO_B2G_KK_3.5.01.04.00.113.097 Moz BuildID: 20140511000204 B2G Version: 1.4 Gecko Version: 30.0 Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=17fb44880e95bc7ae363a609d811bf5a9a067b5b Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=2f11e3aba98eb785ec24504fe9988ab61a03b64d
Assignee | ||
Comment 24•10 years ago
|
||
(In reply to cafbot (PoC: ggrisco) from comment #23) > Crash observed on: > > Device: msm8226 > Gonk Version: AU_LINUX_GECKO_B2G_KK_3.5.01.04.00.113.097 > Moz BuildID: 20140511000204 > B2G Version: 1.4 > Gecko Version: 30.0 > Gaia: > http://git.mozilla.org/?p=releases/gaia.git;a=commit; > h=17fb44880e95bc7ae363a609d811bf5a9a067b5b > Gecko: > http://git.mozilla.org/?p=releases/gecko.git;a=commit; > h=2f11e3aba98eb785ec24504fe9988ab61a03b64d This is the crash which occurred on gecko w/o the patch I mentioned applied. We also expect this bug could be resolved by that patch.
Assignee | ||
Updated•10 years ago
|
Whiteboard: [CR 655397][b2g-crash][p=3] → [CR 655397][b2g-crash][p=3][ETA: 5/27]
Target Milestone: 2.0 S2 (23may) → 2.0 S6 (18july)
Comment 25•10 years ago
|
||
Flags: needinfo?(ggrisco)
Comment 26•10 years ago
|
||
Comment 27•10 years ago
|
||
Comment 28•10 years ago
|
||
Comment 29•10 years ago
|
||
Marking this dependent on bug 1005775, since that bug should fix this issue.
blocking-b2g: 1.4? → 1.4+
Depends on: 1005775
Comment 30•10 years ago
|
||
(In reply to Jason Smith [:jsmith] from comment #29) > Marking this dependent on bug 1005775, since that bug should fix this issue. Actually, this is a different crash than 1005775 and the patch does not fix it. I had the patch applied when running the tests that produced minidump #3 and #4 (attached). Sorry, I should have made that more clear. The crash with signature containing "wifi_connect_on_socket_path" are not fixed by the patch from bug 1005775.
Flags: needinfo?(jsmith)
Comment 32•10 years ago
|
||
details for minidump #4 which had patch for bug 1005775 applied: Device: msm8226 Moz BuildID: 20140518000201 Gonk Version: AU_LINUX_GECKO_B2G_KK_3.5.01.04.00.113.105 B2G Version: 1.4 Gecko Version: 30.0 Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=7019efdbcfa58d3ff4702b018420db3d8753bb93 Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=46b48309fc782275104e128f58a601123c21922e
Updated•10 years ago
|
Flags: needinfo?(ikumar)
Comment 34•10 years ago
|
||
changing ni to Preeti for ETA.
Flags: needinfo?(ikumar) → needinfo?(praghunath)
Comment 35•10 years ago
|
||
ni Eric for ETA. I'm now keen to understand next steps.
Flags: needinfo?(praghunath)
Assignee | ||
Comment 36•10 years ago
|
||
We found FD_SET [1] with socket fd larger than 1024 would cause the stack overflow which we observe in the crash log. So we are suspecting if there is any file descriptor leak in wifi or the entire system. We are also still trying to reproduce this crash. In the mean time, are you able to attach gdb to capture this kind of crash to see if too many files are opened at the time crashing. (For example, b2g process opens tons of socket connecting to wpa_supplicant). Thanks! [1] http://androidxref.com/4.4.2_r2/xref/external/wpa_supplicant_8/src/common/wpa_ctrl.c#454
Flags: needinfo?(hchang)
Comment 37•10 years ago
|
||
Adding ni to address questions raised by Henry. PS: Please ni for faster response.
Flags: needinfo?(tkundu)
Flags: needinfo?(ggrisco)
Reporter | ||
Comment 38•10 years ago
|
||
(In reply to Henry Chang [:henry] from comment #36) > We found FD_SET [1] with socket fd larger than 1024 would cause the stack > overflow which we observe in the crash log. So we are suspecting if there is > any file descriptor leak in wifi or the entire system. We are also still > trying to reproduce this crash. In the mean time, are you able to attach gdb > to capture this kind of crash to see if too many files are opened at the > time crashing. (For example, b2g process opens tons of socket connecting to > wpa_supplicant). Thanks! > > [1] > http://androidxref.com/4.4.2_r2/xref/external/wpa_supplicant_8/src/common/ > wpa_ctrl.c#454 Thanks for your help. We are trying to find the fd leak by looking into |lsof| output . We will update here soon.
Assignee | ||
Comment 39•10 years ago
|
||
We reproduced the crash after 10 hours wifi toggle test on nexus 4 with m-c revision [1] and found there was a large number of ashmem allocated (using "adb shell lsof| grep b2g | grep -c ashmem"). Bug 1004191 has addressed this issue and supposedly resolved this leak but we still see this issue on mozilla-central (patch for Bug 1004191 is in the tree we use.) [1] http://hg.mozilla.org/mozilla-central/log?rev=41a54c8add09
Comment 40•10 years ago
|
||
Henry So we will continue investigation right? Did we want partner to test with patch on bug 1004191?
Flags: needinfo?(hchang)
Comment 41•10 years ago
|
||
Crash observed on: Device: msm8226 Gonk Version: AU_LINUX_GECKO_B2G_KK_3.5.01.04.00.113.097 Moz BuildID: 20140511000204 B2G Version: 1.4 Gecko Version: 30.0 Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=17fb44880e95bc7ae363a609d811bf5a9a067b5b Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=2f11e3aba98eb785ec24504fe9988ab61a03b64d
Comment 42•10 years ago
|
||
Crash observed on: Device: msm8226 Gonk Version: AU_LINUX_GECKO_B2G_KK_3.5.01.04.00.113.097 Moz BuildID: 20140511000204 B2G Version: 1.4 Gecko Version: 30.0 Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=17fb44880e95bc7ae363a609d811bf5a9a067b5b Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=2f11e3aba98eb785ec24504fe9988ab61a03b64d
Comment 43•10 years ago
|
||
Crash observed on: Device: msm8226 Gonk Version: AU_LINUX_GECKO_B2G_KK_3.5.01.04.00.113.102 Moz BuildID: 20140515000202 B2G Version: 1.4 Gecko Version: 30.0 Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=2e97bee6bb79d3577dba1bf2a1bbfcba64ee99ab Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=35f27a8e9b3f651748aa22095553024556272de8
Comment 44•10 years ago
|
||
sorry for the spam, please ignore comments 41-43.
Flags: needinfo?(ggrisco)
Assignee | ||
Comment 45•10 years ago
|
||
(In reply to Preeti Raghunath(:Preeti) from comment #40) > Henry > > So we will continue investigation right? Did we want partner to test with > patch on bug 1004191? Patch on bug 1004191 is for mozilla-central. I found another bug 998504 about fd leak which was caught and has been resolved on 1.4. The most prioritized thing I want to confirm is if these crashes observed by partner were really caused by fd leak. I am pretty sure fd leak could lead the same crash/stack overflow but there is no evidence to show the crashes caught by partner were caused by fd leak until we could capture the crash and run "adb shell lsof" at the time crash occurs. Thanks!
Flags: needinfo?(hchang)
Comment 46•10 years ago
|
||
Crash observed on: Device: msm8226 Gonk Version: AU_LINUX_GECKO_B2G_KK_3.5.01.04.00.113.102 Moz BuildID: 20140515000202 B2G Version: 1.4 Gecko Version: 30.0 Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=2e97bee6bb79d3577dba1bf2a1bbfcba64ee99ab Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=35f27a8e9b3f651748aa22095553024556272de8
Comment 47•10 years ago
|
||
Hi cafbot, can you also dump the file descriptor list using "adb shell lsof" command when crash happened ? It's really an important clue to identify if the crash is caused by file descriptor leak.
Comment 48•10 years ago
|
||
(In reply to Vincent Chang[:vchang] from comment #47) > Hi cafbot, can you also dump the file descriptor list using "adb shell lsof" > command when crash happened ? It's really an important clue to identify if > the crash is caused by file descriptor leak. We're working on this feature actually, not currently available though.
Comment 49•10 years ago
|
||
Does it mean that you are working on fixing the file descriptor leak or something else ?
Comment 50•10 years ago
|
||
(In reply to Vincent Chang[:vchang] from comment #49) > Does it mean that you are working on fixing the file descriptor leak or > something else ? We're working on adding 'can you also dump the file descriptor list using "adb shell lsof" command when crash happened ?' to the crash logs. We are not working on a Gecko file descriptor leak at this time.
Comment 51•10 years ago
|
||
(In reply to Michael Vines [:m1] [:evilmachines] from comment #50) > (In reply to Vincent Chang[:vchang] from comment #49) > > Does it mean that you are working on fixing the file descriptor leak or > > something else ? > > We're working on adding 'can you also dump the file descriptor list using > "adb shell lsof" command when crash happened ?' to the crash logs. We are > not working on a Gecko file descriptor leak at this time. Thanks for your prompt response. :-) May be you can use the "watch -n 1 "adb shell lsof| grep b2g | grep -c ashmem"" command to observe if the count continues to increase. It seems to increase by 1 for every 30 seconds in our testing. Roughly estimated, the fd may exceed to 1024 less than 10 hours and cause the crash.
Assignee | ||
Comment 52•10 years ago
|
||
Easier way to prove the fd leak: Apply this patch and run the test until crash. There's supposed a line of log: "It's going to have a stack overflow since monitor_conn->s >= 1024" if the crash is caused by fd leak.
Comment 53•10 years ago
|
||
:tk, can you please check out Henry's patch.
Comment 54•10 years ago
|
||
Crash observed on: Device: msm8226 Gonk Version: AU_LINUX_GECKO_B2G_KK_3.5.01.04.00.113.105 Moz BuildID: 20140518000201 B2G Version: 1.4 Gecko Version: 30.0 Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=7019efdbcfa58d3ff4702b018420db3d8753bb93 Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=46b48309fc782275104e128f58a601123c21922e
Comment 55•10 years ago
|
||
Crash observed on: Device: msm8226 Gonk Version: AU_LINUX_GECKO_B2G_KK_3.5.01.04.00.113.105 Moz BuildID: 20140518000201 B2G Version: 1.4 Gecko Version: 30.0 Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=7019efdbcfa58d3ff4702b018420db3d8753bb93 Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=46b48309fc782275104e128f58a601123c21922e
Reporter | ||
Comment 56•10 years ago
|
||
(In reply to Henry Chang [:henry] from comment #52) > Created attachment 8428544 [details] [diff] [review] > wifi.c.patch > > Easier way to prove the fd leak: Apply this patch and run the test until > crash. There's supposed a line of log: > > "It's going to have a stack overflow since monitor_conn->s >= 1024" > > if the crash is caused by fd leak. Our stability team is testing new build with your patch and I will update here when they reproduces it next time. I also uploaded logs from last stability test (it also has lsof logs every 30 seconds) in [1] Crash timestamp is : 2014-05-26 03:38:38 in logs I can see total number of FD opened by b2g process is 420 at the time of crash. Could you please take a look and suggest ? [1] https://drive.google.com/file/d/0B1cSMS8_GuAEYW1MMXZMZnRuWXc/edit?usp=sharing
Flags: needinfo?(tkundu) → needinfo?(hchang)
Assignee | ||
Comment 57•10 years ago
|
||
(In reply to Tapas Kumar Kundu from comment #56) > (In reply to Henry Chang [:henry] from comment #52) > > Created attachment 8428544 [details] [diff] [review] > > wifi.c.patch > > > > Easier way to prove the fd leak: Apply this patch and run the test until > > crash. There's supposed a line of log: > > > > "It's going to have a stack overflow since monitor_conn->s >= 1024" > > > > if the crash is caused by fd leak. > > Our stability team is testing new build with your patch and I will update > here when they reproduces it next time. > > I also uploaded logs from last stability test (it also has lsof logs every > 30 seconds) in [1] > > Crash timestamp is : 2014-05-26 03:38:38 in logs > > I can see total number of FD opened by b2g process is 420 at the time of > crash. > > Could you please take a look and suggest ? > > [1] > https://drive.google.com/file/d/0B1cSMS8_GuAEYW1MMXZMZnRuWXc/edit?usp=sharing I checked the log and it's not the same crash point as before [1]. The output of lsof looks fine at the time of crash... [1] Thread 12 (crashed) 0 libc.so!strstr [strstr.c : 49 + 0x0] r0 = 0xfffffff1 r1 = 0xb6d25714 r2 = 0x00000012 r3 = 0x0000005a r4 = 0x9f900000 r5 = 0x9f900000 r6 = 0xb6d25714 r7 = 0x00000063 r8 = 0x00000012 r9 = 0x9f8fe050 r10 = 0xb6eae394 r12 = 0x01000000 fp = 0xb6eac2ec sp = 0xb08fff20 lr = 0xb6e8b6fd pc = 0xb6e8b6e8 Found by: given as instruction pointer in context 1 libhardware_legacy.so!update_ctrl_interface [wifi.c : 448 + 0x9] r4 = 0x9f8fe050 r5 = 0x000000d0 r6 = 0xb6d25d80 r7 = 0x00000072 r8 = 0xb6eae394 r9 = 0x9f8fe050 r10 = 0xb6eae394 fp = 0xb6eac2ec sp = 0xb08fff38 pc = 0xb6d2391f Found by: call frame info 2 libhardware_legacy.so!ensure_config_file_exists [wifi.c : 495 + 0x5] r4 = 0xb6d25d80 r5 = 0xb6eae394 r6 = 0xb6d2554d r7 = 0xb6eae394 r8 = 0xb6eae394 r9 = 0xb63c67f0 r10 = 0x00000000 fp = 0xb6eac2ec sp = 0xb0900030 pc = 0xb6d23ae5 Found by: call frame info
Flags: needinfo?(hchang)
Reporter | ||
Comment 58•10 years ago
|
||
(In reply to Henry Chang [:henry] from comment #57) > (In reply to Tapas Kumar Kundu from comment #56) > > I also uploaded logs from last stability test (it also has lsof logs every > > 30 seconds) in [1] > > > > Crash timestamp is : 2014-05-26 03:38:38 in logs > > Sorry, crash timestamp is 2014-05-25 03:38:38 in logs . Sorry for confusion. Please let me know if you checked lsof logs for some other timestamp. > I checked the log and it's not the same crash point as before [1]. We guessed that crash [1] may be same as this crash. But our guess may be wrong. > [1] > https://drive.google.com/file/d/0B1cSMS8_GuAEYW1MMXZMZnRuWXc/edit?usp=sharing
Flags: needinfo?(hchang)
Assignee | ||
Comment 59•10 years ago
|
||
(In reply to Tapas Kumar Kundu from comment #58) > (In reply to Henry Chang [:henry] from comment #57) > > (In reply to Tapas Kumar Kundu from comment #56) > > > I also uploaded logs from last stability test (it also has lsof logs every > > > 30 seconds) in [1] > > > > > > Crash timestamp is : 2014-05-26 03:38:38 in logs > > > > Sorry, crash timestamp is 2014-05-25 03:38:38 in logs . Sorry for confusion. > Please let me know if you checked lsof logs for some other timestamp. > So, the lsof dump at 2014-05-25 03:38:38 does show fd (most likely "anon_inode:dmabuf") was leaking and it is definitely a bug and will definitely result in the stack overflow. > > I checked the log and it's not the same crash point as before [1]. > > We guessed that crash [1] may be same as this crash. But our guess may be > wrong. > > > [1] > > https://drive.google.com/file/d/0B1cSMS8_GuAEYW1MMXZMZnRuWXc/edit?usp=sharing I am not saying they are due to different root cause since I cannot explain what really caused strstr crashed in the log you attached. But the points that the crash occurred are different. I am now trying to connect the strstr crash to fd leak. thanks!
Flags: needinfo?(hchang)
Assignee | ||
Comment 60•10 years ago
|
||
Filed Bug 1017589 for the fd leaking issue observed in the log on comment 56. As long as we prove the wifi_connect_on_socket_path crash is caused by fd leak, we can make this bug a dup of Bug 1017589.
Comment 61•10 years ago
|
||
(In reply to Tapas Kumar Kundu from comment #56) > (In reply to Henry Chang [:henry] from comment #52) > > Created attachment 8428544 [details] [diff] [review] > > wifi.c.patch > > > > Easier way to prove the fd leak: Apply this patch and run the test until > > crash. There's supposed a line of log: > > > > "It's going to have a stack overflow since monitor_conn->s >= 1024" Tapas, why it has a limitation of 1024? On b2g, b2g process' file descriptor number could rise up to more than 4000 in some use cases. See Bug 877495. Therefore, current b2g process's file descriptor's limitation is set to 8192 by the following. https://github.com/mozilla-b2g/gonk-misc/blob/master/b2g.sh#L6
Flags: needinfo?(tkundu)
Comment 62•10 years ago
|
||
> I also uploaded logs from last stability test (it also has lsof logs every > 30 seconds) in [1] > > Crash timestamp is : 2014-05-26 03:38:38 in logs > > I can see total number of FD opened by b2g process is 420 at the time of > crash. > > Could you please take a look and suggest ? As in Comment 61, we can not say 420 is huge number in b2g. It could not be a evidence of leaking.
Reporter | ||
Comment 63•10 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #61) > (In reply to Tapas Kumar Kundu from comment #56) > > (In reply to Henry Chang [:henry] from comment #52) > > > Created attachment 8428544 [details] [diff] [review] > > > wifi.c.patch > > > > > > Easier way to prove the fd leak: Apply this patch and run the test until > > > crash. There's supposed a line of log: > > > > > > "It's going to have a stack overflow since monitor_conn->s >= 1024" > > Tapas, why it has a limitation of 1024? On b2g, b2g process' file descriptor > number could rise up to more than 4000 in some use cases. See Bug 877495. > Therefore, current b2g process's file descriptor's limitation is set to 8192 > by the following. > > https://github.com/mozilla-b2g/gonk-misc/blob/master/b2g.sh#L6 I am seeing fd limit as 8192 for b2g process in v1.4 : https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla-b2g/gonk-misc/tree/b2g.sh?h=mozilla/v1.4#n6 I am curious to know why do you think that we have 1024 as fd limit for b2g process ? BTW, it seems to me that patch from #comment 52 can be a just hypothesis for this bug which is not proved yet.
Flags: needinfo?(sotaro.ikeda.g)
Comment 64•10 years ago
|
||
I already answered to the question of Comment 63 at Bug 1017589 Comment 10.
Flags: needinfo?(tkundu)
Flags: needinfo?(sotaro.ikeda.g)
Comment 65•10 years ago
|
||
Crash observed on: Device: msm8226 Gonk Version: AU_LINUX_GECKO_B2G_KK_3.5.01.04.00.113.114 Moz BuildID: 20140528000201 B2G Version: 1.4 Gecko Version: 30.0 Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=cd595be0a8e975559e8938830df5face89bec3e8 Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=d591b0c691da6847dcb9a4f626211b597e8807fe
Assignee | ||
Comment 66•10 years ago
|
||
According to Bug 1017589 Comment 15, wpa_supplicant is going to be fixed to avoid fd_set stack overflow issue since fd greater than 1024 is not a bad value to B2G.
Updated•10 years ago
|
Whiteboard: [CR 655397][b2g-crash][p=3][ETA: 5/27] → [caf priority: p1][CR 655397][b2g-crash][p=3][ETA: 5/27]
Comment 67•10 years ago
|
||
Hi TK, according to Bug 1017589 Comment 14, have you fixed the wap_supplicant? have it any improvement for this bug?
Flags: needinfo?(tkundu)
Reporter | ||
Comment 68•10 years ago
|
||
(In reply to Ken Chang[:ken] from comment #67) > Hi TK, according to Bug 1017589 Comment 14, have you fixed the > wap_supplicant? have it any improvement for this bug? yes. We have landed some fix for wpa_supplicant and we are tying to find out more usecase where b2g process is calling SELECT. I will update here soon
Comment 69•10 years ago
|
||
Waiting for Partner's feedback.
Whiteboard: [caf priority: p1][CR 655397][b2g-crash][p=3][ETA: 5/27] → [caf priority: p1][CR 655397][b2g-crash][p=3]
Reporter | ||
Comment 70•10 years ago
|
||
We are not seeing this issue anymore in our testing. But we seeing bug 1025414
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(tkundu)
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•