Closed Bug 1010292 Opened 11 years ago Closed 10 years ago

crash in strstr | update_ctrl_interface

Categories

(Firefox OS Graveyard :: Vendcom, defect)

ARM
Gonk (Firefox OS)
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1025414

People

(Reporter: rkuhlman, Assigned: vchang)

Details

(Keywords: crash, Whiteboard: [b2g-crash])

Crash Data

Attachments

(3 files)

This bug was filed from the Socorro interface and is 
report bp-eb6df8a9-38af-4919-9987-d00c42140514.
=============================================================

No particular STR to reproduce this crash. I flashed to todays(5/14) v1.4 Flame build on Open C. The crash occured after the flashing process was completed, but before the user was able to interact with the device. After dismissing the crash notification, user is returned to choose language screen of FTU

v1.4 Environmental Variables:
Device: Open C v1.4
BuildID: 20140514000204
Gaia: b40103dec34a147c9018a1af76eb21c3184f2f93
Gecko: 7788969f70b0
Version: 30.0
Firmware Version: P821A10v1.0.0B06_LOG_DL
Unable to reproduce issue after resetting or reflashing phone. repro rate 1/3.
Steps wanted
Keywords: steps-wanted
Component: General → Wifi
Whiteboard: [b2g-crash]
We're seeing this on our Flame, in automation, on master:

https://crash-stats.mozilla.com/report/index/64512080-78d8-4769-975d-0857f2140516

The STR for us are automation-specific, but it's running test_settings_wifi.py multiple times, and almost every run now crashes in at least one test iteration out of 20.
Attached file logcat.txt
Here's the logcat which corresponds to the crash I mentioned, above.
Is the wifi crash seen in automation only happening on 2.0? Or is this also seen on 1.4?
Flags: needinfo?(stephen.donner)
(In reply to Jason Smith [:jsmith] from comment #4)
> Is the wifi crash seen in automation only happening on 2.0? Or is this also
> seen on 1.4?

We don't have Flames in automation set up with builds/jobs for 1.4, so I don't know.
Flags: needinfo?(stephen.donner)
Hi Stephen, can you help to apply the patch in Bug 1005775 and see if it help to fix the crash ?
Flags: needinfo?(stephen.donner)
(In reply to Vincent Chang[:vchang] from comment #6)
> Hi Stephen, can you help to apply the patch in Bug 1005775 and see if it
> help to fix the crash ?

Sorry, I don't build.
Flags: needinfo?(stephen.donner)
(In reply to Vincent Chang[:vchang] from comment #6)
> Hi Stephen, can you help to apply the patch in Bug 1005775 and see if it
> help to fix the crash ?

Hi Vincent, I am curious about how patch in Bug 1005775 relates to update_ctrl_interface.
If the wifi.c used by this build is:

http://androidxref.com/4.3_r2.1/xref/hardware/libhardware_legacy/wifi/wifi.c

since |pbuf| is a local variable pointing to heap and isn't changed after malloc,
the only chance to cause segmentation fault is some stack overflow happens to its neighbor |ifc|....
Assignee: nobody → vchang
(In reply to rkuhlman from comment #1)
> Unable to reproduce issue after resetting or reflashing phone. repro rate
> 1/3.
> Steps wanted

Can we have the source code of this build? Or it's the unmodified AOSP? Thanks!
Hi Stephen, do you still encounter this problem running test_settings_wifi.py in Flame ? I am running the same test for couple of hours, but no luck to reproduce the crash.
Flags: needinfo?(stephen.donner)
(In reply to Henry Chang [:henry] from comment #8)
> (In reply to Vincent Chang[:vchang] from comment #6)
> > Hi Stephen, can you help to apply the patch in Bug 1005775 and see if it
> > help to fix the crash ?

> Hi Vincent, I am curious about how patch in Bug 1005775 relates to
> update_ctrl_interface.

Oh, you are right, that patch doesn't relate to this crash. 

> since |pbuf| is a local variable pointing to heap and isn't changed after
> malloc,
> the only chance to cause segmentation fault is some stack overflow happens
> to its neighbor |ifc|....

Nice catch, but the code snippet for ifc below seems fine.  

   if (!strcmp(config_file, SUPP_CONFIG_FILE)) {
        property_get("wifi.interface", ifc, WIFI_TEST_INTERFACE);
    } else {
        strcpy(ifc, CONTROL_IFACE_PATH);
    }
I have run adapted version of test_settings_wifi.py over the weekend without observing the crash. I am going to close the bug based on the test result. 
Feel free to reopen if you observe the same problem again.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
Attached file logcat.txt
Flags: needinfo?(stephen.donner)
(In reply to Vincent Chang[:vchang] from comment #12)
> Created attachment 8428473 [details]
> adapted test_settings_wifi_enable.py
> 
> I have run adapted version of test_settings_wifi.py over the weekend without
> observing the crash. I am going to close the bug based on the test result. 
> Feel free to reopen if you observe the same problem again.

What was the adaptation?  Can you tell if there's a crash in the logcat I just uploaded?  We won't be able to tell if we're still crashing until we physically look at the device, which due to a U.S. holiday, won't be until Tuesday.
You can see the difference in the attached I uploaded in comment 12. It just toggles wifi on/off continually without reboot the device. I think it could help to reproduce the crash quickly. I attached the GDB in Flame and ran this test over the weekend. But I don't observe the crash. 

I am skimming through the logcat you just uploaded, I saw some processes restart, but it seems like the reboot and not the crash. Please help to double confirm if the device is crashed tomorrow and feel free to reopen if you find any problem.
Note - if we end up seeing this crash again, feel free to reopen.
Keywords: steps-wanted
We can't reproduce this locally or with several runs in our lab, either, on master.  Given that we _were_ able to, though, and there was a fix over in bug 1005775, should we mark this a duplicate of that, and update that bug with the signature, for tracking?
Flags: needinfo?(jsmith)
Sure.
Flags: needinfo?(jsmith)
Resolution: WORKSFORME → DUPLICATE
I would like to reopen this bug due to the comment 57 of Bug 1001897. Since QC reproduced this in 1.4 branch, I'll try to reproduce it in 1.4 using Nexus 5 first.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
No luck to reproduce the problem yet, keep running the test.
We're seeing this in our master/mozilla-central automation on Flame: https://crash-stats.mozilla.com/report/index/99c332e9-69a7-4a95-b985-531492140624 - for those in Taipei (QA, at least) who can see this, here it is: http://selenium.qa.mtv2.mozilla.com:8080/job/b2g.flame.mozilla-central.unittests/177/HTML_Report/ 

Vincent/TPE QA, can you try running the Python unittests on Flame?  Since we can also reproduce locally, is there anything we can do to help you debug this further?
Stephen - How often are we seeing our wifi test fail due to this crash?
Flags: needinfo?(stephen.donner)
As far as I can tell wifi tests fail a majority due to the white screen of death and not crashing. Looking at the handful of builds, it seems like they started crashing in b2g.flame.mozilla-central.unittests (http://selenium.qa.mtv2.mozilla.com:8080/job/b2g.flame.mozilla-central.unittests) since build 167. Build 163 was the first build to use the same gaia/gecko as 167. So looking at builds, 163-177, there were 12 builds that passed completely (35 tests), 1 that didn't run as it was already crashed, and the crashes were the following:

#167 26 pass then crash 9 fail
#170 19 pass then crash 16 fail
#177 10 pass then crash 25 fail

It's crashing on different tests each time, and crashed on 20% of the 15 builds. They seem random so I treated it as a geometric distribution where each test run in a build is independent and estimated that every time a test runs it's a 3% chance of failing. Though since it's only a sample size of 15 take it with a grain of salt.

The gaia/gecko between builds 163 and 170 are the same, and I'm having trouble discerning the changes made between them. Though build 162 did have several FTU code changes.
Flags: needinfo?(stephen.donner)
A same crash described and identified on Bug 1025414. It seems like a JB/KK gonk bug.
According to comment 24.
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Component: Wifi → Vendcom
Resolution: --- → FIXED
Sorry, I want to mark this bug as a duplicate of 1025414.
Resolution: FIXED → DUPLICATE
A solution to this issue other than modifying wifi.c is to modify the template wpa_supplicant.conf which is typically located in system/etc/wifi/wpa_supplicant.conf: 

Change ctrl_interface=wlan0 to ctrl_interface=/data/misc/wifi/sockets
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: