Closed
Bug 1179219
Opened 9 years ago
Closed 9 years ago
Phone randomly crashing and keeping a crashed state ; trigger a QHSUSB__BULK peripheral on a computer
Categories
(Firefox OS Graveyard :: GonkIntegration, defect)
Tracking
(b2g-v2.5 affected, b2g-master affected)
RESOLVED
WONTFIX
People
(Reporter: clement.lefevre, Unassigned)
References
Details
(Keywords: crash, foxfood, regression)
Attachments
(5 files)
On Flame, on master, the phone currently can randomly crash. When the crash occur, the screen remains black if the phone was sleeping, or fully white if the screen was on. Next message will contain a video of the screen suddenly becoming white when it happens. It's not even possible to restart the phone with the button, it's needed to get off the battery and then get it in again to be able to restart. Moreover, if the phone is connected to a computer when it happens and you're notified for connected USB devices, while it usually is named "Firefox OS", it becomes named "QHSUSB__BULK" and is permanently disconnecting and connecting, as you will see in second joined video. I tried to gather logcat while the crash happen, but it seems to show nothing useful, adb is just stopping when it happens. Same about /proc/kmsg While the crash was very occasional in some situations, at my current work where network reception is *very* bad (switching between no network and 0/1 bars, sometimes 4/5, but rarely), the crash is happening very often: several times every day. This is a real problem when the phone is used for everyday usage. Going into or coming from metro can make this crash trigger too. Informations of my device on nightly build: Build ID 20150619010205 Build Type user Gaia Revision a0df9c367a68764bdcf2e2e1c4d27f0d6ee165b8 Gaia Date 2015-06-18 18:49:14 Gecko Revision https://hg.mozilla.org/mozilla-central/rev/2694ff2ace6a Gecko Version 41.0a1 Device ID flame Firmware(Release) 4.4.2 Firmware(Incremental) eng.cltbld.20150212.043653 Firmware Date Thu Feb 12 04:37:04 EST 2015 Bootloader L1TC000118D0 From what I could hear, it looks like someone using 2.1 builds is having the same issue, can ask if he can add something, if relevant.
Reporter | ||
Comment 1•9 years ago
|
||
Reporter | ||
Comment 2•9 years ago
|
||
Comment 3•9 years ago
|
||
Even we don't get exactly the same USB devices than in bug 1122119, we're facing a kernel panic. Gabriele, do you have an idea of a race condition in the kernel which might be related to low cell coverage? I'm not sure about 2.1 getting this issue when you have low cell coverage. I had dogfooded this version until last month, took multiple time the parisian subway where I got mostly no coverage, and I've never encountered this bug.
Flags: needinfo?(gsvelto)
Updated•9 years ago
|
Keywords: qawanted → steps-wanted
Comment 4•9 years ago
|
||
Interesting. Is it possible this is linked to the crash during call bug we had on Open C ? I know that the buggy code is not limited to the Open C's modem so maybe ... Sadly we have not been able to get last_kmsg working (see bug 1025265)
Flags: needinfo?(clement.lefevre)
Comment 5•9 years ago
|
||
I see random crashes too, and phone restarts when pressing powerbutton for ~10 sec. I can usually still connect with adb shell and do a restart. But can't confirm right now as I can't reproduce. It startet roughly the same time I experienced issues with the battery (bug 1178869)
Reporter | ||
Comment 6•9 years ago
|
||
(In reply to Alexandre LISSY :gerard-majax from comment #4) > Interesting. Is it possible this is linked to the crash during call bug we > had on Open C ? I know that the buggy code is not limited to the Open C's > modem so maybe ... > > Sadly we have not been able to get last_kmsg working (see bug 1025265) I don't know if this can be related: The described bug here is happening while no actions at all are performed on the phone. Most of the time it occurs while the phone is sleeping, here I disabled screen sleeping as a choice for this video to see it happening and show it. It never happened during a phone call as far as I can remember. (In reply to Mark Trompell from comment #5) > I see random crashes too, and phone restarts when pressing powerbutton for > ~10 sec. > I can usually still connect with adb shell and do a restart. But can't > confirm right now as I can't reproduce. > It startet roughly the same time I experienced issues with the battery (bug > 1178869) I'll try next time to press the powerbutton for ~10s but I don't think it will work: in my case, I was having adb logcat running while waiting for the crash. When the crash happen, adb logcat is stopping instantly, and then any adb command to the phone is failing Alexandre's guess was that the modem was going into a special state. I don't know how to check this though as kmsg and logcat were giving pretty much no informations about this.
Flags: needinfo?(clement.lefevre)
Comment 7•9 years ago
|
||
(In reply to Clément Lefèvre from comment #6) > (In reply to Alexandre LISSY :gerard-majax from comment #4) > > Interesting. Is it possible this is linked to the crash during call bug we > > had on Open C ? I know that the buggy code is not limited to the Open C's > > modem so maybe ... > > > > Sadly we have not been able to get last_kmsg working (see bug 1025265) > > I don't know if this can be related: The described bug here is happening > while no actions at all are performed on the phone. Most of the time it > occurs while the phone is sleeping, here I disabled screen sleeping as a > choice for this video to see it happening and show it. It never ever occurs with the screenlock disabled ? If yes, then I'd tend to agree that it's totally unrelated and it may be exposing issues around cgroups like what Gabriele already had fun to play with :) > > It never happened during a phone call as far as I can remember. > > (In reply to Mark Trompell from comment #5) > > I see random crashes too, and phone restarts when pressing powerbutton for > > ~10 sec. > > I can usually still connect with adb shell and do a restart. But can't > > confirm right now as I can't reproduce. > > It startet roughly the same time I experienced issues with the battery (bug > > 1178869) > > I'll try next time to press the powerbutton for ~10s but I don't think it > will work: in my case, I was having adb logcat running while waiting for the > crash. When the crash happen, adb logcat is stopping instantly, and then any > adb command to the phone is failing > > Alexandre's guess was that the modem was going into a special state. I don't > know how to check this though as kmsg and logcat were giving pretty much no > informations about this. That's the fun part: I have no idea how we can expose this.
Flags: needinfo?(clement.lefevre)
Reporter | ||
Comment 8•9 years ago
|
||
(In reply to Alexandre LISSY :gerard-majax from comment #7) > > It never ever occurs with the screenlock disabled ? If yes, then I'd tend to > agree that it's totally unrelated and it may be exposing issues around > cgroups like what Gabriele already had fun to play with :) > > That's the fun part: I have no idea how we can expose this. Yes, I tried with disabled screenlock, and it's reproducing too.
Flags: needinfo?(clement.lefevre)
Comment 9•9 years ago
|
||
(In reply to Clément Lefèvre from comment #8) > (In reply to Alexandre LISSY :gerard-majax from comment #7) > > > > It never ever occurs with the screenlock disabled ? If yes, then I'd tend to > > agree that it's totally unrelated and it may be exposing issues around > > cgroups like what Gabriele already had fun to play with :) > > > > That's the fun part: I have no idea how we can expose this. > > Yes, I tried with disabled screenlock, and it's reproducing too. Then there's still a chance it's the same bug as on Open C's modem, even if the triggering conditions looks different: hardware is close.
Comment 10•9 years ago
|
||
In order to help to find what factors might help to repro, can you give more infomation about: * Wi-Fi, have you ever been able to repro with Wi-Fi off? * USB, same question with a device not plugged at all. * Do you have an SD card in your device? Thanks!
Flags: needinfo?(clement.lefevre)
Reporter | ||
Comment 11•9 years ago
|
||
(In reply to Johan Lorenzo [:jlorenzo] (QA) from comment #10) > In order to help to find what factors might help to repro, can you give more > infomation about: > * Wi-Fi, have you ever been able to repro with Wi-Fi off? > * USB, same question with a device not plugged at all. > * Do you have an SD card in your device? > > Thanks! So, after testing for missing informations: this device never contained any SD card. The crash is happening whether the Wifi is activated or not and whether the phone is plugged into a computer or not.
Flags: needinfo?(clement.lefevre)
Comment 12•9 years ago
|
||
I noticed in the past, maybe two or three times, that when I tried turning on my phone after it being on standby for a while the phone just didn't turn on at all and I was forced to pull the battery to have it working again. IIRC keeping the power button pushed for a long time also forced the phone to reboot. I've never bothered to check what caused this before this but report but if it happens again I'll double-check it. In my case this is happening on my main phone which is a Flame I use for dogfooding with both SIMs populated, running the latest base build plus the current master.
Flags: needinfo?(gsvelto)
Updated•9 years ago
|
status-b2g-master:
--- → affected
status-firefox-esr31:
--- → wontfix
Reporter | ||
Comment 13•9 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #12) > I noticed in the past, maybe two or three times, that when I tried turning > on my phone after it being on standby for a while the phone just didn't turn > on at all and I was forced to pull the battery to have it working again. > IIRC keeping the power button pushed for a long time also forced the phone > to reboot. I've never bothered to check what caused this before this but > report but if it happens again I'll double-check it. In my case this is > happening on my main phone which is a Flame I use for dogfooding with both > SIMs populated, running the latest base build plus the current master. Looking at how frequently I'm reproducing it at my work place on this Flame, it looks like very bad network conditions/reception, moving easily between no network, a bad quality and a good quality. A guess can be that, as I mainly reproduce this in a building with bad reception in a city like Paris, but as there are several cells around, all with random reception quality because of the building, the phone jump from cell to cell and this is causing some issues. This is still just a random guess, Gabriele, maybe have you an idea about this, or a way to check it? Or I'm just mistaking and this have nothing to do with it.
Reporter | ||
Updated•9 years ago
|
status-b2g-master:
--- → affected
status-firefox-esr31:
--- → wontfix
Updated•9 years ago
|
status-firefox-esr31:
wontfix → ---
Comment 14•9 years ago
|
||
(In reply to Clément Lefèvre from comment #13) [...] > > Looking at how frequently I'm reproducing it at my work place on this Flame, > it looks like very bad network conditions/reception, moving easily between > no network, a bad quality and a good quality. > A guess can be that, as I mainly reproduce this in a building with bad > reception in a city like Paris, but as there are several cells around, all > with random reception quality because of the building, the phone jump from > cell to cell and this is causing some issues. > Sadly, I do have a Flame that I carry with me and that never ever reproduce such kind of behavior. Yet I'm often in high speed train, where network connectivity is bad as you describe. That does not means it's not what triggers it in your case.
Comment 15•9 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #12) > I noticed in the past, maybe two or three times, that when I tried turning > on my phone after it being on standby for a while the phone just didn't turn > on at all and I was forced to pull the battery to have it working again. > IIRC keeping the power button pushed for a long time also forced the phone > to reboot. I've never bothered to check what caused this before this but > report but if it happens again I'll double-check it. In my case this is > happening on my main phone which is a Flame I use for dogfooding with both > SIMs populated, running the latest base build plus the current master. Are you still able to connect with adb when that happens and reboot from there, like I am?
Comment 16•9 years ago
|
||
Gathering a RIL log might be helpful if this is network-related, it can be done from the prefs, see here: https://wiki.mozilla.org/B2G/QA/Tips_And_Tricks#RIL_Debugging
Flags: needinfo?(gsvelto)
Reporter | ||
Comment 17•9 years ago
|
||
These logs are those from just before the crash happen with RIL debug activated graphically as described in the wiki webpage.
Reporter | ||
Comment 18•9 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #16) > Gathering a RIL log might be helpful if this is network-related, it can be > done from the prefs, see here: > > https://wiki.mozilla.org/B2G/QA/Tips_And_Tricks#RIL_Debugging These logs are those from just before the crash, using the procedure described in the wiki webpage by modifying config files. NI you Gabriele, if you can see something useful in those logs…?
Flags: needinfo?(gsvelto)
Comment 19•9 years ago
|
||
I can't see anything that should crash the phone but I'm not super-familiar with the RIL's internals. Hsin-Yi, can you see something suspicious in those logs? What are the chances of something network-related to be able to take down the phone like this?
Flags: needinfo?(gsvelto) → needinfo?(htsai)
Comment 20•9 years ago
|
||
I encountered the same behavior as Clément at work where network reception is *very* bad too. Since i changed the date & time setting from automatic to manual, i don't have the issue anymore (since 2 weeks). May it be related to bug 1110010 ?
Comment 21•9 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #19) > I can't see anything that should crash the phone but I'm not super-familiar > with the RIL's internals. Hsin-Yi, can you see something suspicious in those > logs? What are the chances of something network-related to be able to take > down the phone like this? Let me ask for Edgar's help :P
Flags: needinfo?(htsai) → needinfo?(echen)
Comment 22•9 years ago
|
||
Both log shows the device lost network and receives a NITZ update just before crash. And after receiving a NITZ update, the system time is set back a bit. log_RIL_graphical_debug.txt: 07-03 13:08:43.802 -> 07-03 13:08:43.055 log_RIL_config_files_debug.txt: 07-06 15:03:53.869 -> 07-06 15:03:53.051 Not sure if these behaviour has any connection with the crash? log_RIL_graphical_debug.txt: > 07-03 13:07:55.241 209 636 I Gecko : RIL Worker: [0] Handling parcel as UNSOLICITED_NITZ_TIME_RECEIVED > 07-03 13:07:55.241 209 636 I Gecko : RIL Worker: [0] DateTimeZone string 15/07/03,11:07:55+08,00 > 07-03 13:08:43.752 209 636 I Gecko : RIL Worker: [0] Handling parcel as UNSOLICITED_NITZ_TIME_RECEIVED > 07-03 13:08:43.752 209 636 I Gecko : RIL Worker: [0] DateTimeZone string 15/07/03,11:08:43+08,00 > 07-03 13:08:43.772 209 636 I Gecko : RIL Worker: [0] Handling parcel as REQUEST_VOICE_REGISTRATION_STATE > 07-03 13:08:43.772 209 636 I Gecko : RIL Worker: [0] Received voiceRegistrationState network info. > 07-03 13:08:43.782 209 636 I Gecko : RIL Worker: [0] Still missing some more network info, not notifying main thread. > 07-03 13:08:43.782 209 636 I Gecko : RIL Worker: [0] voice registration state: 12,,,0,,,,0,,,,,,0, > 07-03 13:08:43.782 209 636 I Gecko : RIL Worker: [0] Queuing voiceRegistrationState network info message: {"regState":12,"state":"searching","connected":false,"roaming":false,"emergencyCallsOnly":true,"cell":{"gsmLocationAreaCode":-1,"gsmCellId":-1},"radioTech":0,"type":null,"rilMessageType":"voiceregistrationstatechange"} > 07-03 13:08:43.055 209 209 I GeckoDump: [system] [TimeCore][1209905.923] handling moztimechange log_RIL_config_files_debug.txt: > 07-06 15:03:53.819 211 625 I Gecko : RIL Worker: [0] Handling parcel as UNSOLICITED_NITZ_TIME_RECEIVED > 07-06 15:03:53.829 211 625 I Gecko : RIL Worker: [0] DateTimeZone string 15/07/06,13:03:53+08,00 > 07-06 15:03:53.051 211 211 I GeckoDump: [system] [TimeCore][1476015.942] handling moztimechange > 07-06 15:03:53.051 211 625 I Gecko : RIL Worker: [0] Handling parcel as REQUEST_VOICE_REGISTRATION_STATE > 07-06 15:03:53.051 211 625 I Gecko : RIL Worker: [0] Received voiceRegistrationState network info. > 07-06 15:03:53.051 211 625 I Gecko : RIL Worker: [0] Still missing some more network info, not notifying main thread. > 07-06 15:03:53.051 211 625 I Gecko : RIL Worker: [0] voice registration state: 12,,,0,,,,0,,,,,,0, > 07-06 15:03:53.051 211 625 I Gecko : RIL Worker: [0] Queuing voiceRegistrationState network info message: {"regState":12,"state":"searching","connected":false,"roaming":false,"emergencyCallsOnly":true,"cell":{"gsmLocationAreaCode":-1,"gsmCellId":-1},"radioTech":0,"type":null,"rilMessageType":"voiceregistrationstatechange"}
Flags: needinfo?(echen)
Reporter | ||
Comment 23•9 years ago
|
||
(In reply to Mark Trompell from comment #5) > I see random crashes too, and phone restarts when pressing powerbutton for > ~10 sec. > I can usually still connect with adb shell and do a restart. But can't > confirm right now as I can't reproduce. > It startet roughly the same time I experienced issues with the battery (bug > 1178869) So, I double checked about this and: - Yes, pressing the powerbutton for ~10-15s is restarting the phone (maybe not exactly the same way as by getting off the battery though: I felt like it needed more time to recover the hour from network. But I was maybe just unlucky). - No, it doesn't allow me to connect to the phone with adb, especially as the phone is presenting/exposed as a different state and device connected to the computer.
Comment 24•9 years ago
|
||
I see different behaviour, so I created Bug 1179679 I can still connect via adb and the issue is independant of the date-time setting. It may still be the same root cause. Maybe even connected to the battery issues some experience. Like a process going wild and oom kills the wrong one.
Comment 25•9 years ago
|
||
Now that bug 1154072 is done, can you give a try to this new base image? Make sure you backup your data of course :)
Flags: needinfo?(clement.lefevre)
Reporter | ||
Comment 26•9 years ago
|
||
(In reply to Alexandre LISSY :gerard-majax from comment #25) > Now that bug 1154072 is done, can you give a try to this new base image? > Make sure you backup your data of course :) So, after flashing the new v18D v3, it appears that the crash is still quickly happening in the same conditions. Here is attached a new logcat gathered with this new base image, in case it would offer more informations.
Flags: needinfo?(clement.lefevre)
Updated•9 years ago
|
QA Whiteboard: [foxfood-triage]
Reporter | ||
Comment 27•9 years ago
|
||
I could notice recently, what is probably a side effect: if a recurrent alarm is set, after one of those crashes, the alarm is still set if you go to check, but will not ring anymore unless you unset it and set re-set it. This is the current most common behavior. Some times ago, I noticed a ring on reboot, and then no ring anymore too. It looks anyway like it does have side effects on alarms.
Comment 29•9 years ago
|
||
Gregor, Can you please comment on the nomination, and block accordingly.In triage session we do not have the right audience to block this.
Flags: needinfo?(anygregor)
Comment 30•9 years ago
|
||
I don't know this part of the code. We need Hsin-Yi to weight in about how bad this random crash is.
Flags: needinfo?(anygregor) → needinfo?(htsai)
Comment 31•9 years ago
|
||
(In reply to Gregor Wagner [:gwagner] from comment #30) > I don't know this part of the code. We need Hsin-Yi to weight in about how > bad this random crash is. I couldn't see obvious messages about crash from RIL. Couldn't provide more comments.
Flags: needinfo?(htsai)
Comment 32•9 years ago
|
||
As Hsin-Yi can't find anything. Removing the nomination. Please re nominate if it is reproduced. Thanks
blocking-b2g: 2.5? → ---
Updated•9 years ago
|
status-b2g-v2.5:
--- → affected
Comment 33•9 years ago
|
||
Wontfix as un-reproducible, feel free to provide a better scenario
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
Comment 34•9 years ago
|
||
This is kernel panic and we had a bug about it awhile ago (bug 1130035). At first we were able to reliably reproduce while visiting the website mentioned in that bug, then we were no longer able to repro. I still see this from time to time on Flame and doubt that we will ever get any steps closer to what we had in bug 1130035.
Flags: needinfo?(jmercado)
Keywords: steps-wanted
Updated•9 years ago
|
Flags: needinfo?(jmercado)
You need to log in
before you can comment on or make changes to this bug.
Description
•