Closed Bug 841041 Opened 13 years ago Closed 13 years ago

[B2G][OTA] Ridiculously janky / unresponsive behavior after OTA update (making it hard to even unlock the phone)

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(blocking-b2g:tef+, firefox19 wontfix, firefox20 wontfix, firefox21 fixed, b2g18 fixed, b2g18-v1.0.0 wontfix, b2g18-v1.0.1 fixed)

VERIFIED FIXED
B2G C4 (2jan on)
blocking-b2g tef+
Tracking Status
firefox19 --- wontfix
firefox20 --- wontfix
firefox21 --- fixed
b2g18 --- fixed
b2g18-v1.0.0 --- wontfix
b2g18-v1.0.1 --- fixed

People

(Reporter: nkot, Assigned: dhylands)

References

Details

(Keywords: smoketest)

Attachments

(5 files, 1 obsolete file)

Attached file log file
Description: Homescreen does not display after OTA update without having to restart device Repro Steps: 1) Go to Settings => Device information => Software updates => Check now 2) Install new System Update 2013-02-13-071150 3) Wait for the homescreen to appear or press power button if display went off Expected: *Locked Homescreen displays, user is able to unlock and use device Actual: *Homecreen appears black - see screenshot *Homescreen displays successfully after restart Repro frequency: 100% 3/3 devices *screenshot attached *log file attached Notes: Updated from: Gecko:70c8f2cf813626e8c7b0f89676e1a62fe4ddfcae Gaia:ecca2ee860825547d5e1109436b50b74dfe9261e Build ID:20130212070205
Attached image screenshot
That sounds bad. Can you consistently reproduce?
blocking-b2g: --- → tef?
Component: Gaia → Gaia::Homescreen
Something most definitely blew up here. Weird logcat errors: 02-13 08:46:05.212: I/Gecko(108): ###!!! [Parent][AsyncChannel] Error: Channel error: cannot send/recv 02-13 08:46:11.508: E/GeckoConsole(20200): [JavaScript Error: "formatURLPref: Couldn't get pref: app.update.url.details" {file: "jar:file:///system/b2g/omni.ja!/components/nsURLFormatter.js" line: 126}]
Marshall - Any ideas?
Flags: needinfo?(marshall)
blocking-b2g: tef? → ---
I ran into this today. :overholt did too.
Actually maybe I ran into a related but different issue. In my case the screen stayed entirely blank, then I got the boot image for quite a while, then back to entirely blank.
I ran into this yesterday and today. I have to pull the battery to get anything other than the "hardware" buttons to light up.
blocking-b2g: --- → tef?
Component: Gaia::Homescreen → General
Gah, should probably be shira?
blocking-b2g: tef? → shira?
(In reply to Andrew Overholt [:overholt] from comment #8) > Gah, should probably be shira? Why shira? Isn't this going to impact tef?
I've applied the update locally, and I also see "app.update.url.details" errors, but I'm able to get into the phone -- albeit _very_ slowly. When the screen is off, pressing the power button to turn it back on takes on the order of ~15-20 seconds on my device, and then almost immediately goes back off. If you tap the screen some while after you press the power button, you can avoid the timeout, but then you have to patiently drag the lock drawer up, and press the unlock button while the device remains extremely unresponsive. Doing a quick top, I noticed that the b2g process is eating between 97-98%! 2577 0 98% S 36 163016K 53780K fg root /system/b2g/b2g Looking into why that might be..
Flags: needinfo?(marshall)
(In reply to Jason Smith [:jsmith] from comment #9) > (In reply to Andrew Overholt [:overholt] from comment #8) > > Gah, should probably be shira? > > Why shira? Isn't this going to impact tef? Agreed
Assignee: nobody → marshall
blocking-b2g: shira? → tef?
(See also bug 841517, which might be a dupe of this bug.)
So this has basically bricked the device for me. Restarting doesn't help, I just get a black screen. I can barely get the lockscreen to display.
What does: top -t -m 5 show? (from an adb shell)
adb doesn't even see the device, for me at least. (adb shell says "error: device not found") (I have udev rules set up, and I ran adb w/ sudo for good measure - so I'm not getting blocked from seeing the device due to lack-of-privileges)
It showed /system/b2g/b2g at 96%. I ended up kill that process and after it restart the phone is responsive again. I'm not sure why yanking the battery out previously didn't fix it, unless I just gave it more time to finish whatever it was trying to do.
(In reply to Daniel Holbert [:dholbert] from comment #15) > adb doesn't even see the device, for me at least. (adb shell says "error: > device not found") > > (I have udev rules set up, and I ran adb w/ sudo for good measure - so I'm > not getting blocked from seeing the device due to lack-of-privileges) I ran into this at first, I think it was due to the device being so behind/bogged down that the daemon wasn't running (yet).
(In reply to Lucas Adamski from comment #16) > It showed /system/b2g/b2g at 96%. I was hoping to see the whole line, so we could tell which thread was consuming the CPU (top -t shows individual threads).
(In reply to Daniel Holbert [:dholbert] from comment #15) > adb doesn't even see the device, for me at least. (adb shell says "error: > device not found") > > (I have udev rules set up, and I ran adb w/ sudo for good measure - so I'm > not getting blocked from seeing the device due to lack-of-privileges) adb might be disabled. It is by default for dogfooding. You can enable it by enabling: Settings->Device Information->More Information->Developer->Remote Debugging
(In reply to Dave Hylands [:dhylands] from comment #18) > > I was hoping to see the whole line, so we could tell which thread was > consuming the CPU (top -t shows individual threads). Sorry, I'd closed the window by the time I saw your question. :(
Spoke too soon. Checked for updates, applied it, now stuck again: User 83%, System 16%, IOW 0%, IRQ 0% User 272 + Nice 0 + Sys 54 + Idle 0 + IOW 0 + IRQ 0 + SIRQ 0 = 326 PID TID PR CPU% S VSS RSS PCY UID Thread Proc 526 550 0 24% R 157880K 52444K fg root DOM Worker /system/b2g 526 549 0 24% R 157880K 52444K fg root DOM Worker /system/b2g 526 555 0 24% R 157880K 52444K fg root DOM Worker /system/b2g 526 546 0 24% R 157880K 52444K fg root DOM Worker /system/b2g 617 617 0 1% R 1088K 444K fg root top top
So yeah dholbert is seeing the same thing. 4 DOM Workers consuming most of the CPU. And they're all in the main process. I don't know what the DOM Workers do.
I tried flashing https://pvtbuilds.mozilla.org/pub/mozilla.org/b2g/nightly/mozilla-b2g18-unagi/2013/02/2013-02-12-07-02-05/ and then OTA updating to my local built version and it didn't reproduce (my locally built version is a v1-train) I reflashed and OTA updated as in comment 23 and it reproduced (so not a one off).
I was able to get into gdb and get some back traces. I'm not sure of the validity of the symbols since the image that was being wasn't from the tree I was in.
[clarifying summary]
Summary: [B2G][OTA] Homescreen fails to display after OTA update → [B2G][OTA] Ridiculously janky / unresponsive behavior after OTA update (making it hard to even unlock the phone)
(In reply to Dave Hylands [:dhylands] from comment #25) > I'm not sure of the validity of the symbols since the image that was being > wasn't from the tree I was in. Yeah... The traces don't look right to me.
Do you guys need more information to debug this issue? i just reproduced these symptoms myself when updating to: Gecko http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/d1288313218e Gaia 6544fdb8dddc56f1aefe94482402488c89eeec49 BuildID 20130214070203 Version 18.0
For what its worth, if i pull battery and reboot, I can recover the device back into a usable state. Just fodder for triage drivers under consideration.
blocking-b2g: tef? → tef+
I was able to reproduce using a local build. STR: 1 - Modify gecko/toolkit/content/UpdateChannel.sh near the end to override the channel: channel = "foobar"; return channel; 2 - build. 3 - Create an update ./build.sh gecko-update-full 4 - Setup the phone to use the update tools/update-tools/test-update.py ${GECKO_OBJDIR}/dist/b2g-update/b2g-gecko-update.mar 5 - Do a Check Now and then do the update This message: AUS:SVC UpdateManager:get activeUpdate - channel has changed, reloading default preferences to workaround bug 802022 from here: https://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/nsUpdateService.js#2795 seems to be the key. I'm hypothesising that the reload-default-prefs is the actual trigger.
Attachment #714262 - Attachment is obsolete: true
I let the process run for a bit longer and grabbed another set of backtraces
I picked one of the DOM Workers and just hit n over and over in gdb. It wound up doing these 4 lines continuously: 3428 #endif (gdb) 3408 WorkerRunnable* event; (gdb) 3410 MutexAutoLock lock(mMutex); (gdb) 3412 while (!mControlQueue.Pop(event) && !syncQueue->mQueue.Pop(event)) {
(In reply to Tony Chung [:tchung] from comment #29) > Do you guys need more information to debug this issue? I can reproduce at will now, so now it's mostly just trying to figure out whats going on.
Attached patch patchSplinter Review
Assignee: marshall → anygregor
Attachment #714586 - Flags: review?(bent.mozilla)
Blocks: 828887
Attachment #714586 - Flags: review?(bent.mozilla) → review+
thx jdm! https://hg.mozilla.org/integration/mozilla-inbound/rev/424de8168602 dhylands mentions that the bug is not completely gone.
Assignee: anygregor → dhylands
Whiteboard: leave-open
Blocks: 841962
I filed bug 841962 to followup on this problem and removed the leave-open on this bug. At least with the patch applied in this bug, the phone now just takes alot longer to bootup after an update-channel change, but it seems to perform ok.
Whiteboard: leave-open
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Blocks: 847511
Verifying fix on V1-train branch - OTA goes smoothly tested with the following: 1. manually flashed to Unagi build 2013-03-20-070206 Gecko http://hg.mozilla.org/releases/mozilla-b2g18/rev/778da49486f0 Gaia 6c3767c2dea43b5e9aff7d156d36d69649005621 2. revertNightly 3. OTA to Build 2013-03-21-070203 Gecko http://hg.mozilla.org/releases/mozilla-b2g18/rev/7508c5a1026b Gaia 7af427d35c4d557c75b2060022815f07851acc28 Issue seems still to be occurring when OTA from V.1.0.1 builds but there are other bugs to cover that, please refer to bugs 847511 and 842932 (note that switching update channels takes place here)
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: