Closed Bug 841041 Opened 10 years ago Closed 10 years ago

[B2G][OTA] Ridiculously janky / unresponsive behavior after OTA update (making it hard to even unlock the phone)

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(blocking-b2g:tef+, firefox19 wontfix, firefox20 wontfix, firefox21 fixed, b2g18 fixed, b2g18-v1.0.0 wontfix, b2g18-v1.0.1 fixed)

VERIFIED FIXED
B2G C4 (2jan on)
blocking-b2g tef+
Tracking Status
firefox19 --- wontfix
firefox20 --- wontfix
firefox21 --- fixed
b2g18 --- fixed
b2g18-v1.0.0 --- wontfix
b2g18-v1.0.1 --- fixed

People

(Reporter: nkot, Assigned: dhylands)

References

Details

(Keywords: smoketest)

Attachments

(5 files, 1 obsolete file)

Attached file log file
Description:
Homescreen does not display after OTA update without having to restart device

Repro Steps:
1) Go to Settings => Device information => Software updates => Check now
2) Install new System Update 2013-02-13-071150
3) Wait for the homescreen to appear or press power button if display went off

Expected:
*Locked Homescreen displays, user is able to unlock and use device

Actual:
*Homecreen appears black - see screenshot
*Homescreen displays successfully after restart

Repro frequency:
100%
3/3 devices

*screenshot attached
*log file attached

Notes:
Updated from:
Gecko:70c8f2cf813626e8c7b0f89676e1a62fe4ddfcae
Gaia:ecca2ee860825547d5e1109436b50b74dfe9261e
Build ID:20130212070205
Attached image screenshot
That sounds bad. Can you consistently reproduce?
blocking-b2g: --- → tef?
Component: Gaia → Gaia::Homescreen
Something most definitely blew up here.

Weird logcat errors:

02-13 08:46:05.212: I/Gecko(108): ###!!! [Parent][AsyncChannel] Error: Channel error: cannot send/recv

02-13 08:46:11.508: E/GeckoConsole(20200): [JavaScript Error: "formatURLPref: Couldn't get pref: app.update.url.details" {file: "jar:file:///system/b2g/omni.ja!/components/nsURLFormatter.js" line: 126}]
Marshall - Any ideas?
Flags: needinfo?(marshall)
blocking-b2g: tef? → ---
I ran into this today.  :overholt did too.
Actually maybe I ran into a related but different issue.  In my case the screen stayed entirely blank, then I got the boot image for quite a while, then back to entirely blank.
I ran into this yesterday and today.  I have to pull the battery to get anything other than the "hardware" buttons to light up.
blocking-b2g: --- → tef?
Component: Gaia::Homescreen → General
Gah, should probably be shira?
blocking-b2g: tef? → shira?
(In reply to Andrew Overholt [:overholt] from comment #8)
> Gah, should probably be shira?

Why shira? Isn't this going to impact tef?
I've applied the update locally, and I also see "app.update.url.details" errors, but I'm able to get into the phone -- albeit _very_ slowly.

When the screen is off, pressing the power button to turn it back on takes on the order of ~15-20 seconds on my device, and then almost immediately goes back off.

If you tap the screen some while after you press the power button, you can avoid the timeout, but then you have to patiently drag the lock drawer up, and press the unlock button while the device remains extremely unresponsive.

Doing a quick top, I noticed that the b2g process is eating between 97-98%!
 2577  0  98% S    36 163016K  53780K  fg root     /system/b2g/b2g

Looking into why that might be..
Flags: needinfo?(marshall)
(In reply to Jason Smith [:jsmith] from comment #9)
> (In reply to Andrew Overholt [:overholt] from comment #8)
> > Gah, should probably be shira?
> 
> Why shira? Isn't this going to impact tef?

Agreed
Assignee: nobody → marshall
blocking-b2g: shira? → tef?
(See also bug 841517, which might be a dupe of this bug.)
So this has basically bricked the device for me.  Restarting doesn't help, I just get a black screen.  I can barely get the lockscreen to display.
What does:

top -t -m 5

show? (from an adb shell)
adb doesn't even see the device, for me at least.  (adb shell says "error: device not found")

(I have udev rules set up, and I ran adb w/ sudo for good measure - so I'm not getting blocked from seeing the device due to lack-of-privileges)
It showed /system/b2g/b2g at 96%.  

I ended up kill that process and after it restart the phone is responsive again.  I'm not sure why yanking the battery out previously didn't fix it, unless I just gave it more time to finish whatever it was trying to do.
(In reply to Daniel Holbert [:dholbert] from comment #15)
> adb doesn't even see the device, for me at least.  (adb shell says "error:
> device not found")
> 
> (I have udev rules set up, and I ran adb w/ sudo for good measure - so I'm
> not getting blocked from seeing the device due to lack-of-privileges)

I ran into this at first, I think it was due to the device being so behind/bogged down that the daemon wasn't running (yet).
(In reply to Lucas Adamski from comment #16)
> It showed /system/b2g/b2g at 96%.  

I was hoping to see the whole line, so we could tell which thread was consuming the CPU (top -t shows individual threads).
(In reply to Daniel Holbert [:dholbert] from comment #15)
> adb doesn't even see the device, for me at least.  (adb shell says "error:
> device not found")
> 
> (I have udev rules set up, and I ran adb w/ sudo for good measure - so I'm
> not getting blocked from seeing the device due to lack-of-privileges)

adb might be disabled. It is by default for dogfooding. You can enable it by enabling:

Settings->Device Information->More Information->Developer->Remote Debugging
(In reply to Dave Hylands [:dhylands] from comment #18)
> 
> I was hoping to see the whole line, so we could tell which thread was
> consuming the CPU (top -t shows individual threads).

Sorry, I'd closed the window by the time I saw your question. :(
Spoke too soon.  Checked for updates, applied it, now stuck again:

User 83%, System 16%, IOW 0%, IRQ 0%
User 272 + Nice 0 + Sys 54 + Idle 0 + IOW 0 + IRQ 0 + SIRQ 0 = 326

  PID   TID PR CPU% S     VSS     RSS PCY UID      Thread          Proc
  526   550  0  24% R 157880K  52444K  fg root     DOM Worker      /system/b2g
  526   549  0  24% R 157880K  52444K  fg root     DOM Worker      /system/b2g
  526   555  0  24% R 157880K  52444K  fg root     DOM Worker      /system/b2g
  526   546  0  24% R 157880K  52444K  fg root     DOM Worker      /system/b2g
  617   617  0   1% R   1088K    444K  fg root     top             top
So yeah dholbert is seeing the same thing. 4 DOM Workers consuming most of the CPU.
And they're all in the main process. I don't know what the DOM Workers do.
I tried flashing https://pvtbuilds.mozilla.org/pub/mozilla.org/b2g/nightly/mozilla-b2g18-unagi/2013/02/2013-02-12-07-02-05/ and then OTA updating to my local built version and it didn't reproduce (my locally built version is a v1-train)

I reflashed and OTA updated as in comment 23 and it reproduced (so not a one off).
I was able to get into gdb and get some back traces.

I'm not sure of the validity of the symbols since the image that was being wasn't from the tree I was in.
[clarifying summary]
Summary: [B2G][OTA] Homescreen fails to display after OTA update → [B2G][OTA] Ridiculously janky / unresponsive behavior after OTA update (making it hard to even unlock the phone)
(In reply to Dave Hylands [:dhylands] from comment #25)
> I'm not sure of the validity of the symbols since the image that was being
> wasn't from the tree I was in.

Yeah... The traces don't look right to me.
Do you guys need more information to debug this issue?  i just reproduced these symptoms myself when updating to:

Gecko  http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/d1288313218e
Gaia   6544fdb8dddc56f1aefe94482402488c89eeec49
BuildID 20130214070203
Version 18.0
For what its worth, if i pull battery and reboot, I can recover the device back into a usable state.  Just fodder for triage drivers under consideration.
blocking-b2g: tef? → tef+
I was able to reproduce using a local build.

STR:
1 - Modify gecko/toolkit/content/UpdateChannel.sh near the end to override the channel:

    channel = "foobar";
    return channel;

2 - build.

3 - Create an update
    ./build.sh gecko-update-full

4 - Setup the phone to use the update
    tools/update-tools/test-update.py ${GECKO_OBJDIR}/dist/b2g-update/b2g-gecko-update.mar

5 - Do a Check Now and then do the update

This message:

AUS:SVC UpdateManager:get activeUpdate - channel has changed, reloading default preferences to workaround bug 802022

from here:
https://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/nsUpdateService.js#2795

seems to be the key. I'm hypothesising that the reload-default-prefs is the actual trigger.
Attachment #714262 - Attachment is obsolete: true
I let the process run for a bit longer and grabbed another set of backtraces
I picked one of the DOM Workers and just hit n over and over in gdb.

It wound up doing these 4 lines continuously:

3428	#endif
(gdb) 

3408	    WorkerRunnable* event;
(gdb) 
3410	      MutexAutoLock lock(mMutex);
(gdb) 
3412	      while (!mControlQueue.Pop(event) && !syncQueue->mQueue.Pop(event)) {
(In reply to Tony Chung [:tchung] from comment #29)
> Do you guys need more information to debug this issue?

I can reproduce at will now, so now it's mostly just trying to figure out whats going on.
Attached patch patchSplinter Review
Assignee: marshall → anygregor
Attachment #714586 - Flags: review?(bent.mozilla)
Blocks: 828887
Attachment #714586 - Flags: review?(bent.mozilla) → review+
thx jdm!
https://hg.mozilla.org/integration/mozilla-inbound/rev/424de8168602

dhylands mentions that the bug is not completely gone.
Assignee: anygregor → dhylands
Whiteboard: leave-open
Blocks: 841962
I filed bug 841962 to followup on this problem and removed the leave-open on this bug.

At least with the patch applied in this bug, the phone now just takes alot longer to bootup after an update-channel change, but it seems to perform ok.
Whiteboard: leave-open
https://hg.mozilla.org/mozilla-central/rev/424de8168602
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Duplicate of this bug: 842222
Blocks: 847511
Verifying fix on V1-train branch - OTA goes smoothly

tested with the following:
1. manually flashed to Unagi build 2013-03-20-070206
Gecko  http://hg.mozilla.org/releases/mozilla-b2g18/rev/778da49486f0
Gaia   6c3767c2dea43b5e9aff7d156d36d69649005621

2. revertNightly 

3. OTA to Build 2013-03-21-070203
Gecko  http://hg.mozilla.org/releases/mozilla-b2g18/rev/7508c5a1026b
Gaia   7af427d35c4d557c75b2060022815f07851acc28

Issue seems still to be occurring when OTA from V.1.0.1 builds but there are other bugs to cover that, please refer to bugs 847511 and 842932 (note that switching update channels takes place here)
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.