Closed Bug 828731 Opened 11 years ago Closed 11 years ago

Updates don't restart properly require battery pull.

Categories

(Firefox OS Graveyard :: General, defect)

x86
macOS
defect
Not set
critical

Tracking

(blocking-b2g:tef+, blocking-basecamp:-, firefox19 wontfix, firefox20 wontfix, firefox21 fixed, b2g18+ fixed, b2g18-v1.0.0 fixed, b2g18-v1.0.1 fixed)

VERIFIED FIXED
B2G C4 (2jan on)
blocking-b2g tef+
blocking-basecamp -
Tracking Status
firefox19 --- wontfix
firefox20 --- wontfix
firefox21 --- fixed
b2g18 + fixed
b2g18-v1.0.0 --- fixed
b2g18-v1.0.1 --- fixed

People

(Reporter: cww, Assigned: dhylands)

References

Details

(Whiteboard: [triage:1/16])

Attachments

(7 files)

I started the latest B2G update (was on the 1/04 nightly, moving to 1/09). Battery was full (and possibly charging).

Started the update and it started downloading and I stopped paying attention (I may have started charging the phone). A short while later, I noticed the phone screen was black (off) but I couldn't turn it on. Holding the power button did nothing, tapping did nothing. Nada.

I had to pull the battery and re-insert it to get the phone to boot properly.

There's corroborating feedback "Upgrading the os just now required me to pull the battery."

Have not tried to reproduce yet.
I hit this just now. I was able to recover by holding the power button for 5 seconds, then pressing it again to reboot the phone.
blocking-basecamp: --- → ?
There's some feedback suggesting this is related to charging the phone and not updates:

"After recharging the phone overnight, the phone was hung. Pressing the power button didn't do anything. After unplugging the phone from the charger, the charging light remained solid green, further indicating that the phone was truly hung. I had to remove the battery to start the phone."
I filed this exact bug a while back - this has been happening for some time, but is not 100% reproducible. 

I remember something being said about the fact that the screen shutting off being a factor, but I would have to dig through the bugs.

Geo just did an OTA update with the phone in his hand in Haxxor and had no issues.
The related bug is bug 825598
should this be duped to bug 825598?   that one is already bb+'d
We don't think this is a dupe of bug 825598.

Without an STR we can't block at this time.

Marshall, can we put the 4th and 9th builds into your testing system and see if we can reproduce?
blocking-basecamp: ? → -
tracking-b2g18: --- → +
Flags: needinfo?(marshall)
I have seen this too a couple of times in the past. Workaround here is to wait until the decompressing step has been finished. It can take up to 5 minutes. Once that is done the power button will behave normally again.

As I was told yesterday this should be related to the cpu load the uncompressing step actually uses. There should be a bug around which will add support for `nice` and which should fix that.
(In reply to Andrew Overholt [:overholt] from comment #6)
> Marshall, can we put the 4th and 9th builds into your testing system and see
> if we can reproduce?

Sure thing, I'll give it a spin and see what happens

(In reply to Henrik Skupin (:whimboo) from comment #7)
> As I was told yesterday this should be related to the cpu load the
> uncompressing step actually uses. There should be a bug around which will
> add support for `nice` and which should fix that.

This is Bug 802423.
Flags: needinfo?(marshall)
(In reply to Marshall Culpepper [:marshall_law] from comment #8)
> (In reply to Andrew Overholt [:overholt] from comment #6)
> > Marshall, can we put the 4th and 9th builds into your testing system and see
> > if we can reproduce?
> 
> Sure thing, I'll give it a spin and see what happens

Did you get any results, Marshall?
Flags: needinfo?(marshall)
Can we leave this on the nomination list so that it stays on triage's radar? Not saying this isn't getting the attention it needs, but it's probably not explicitly blocking-basecamp- yet given the fact that multiple people just ran into this problem
blocking-basecamp: - → ?
bb- without an STR.  Marshall is working on another bug which fixes the two "brickers" he knows about.
blocking-basecamp: ? → -
Flags: needinfo?(marshall)
blocking-b2g: --- → tef?
blocking-b2g: tef? → tef+
Marshall seems to be on PTO, will anyone be able to take this?
Assignee: nobody → dhylands
I hit this again last night, during the 'uncompressing' phase of updating from 20130110 to 20130114 (beta builds) on unagi. Update started downloading, got to uncompressing, phone went to sleep. Couldn't wake it up by pushing buttons. Left it for a few minutes, tried again, and it eventually responded and turned the screen on again.
Whiteboard: [triage:1/16]
(In reply to Ralph Giles (:rillian) from comment #14)
> I hit this again last night, during the 'uncompressing' phase of updating
> from 20130110 to 20130114 (beta builds) on unagi. Update started
> downloading, got to uncompressing, phone went to sleep. Couldn't wake it up
> by pushing buttons. Left it for a few minutes, tried again, and it
> eventually responded and turned the screen on again.

That particular problem (phone being unresponsive during uncompressing) should be addressed by bug 802423
Ok, thanks Dave. I'll track the issue there and leave this one for actual hangs.
I had this problem just now(though this has been happening several times before). This was on the downloading phase. It says 'Downloading updates' but is frozen on the same screen (keeps showing 0.00 bytes downloaded and doesn't go beyond that). But there is a small triangle image animating that the download is in progress (on top right corner near Wifi symbol) - shows that some activity is going on.
The phone freezing part should go away once bug bug 802423 lands (currently waiting for review).

The download being stuck at zero shouldn't be happening if your image was newer than the 13th. If your image was newer than the 13th, then it would be really useful to get a logcat (but since you're updated, it's probably too late).
I am blocked by bug 832205; somehow I bricked my unagi phone with that build... not sure how.  I'll try to get it fixed when I get back to the states.
(In reply to Naoki Hirata :nhirata from comment #20)
> I am blocked by bug 832205; somehow I bricked my unagi phone with that
> build... not sure how.  I'll try to get it fixed when I get back to the
> states.

adb shell rm /sdcard/updates/0/update.mar
reboot
Manually check for updates
I worked with jst who had his phone in this condition (b2g didn't restart after performing an update).

I got jst to grab a bunch of information from the phone, and I'll attach some raw logs and extract some stuff from it).

b2g-ps only showed:

b2g              root      1442  1     32496  6992  c025cf64 400c17b0 D /system/b2g/b2g

I've attached the full logcat, and full ps output, and output from doing

echo t > /proc/sysrq-trigger
dmesg

The "D" in the ps output is significant. According to the docs it means "D is waiting in uninterruptible disk sleep"

The dmesg output shows us:

   D c053493c  5296  1442      1 0x00000000
(__schedule+0x38c/0x3f8) from [<c0535124>] (schedule_timeout+0x1c/0x208)
(schedule_timeout+0x1c/0x208) from [<c0534cfc>] (wait_for_common+0x110/0x1c8)
(wait_for_common+0x110/0x1c8) from [<c0534de4>] (wait_for_completion_killable+0x18/0x24)
(wait_for_completion_killable+0x18/0x24) from [<c025cf64>] (mdp_dsi_video_update+0x130/0x154)
(mdp_dsi_video_update+0x130/0x154) from [<c025d6c4>] (mdp_dma_pan_update+0x50/0x54)
(mdp_dma_pan_update+0x50/0x54) from [<c02460d0>] (msm_fb_pan_display+0x234/0x284)
(msm_fb_pan_display+0x234/0x284) from [<c023f438>] (fb_pan_display+0xe4/0x120)
(fb_pan_display+0xe4/0x120) from [<c02404a8>] (fb_set_var+0x1ec/0x290)
(fb_set_var+0x1ec/0x290) from [<c02406b4>] (do_fb_ioctl+0x168/0x5e0)
(do_fb_ioctl+0x168/0x5e0) from [<c013b338>] (do_vfs_ioctl+0x520/0x59c)
(do_vfs_ioctl+0x520/0x59c) from [<c013b3e8>] (sys_ioctl+0x34/0x54)
(sys_ioctl+0x34/0x54) from [<c003e180>] (ret_fast_syscall+0x0/0x30)

so it appears that the video driver is hung for some reason.

In logcat at 16:19:50.675 we see b2g wanting to restart in order to apply the update. jst says that he had just left his phone, and this tells me it timed out waiting for the user to press Install and went ahead anyways.

I normally see the ###!!! [Parent][AsyncChannel] Error: Channel error: cannot send/recv messages when b2g restarts.

This message:

16:19:51.036 E/GeckoConsole(  495): Content JS INFO at app://system.gaiamobile.org/js/window_manager.js:1114 in createFrame: %%%%% Launching Homescreen as remote (OOP)

however seems unusual. I think its the last message from process 495 (which would have been the b2g process prior to the restart).

We then see: 

01-22 16:19:52.948 E/AudioHardwareMSM76XXA( 1441): failed to open AUDIO_NORMAL_FILTER /system/etc/AudioFilter.csv: No such file or directory (2).
01-22 16:19:52.948 E/AudioHardwareMSM76XXA( 1441): failed to open AUTO_VOLUME_CONTROL /system/etc/AutoVolumeControl.txt: No such file or directory (2)
01-22 16:19:52.988 E/QualcommCamera( 1441): Qint android::get_number_of_cameras(): E


which is normal startup. Note that the PID is 1441.

I got jst to do a ps, and b2g is showing as 1442. /system/b2g/updated exists, which tells me that it hasn't done the restart to actually apply the update. /data/local/updates/0/update.status contains "applied" which is consistent with this.

I got jst to grab /proc/1442/stat /proc/1442/status and then got him to do:

kill 1442

This caused b2g to restart and it properly applied the update and started up.

I put needinfo on mvines to see if he can shed any light on being stuck in the video driver.
Flags: needinfo?(mvines)
Attached file pid 1442 logcat output
Attached file pid 1442 dmesg output
Attached file pid 1442 ps output
Hey Sushil, does this look familiar to you?
Flags: needinfo?(mvines) → needinfo?(sushilchauhan)
Any progress here?
(Sushil is still trying to get internal support on this sadly)
Some more clues:

- b2g only has a single thread.

lsof yields (this time b2g is pid 616 - output edited to fit better)

CMD PID USER   FD TYPE DEVICE  SIZE/OFF   NODE NAME
b2g 616 root  exe ???    ???       ???    ??? /system/b2g/b2g
b2g 616 root    0 ???    ???       ???    ??? /dev/null
b2g 616 root    1 ???    ???       ???    ??? /dev/null
b2g 616 root    2 ???    ???       ???    ??? /dev/null
b2g 616 root    3 ???    ???       ???    ??? /sys/power/wake_lock
b2g 616 root    4 ???    ???       ???    ??? /sys/power/wake_unlock
b2g 616 root    5 ???    ???       ???    ??? /sys/power/state
b2g 616 root    6 ???    ???       ???    ??? /dev/graphics/fb0
b2g 616 root    8 ???    ???       ???    ??? /dev/__properties__ (deleted)
b2g 616 root  mem ???  b3:13         0  16633 /system/b2g/b2g
b2g 616 root  mem ???  b3:13     94208  16633 /system/b2g/b2g
b2g 616 root  mem ???  b3:13         0    667 /system/lib/libstagefright_amrnb_common.so
b2g 616 root  mem ???  b3:13     49152    667 /system/lib/libstagefright_amrnb_common.so
b2g 616 root  mem ???  b3:13         0    566 /system/lib/libc.so
b2g 616 root  mem ???  b3:13    270336    566 /system/lib/libc.so

This phone has no /system/media/BootAnimation.zip file.

And this has happened twice on jst's unagi
We still turn on the screen and open the framebuffer very early in startup, regardless of whether we're actually going to run the boot animation.

Since we've opened /dev/graphics/fb0 we're almost certainly hanging at

http://mxr.mozilla.org/mozilla-central/source/b2g/app/BootAnimation.cpp#633
OK this problem seems to be that if the phone reboots (after applying the update) and the screen is off, then it hangs.

adb reboot causes the TurkCell splash to appear which seems to not cause the problem.

However if you do the following:

- boot the phone
- adb shell b2g-ps
- wait for the screen to go off
- kill -9 PID   (use the pid of the b2g process)

then b2g will hang as in comment 22.
These STR don't result in a hang on my hardware.  Sounding more like a device-specific problem now.
Flags: needinfo?(sushilchauhan)
That's why we added the code to turn the screen on before initializing the framebuffer: this used to bite all the time with uploading gecko builds when the screen was off.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #35)
> That's why we added the code to turn the screen on before initializing the
> framebuffer: this used to bite all the time with uploading gecko builds when
> the screen was off.

It seems that adding a delay between  set_screen_enabled(1) and the call to FramebufferNativeWindow makes the problem go away.

a 1 second delay works.
I'm working on smaller delays to see just how much of a delay is required.
We could try checking the screen enabled-ness before delaying to not hurt that case.
If there was an API, I would.

I'm trying /sys/power/wait_for_fb_wake and if that doesn't work, I'll try a timer that restarts if the timer goes off.
WooHoo.

/sys/power/wait_for_fb_wake seems to work (thanks mwu). And looking at the logs, it takes about 1/2 second for the case where it would hang, and no delay for the cases where it wouldn't.

Now to cleanup and create a patch
I've tested this on the unagi, and will test it on the otoro and the emulator.
Attachment #706680 - Flags: review?(mwu)
Comment on attachment 706680 [details] [diff] [review]
Wait for the framebuffer to become ready before trying to use it

Review of attachment 706680 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine though I would prefer to avoid logging if there are no errors

::: b2g/app/BootAnimation.cpp
@@ +640,5 @@
> +        do {
> +            len = read(fd.get(), &buf, 1);
> +        } while (len < 0 && errno == EINTR);
> +        if (len < 0) {
> +            LOGE("BootAnitmation: wait_for_fb_sleep failed errno: %d", errno);

typo here
Attachment #706680 - Flags: review?(mwu) → review+
I'll remove the 2 non-error logs.
The otoro doesn't seem to suffer from the hanging problem, although reading from /sys/power/wait_for_fb_wake delays for about 400 msec under those circumstances (compared to about 500 msec for the unagi).

The emulator seems to work with this patch applied as well.
https://hg.mozilla.org/mozilla-central/rev/6a073316b972
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [triage:1/16] → [triage:1/16][needs-b2g-v1.0 uplift]
https://hg.mozilla.org/releases/mozilla-b2g18_v1_0_0/rev/b5d3899496a2
Whiteboard: [triage:1/16][needs-b2g-v1.0 uplift] → [triage:1/16]
Target Milestone: --- → B2G C4 (2jan on)
Happened to me once again today.
Severity: normal → critical
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I'm think that you actually got hit by bug 841041 rather than this one.
File a new bug for what you are hitting - or if bug 841041 is what you are hitting, use that one.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
I'm almost sure that the screen was fully black.
Build ID:20130219070200
Gecko  http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/98354c0298ab
Gaia   edaca00b1eb7534120b6255db5d5200fb1d86d65
Kernel: Dec 5

Verified on "Unagi" device, this issue no longer reproduces on nightly build.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: