Closed Bug 924129 Opened 11 years ago Closed 11 years ago

[B2G] Buri device stuck in boot process after flashing from 1.3 branch, showing the firefox logo and then rebooting.

Categories

(Firefox OS Graveyard :: Gaia, defect)

ARM
Gonk (Firefox OS)
defect
Not set
blocker

Tracking

(blocking-b2g:1.3+, firefox27 fixed)

RESOLVED FIXED
blocking-b2g 1.3+
Tracking Status
firefox27 --- fixed

People

(Reporter: jzimbrick, Assigned: pchang)

References

Details

(Keywords: regression, smoketest)

Attachments

(2 files)

Attached file Logcat during reboots.
Description:
Buri device stuck on Firefox loading screen, screen progresses to a certain point, freezes, and then reboots infinitely.

Repro Steps:
1. Update Buri device to 1.3 moz build 20131007040201.
2. Observe that the phone will get to the Firefox logo, progress a bit, freeze, and then reboot. It does this until the device is powered off.

Actual:
Device can not get past the firefox logo, reboots until powered off.

Expected:
Device progresses normally past the firefox logo and will progress to the FTU.

Environmental Variables
Device: Buri 1.3 Moz Ril
Build ID: 20131007040201
Gecko: 4960e672451b929388f431ebb0b805fa8d82e324

Notes:
Repro frequency: 100%

See attached: Logcat
*Correction on environment variables section, the gecko field is actually the build's Gaia pulled from the sources.xml file.
blocking-b2g: --- → 1.3?
Severity: normal → blocker
Just wanted to note that I think this might be different from 924130.  I tried running on my buri with m-c prior to the problem commit in 924130 and I still see my device failing to boot.
I think I have confirmed this is unrelated to 924130.  I was able to bisect it as far as here:

  http://hg.mozilla.org/mozilla-central/rev/f191c70fcbfb

But that is a merge from m-i to m-c.  We need to identify which commit within the merge is the problem.

Unfortunately I'm out of time for the evening, so it would be great if someone else could continue the investigate or backout the entire m-i merge.
Regression range wise, we know this was working on last Friday's build on 10/4/2013 and failing on today's build.
Notable logcat statements.

10-07 08:27:35.259: I/Gecko(743): ###!!! [Child][MessageChannel::SendAndWait] Error: Channel error: cannot send/recv

10-07 08:27:36.199: E/GeckoConsole(803): Could not read chrome manifest 'file:///system/b2g/distribution/bundles/libqc_b2g_ril.version/chrome.manifest'.

10-07 08:27:43.230: I/GeckoDump(803): Crash reporter : Can't fetch app.reportCrashes. Exception: [Exception... "Component returned failure code: 0x8000ffff (NS_ERROR_UNEXPECTED) [nsIPrefBranch.getBoolPref]"  nsresult: "0x8000ffff (NS_ERROR_UNEXPECTED)"  location: "JS frame :: chrome://browser/content/shell.js :: shell_reportCrash :: line 122"  data: no]
Just a quick note. I only have Unagi at hand, so I flashed the build with the same date (2013-10-07-04-02-01) and my Unagi booted successfully. 

Gecko: (mercurial) 5f0569c3cb8f
20131005040201 - Good build
gaia:     c9090021f7d642bae1db73a1093ab3dbb5078642
Gecko:    http://hg.mozilla.org/mozilla-central/rev/b5d24ef1eb37
BuildID   20131005040201
Version   27.0a1

20131006040201 - Bad build
gaia:     0579e4ec4903344d1e92f6a02037ad133c6d974d
Gecko:    http://hg.mozilla.org/mozilla-central/rev/f191c70fcbfb
BuildID   20131006040201
Version   27.0a1


I'll try to figure out which patch broke it.
I just sync the latest hamachi code from m-c and was able to reproduce the booting problem.

I tried to disable the hwcomposer on hamachi device by the following cmds and then the device could boot to homescreen.

Jason, does it work for you? I'm checking the HWComposer issue on hamachi device.

peter@peter-desktop:~$ adb remount
remount succeeded
peter@peter-desktop:~$ adb shell mv /system/lib/hw/hwcomposer.msm7627a.so /system/lib/hw/hwcomposer.msm7627a.so_bak
peter@peter-desktop:~$ adb shell stop b2g
peter@peter-desktop:~$ adb shell start b2g
Great, it works for me. I can boot up successfully followed the instructions in Comment 8
Found the problem because mList is null inside HwcComposer2D module.

Working on patch to fix it.

http://mxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp#624
Assignee: nobody → pchang
Fix crash on hwcomposer becasue mList is not initialized and HwcComposer2D::PrepareLayerList return false.
Attachment #814308 - Flags: review?(ncameron)
Comment on attachment 814308 [details] [diff] [review]
fix crash on hwcomposer

Review of attachment 814308 [details] [diff] [review]:
-----------------------------------------------------------------

I don't see why we need this mList->numHwLayers = 0; line at all - we do the same thing (already guarded) a couple of lines higher and (I think) it cannot get set in PrepareLayerList if it returns false. Maybe it should be asserted that it is still 0 rather than setting it? (However, it could be set by TryHwComposition if that returns false, so if the invariant here is that is mList->numHwLayers 0 when TryRender returns false, then that needs to be looked at).
Attachment #814308 - Flags: review?(ncameron)
This looks to be caused by bug 919676. Given that this causes a boot failure, I'm going to ask for that patch to be backed out.
Blocks: 919676
Fixed by backout of bug 919676.
Status: NEW → RESOLVED
blocking-b2g: 1.3? → 1.3+
Closed: 11 years ago
Resolution: --- → FIXED
> I don't see why we need this mList->numHwLayers = 0; line at all

All renders (including GPU composition) are now done through HwcComposer2D. This is a way to signal that no layers shall be composed by HWC.

I have updated the patch in bug 919676 to do a null check. If anyone with the device that broke (buri?) could try it out I'd appreciate it.
(In reply to Diego Wilson [:diego] from comment #15)
> > I don't see why we need this mList->numHwLayers = 0; line at all
> 
> All renders (including GPU composition) are now done through HwcComposer2D.
> This is a way to signal that no layers shall be composed by HWC.

Sorry, I didn't mean we didn't need the line at all, but that I don't see why we need it here _and_ a few lines above. Just once looks like it ought to be enough.
(In reply to Nick Cameron [:nrc] from comment #16)
> Sorry, I didn't mean we didn't need the line at all, but that I don't see
> why we need it here _and_ a few lines above. Just once looks like it ought
> to be enough.

Oh, that's because PrepareLayerList() could've added layers before it failed, and it's kind of all or nothing right now.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: