Closed Bug 1036670 Opened 10 years ago Closed 10 years ago

[Flame][Memory] Cannot always boot successfully in 273MB RAM

Categories

(Firefox OS Graveyard :: Performance, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:2.0+, firefox31 wontfix, firefox32 fixed, firefox33 fixed, b2g-v1.3 unaffected, b2g-v1.4 affected, b2g-v2.0 fixed, b2g-v2.1 fixed)

RESOLVED INVALID
2.0 S6 (18july)
blocking-b2g 2.0+
Tracking Status
firefox31 --- wontfix
firefox32 --- fixed
firefox33 --- fixed
b2g-v1.3 --- unaffected
b2g-v1.4 --- affected
b2g-v2.0 --- fixed
b2g-v2.1 --- fixed

People

(Reporter: pcheng, Assigned: ting)

References

Details

(Keywords: memory-footprint, perf, regression, Whiteboard: [MemShrink][273MB-Flame-Support][c=memory p= s= u=2.0])

Attachments

(4 files, 1 obsolete file)

Related bug 1008050.

To Repro:
Set Flame memory to 273MB by entering following commands in console:
$ adb reboot bootloader
$ fastboot oem mem 273
$ fastboot reboot

Expected: Flame boots up correctly to lockscreen or homescreen.

Actual: Flame doesn't always boot up correctly and would be stuck on Firefox animation screen.

- Frequency of hitting this issue:
v122 base only = 0 out of 5 attempts
v122 base + v1.4 = 2 out of 5 attempts (1.4 build ID: 20140709075131)
v122 base + v2.0 = 2 out of 5 attempts (2.0 build ID: 20140709105132)
v122 base + v2.1 = 2 out of 5 attempts (2.1 master build ID: 20140709131330)


Notes:
1) This issue will be reliably hit for v1.4, v2.0 and v2.1 after flashing gecko & gaia, or after Factory Resetting phone via Settings. I made sure for each version of Firefox these two conditions (flashing & resetting) are each tested once within those 5 attempts.

2) For testers running into this issue, doing an '$ adb reboot' should reset the device and you should be able to boot up eventually.
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(jmitchell)
Blocks: 1036675
nomming 2.0 (not 1.4 because I believe 2.0 is the focus of these 256/273 mem tests) ; issue is obviously severe, not being able to boot correctly.
blocking-b2g: --- → 2.0?
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(jmitchell)
Can we get a logcat?
QA Wanted for a logcat.
Keywords: footprint, perf, qawanted
QA Whiteboard: [QAnalyst-Triage+]
Keywords: regression
Logcat attached.
QA Whiteboard: [QAnalyst-Triage?]
Keywords: qawanted
QA Contact: pcheng
Flags: needinfo?(jmitchell)
So this is a regression between 1.3 and 1.4, not 1.4 and 2.0, correct?
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(jmitchell)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #5)
> So this is a regression between 1.3 and 1.4, not 1.4 and 2.0, correct?

Right.
Those from attachment 8453841 [details] seems odd:

  01-02 08:20:59.109   292   292 E GeckoConsole: [JavaScript Error: "TypeError: shell.contentBrowser is null" {file: "chrome://b2g/content/shell.js" line: 78}]
  07-10 03:18:48.460   292   292 E GeckoConsole: [JavaScript Error: "TypeError: this.contentBrowser is null" {file: "chrome://b2g/content/shell.js" line: 332}]

How should I try again if STR at comment 1 doens't repro, set the memory back to 1024, flash/reset, set memory to 273, and reboot?
(In reply to Ting-Yu Chou [:ting] from comment #7)
> How should I try again if STR at comment 1 doens't repro, set the memory
> back to 1024, flash/reset, set memory to 273, and reboot?

Basically you need to either flash or factory reset the phone via Settings with the device ALREADY in 273mb mem. So your order of doing it is incorrect.

To put it simply: Set device to 273mb mem, and go to settings > device information > reset phone.
blocking-b2g: 2.0? → 2.0+
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage+][lead-review+]
Thinker, do you have any idea about this bug? How should we go next? Need reproducible environment or something else? Thank you!
Flags: needinfo?(tlee)
Ting-Yu, are you the best person to own this bug? Thank you.
Flags: needinfo?(tchou)
I am not sure wheter I am the best fit, but yes, I am trying to figure out what is going on.
Flags: needinfo?(tchou)
PiWei, it seems I am not reproducing the issue correctly. Follow the STR in comment 8, after factory reset:

1) the device always stops at "T2Mobile" screen, this does not match to Firefox animation screen you mentioned in comment 1
2) the device stops at "T2Mobile" screen no matter I set 273, or 1024 MB for the device memory, this also does not match to the prerequisite

gecko: f6aad75c6ac3d3da06406f3570833f29771e217d
gaia:  9d822c4666c507da8ffe9967df530aa6980acbf8
Flags: needinfo?(pcheng)
Brian, as far as I know, your team is also using 273MB Flame to do MTBF testing, did you encounter this kind of issues? Thanks!
Flags: needinfo?(brhuang)
We did encounter this issue sometimes but it's not always reproducible.
Flags: needinfo?(brhuang)
As factory reset clears webapps, I tried to repro by:

  1. $ fastboot oem mem 273
  2. $ flash.sh
  3. go back to step 2 if I see FTU

Repeated step 2 for 10 times, but haven't reproduced it yet, I will keep trying tomorrow.
(In reply to Ting-Yu Chou [:ting] from comment #15)
>   1. $ fastboot oem mem 273
>   2. $ flash.sh
>   3. go back to step 2 if I see FTU

I tried 20 more times today, and haven't reproduced it. Tried also 20 times "adb reboot" as below, still can't repro:

  1. fastboot oem mem 273, and accomplish FTU
  2. $ adb reboot
  3. go to setp 2 if I see lock screen

gecko:    deea1599293617731dcc0f2104fc0eb79e1c1c28
gaia :    f0d9c5b39735bcbc63405281642437d4fc186cdb
firmare:  v122
platform: 33.0a1
Flags: needinfo?(tlee)
After double checked with :mlien, I can reproduce now.

QA update only gecko/gaia on top of base image v122, but I used "./flash.sh" to download my local build, if I update gecko/gaia only by "./flash.sh gecko && ./flash.sh gaia", I can then reproduce it.

From what :mline told me, the gonk/kernel of t2m's base image is from v1.3 (need to be confirmed).

Here's the about-memory report when the issue is occurred, heap-unclassified of b2g process is really high, I will enable DMD to check what is it tomorrow.

Main Process (pid 2281)
Explicit Allocations

90.63 MB (100.0%) -- explicit
├──72.78 MB (80.30%) ── heap-unclassified
├──10.35 MB (11.43%) -- js-non-window
│  ├───6.61 MB (07.30%) -- zones
│  │   ├──4.70 MB (05.19%) -- zone(0xb2b69c00)
│  │   │  ├──2.49 MB (02.75%) ++ compartment([System Principal])
│  │   │  └──2.21 MB (02.44%) ++ (34 tiny)
│  │   ├──1.71 MB (01.89%) -- zone(0xb6a78000)
│  │   │  ├──1.69 MB (01.86%) -- strings/string(<non-notable strings>)
│  │   │  │  ├──1.12 MB (01.23%) ── malloc-heap
│  │   │  │  └──0.57 MB (00.63%) ── gc-heap
│  │   │  └──0.02 MB (00.03%) ++ (4 tiny)
│  │   └──0.20 MB (00.22%) ++ zone(0xb6a79c00)
│  ├───3.65 MB (04.03%) -- runtime
│  │   ├──2.08 MB (02.30%) ── script-data
│  │   ├──1.08 MB (01.20%) ── atoms-table
│  │   └──0.48 MB (00.53%) ++ (10 tiny)
│  └───0.09 MB (00.10%) ++ gc-heap
├───3.26 MB (03.60%) ++ (19 tiny)
├───2.34 MB (02.58%) -- heap-overhead
│   ├──1.69 MB (01.87%) ── bin-unused
│   └──0.65 MB (00.71%) ++ (3 tiny)
└───1.89 MB (02.09%) -- workers/workers()
    ├──1.40 MB (01.54%) ++ worker(resource://gre/modules/ril_worker.js, 0xafdcd800)
    └──0.50 MB (00.55%) ++ worker(resource://gre/modules/nfc_worker.js, 0xafdc7000)
Clear NI as I can repro now.
Flags: needinfo?(pcheng)
Whiteboard: [MemShrink] → [MemShrink][273MB-Flame-Support]
How do I even get it to adb reboot? Can anyone help with that process? I'm pressing the volume button and the up arrow, and reaching the reboot screen- what do I do now?
See Also: → 1038854
Unreported: 45 blocks in stack trace record 1 of 302
 73,912,320 bytes (73,872,000 requested / 40,320 slop)
 86.58% of the heap (86.58% cumulative);  96.62% of unreported (96.62% cumulative)
 Allocated at
   replace_malloc[libdmd.so +0x3898] 0xb6fce898
   ???[/system/b2g/b2g +0xBC42] 0x13c42
   ???[/system/b2g/b2g +0xC656] 0x14656
   __thread_entry[libc.so +0xCB64] 0xb6f7ab64
   pthread_create[libc.so +0xCCE0] 0xb6f7ace0

This is from boot animation, which there are 45 480x854 pngs and AnimationFrame::ReadPngFrame() is called with output format HAL_PIXEL_FORMAT_RGBA_8888. Isn't 70MB for boot animation too much?
As boot animation eats memory, there comes memory pressure events. And if getContentWindow() is called before shell_start() appending |systemAppFrame| for sending custom event |mozmemorypressure|, it will be failed with error:

  E/GeckoConsole( 1216): [JavaScript Error: "TypeError: shell.contentBrowser is null" {file: "chrome://b2g/content/shell.js" line: 80}]

and the getter function contentBrowser() will be deleted by itself:

  get contentBrowser() {
    delete this.contentBrowser;
    return this.contentBrowser = document.getElementById('systemapp');
  },

Later when shell_start() comes here:

  this.contentBrowser.addEventListener('mozbrowserloadstart', this, true);

it gets another error:

  E/GeckoConsole( 1216): [JavaScript Error: "TypeError: this.contentBrowser is null" {file: "chrome://b2g/content/shell.js" line: 338}]

as the property is assigned as null. StopBootAnimation() won't be called when this happens, so the phone stays at animation screen.

I am cooking a patch for this.
Attached patch patch (obsolete) — Splinter Review
Removed getter function contentBrowser() which may set the property null. Assign the property after |systemAppFrame| is appended instead.

I didn't add any error handling in getContentWindow() since there will be still error messages if someone is trying to access |shell.contentBrowser| before it is assigned.
Attachment #8456751 - Flags: review?(fabrice)
Assignee: nobody → tchou
Status: NEW → ASSIGNED
Priority: -- → P1
Whiteboard: [MemShrink][273MB-Flame-Support] → [MemShrink][273MB-Flame-Support][c=memory p= s= u=2.0]
Severity: normal → blocker
Attachment #8456751 - Flags: review?(fabrice) → review+
Attached patch patchSplinter Review
Added r=fabrice to commit message, carry r+.

I found the try of comment 23 ran only a few tests, added b2g ics emulator and it looks good: https://tbpl.mozilla.org/?tree=Try&rev=0409d889dec8.
Attachment #8457809 - Flags: review+
Attachment #8456751 - Attachment is obsolete: true
Attached patch patch-v1.4Splinter Review
Attached patch patch-v2.0Splinter Review
Please let me know if I should reqeust approval for uplifting.
Keywords: checkin-needed
2.0 blockers are uplifted automatically.
https://hg.mozilla.org/mozilla-central/rev/7ee1774b1703
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → 2.0 S6 (18july)
Dear PiWei,
According to the Comment 17, is there any precondition when we verify this issue?
Flags: needinfo?(pcheng)
(In reply to Shine from comment #32)
> Dear PiWei,
> According to the Comment 17, is there any precondition when we verify this
> issue?

I believe this bug had been fixed since it was marked fixed. Also we've increased base memory to 319MB and no longer using 273MB so I'm closing this bug as invalid.
Flags: needinfo?(pcheng)
Resolution: FIXED → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: