Closed
Bug 1109794
Opened 10 years ago
Closed 9 years ago
Device not booting up due to b2g process getting killed by LMK on boot up continuously
Categories
(Firefox OS Graveyard :: General, defect, P1)
Tracking
(Not tracked)
RESOLVED
INVALID
People
(Reporter: anshulj, Assigned: ntroast)
References
Details
(Whiteboard: [MemShrink:P3])
Attachments
(2 files)
The issue is easily reproducible on flame device with the following SHAs Gecko: 2145ba8738a56c235efc211b461272edede6fb84 Gaia: e04ab7651b1e0c67516e1cef7aa4bc6072529885
The last known good SHAs are below to help narrow down a regression window. Gecko: bd2404ce8db2ca13b484a7f3c3b3db31239cf904 Gaia: e5d666d6f62480ced56c6d9352f5e12befb5a862
Updated•10 years ago
|
Summary: Device not botting up due to b2g process getting killed by LMK on boot up continuously → Device not booting up due to b2g process getting killed by LMK on boot up continuously
Comment 2•10 years ago
|
||
Anshul this the log I have with the above SHA's : http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a61cacb954c5&tochange=be1f49e80d2d, kinda too big. It will help if you can help narrow down further...
Bhavna, I have narrowed it down to Gaia commits between b384220eb54329397af53ee6819cc13bd7b641f1 and b1edd64173cd48d130f697e0b0b2adf2523ad57f. I am having hard time bisecting it further as I am getting compilation errors if I bisect anymore.
Comment 4•10 years ago
|
||
Last time I flash all image from pvt server but cannot reproduce this issue. I will use v188 as base and just flash gecko/gaia to reporduce it, will update later.
Comment 5•10 years ago
|
||
I sync flame B2G source tree based on the manifest provided by Michael to generate local build images. But, with these images, I cannot see the same symptom even with SIM card and SD card inserted. ftp://ftp.mozilla.org/pub/mozilla.org/b2g/manifests/nightly/2.2.0/2014-12-09-16/source_flame-kk_2014-12-09-16.xml. b2g process information of b2g-procrank: APPLICATION PID Vss Rss Pss Uss cmdline b2g 210 99524K 34568K 29833K 26952K /system/b2g/b2g (5 seconds...) APPLICATION PID Vss Rss Pss Uss cmdline b2g 210 132276K 52124K 41677K 35940K /system/b2g/b2g (5 seconds...) APPLICATION PID Vss Rss Pss Uss cmdline b2g 210 170240K 68852K 60475K 55528K /system/b2g/b2g (5 seconds...) APPLICATION PID Vss Rss Pss Uss cmdline b2g 210 171416K 67932K 59484K 54472K /system/b2g/b2g (5 seconds...) APPLICATION PID Vss Rss Pss Uss cmdline b2g 210 207848K 79056K 71978K 67788K /system/b2g/b2g (5 seconds...) APPLICATION PID Vss Rss Pss Uss cmdline b2g 210 218500K 41032K 34087K 30796K /system/b2g/b2g (5 seconds...) APPLICATION PID Vss Rss Pss Uss cmdline b2g 210 219708K 35168K 28432K 26200K /system/b2g/b2g (5 seconds...) APPLICATION PID Vss Rss Pss Uss cmdline b2g 210 218240K 42592K 36269K 33952K /system/b2g/b2g (5 seconds...) APPLICATION PID Vss Rss Pss Uss cmdline b2g 210 218240K 37752K 32520K 30708K /system/b2g/b2g (5 seconds...) APPLICATION PID Vss Rss Pss Uss cmdline b2g 210 217408K 36044K 32798K 31608K /system/b2g/b2g (5 seconds...) APPLICATION PID Vss Rss Pss Uss cmdline b2g 210 217472K 40996K 38020K 36728K /system/b2g/b2g (5 seconds...) APPLICATION PID Vss Rss Pss Uss cmdline b2g 210 216192K 41568K 39178K 38084K /system/b2g/b2g Gaia-Rev 4cdeee67b449db90aae9384337311547c280093c Gecko-Rev 4e5dbde020e7101d711aca751bb1ff3af40e32b4 Build-ID 20141211135752 Version 37.0a1 Device-Name flame FW-Release 4.4.2 FW-Incremental eng.rexmax.20141211.135325 FW-Date 四 12月 11 13:53:43 CST 2014 Bootloader L1TC00011880
Comment 6•10 years ago
|
||
Hi Anshul, I did two tests on 256MB flame, but still CANNOT reproduce this issue, it's weird! My test configs and environment as following: Test 1: 1. Flash v188 as base 2. Update gaia/gecko with below commit: Gecko: 2145ba8738a56c235efc211b461272edede6fb84 Gaia: e04ab7651b1e0c67516e1cef7aa4bc6072529885 3. No SIM and no sd card. Test 2: 1. Flash v188 as base 2. re-sync code with manifest file source_flame-kk_2014-12-09-16.xml and rebuild images. 3. Flash system.img, userdata.img and boot.img 4. No SIM and no sd card Should I base on another base image or do any pre-setting on flame? If you guys still can reproduce this issue, there are some suggestions to narrow down the problem. 1. Using gdb to debug 2. Using top command to check if any thread of b2g is busy (maybe there is process enter endless loop and allocate more memory ) #adb shell top -m 15 -d 1 -t 3. Using get_about_memory.py to find the memory leakage of b2g.
Flags: needinfo?(anshulj)
I have narrowed the issue down to bug 1101158. Reverting bug 1101158 locally fixes the issue.
Flags: needinfo?(anshulj)
Comment 8•10 years ago
|
||
Hi Anshul, We'd like to investigate this issue but we could not reproduce it on Moz build. Could we have your own build for flame device? Best regards,
Comment 9•10 years ago
|
||
I've done most of the development related to the LMK so I wanted to investigate this bug but alas comment 7 points to a bug that has been marked as confidential. I believe that is wrong since the said bug was fixed and landed on gaia/master [1] Mozilla requires that all bugs associated to code landed publicly must also be public. Making it so will obviously also make it easier for people to help with it. [1] https://github.com/mozilla-b2g/gaia/commit/cd0cb6aa5322e6d98633cb034513136e5e470246
Reporter | ||
Comment 10•10 years ago
|
||
Gabriele, just so you know, I don't have access to the bug mentioned in comment #7 either. I can see the change however in the git history by searching for bug 1101158.
Comment 11•10 years ago
|
||
(In reply to Anshul from comment #10) > Gabriele, just so you know, I don't have access to the bug mentioned in > comment #7 either. I can see the change however in the git history by > searching for bug 1101158. Meh, this is really bad :-( I'll try pinging :kgrandon because he seems to have authored the change.
Comment 12•10 years ago
|
||
Hi Anshul, Per request comment #8, could you provide us your own build of flame device? We'd like to investigate this issue further. Thank you.
Flags: needinfo?(anshulj)
Comment 13•10 years ago
|
||
Anshul, try grabbing the stock v188 build from https://developer.mozilla.org/en-US/Firefox_OS/Developer_phone_guide/Flame and shallow flashing a gecko/gaia built from the sha1s that reproduce the issue here on it.
Comment 14•10 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #11) > (In reply to Anshul from comment #10) > > Gabriele, just so you know, I don't have access to the bug mentioned in > > comment #7 either. I can see the change however in the git history by > > searching for bug 1101158. > > Meh, this is really bad :-( I'll try pinging :kgrandon because he seems to > have authored the change. NI, Kevin here to look into the suspected patch.
Flags: needinfo?(kgrandon)
Comment 15•10 years ago
|
||
It seems unlikely to me that bug 1101158 could cause these kinds of symptoms, but I'll look into it. It also seems like no one at moz is able to reproduce this yet. Anshul - Are you able to get a logcat here during the reboot?
Flags: needinfo?(kgrandon)
Reporter | ||
Comment 16•10 years ago
|
||
Please find attached the android log as requested.
Flags: needinfo?(anshulj)
Comment 17•10 years ago
|
||
Anshul - thanks for the logs. I couldn't immediately see anything spit out by gaia that would be causing this, but there might be something in there more telling of the platform. Any chance that we've done the suggested steps in comment 13? Does the issue reproduce after the stock build and hsallow flash?
Flags: needinfo?(anshulj)
Reporter | ||
Comment 18•10 years ago
|
||
(In reply to Michael Vines [:m1] [:evilmachines] from comment #13) > Anshul, try grabbing the stock v188 build from > https://developer.mozilla.org/en-US/Firefox_OS/Developer_phone_guide/Flame > and shallow flashing a gecko/gaia built from the sha1s that reproduce the > issue here on it. With the latest v188 image on flame and shallow flashing gecko/gaia from moz central I am able to reproduce the issue. Once I revert bug 1101158 the flame device boots up fine. So again confirming the fact that bug 1101158 is the offending bug.
Flags: needinfo?(anshulj)
Comment 19•10 years ago
|
||
(In reply to Kevin Grandon :kgrandon from comment #17) > Anshul - thanks for the logs. I couldn't immediately see anything spit out > by gaia that would be causing this, but there might be something in there > more telling of the platform. This sounds like we're hitting some kind of corner case which is causing memory consumption to hit a peak. Looking at the code in your change however I can't really tell what might be causing it though.
Comment 20•10 years ago
|
||
Anshul - are there any actions performed before the device gets stuck in a reboot loop? Is this with manual execution or a marionette test? If there are some actions before the reboot loop, please let us know what they are. It also sounds like a memory report would be useful if it's possible to get one before the device reboots.
Flags: needinfo?(anshulj)
Reporter | ||
Comment 21•10 years ago
|
||
Kevin, no specific action being taken besides simply trying to boot up the phone.
Flags: needinfo?(anshulj)
Reporter | ||
Comment 22•10 years ago
|
||
Please find attached procrank logs for a run on an internal device (not flame as b2g-procrank) doesn't seem to be working on flame for me.
Reporter | ||
Comment 23•10 years ago
|
||
adb shell cat /proc/meminfo on the flame device. MemTotal: 935400 kB MemFree: 271200 kB Buffers: 6152 kB Cached: 55516 kB SwapCached: 1052 kB Active: 571284 kB Inactive: 39480 kB Active(anon): 548192 kB Inactive(anon): 2112 kB Active(file): 23092 kB Inactive(file): 37368 kB Unevictable: 1120 kB Mlocked: 0 kB HighTotal: 270336 kB HighFree: 636 kB LowTotal: 665064 kB LowFree: 270564 kB SwapTotal: 196604 kB SwapFree: 183960 kB Dirty: 0 kB Writeback: 0 kB AnonPages: 550212 kB Mapped: 32640 kB Shmem: 88 kB Slab: 21528 kB SReclaimable: 7672 kB SUnreclaim: 13856 kB KernelStack: 3368 kB PageTables: 2924 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 664304 kB Committed_AS: 820628 kB VmallocTotal: 245760 kB VmallocUsed: 21660 kB VmallocChunk: 79364 kB Every time I run it the Active memory keeps going up and MemFree keeps going down until b2g gets killed.
Updated•10 years ago
|
Whiteboard: [MemShrink]
Updated•9 years ago
|
Whiteboard: [MemShrink] → [MemShrink:P3]
Updated•9 years ago
|
Assignee: nobody → ntroast
Comment 24•9 years ago
|
||
We've root-caused this to a sinister landmine in our build environment. I feel queasy. Thanks for the debug help all!
Status: NEW → RESOLVED
blocking-b2g: 2.2? → ---
Closed: 9 years ago
Resolution: --- → INVALID
Comment 25•9 years ago
|
||
Hi Michael, I am still curious about what kind of landmine in your environment can result in memory leakage of b2g process based on this Gaia commit. Maybe there are some lessons learned we can have to identify this kind of problem more efficiently next time. Could you kindly let us know what you found in more detail? Thank you.
Comment 26•9 years ago
|
||
Heh, so we cache the b2g_sdk locally because it's very annoying that the Gaia build wants to download it from ftp.mozilla.org every time. We had a subtle bug in our cache such that we were not using the latest b2g_sdk, we've had this bug for years now but just never triggered it until recently. So for some reason, that particular gaia patch was causing the older b2k_sdk to generate a bogus system app zip file and causing the LMK at boot. The best way to avoid this particular build mismatch between Moz/CAF in the future would be for Mozilla to store the b2g_sdk in a git project so that it can be properly versioned like the rest of the build and then we can stop using our local cache to avoid the ftp download. I know other partners have requested this in the past as well.
Blocks: CAF-v3.0-FL-metabug
No longer blocks: CAF-v3.0-FL-metabug
You need to log in
before you can comment on or make changes to this bug.
Description
•