Closed Bug 898990 Opened 11 years ago Closed 11 years ago

[Leo] b2g takes high cpu resource (>90%) and system too slow, due to long CC pauses

Categories

(Firefox OS Graveyard :: General, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:leo+, b2g18 fixed)

RESOLVED FIXED
1.1 QE6
blocking-b2g leo+
Tracking Status
b2g18 --- fixed

People

(Reporter: leo.bugzilla.gecko, Unassigned)

References

Details

(Keywords: meta, perf, Whiteboard: [MemShrink:P1] [c=memory u=1.1])

Attachments

(7 files, 1 obsolete file)

STR> 1) run app from first icon of homescreen. 2) load wait 10 seconds. 3) go homescreen by home key 4) do test 1)~3) by end icon of homescreen (leo has a 23 app on homescreen including operator apps) 5) go to step 1) and never kill app by user. if it has no memory, LMK will kill any apps to run new one. the purpose of this test is checking stability of target. after doing test for 3~4 days. target is very slow. delayed launch application and delayed response as my action.
basically, this version included the patch of Bug 893012. i am testing to investigation the patch of Bug 893012.
Attached file log of top 10 process
$> adb shell top -m 10 > log_top.txt periodically, b2g takes high cpu resource almost 92%~95%.
Attached file b2g-ps.txt
when i was long-pressed home key, there are no running app. but, b2g-ps displays APPLICATION USER PID PPID VSIZE RSS WCHAN PC NAME Browser app_8025 8025 140 71960 21572 ffffffff 40091430 S /system/b2g/plugin-container
after do this. $> kill -9 8025 no chage anything. remains problem.
What is "periodically"? Also, do we have profiler working for b2g? Would be good to see where the cpu time is spent.
The root process seems to have nearly 50000 event listeners. Clearly not right. Something is adding event listener all the time but not remove after use? Or one should probably add just one event listener, but because of some bug in gaia ends up adding listener all the time.
Looks like the listeners are added to 0x4c25c1c0 [rc=7] FragmentOrElement (xhtml) iframe class='' app://system.gaiamobile.org/index.html So, some iframe element in app://system.gaiamobile.org/index.html document.
There is also some JS array with over 20000 items in it, and each item is an Object ... Looks like property called momentums keeps that array alive.
Component: General → Gaia
Looks like "momentums" is something from BrowserElementPanning: http://mxr.mozilla.org/mozilla-central/ident?i=momentums&filter=
Er, wait, wrong number for event listeners. Not 50000, but "only" about 15000
Comment on attachment 782463 [details] after testing for 4 days, take a memory-report / dmd / gc / cc I'm marking this attachment obsolete, per comment 3.
Attachment #782463 - Attachment is obsolete: true
Well, the event listeners aren't a mystery; BrowserElementPanning calls addEventListener but never calls removeEventListener. :-/
(In reply to leo.bugzilla.gecko from comment #0) > STR> > 1) run app from first icon of homescreen. > 2) load wait 10 seconds. > 3) go homescreen by home key > 4) do test 1)~3) by end icon of homescreen > (leo has a 23 app on homescreen including operator apps) > 5) go to step 1) > > and never kill app by user. if it has no memory, LMK will kill any apps to > run new one. > the purpose of this test is checking stability of target. > > after doing test for 3~4 days. target is very slow. > delayed launch application and delayed response as my action. Can you tell which Gecko version you are using? Is it mozillab2g-18? In this case smaug mentionned that it could be useful to port the small patch of bug 899089 to this branch in order to have more useful debug informations.
> Can you tell which Gecko version you are using? Is it mozillab2g-18? Almost certainly; it's Leo. > In this case smaug mentionned that it could be useful to port the small patch of bug > 899089 to this branch in order to have more useful debug informations. Keep in mind that it takes them many days to run this test. > Well, the event listeners aren't a mystery; BrowserElementPanning calls addEventListener > but never calls removeEventListener. :-/ I guess this isn't necessarily the problem, although it's certainly /a/ problem.
Oh, I'm wrong; BrowserElementPanning does (try to) remove its event listener.
over 7000 JS Object (Function - apploaded)
And about 2400 JS Object (Function - ignoreClearCacheOn) but I don't see that in mxr (gaia or mozilla-central)
I wrote a little script to analyze the GC log for the names of things rooted by things with names starting with "nsXPCWrappedJS[nsIDOMEventListener" and the large ones are these: 7133 Function apploaded 2391 Function visibilityChangeHandler 2391 Function ignoreClearCacheOn 2391 Function ignoreClearCacheOff Which is basically what you came up with, with the addition of ignoreClearCacheOff. :)
ignoreClearCacheOn and ignoreClearCacheOff are in the bindingArray for the script at resource://gre/modules/BrowserElementParent.jsm:1, along with visibilityChangeHandler, which seems to mean that they are local variables or arguments for the script. Unfortunately the line number appears bogus, but that does suggest that it is something in BEP related to visibilityChangeHandler, which makes sense given that they appear as event listeners the exact same number of times. There's also a function BrowserElementParent.prototype._ownerIgnoreClearCacheOff.
Based on cursory examination, I think all of the visibilityChangeHandler and ignoreClear* are being held alive by this nsGlobalWindow #3 from this path: [nsXPCWrappedJS (nsIDOMEventListener)] <--[mListeners[i] mListener]-- [nsEventListenerManager] <--[mListenerManager]-- [nsGlobalWindow #3] and I'm guessing that all of the apploaded event listeners are being kept alive by this path: [nsXPCWrappedJS (nsIDOMEventListener)] <--[mListeners[i] mListener]-- [nsEventListenerManager] <--[[via hash] mListenerManager]-- [FragmentOrElement (xhtml) iframe class='' app://system.gaiamobile.org/index.html] eg via the iframe that Olli mentioned. But I think that's expected, so that's not very useful.
> There's also a function BrowserElementParent.prototype._ownerIgnoreClearCacheOff. Do we think this is a vendor customization?
Blocks: 899328
Leo, have you modified BrowserElementParent to add _ownerIgnoreClearCacheOff? We can't find "_ownerIgnoreClearCacheOff" in any of our code.
Flags: needinfo?(leo.bugzilla.gecko)
blocking-b2g: --- → leo?
Component: Gaia → General
Summary: [Leo] b2g takes high cpu resource (>90%) and system too slow. → [Leo] b2g takes high cpu resource (>90%) and system too slow, due to long CC pauses
Leo, can you also please comment on whether a fix is desired for B2G 1.1? Moz triage: This bug will cause our battery life to gradually worsen and will cause the phone to gradually become jankier as the user uses the phone for a while (a few weeks?) without rebooting.
I'm going to call this [MemShrink] even though the primary effect here is on snappiness and battery life because we believe the root cause is a leak (actually, a few separate leaks).
Whiteboard: [MemShrink]
> And ~2400 JS Object (Function - visibilityChangeHandler) > http://mxr.mozilla.org/mozilla-central/source/dom/browser-element/BrowserElementParent.jsm#186 ? If the relevant BEPs have been collected, these should go away when you turn the screen on/off. I'm not sure how to do better here, but ideas are welcome.
Depends on: 899354
No longer blocks: 899328
Depends on: 899328
sorry for confusing of ignoreClearCache* ignoreClearCacheOn/ignoreClearCacheOff is custimization by leo. it is applied to fix screen is bliking as changing screen on/off. and also, it was changed following as Bug 894147 which it use weakBEP instead of directly bind() on this. i am attaching Diff_BrowserElementParent.jsm.patch between LEO and gecko-18
Flags: needinfo?(leo.bugzilla.gecko)
Does Leo have plans to make this entire repository available to Mozilla and the world, as required by MPL?
At the very least, can we please see the changes to BrowserElementChild?
> Leo, can you also please comment on whether a fix is desired for B2G 1.1? > > Moz triage: This bug will cause our battery life to gradually worsen and > will cause the phone to gradually become jankier as the user uses the phone > for a while (a few weeks?) without rebooting. leo will be tested to check stability for 5days from next monday. although leo has no crash by caused leaked memory. if it has other problems like this, the stability test will be also failed. IMO, it should be fixed for B2G 1.1
(In reply to Justin Lebar [:jlebar] from comment #32) > At the very least, can we please see the changes to BrowserElementChild? there are no diff between BrowserElementChild.js and gecko-18's one
(In reply to Justin Lebar [:jlebar] from comment #26) > Leo, have you modified BrowserElementParent to add > _ownerIgnoreClearCacheOff? We can't find "_ownerIgnoreClearCacheOff" in any > of our code. Hi Justin, I added the IgnoreClearCacheOn/Off listeners in BEP for bug 871919/871413/862913/825174 at few month ago. > And ~2400 JS Object (Function - visibilityChangeHandler) > And about 2400 JS Object (Function - ignoreClearCacheOn) And it has exactly same path with visibilityChangeHandler. So, if adding visibilityChangeHandler has the problem (piled up listeners in memory), it also have the same problem obviously..
> if it has other problems like this, the stability test will be also failed. I think the visibilitychanged handler is only a problem if you launch thousands of apps without ever turning the screen on or off, so I think this issue would likely not be a problem in the field. The other listeners that Leo added may or may not be safe in the same way; I'd have to look more closely to say for sure (and I'd need the full tree, not just the changes to this one file). But I'm not clear on Mozilla's policy on reviewing code in repositories like this, so I'm going to hold off until I hear more from management.
leo+ per comment 33.
blocking-b2g: leo? → leo+
Depends on: 899801
Leo should change the patch here to match the idiom introduced in bug 899354. The fix in bug 899354 won't have a big impact on users -- it fixes a "leak" which only occurs if the user launches hundreds of apps without turning off the screen, which is unlikely to happen in the real world. But the fix may be more important for the listeners added in the vendor customization here. I can't tell for sure how important this fix is, because I don't know when these events are fired; I would need to see the complete source code. Note that matching the idiom from bug 899354 will not fix the fact that the patch here adds a new web-facing API on <iframe mozbrowser>. I really hope that we will have an opportunity to revisit this before the Leo device ships.
Keywords: perf
Whiteboard: [MemShrink] → [MemShrink] [c=memory u=1.1]
(In reply to Justin Lebar [:jlebar] from comment #38) > Leo should change the patch here to match the idiom introduced in bug 899354. we make a decision to remove leo cusmization on BEP. and it already applied on our version(v08o) Never mind those thing. thanks :jlebar for your concern.
This may have been fixed. Can you test again? Thanks.
Flags: needinfo?(jeremy.kim.leo)
Whiteboard: [MemShrink] [c=memory u=1.1] → [MemShrink:P1] [c=memory u=1.1]
sorry for delay because of my vacation on last week. i was tested the latest leo with those patch (bug 899328, bug 899354, bug 899801) (for refrerence, those patches not yet applied on leo's main stream.) there are some strange problems. leo was freezed after 12 hours. the freeze means never response for user input(touch, home-key, power-key) STR> infinitly, run all apps one by one. because of limitation of memory, if it closes limitation, LMK will be activated and some process will be killed to run new app. when i was checked the information of processes. $adb shell> ps app_961 961 141 0 0 c00bc32c 00000000 Z Facebook app_839 839 141 0 0 c00bc32c 00000000 Z Communications app_4403 4403 141 0 0 c00bc32c 00000000 Z Wikipedia root 4465 141 0 0 c00bc32c 00000000 Z Gecko_IOThread i guess some child process of b2g didn't rightly killed by kernel or somthing. i can't make sure this problems caused by those patched so, i will try one more time to compare those patches target 1 - applied patches which is bug 899328, bug 899354, bug 899801 target 2 - not applied anything.
this is "target 2 - not applied anything." and report after 15 hours test. it shows on gc log of b2g process. 3838 Function apploaded 971 Function visibilityChangeHandler
this is "target 1 - applied patches which is bug 899328, bug 899354, bug 899801" after 22 hours test. it looks very good compare with before(Attachment 793215 [details]). in result, it was verfied me.
Flags: needinfo?(jeremy.kim.leo)
Let's call this fixed, then! Thanks, Jeremy.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
FYI, here are the top objects in the CC logs, before and after the fix. == CC log == Items marked with * either disappeared after the fix, or otherwise decreased significantly. Before: > $ grep '^[^>]' cc-edges.log | cut -f '2-10' -d ' ' | sort | uniq -c | sort -rg | head -n20 > 5677 [gc.marked] JS Script > * 5186 [rc=2] nsXPCWrappedJS (nsIDOMEventListener) > 5114 [gc.marked] JS Object (Object) > 5041 [gc.marked] JS Object (Function) > * 4073 [gc] JS Object (Call) > * 3836 [gc] JS Object (Function - apploaded) > 3327 [gc] JS Object (Object) > * 1282 [rc=1] XPCWrappedNative > 1241 [gc.marked] JS Object (Block) > * 1053 [gc] JS Object (XPCWrappedNative_NoHelper) > * 967 [gc] JS Object (Function - visibilityChangeHandler) > * 967 [gc] JS Object (Function - get) > 882 [rc=1] nsGenericDOMDataNode > 767 [gc.marked] JS Object (Array) > 617 [gc] JS Object (Array) > 573 [gc.marked] JS Object (Call) > 556 [known=1] > 512 [gc] JS Object (Function) > 283 [gc] JS Script > 251 [rc=1] XPCWrappedNative (nsJSIID) After: > $ grep '^[^>]' cc-edges.log | cut -f '2-10' -d ' ' | sort | uniq -c | sort -rg | head -n20 > 5490 [gc.marked] JS Script > 5155 [gc.marked] JS Object (Object) > 4997 [gc.marked] JS Object (Function) > 4432 [gc] JS Object (Object) > 1205 [gc.marked] JS Object (Block) > 980 [rc=1] nsGenericDOMDataNode > 798 [gc.marked] JS Object (Array) > 742 [gc] JS Object (Array) > 691 [known=1] > 637 [gc] JS Object (Function) > 537 [gc.marked] JS Object (Call) > 487 [gc] JS Script > * 418 [rc=2] nsXPCWrappedJS (nsIDOMEventListener) > 345 [known=2] > * 324 [rc=1] XPCWrappedNative > * 304 [gc] JS Object (Call) > 297 [rc=2] nsGenericDOMDataNode > 291 [gc] JS Object (Proxy) > 258 [rc=1] XPCWrappedNative (nsJSIID) > 258 [gc.marked] JS Object (nsJSIID)
Also, anything with [gc.marked] in it shouldn't contribute to CC times.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: