Last Comment Bug 898990 - [Leo] b2g takes high cpu resource (>90%) and system too slow, due to long CC pauses
: [Leo] b2g takes high cpu resource (>90%) and system too slow, due to long CC ...
Status: RESOLVED FIXED
[MemShrink:P1] [c=memory u=1.1]
: meta, perf
Product: Firefox OS
Classification: Client Software
Component: General (show other bugs)
: unspecified
: ARM Gonk (Firefox OS)
: P1 critical (vote)
: 1.1 QE6
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
Depends on: 899328 899354 899801
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-29 00:49 PDT by leo.bugzilla.gecko
Modified: 2013-08-21 13:21 PDT (History)
15 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
leo+
fixed


Attachments
after testing for 4 days, take a memory-report / dmd / gc / cc (8.60 MB, application/x-compressed-tar)
2013-07-29 00:52 PDT, jeremy.kim.leo [:jeremykimleo]
no flags Details
log of top 10 process (23.86 KB, text/plain)
2013-07-29 00:55 PDT, jeremy.kim.leo [:jeremykimleo]
no flags Details
instead Comment #1, do use this memory-report/dmd/gc/cc log. (8.59 MB, application/x-compressed-tar)
2013-07-29 02:23 PDT, jeremy.kim.leo [:jeremykimleo]
no flags Details
b2g-ps.txt (474 bytes, text/plain)
2013-07-29 02:26 PDT, jeremy.kim.leo [:jeremykimleo]
no flags Details
Diff_BrowserElementParent.jsm.patch (4.93 KB, patch)
2013-07-29 17:17 PDT, jeremy.kim.leo [:jeremykimleo]
no flags Details | Diff | Splinter Review
memory-report-898990.tgz (5.03 MB, application/x-compressed-tar)
2013-08-19 17:50 PDT, jeremy.kim.leo [:jeremykimleo]
no flags Details
memory-report-before-898990-15hours.tgz (4.82 MB, application/x-compressed-tar)
2013-08-20 17:25 PDT, jeremy.kim.leo [:jeremykimleo]
no flags Details
memory-report-after-898990-22hours.tgz (5.86 MB, application/x-compressed-tar)
2013-08-20 17:49 PDT, jeremy.kim.leo [:jeremykimleo]
no flags Details

Description leo.bugzilla.gecko 2013-07-29 00:49:33 PDT
STR>
1) run app from first icon of homescreen.
2) load wait 10 seconds.
3) go homescreen by home key
4) do test 1)~3) by end icon of homescreen 
(leo has a 23 app on homescreen including operator apps)
5) go to step 1) 

and never kill app by user. if it has no memory, LMK will kill any apps to run new one.
the purpose of this test is checking stability of target.

after doing test for 3~4 days. target is very slow.
delayed launch application and delayed response as my action.
Comment 1 jeremy.kim.leo [:jeremykimleo] 2013-07-29 00:52:58 PDT
Created attachment 782463 [details]
after testing for 4 days, take a memory-report / dmd / gc / cc

basically, this version included the patch of Bug 893012.
i am testing to investigation the patch of Bug 893012.
Comment 2 jeremy.kim.leo [:jeremykimleo] 2013-07-29 00:55:13 PDT
Created attachment 782464 [details]
log of top 10 process

$> adb shell top -m 10 > log_top.txt

periodically, b2g takes high cpu resource almost 92%~95%.
Comment 3 jeremy.kim.leo [:jeremykimleo] 2013-07-29 02:23:06 PDT
Created attachment 782498 [details]
instead Comment #1, do use this memory-report/dmd/gc/cc log.

instead Attachment 782463 [details], use this.
Comment 4 jeremy.kim.leo [:jeremykimleo] 2013-07-29 02:26:54 PDT
Created attachment 782499 [details]
b2g-ps.txt

when i was long-pressed home key, there are no running app. 
but, b2g-ps displays 

APPLICATION      USER     PID   PPID  VSIZE  RSS     WCHAN    PC         NAME
Browser          app_8025  8025  140   71960  21572 ffffffff 40091430 S /system/b2g/plugin-container
Comment 5 jeremy.kim.leo [:jeremykimleo] 2013-07-29 02:38:18 PDT
after do this.
$> kill -9 8025

no chage anything. remains problem.
Comment 6 Olli Pettay [:smaug] 2013-07-29 04:46:09 PDT
What is "periodically"? Also, do we have profiler working for b2g?  Would be good to see where
the cpu time is spent.
Comment 7 Olli Pettay [:smaug] 2013-07-29 06:42:42 PDT
Oh, this is about https://bugzilla.mozilla.org/show_bug.cgi?id=893012#c65
Comment 8 Olli Pettay [:smaug] 2013-07-29 06:59:26 PDT
The root process seems to have nearly 50000 event listeners. Clearly not right.
Something is adding event listener all the time but not remove after use?
Or one should probably add just one event listener, but because of some bug in gaia ends up
adding listener all the time.
Comment 9 Olli Pettay [:smaug] 2013-07-29 07:00:59 PDT
Looks like the listeners are added to
0x4c25c1c0 [rc=7] FragmentOrElement (xhtml) iframe class='' app://system.gaiamobile.org/index.html

So, some iframe element in app://system.gaiamobile.org/index.html document.
Comment 10 Olli Pettay [:smaug] 2013-07-29 07:06:32 PDT
There is also some JS array with over 20000 items in it, and each item is an Object ...
Looks like property called momentums keeps that array alive.
Comment 11 Andrew McCreight [:mccr8] 2013-07-29 07:09:18 PDT
Looks like "momentums" is something from BrowserElementPanning:
  http://mxr.mozilla.org/mozilla-central/ident?i=momentums&filter=
Comment 12 Olli Pettay [:smaug] 2013-07-29 07:12:40 PDT
Hmm, the momemtum thing might not be a gaia bug, but seems to come from http://mxr.mozilla.org/mozilla-central/source/dom/browser-element/BrowserElementPanning.js#701
Comment 13 Olli Pettay [:smaug] 2013-07-29 07:38:22 PDT
Er, wait, wrong number for event listeners. Not 50000, but "only" about 15000
Comment 14 Andrew McCreight [:mccr8] 2013-07-29 08:46:12 PDT
Comment on attachment 782463 [details]
after testing for 4 days, take a memory-report / dmd / gc / cc

I'm marking this attachment obsolete, per comment 3.
Comment 15 Justin Lebar (not reading bugmail) 2013-07-29 08:56:16 PDT
Well, the event listeners aren't a mystery; BrowserElementPanning calls addEventListener but never calls removeEventListener.  :-/
Comment 16 Vivien Nicolas (:vingtetun) (:21) - (NOT reading bugmails, needinfo? please) 2013-07-29 09:18:02 PDT
(In reply to leo.bugzilla.gecko from comment #0)
> STR>
> 1) run app from first icon of homescreen.
> 2) load wait 10 seconds.
> 3) go homescreen by home key
> 4) do test 1)~3) by end icon of homescreen 
> (leo has a 23 app on homescreen including operator apps)
> 5) go to step 1) 
> 
> and never kill app by user. if it has no memory, LMK will kill any apps to
> run new one.
> the purpose of this test is checking stability of target.
> 
> after doing test for 3~4 days. target is very slow.
> delayed launch application and delayed response as my action.

Can you tell which Gecko version you are using? Is it mozillab2g-18?

In this case smaug mentionned that it could be useful to port the small patch of bug 899089 to this branch in order to have more useful debug informations.
Comment 17 Justin Lebar (not reading bugmail) 2013-07-29 09:20:14 PDT
> Can you tell which Gecko version you are using? Is it mozillab2g-18?

Almost certainly; it's Leo.

> In this case smaug mentionned that it could be useful to port the small patch of bug 
> 899089 to this branch in order to have more useful debug informations.

Keep in mind that it takes them many days to run this test.

> Well, the event listeners aren't a mystery; BrowserElementPanning calls addEventListener 
> but never calls removeEventListener.  :-/

I guess this isn't necessarily the problem, although it's certainly /a/ problem.
Comment 18 Justin Lebar (not reading bugmail) 2013-07-29 09:24:50 PDT
Oh, I'm wrong; BrowserElementPanning does (try to) remove its event listener.
Comment 19 Olli Pettay [:smaug] 2013-07-29 10:08:52 PDT
over 7000 JS Object (Function - apploaded)
Comment 21 Olli Pettay [:smaug] 2013-07-29 10:15:44 PDT
And about 2400 JS Object (Function - ignoreClearCacheOn) but I don't see that in mxr (gaia or mozilla-central)
Comment 22 Andrew McCreight [:mccr8] 2013-07-29 10:43:30 PDT
I wrote a little script to analyze the GC log for the names of things rooted by things with names starting with "nsXPCWrappedJS[nsIDOMEventListener" and the large ones are these:
 7133 Function apploaded
 2391 Function visibilityChangeHandler
 2391 Function ignoreClearCacheOn
 2391 Function ignoreClearCacheOff
Which is basically what you came up with, with the addition of ignoreClearCacheOff. :)
Comment 23 Andrew McCreight [:mccr8] 2013-07-29 11:09:23 PDT
ignoreClearCacheOn and ignoreClearCacheOff are in the bindingArray for the script at resource://gre/modules/BrowserElementParent.jsm:1, along with visibilityChangeHandler, which seems to mean that they are local variables or arguments for the script. Unfortunately the line number appears bogus, but that does suggest that it is something in BEP related to visibilityChangeHandler, which makes sense given that they appear as event listeners the exact same number of times.  There's also a function BrowserElementParent.prototype._ownerIgnoreClearCacheOff.
Comment 24 Andrew McCreight [:mccr8] 2013-07-29 11:49:40 PDT
Based on cursory examination, I think all of the visibilityChangeHandler and ignoreClear* are being held alive by this nsGlobalWindow #3 from this path:
    [nsXPCWrappedJS (nsIDOMEventListener)] <--[mListeners[i] mListener]-- [nsEventListenerManager] <--[mListenerManager]-- [nsGlobalWindow #3]

and I'm guessing that all of the apploaded event listeners are being kept alive by this path:
  [nsXPCWrappedJS (nsIDOMEventListener)] <--[mListeners[i] mListener]-- [nsEventListenerManager] <--[[via hash] mListenerManager]-- [FragmentOrElement (xhtml) iframe class='' app://system.gaiamobile.org/index.html]

eg via the iframe that Olli mentioned.

But I think that's expected, so that's not very useful.
Comment 25 Justin Lebar (not reading bugmail) 2013-07-29 12:03:36 PDT
> There's also a function BrowserElementParent.prototype._ownerIgnoreClearCacheOff.

Do we think this is a vendor customization?
Comment 26 Justin Lebar (not reading bugmail) 2013-07-29 15:14:23 PDT
Leo, have you modified BrowserElementParent to add _ownerIgnoreClearCacheOff?  We can't find "_ownerIgnoreClearCacheOff" in any of our code.
Comment 27 Justin Lebar (not reading bugmail) 2013-07-29 15:28:26 PDT
Leo, can you also please comment on whether a fix is desired for B2G 1.1?

Moz triage: This bug will cause our battery life to gradually worsen and will cause the phone to gradually become jankier as the user uses the phone for a while (a few weeks?) without rebooting.
Comment 28 Justin Lebar (not reading bugmail) 2013-07-29 15:31:11 PDT
I'm going to call this [MemShrink] even though the primary effect here is on snappiness and battery life because we believe the root cause is a leak (actually, a few separate leaks).
Comment 29 Justin Lebar (not reading bugmail) 2013-07-29 15:45:16 PDT
> And ~2400 JS Object (Function - visibilityChangeHandler)
> http://mxr.mozilla.org/mozilla-central/source/dom/browser-element/BrowserElementParent.jsm#186 ?

If the relevant BEPs have been collected, these should go away when you turn the screen on/off.

I'm not sure how to do better here, but ideas are welcome.
Comment 30 jeremy.kim.leo [:jeremykimleo] 2013-07-29 17:17:54 PDT
Created attachment 782872 [details] [diff] [review]
Diff_BrowserElementParent.jsm.patch

sorry for confusing of ignoreClearCache*
ignoreClearCacheOn/ignoreClearCacheOff is custimization by leo.
it is applied to fix screen is bliking as changing screen on/off.

and also, it was changed following as Bug 894147 which it use weakBEP instead of directly bind() on this.

i am attaching Diff_BrowserElementParent.jsm.patch between LEO and gecko-18
Comment 31 Justin Lebar (not reading bugmail) 2013-07-29 17:22:08 PDT
Does Leo have plans to make this entire repository available to Mozilla and the world, as required by MPL?
Comment 32 Justin Lebar (not reading bugmail) 2013-07-29 17:23:35 PDT
At the very least, can we please see the changes to BrowserElementChild?
Comment 33 jeremy.kim.leo [:jeremykimleo] 2013-07-29 17:25:34 PDT
> Leo, can you also please comment on whether a fix is desired for B2G 1.1?
> 
> Moz triage: This bug will cause our battery life to gradually worsen and
> will cause the phone to gradually become jankier as the user uses the phone
> for a while (a few weeks?) without rebooting.

leo will be tested to check stability for 5days from next monday.
although leo has no crash by caused leaked memory. if it has other problems like this, the stability test will be also failed.

IMO, it should be fixed for B2G 1.1
Comment 34 jeremy.kim.leo [:jeremykimleo] 2013-07-29 17:27:22 PDT
(In reply to Justin Lebar [:jlebar] from comment #32)
> At the very least, can we please see the changes to BrowserElementChild?

there are no diff between BrowserElementChild.js and gecko-18's one
Comment 35 Jinho Hwang [:jeffhwang] 2013-07-29 17:35:38 PDT
(In reply to Justin Lebar [:jlebar] from comment #26)
> Leo, have you modified BrowserElementParent to add
> _ownerIgnoreClearCacheOff?  We can't find "_ownerIgnoreClearCacheOff" in any
> of our code.

Hi Justin, 
I added the IgnoreClearCacheOn/Off listeners in BEP for bug 871919/871413/862913/825174 at few month ago. 

> And ~2400 JS Object (Function - visibilityChangeHandler)
> And about 2400 JS Object (Function - ignoreClearCacheOn)

And it has exactly same path with visibilityChangeHandler. So, if adding visibilityChangeHandler has the problem (piled up listeners in memory), it also have the same problem obviously..
Comment 36 Justin Lebar (not reading bugmail) 2013-07-29 18:15:30 PDT
> if it has other problems like this, the stability test will be also failed.

I think the visibilitychanged handler is only a problem if you launch thousands of apps without ever turning the screen on or off, so I think this issue would likely not be a problem in the field.

The other listeners that Leo added may or may not be safe in the same way; I'd have to look more closely to say for sure (and I'd need the full tree, not just the changes to this one file).  But I'm not clear on Mozilla's policy on reviewing code in repositories like this, so I'm going to hold off until I hear more from management.
Comment 37 Justin Lebar (not reading bugmail) 2013-07-30 10:27:50 PDT
leo+ per comment 33.
Comment 38 Justin Lebar (not reading bugmail) 2013-07-31 13:23:36 PDT
Leo should change the patch here to match the idiom introduced in bug 899354.

The fix in bug 899354 won't have a big impact on users -- it fixes a "leak" which only occurs if the user launches hundreds of apps without turning off the screen, which is unlikely to happen in the real world.  But the fix may be more important for the listeners added in the vendor customization here.  I can't tell for sure how important this fix is, because I don't know when these events are fired; I would need to see the complete source code.

Note that matching the idiom from bug 899354 will not fix the fact that the patch here adds a new web-facing API on <iframe mozbrowser>.  I really hope that we will have an opportunity to revisit this before the Leo device ships.
Comment 39 jeremy.kim.leo [:jeremykimleo] 2013-08-04 21:45:28 PDT
(In reply to Justin Lebar [:jlebar] from comment #38)
> Leo should change the patch here to match the idiom introduced in bug 899354.

we make a decision to remove leo cusmization on BEP. and it already applied on our version(v08o)
Never mind those thing. 
thanks :jlebar for your concern.
Comment 40 Nicholas Nethercote [:njn] 2013-08-06 16:49:09 PDT
This may have been fixed.  Can you test again?  Thanks.
Comment 41 jeremy.kim.leo [:jeremykimleo] 2013-08-19 17:50:00 PDT
Created attachment 792536 [details]
memory-report-898990.tgz

sorry for delay because of my vacation on last week.

i was tested the latest leo with those patch (bug 899328, bug 899354, bug 899801)
(for refrerence, those patches not yet applied on leo's main stream.)

there are some strange problems. leo was freezed after 12 hours. 
the freeze means never response for user input(touch, home-key, power-key)

STR> 
infinitly, run all apps one by one.
because of limitation of memory, if it closes limitation, 
LMK will be activated and some process will be killed to run new app.

when i was checked the information of processes.
$adb shell> ps
app_961   961   141   0      0     c00bc32c 00000000 Z Facebook
app_839   839   141   0      0     c00bc32c 00000000 Z Communications
app_4403  4403  141   0      0     c00bc32c 00000000 Z Wikipedia
root      4465  141   0      0     c00bc32c 00000000 Z Gecko_IOThread
i guess some child process of b2g didn't rightly killed by kernel or somthing.

i can't make sure this problems caused by those patched 
so, i will try one more time to compare those patches 
target 1 - applied patches which is bug 899328, bug 899354, bug 899801                                            
target 2 - not applied anything.
Comment 42 jeremy.kim.leo [:jeremykimleo] 2013-08-20 17:25:09 PDT
Created attachment 793215 [details]
memory-report-before-898990-15hours.tgz

this is "target 2 - not applied anything." and report after 15 hours test.

it shows on gc log of b2g process.
3838 Function apploaded
971  Function visibilityChangeHandler
Comment 43 jeremy.kim.leo [:jeremykimleo] 2013-08-20 17:49:42 PDT
Created attachment 793230 [details]
memory-report-after-898990-22hours.tgz

this is "target 1 - applied patches which is bug 899328, bug 899354, bug 899801" after 22 hours test.

it looks very good compare with before(Attachment 793215 [details]).
in result, it was verfied me.
Comment 44 Justin Lebar (not reading bugmail) 2013-08-20 17:52:16 PDT
Let's call this fixed, then!  Thanks, Jeremy.
Comment 45 Justin Lebar (not reading bugmail) 2013-08-21 12:32:07 PDT
FYI, here are the top objects in the CC logs, before and after the fix.

== CC log ==

Items marked with * either disappeared after the fix, or otherwise decreased
significantly.

Before:

> $ grep '^[^>]' cc-edges.log  | cut -f '2-10' -d ' ' | sort | uniq -c | sort -rg | head -n20
>    5677 [gc.marked] JS Script
> *  5186 [rc=2] nsXPCWrappedJS (nsIDOMEventListener)
>    5114 [gc.marked] JS Object (Object)
>    5041 [gc.marked] JS Object (Function)
> *  4073 [gc] JS Object (Call)
> *  3836 [gc] JS Object (Function - apploaded)
>    3327 [gc] JS Object (Object)
> *  1282 [rc=1] XPCWrappedNative
>    1241 [gc.marked] JS Object (Block)
> *  1053 [gc] JS Object (XPCWrappedNative_NoHelper)
> *   967 [gc] JS Object (Function - visibilityChangeHandler)
> *   967 [gc] JS Object (Function - get)
>     882 [rc=1] nsGenericDOMDataNode
>     767 [gc.marked] JS Object (Array)
>     617 [gc] JS Object (Array)
>     573 [gc.marked] JS Object (Call)
>     556 [known=1]
>     512 [gc] JS Object (Function)
>     283 [gc] JS Script
>     251 [rc=1] XPCWrappedNative (nsJSIID)

After:

> $ grep '^[^>]' cc-edges.log | cut -f '2-10' -d ' ' | sort | uniq -c | sort -rg | head -n20
>    5490 [gc.marked] JS Script
>    5155 [gc.marked] JS Object (Object)
>    4997 [gc.marked] JS Object (Function)
>    4432 [gc] JS Object (Object)
>    1205 [gc.marked] JS Object (Block)
>     980 [rc=1] nsGenericDOMDataNode
>     798 [gc.marked] JS Object (Array)
>     742 [gc] JS Object (Array)
>     691 [known=1]
>     637 [gc] JS Object (Function)
>     537 [gc.marked] JS Object (Call)
>     487 [gc] JS Script
> *   418 [rc=2] nsXPCWrappedJS (nsIDOMEventListener)
>     345 [known=2]
> *   324 [rc=1] XPCWrappedNative
> *   304 [gc] JS Object (Call)
>     297 [rc=2] nsGenericDOMDataNode
>     291 [gc] JS Object (Proxy)
>     258 [rc=1] XPCWrappedNative (nsJSIID)
>     258 [gc.marked] JS Object (nsJSIID)
Comment 46 Andrew McCreight [:mccr8] 2013-08-21 13:21:09 PDT
Also, anything with [gc.marked] in it shouldn't contribute to CC times.

Note You need to log in before you can comment on or make changes to this bug.