Closed Bug 924337 (less-js-at-startup) Opened 7 years ago Closed 2 years ago

[b2g][meta] Remove all much JS as possible from applications startup path.

Categories

(Firefox OS Graveyard :: Performance, defect, P2)

defect

Tracking

(tracking-b2g:backlog)

RESOLVED WONTFIX
tracking-b2g backlog

People

(Reporter: vingtetun, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: perf, Whiteboard: [c=progress p= s= u=][perf-wanted])

Attachments

(3 files)

There is a few places where some of the code in the startup path is written in JS. It seems that this code hurts a lot startup time for different reasons.

Replacing this JS by cpp where it is possible should help here.
Some of the possible places that can be rewritten in CPP are:
 - /content/base/src/contentSecurityPolicy.js
  It hurts a lot. I may give measurements in to a dependent bug but by just switching the pref("security.csp.enable", false); in $GAIA/profile/user.js and pushing it to the device with |adb push profile/user.js /data/local/user.js| before restarting proves it.
 I guess one of the main reason is that is not very expensive to call shouldLoad, it offens cost between 0,1 or 2 ms but it is called soo many times during app startup...

 - /netwerk/protocol/app/AppProtocolHandler.js
  Sounds the same root issue than contentSecurityPolicy.js. It it not very expensive to call newURI or newChannel but it is called (often between 0-1 ms) but it is called so many times...

 - /dom/browser-element/BrowserElementParent.js
  The first thing which is done before loading an application is to create a <iframe mozbrowser mozapp> and this cost between 20-40ms just to build it.


 - $GAIA/apps/system/js/window_manager.js
  - Here are the Gaia part that manages windows opening/closing. It seems like the JS running here cost around 30-40ms(ignoring the creating of the <iframe mozbrowser>. Obviously this one can not be rewritten in cpp but it worth trying to optimize (again) this code path.
For fun I just rewrite a little AppProtocolHandler in cpp. It saves around 50ms on the dialer app...
(In reply to Vivien Nicolas (:vingtetun) (:21) from comment #2)
> For fun I just rewrite a little AppProtocolHandler in cpp. It saves around
> 50ms on the dialer app...

fwiw with this and csp disable (someone needs to rewrite it in cpp, I think Christoph Kerschbaumer has volunteered to do that) it shaves more than 120ms of startup time for the dialer.
There is a CSP cache to reduce the impact of CSP on startup time.  Maybe somebody could investigate that to see if it could be improved.  For instance, bug 897745 might make the cache faster.
(In reply to Andrew McCreight [:mccr8] from comment #4)
> There is a CSP cache to reduce the impact of CSP on startup time.  Maybe
> somebody could investigate that to see if it could be improved.  For
> instance, bug 897745 might make the cache faster.

I think the issue is not about the cache but about the xpconnect bridges (2) and context switch. Also even if the function is super fast (1ms) it is called so many times (once for each js, css, images) that it hurts at the end. The AppProtocolHandler as the same type of behavior and by rewriting it in cpp it is obvious that it saves a lot.
Keywords: perf
Another note on CSP which might be relevant to Haida: while investigating the performance results of bug 889899 a couple of months back we noticed that when a large number of iframes were present in a page the impact of CSP was growing in a disproportionate way. It turned out that there was a loop somewhere in the code that for every check iterated over all the iframes yielding an n^2 complexity in our case. I don't remember exactly where the problem appeared but I can dig out the profile. If we're rewriting CSP code it might be a good idea to avoid these kind of pitfalls (unless it's been fixed already in the meantime).
This is awesome. ++Vivien
This is the wip of the app protocol handler in cpp. It works only for core apps right now and does not ask anything to the appsservice, so the path should always be /data/local/webapps if you want to try it (it means it does not work with a PRODUCTION=1) build.
(In reply to Vivien Nicolas (:vingtetun) (:21) from comment #8)
> Created attachment 814394 [details] [diff] [review]
> wip appprotocolhandler.cpp.patch
> 
> This is the wip of the app protocol handler in cpp. It works only for core
> apps right now and does not ask anything to the appsservice, so the path
> should always be /data/local/webapps if you want to try it (it means it does
> not work with a PRODUCTION=1) build.

How does that even work? Content processes should not be able to read directly from /data/local/webapps but you're not using the remoted urls (jar:remoteopenfile://...). Either we have a bad security issue, or you wip is not current...

Overall I'm not surprised by the results. Back in the 1.0 days we optimized that a lot with cjones but since gaia apps have many more files to lazy load all the things and that exercises these paths much more.
(In reply to Vivien Nicolas (:vingtetun) (:21) from comment #1)
>  - $GAIA/apps/system/js/window_manager.js
>   - Here are the Gaia part that manages windows opening/closing. It seems
> like the JS running here cost around 30-40ms(ignoring the creating of the
> <iframe mozbrowser>. Obviously this one can not be rewritten in cpp but it
> worth trying to optimize (again) this code path.

Looks like bug 902766 but I cannot make sure 30-40ms is reduce-able.
(In reply to Vivien Nicolas (:vingtetun) (:21) from comment #1)
> Some of the possible places that can be rewritten in CPP are:

>  - /dom/browser-element/BrowserElementParent.js
>   The first thing which is done before loading an application is to create a
> <iframe mozbrowser mozapp> and this cost between 20-40ms just to build it.

Yep, we should really rewrite the browser element API. Did you time the BrowserElementParent() constructor in BrowserElementParent.jsm ?
Any chance these changes will affect bug 920713 (and maybe 915083)?
(In reply to Mike Habicher [:mikeh] from comment #12)
> Any chance these changes will affect bug 920713 (and maybe 915083)?

Yes, but we stil should identify the regressions for these two bugs.
(In reply to Fabrice Desré [:fabrice] from comment #9)
> (In reply to Vivien Nicolas (:vingtetun) (:21) from comment #8)
> > Created attachment 814394 [details] [diff] [review]
> > wip appprotocolhandler.cpp.patch
> > 
> > This is the wip of the app protocol handler in cpp. It works only for core
> > apps right now and does not ask anything to the appsservice, so the path
> > should always be /data/local/webapps if you want to try it (it means it does
> > not work with a PRODUCTION=1) build.
> 
> How does that even work? Content processes should not be able to read
> directly from /data/local/webapps but you're not using the remoted urls
> (jar:remoteopenfile://...). Either we have a bad security issue, or you wip
> is not current...

Sounds like there is a fast path for certified app in the JS version.

> 
> Overall I'm not surprised by the results. Back in the 1.0 days we optimized
> that a lot with cjones but since gaia apps have many more files to lazy load
> all the things and that exercises these paths much more.

Yeah, apps were very different at that time and since for any new files the regression is 1 or 2ms in the worst case it is very hard to noticed it.
(In reply to Mike Habicher [:mikeh] from comment #12)
> Any chance these changes will affect bug 920713 (and maybe 915083)?

Yes and no. Converting BrowserElementParent.js to cpp and fixing a little bit the gaia system app should help but the numbers announced in https://bugzilla.mozilla.org/show_bug.cgi?id=920713#c0 are crazy and the win you can expect from the 2 changes I mentionned will will likely be around 50ms, not that much.

Also moving the csp and the app protocol handler to cpp is helpful but not for your use case (you may win around 10-15ms though) but is really targetting apps that loads a lot of files (css, html, js, images). For example the dialer likely load 50 files (numbers on top of my hat) but your use case will load only 2 files (the default index.html and bg.png).
(In reply to Vivien Nicolas (:vingtetun) (:21) from comment #14)
> (In reply to Fabrice Desré [:fabrice] from comment #9)
> > (In reply to Vivien Nicolas (:vingtetun) (:21) from comment #8)
> > > Created attachment 814394 [details] [diff] [review]
> > > wip appprotocolhandler.cpp.patch
> > > 
> > > This is the wip of the app protocol handler in cpp. It works only for core
> > > apps right now and does not ask anything to the appsservice, so the path
> > > should always be /data/local/webapps if you want to try it (it means it does
> > > not work with a PRODUCTION=1) build.
> > 
> > How does that even work? Content processes should not be able to read
> > directly from /data/local/webapps but you're not using the remoted urls
> > (jar:remoteopenfile://...). Either we have a bad security issue, or you wip
> > is not current...
> 
> Sounds like there is a fast path for certified app in the JS version.

Only in production builds which you're not using.
(In reply to Alive Kuo [:alive] from comment #10)
> (In reply to Vivien Nicolas (:vingtetun) (:21) from comment #1)
> >  - $GAIA/apps/system/js/window_manager.js
> >   - Here are the Gaia part that manages windows opening/closing. It seems
> > like the JS running here cost around 30-40ms(ignoring the creating of the
> > <iframe mozbrowser>. Obviously this one can not be rewritten in cpp but it
> > worth trying to optimize (again) this code path.
> 
> Looks like bug 902766 but I cannot make sure 30-40ms is reduce-able.

We should look at some ways of making that smaller, even if we shave 10 or 15ms this is already a win. If we combine that with some other optimizations (we know that the time between the click on the icon and the webapps-launch message to the system app is around 10-15ms as well for example) it quickly grows to 50ms and this is not negligeable.

The time spent in the system app could be because we are reading some expensive informations, or doing other small stuffs that are expensive. It worth looking at it.
(In reply to Fabrice Desré [:fabrice] from comment #16)
> Only in production builds which you're not using.

Nice, sounds like there is a separate bug here :)
Summary: [b2g][meta] Remove all much JS from possible to the application startup path. → [b2g][meta] Remove all much JS as possible from applications startup path.
btw if someone wants to steal and finish the conversion of the app protocol handler from js to cpp, it would be awesome.
(In reply to Vivien Nicolas (:vingtetun) (:21) from comment #19)
> btw if someone wants to steal and finish the conversion of the app protocol
> handler from js to cpp, it would be awesome.

Assign it to me, I kind of know the edge cases :P
(In reply to Vivien Nicolas (:vingtetun) (:21) from comment #17)
> (In reply to Alive Kuo [:alive] from comment #10)
> > (In reply to Vivien Nicolas (:vingtetun) (:21) from comment #1)
> > >  - $GAIA/apps/system/js/window_manager.js
> > >   - Here are the Gaia part that manages windows opening/closing. It seems
> > > like the JS running here cost around 30-40ms(ignoring the creating of the
> > > <iframe mozbrowser>. Obviously this one can not be rewritten in cpp but it
> > > worth trying to optimize (again) this code path.
> > 
> > Looks like bug 902766 but I cannot make sure 30-40ms is reduce-able.
> 
> We should look at some ways of making that smaller, even if we shave 10 or
> 15ms this is already a win. If we combine that with some other optimizations
> (we know that the time between the click on the icon and the webapps-launch
> message to the system app is around 10-15ms as well for example) it quickly
> grows to 50ms and this is not negligeable.
> 
> The time spent in the system app could be because we are reading some
> expensive informations, or doing other small stuffs that are expensive. It
> worth looking at it.

We are doing:
0) webapps-launch
1) Fetch the manifest and parse the final manifest
  1.a Get name from manifest helper
  1.b Traverse entry point if any
  1.c string match (browser app, would be removed in 1.3) to see who isn't OOP
2) Preload icon splash in a standalone image
3) createFrame: String match again(ya I know but it's because the process is not optimized.)
4) new BrowserFrame: String match to see if its mozapptype is special
5) appendFrame: String match to see if its mozapptype is special (Ya, My fault!)
6) Append the mozbrowser iframe to document.
(In reply to Fabrice Desré [:fabrice] from comment #20)
> (In reply to Vivien Nicolas (:vingtetun) (:21) from comment #19)
> > btw if someone wants to steal and finish the conversion of the app protocol
> > handler from js to cpp, it would be awesome.
> 
> Assign it to me, I kind of know the edge cases :P

<3. Lets track the work into a blocking bug of this one to not pollute too much this meta bug then :)
(In reply to Alive Kuo [:alive] from comment #21)
> (In reply to Vivien Nicolas (:vingtetun) (:21) from comment #17)
> > (In reply to Alive Kuo [:alive] from comment #10)
> > > (In reply to Vivien Nicolas (:vingtetun) (:21) from comment #1)
> > > >  - $GAIA/apps/system/js/window_manager.js
> > > >   - Here are the Gaia part that manages windows opening/closing. It seems
> > > > like the JS running here cost around 30-40ms(ignoring the creating of the
> > > > <iframe mozbrowser>. Obviously this one can not be rewritten in cpp but it
> > > > worth trying to optimize (again) this code path.
> > > 
> > > Looks like bug 902766 but I cannot make sure 30-40ms is reduce-able.
> > 
> > We should look at some ways of making that smaller, even if we shave 10 or
> > 15ms this is already a win. If we combine that with some other optimizations
> > (we know that the time between the click on the icon and the webapps-launch
> > message to the system app is around 10-15ms as well for example) it quickly
> > grows to 50ms and this is not negligeable.
> > 
> > The time spent in the system app could be because we are reading some
> > expensive informations, or doing other small stuffs that are expensive. It
> > worth looking at it.
> 
> We are doing:
> 0) webapps-launch
> 1) Fetch the manifest and parse the final manifest
>   1.a Get name from manifest helper
>   1.b Traverse entry point if any
>   1.c string match (browser app, would be removed in 1.3) to see who isn't
> OOP

Does fetching the manifest has a cache or do we have to ask the same information all the time to the webapps API? We can compute this information once and update it only when there is an app update.

But before doing that it worth checking how long this stuff takes. 

> 2) Preload icon splash in a standalone image
> 3) createFrame: String match again(ya I know but it's because the process is
> not optimized.)
> 4) new BrowserFrame: String match to see if its mozapptype is special
> 5) appendFrame: String match to see if its mozapptype is special (Ya, My
> fault!)
> 6) Append the mozbrowser iframe to document.

I wonder if we are trying to read some layout informations sometimes (in the stuff that do the resizing for example) even if this is not needed?
(In reply to Vivien Nicolas (:vingtetun) (:21) from comment #23)
> Does fetching the manifest has a cache or do we have to ask the same
> information all the time to the webapps API? We can compute this information
> once and update it only when there is an app update.
> 
> But before doing that it worth checking how long this stuff takes. 

Applications.js caches the information I think.
> 
> > 2) Preload icon splash in a standalone image
> > 3) createFrame: String match again(ya I know but it's because the process is
> > not optimized.)
> > 4) new BrowserFrame: String match to see if its mozapptype is special
> > 5) appendFrame: String match to see if its mozapptype is special (Ya, My
> > fault!)
> > 6) Append the mozbrowser iframe to document.
> 
> I wonder if we are trying to read some layout informations sometimes (in the
> stuff that do the resizing for example) even if this is not needed?

Resize does occur when we're opening the app now, right after we append the frame.
We may be able to delay the resize call after the app is loaded and just use 100%/100% as default w/h.

BTW how do you estimate these value? If I know I could try.
(In reply to Alive Kuo [:alive] from comment #21)
> 5) appendFrame: String match to see if its mozapptype is special (Ya, My
> fault!)

Lemme fix this first in another bug.
https://bugzilla.mozilla.org/show_bug.cgi?id=924553
Depends on: 924702
Time to launch (ms) on a Keon mc + master
------------------------------------------------+
                [1]      [2]      [3]     delta |
------------------------------------------------+
Dialer       :  586   |  512   |  470   |  116  |
Messages     :  614   |  559   |  486   |  128  |
Contacts     :  731   |  622   |  573   |  158  |
Browser      :  426   |  449   |  364   |   62  |
Camera       :  422   |  384   |  373   |   49  |
Gallery      :  439   |  392   |  379   |   60  |
Settings     :  695   |  653   |  612   |   83  |
fm           :  447   |  395   |  359   |   88  |
Calendar     :  644   |  610   |  564   |   80  |
Clock        :  530   |  483   |  430   |  100  |
Cost Control :  866   |  782   |  723   |  143  |
Music        :  530   |  484   |  439   |   91  |
Video        :  455   |  451   |  408   |   47  |
------------------------------------------------+

[1] with csp and app protocol handler in JS
[2] with csp and app protocol handler in cpp
[3] csp disabled + app protocol handler in cpp


Those numbers are from a very small numbers of runs so they may vary from run to run and needs to be taken carefully, but in all cases they are going in the right direction.
It seems like the numbers vary a lot depending on the application, which I believe is related to the number of files they load.
Also when you look at those numbers I think it is also important to remember that some of the win are not reflected here. 

Why? Most apps lazy load some files. Those lazy loaded files normally does not appears in the cold launch time but they are loaded right after. Shaving some times to load those files should help for responsiveness right after the app is launched.
(In reply to Alive Kuo [:alive] from comment #25)
> (In reply to Alive Kuo [:alive] from comment #21)
> > 5) appendFrame: String match to see if its mozapptype is special (Ya, My
> > fault!)
> 
> Lemme fix this first in another bug.
> https://bugzilla.mozilla.org/show_bug.cgi?id=924553

Alive, it seems like the 3 calls that consume some times in the system app lives in openWindow and are:
 1. app.resize
 2. app.setRotateTransition
 3. HomescreenLauncher.getHomescreen.close();

For 1 and 2 the reasons are obvious, we touching so geometry informations here, also we're relying on SoftwareButtonManager.getHeight to cache the information for us but the way it is written does not work because if the height is 0 then it will always call getBoundingClientRect (same for StatusBar).
Then the code defines the width/height of the window dynamically with JS but for the cases where the most common cases it would be good to have a fast path with CSS in system.css or even to create those CSS rules dynamically. Then the CSS rule will be used all the times until the configuration (orientation, software button, statusbar) changes and that would be lighter.

Doing this should make those calls more or less free.

About 3. I'm not sure yet why it is expensive.

But by removing / hacking around those 3 calls I have some results like this (sometimes this is very unstable):
I/GeckoDump(  107): Time before calling appendFrame is: 1381324981842 1
I/GeckoDump(  107): Time after calling appendFrame is: 1381324981881 40
I/GeckoDump(  107): Time before calling setDisplayedApp is: 1381324981881 40
I/GeckoDump(  107): Time after calling setDisplayedApp is: 1381324981890 49

Where most of the time in appendFrame is actually when the <iframe mozbrowser mozapp> is added to the dom.

Where most of the
(In reply to Vivien Nicolas (:vingtetun) (:21) from comment #28)
> (In reply to Alive Kuo [:alive] from comment #25)
> > (In reply to Alive Kuo [:alive] from comment #21)
> > > 5) appendFrame: String match to see if its mozapptype is special (Ya, My
> > > fault!)
> > 
> > Lemme fix this first in another bug.
> > https://bugzilla.mozilla.org/show_bug.cgi?id=924553
> 
> Alive, it seems like the 3 calls that consume some times in the system app
> lives in openWindow and are:
>  1. app.resize
>  2. app.setRotateTransition
>  3. HomescreenLauncher.getHomescreen.close();
> 
> For 1 and 2 the reasons are obvious, we touching so geometry informations
> here, also we're relying on SoftwareButtonManager.getHeight to cache the
> information for us but the way it is written does not work because if the
> height is 0 then it will always call getBoundingClientRect (same for
> StatusBar).
> Then the code defines the width/height of the window dynamically with JS but
> for the cases where the most common cases it would be good to have a fast
> path with CSS in system.css or even to create those CSS rules dynamically.
> Then the CSS rule will be used all the times until the configuration
> (orientation, software button, statusbar) changes and that would be lighter.

Sounds sane, will try the orientation part in https://bugzilla.mozilla.org/show_bug.cgi?id=908601

> Doing this should make those calls more or less free.
> 
> About 3. I'm not sure yet why it is expensive.
> 
> But by removing / hacking around those 3 calls I have some results like this
> (sometimes this is very unstable):
> I/GeckoDump(  107): Time before calling appendFrame is: 1381324981842 1
> I/GeckoDump(  107): Time after calling appendFrame is: 1381324981881 40
> I/GeckoDump(  107): Time before calling setDisplayedApp is: 1381324981881 40
> I/GeckoDump(  107): Time after calling setDisplayedApp is: 1381324981890 49
> 
> Where most of the time in appendFrame is actually when the <iframe
> mozbrowser mozapp> is added to the dom.

I don't understand of conclusion of this paragraph?
(In reply to Alive Kuo [:alive] from comment #29)
> > But by removing / hacking around those 3 calls I have some results like this
> > (sometimes this is very unstable):
> > I/GeckoDump(  107): Time before calling appendFrame is: 1381324981842 1
> > I/GeckoDump(  107): Time after calling appendFrame is: 1381324981881 40
> > I/GeckoDump(  107): Time before calling setDisplayedApp is: 1381324981881 40
> > I/GeckoDump(  107): Time after calling setDisplayedApp is: 1381324981890 49
> > 
> > Where most of the time in appendFrame is actually when the <iframe
> > mozbrowser mozapp> is added to the dom.
> 
> I don't understand of conclusion of this paragraph?

The conclusion of this paragraph is:
 if we fix the 3 points mentioned in my previous comments then I think we are clear in the system app and the next set thing to fix is BrowserElementParent.js and the time needed to forward the app.launch() from the homescreen to the system app.
(In reply to Vivien Nicolas (:vingtetun) (:21) from comment #30)
> (In reply to Alive Kuo [:alive] from comment #29)
> > > But by removing / hacking around those 3 calls I have some results like this
> > > (sometimes this is very unstable):
> > > I/GeckoDump(  107): Time before calling appendFrame is: 1381324981842 1
> > > I/GeckoDump(  107): Time after calling appendFrame is: 1381324981881 40
> > > I/GeckoDump(  107): Time before calling setDisplayedApp is: 1381324981881 40
> > > I/GeckoDump(  107): Time after calling setDisplayedApp is: 1381324981890 49
> > > 
> > > Where most of the time in appendFrame is actually when the <iframe
> > > mozbrowser mozapp> is added to the dom.
> > 
> > I don't understand of conclusion of this paragraph?
> 
> The conclusion of this paragraph is:
>  if we fix the 3 points mentioned in my previous comments then I think we
> are clear in the system app and the next set thing to fix is
> BrowserElementParent.js and the time needed to forward the app.launch() from
> the homescreen to the system app.

fwiw the bug for the long delay from app.launch() to the system app handler is bug 924032.
(In reply to Vivien Nicolas (:vingtetun) (:21) from comment #30)
> (In reply to Alive Kuo [:alive] from comment #29)
> The conclusion of this paragraph is:
>  if we fix the 3 points mentioned in my previous comments then I think we
> are clear in the system app and the next set thing to fix is
> BrowserElementParent.js and the time needed to forward the app.launch() from
> the homescreen to the system app.

Gotcha, please file a bug and assign to me.
Whiteboard: [c=progress p= s= u=]
It seems like the time spend in BrowserElementParent.js is due to the sync message from BrowserElementchild.js. Creating the BrowserElementParent cost only 4ms so moving to cpp will not help here :/

I noticed a delta of ~80ms between loadstart/loadend of the template app though, seems like that 26ms are spent in the JS handler. The more expensive one is the loadend since it does a getComputedStyle of the body to retrieve the background color. I wonder if it worth rewritting at least the progress listener in cpp...

I/Gecko   ( 8210): CreateBrowserElementParent: 4
I/Gecko   ( 8441): BrowserElementChild:SyncMessage: 17

I/Gecko   ( 8441): BrowserElementChildPreload.js loadstart: 3
I/Gecko   ( 8441): BrowserElementChildPreload.js setvisible: 0
I/Gecko   ( 8441): BrowserElementChildPreload.js: windowcreatedhandler: 3
I/Gecko   ( 8441): BrowserElementChildPreload.js locationchange: 3
I/Gecko   ( 8441): BrowserElementChildPreload.js titlechanged: 2
I/Gecko   ( 8441): BrowserElementChildPreload.js loadend: 12
(In reply to Vivien Nicolas (:vingtetun) (:21) from comment #33)
> It seems like the time spend in BrowserElementParent.js is due to the sync
> message from BrowserElementchild.js. Creating the BrowserElementParent cost
> only 4ms so moving to cpp will not help here :/
> 
> I noticed a delta of ~80ms between loadstart/loadend of the template app
> though, seems like that 26ms are spent in the JS handler. The more expensive
> one is the loadend since it does a getComputedStyle of the body to retrieve
> the background color. I wonder if it worth rewritting at least the progress
> listener in cpp...
> 

I wonder do we really need the background color info here? Is it to avoid the UI flicking?
(In reply to Alive Kuo [:alive] from comment #34)
> (In reply to Vivien Nicolas (:vingtetun) (:21) from comment #33) 
> 
> I wonder do we really need the background color info here? Is it to avoid
> the UI flicking?

I know you like portrait/landscape stuff, so feel free to look again at bug 869903 ;)
(In reply to Vivien Nicolas (:vingtetun) (:21) from comment #33)
> It seems like the time spend in BrowserElementParent.js is due to the sync
> message from BrowserElementchild.js. Creating the BrowserElementParent cost
> only 4ms so moving to cpp will not help here :/
> 
> I noticed a delta of ~80ms between loadstart/loadend of the template app
> though, seems like that 26ms are spent in the JS handler. The more expensive
> one is the loadend since it does a getComputedStyle of the body to retrieve
> the background color. I wonder if it worth rewritting at least the progress
> listener in cpp...
> 
> I/Gecko   ( 8210): CreateBrowserElementParent: 4
> I/Gecko   ( 8441): BrowserElementChild:SyncMessage: 17
> 
> I/Gecko   ( 8441): BrowserElementChildPreload.js loadstart: 3
> I/Gecko   ( 8441): BrowserElementChildPreload.js setvisible: 0
> I/Gecko   ( 8441): BrowserElementChildPreload.js: windowcreatedhandler: 3
> I/Gecko   ( 8441): BrowserElementChildPreload.js locationchange: 3
> I/Gecko   ( 8441): BrowserElementChildPreload.js titlechanged: 2
> I/Gecko   ( 8441): BrowserElementChildPreload.js loadend: 12

It seems like some of the times taken comes from the call to sendAsyncMessage:
I/Gecko   (  107): CreateBrowserElementParent: 4
I/Gecko   (  477): Will InitChild
I/Gecko   (  477): InitChild in : 16
I/Gecko   (  477): SendAsyncMessage took: 1
I/Gecko   (  477): BrowserElementChildPreload.js onstatechange: 3
I/Gecko   (  477): BrowserEleentChildPreload.js setvisible: 0
I/Gecko   (  477): BrowserElementChildPreload.js: windowcreatedhandler: 10
I/Gecko   (  477): SendAsyncMessage took: 3
I/Gecko   (  477): BrowserElementChildPreload.js locationchange: 3
I/Gecko   (  477): SendAsyncMessage took: 1
I/Gecko   (  477): BrowserElementChildPreload.js titlechanged: 3
I/Gecko   (  477): SendAsyncMessage took: 5
I/Gecko   (  477): BrowserElementChildPreload.js onstatechange: 14

Not sure who is a good candidate to fix that? Bent?
I rewrote a little progress listener handler in cpp and it does not seems to help that much, so it should be something related to the the inner of sendAsyncMessage.
(In reply to Vivien Nicolas (:vingtetun) (:21) from comment #37)
> I rewrote a little progress listener handler in cpp and it does not seems to
> help that much, so it should be something related to the the inner of
> sendAsyncMessage.

hmm actually it seems to help cost control which is one of the place where there are multiple loadstart/loadend fire due to the fact that it embeds a few iframes.
(In reply to Vivien Nicolas (:vingtetun) (:21) from comment #37)
> I rewrote a little progress listener handler in cpp and it does not seems to
> help that much, so it should be something related to the the inner of
> sendAsyncMessage.

See also bug 920804 where I try to improve the performance of the frame message manager (especially how we dispatch in the internal receiveMessage())
(In reply to Vivien Nicolas (:vingtetun) (:21) from comment #38)
> (In reply to Vivien Nicolas (:vingtetun) (:21) from comment #37)
> > I rewrote a little progress listener handler in cpp and it does not seems to
> > help that much, so it should be something related to the the inner of
> > sendAsyncMessage.
> 
> hmm actually it seems to help cost control which is one of the place where
> there are multiple loadstart/loadend fire due to the fact that it embeds a
> few iframes.

It could also be because I receive the event from one of the inner iframe and my hack does not support it.
Alive,

I have created a small patch to see reflows (and the time spent reflowing) in bug 926371. Here are the reflows happening when I launch the template app:

I/GeckoDump(  569): uninterruptible Reflow (8.026123999999982) for :null
I/GeckoDump(  569): Stack: SoftwareButtonManager.height@app://system.gaiamobile.org/js/software_button_manager.js:25|aw_resize@app://system.gaiamobile.org/js/window.js:504|openWindow@app://system.gaiamobile.org/js/window_manager.js:435|setDisplayedApp@app://system.gaiamobile.org/js/window_manager.js:741|windowLauncher@app://system.gaiamobile.org/js/window_manager.js:1187|awf_publish@app://system.gaiamobile.org/js/app_window_factory.js:75|awf_handleEvent@app://system.gaiamobile.org/js/app_window_factory.js:46|
I/GeckoDump(  569): uninterruptible Reflow (1.2207029999954102) for :null
I/GeckoDump(  569): Stack: SoftwareButtonManager.height@app://system.gaiamobile.org/js/software_button_manager.js:25|aw_setRotateTransition@app://system.gaiamobile.org/js/window.js:397|openWindow@app://system.gaiamobile.org/js/window_manager.js:458|setDisplayedApp@app://system.gaiamobile.org/js/window_manager.js:741|windowLauncher@app://system.gaiamobile.org/js/window_manager.js:1187|awf_publish@app://system.gaiamobile.org/js/app_window_factory.js:75|awf_handleEvent@app://system.gaiamobile.org/js/app_window_factory.js:46|
I/GeckoDump(  569):   interruptible Reflow (6.530762000002142) for :null
I/GeckoDump(  569): uninterruptible Reflow (10.467529999998078) for :null
I/GeckoDump(  569): Stack: windowOpened@app://system.gaiamobile.org/js/window_manager.js:361|frameAnimationend/onWindowReady@app://system.gaiamobile.org/js/window_manager.js:240|frameAnimationend@app://system.gaiamobile.org/js/window_manager.js:258|
I/GeckoDump(  569):   interruptible Reflow (8.33129899999767) for :null

So those are likely the things to look at before switching from JS to CSS for this part.
Here the patch used to see reflows in the system app.
Depends on: 927078
With the JS fixes about reflows in the system app, most of the time is shaped without having to do complicated changes. It stay one reflows that comes from the platform and I will ask help from the gfx/layout guy next week (as they will be in Paris).

An other strange things too is that I'm seeing 3 reflows in the template app when opening it. Will ask for help as well:

I/Gecko   ( 4274):   interruptible Reflow (0.030518000001393375) for :main
I/Gecko   ( 4274): uninterruptible Reflow (0.2746580000002723) for :main
I/Gecko   ( 4274): uninterruptible Reflow (2.2583009999998467) for :main
(In reply to Vivien Nicolas (:vingtetun) (:21) from comment #43)
> An other strange things too is that I'm seeing 3 reflows in the template app
> when opening it. Will ask for help as well:
> 
> I/Gecko   ( 4274):   interruptible Reflow (0.030518000001393375) for :main
> I/Gecko   ( 4274): uninterruptible Reflow (0.2746580000002723) for :main
> I/Gecko   ( 4274): uninterruptible Reflow (2.2583009999998467) for :main

I've made a quick profile of the Template app with 1ms granularity:

http://people.mozilla.org/~bgirard/cleopatra/#report=4b96d266dc9f56b37a3233aee0f1312cbd3963c3

Specifically looking for layout::Flush events shows two reflows (both uninterruptible) with one happening within the other and taking almost no time. The third layout::Flush call is for style processing and it might be the first one that you're seeing:

http://people.mozilla.org/~bgirard/cleopatra/#report=4b96d266dc9f56b37a3233aee0f1312cbd3963c3&search=layout%253A%253AFlush

In any case only the first uninterruptible reflow is taking a meaningful amount of time and that's the initial layout of the page.

An aspect of the profile that hit me is that IPDL:PBrowser:SendPRenderFrameConstructor IPC call is taking ~75ms and those are made of pure waiting. Being a synchronous call I assume that all that time is spent in the main process:

http://dxr.mozilla.org/mozilla-central/source/dom/ipc/PBrowser.ipdl#l232
Depends on: 927493
Vivien, Fabrice, lets add a counter next to FPS that counts reflows! Do you guys want to do it or do you need help?
(In reply to Andreas Gal :gal from comment #45)
> Vivien, Fabrice, lets add a counter next to FPS that counts reflows! Do you
> guys want to do it or do you need help?

I opened bug 928447. I think I know all I need to make it happens.
(In reply to Gabriele Svelto [:gsvelto] from comment #44)
> An aspect of the profile that hit me is that
> IPDL:PBrowser:SendPRenderFrameConstructor IPC call is taking ~75ms and those
> are made of pure waiting. Being a synchronous call I assume that all that
> time is spent in the main process:
> 
> http://dxr.mozilla.org/mozilla-central/source/dom/ipc/PBrowser.ipdl#l232

Sorry to not have seen this before that would have saved me an email! For the record I will post the email here so at least others may know the story around that:

------
Hey,

I spent a bunch of times profiling the template app yesterday (the template app, not the template process) and found a lot of sync messages.

 1. http://mxr.mozilla.org/mozilla-central/source/dom/browser-element/BrowserElementChild.js#20

 2. http://mxr.mozilla.org/mozilla-central/source/dom/ipc/TabChild.cpp#2278

 3. http://mxr.mozilla.org/mozilla-central/source/dom/ipc/TabChild.cpp#2289

 4. http://mxr.mozilla.org/mozilla-central/source/dom/ipc/TabChild.cpp#2194

Those were making the profile hard to read so I removed the first 3 one with dirty hacks to have a cleaner profile (http://people.mozilla.org/~bgirard/cleopatra/#report=4c9bf38dfeab19d485579367ee2e7a913c3ea9f0).

The interesting part starts starts from 12200 to 12401. (The timing is usually lower but I'm profiling with an interval of 1ms which slow down things a bit).

One interesting things about removing those 3 sync calls was that the timing of the template app didn't changed much but there is more idle time in the template app. It seems like the reason for that is because the template app is sitting here, waiting for OnInputStreamReady to fire. (The IPC call for onInputStreamReady is located at 12292).

OnInputStreamReady starts right after RecvCacheFileDescriptor (at 12289).

This CacheFileDescriptor is the IPC message that contains a File Descriptor to the zip package containing the application. It is send from http://mxr.mozilla.org/mozilla-central/source/dom/ipc/TabParent.cpp#134 a few runnables after the call to load a URL has been made (http://mxr.mozilla.org/mozilla-central/source/dom/ipc/TabParent.cpp#403).

I was curious why the the FD is received so late (89ms after the start of the process).

My first thoughts was. Maybe it takes time to get a FD from the disk before sending it from the parent to the child process. So I started creating a small hash of path -> FD to send the FD already prepared after the first opening.

The result was that it didn't change much.

And so I end up looking at when this SendCacheFileDescriptor/RecvCacheFileDescriptor has been implemented hoping that they were some explanations in the bug. (bug 835698)

In this bug https://bugzilla.mozilla.org/show_bug.cgi?id=835698#c6 states that they should be more or less for free except if they was a low memory situations) because we fadvice()'d
 /system
 /system/b2g
 /system/b2g/webapps
 /data
 /data/local
 /data/local/webapps

Then it was unclear to me why the RecvCacheFileDescriptor is receive so late. It seems like https://bugzilla.mozilla.org/show_bug.cgi?id=835698#c0 explains that this patch was meant to help if bug 835681 and bug 835679 was resolved. (In https://bugzilla.mozilla.org/show_bug.cgi?id=835698#c54 it also states that there is bug 839154).


Those bugs are mostly about removing the sync messages I described at the beginning of this email. They intend to remove (1), (2) and (4).
(3) was not here at that point but should be removed as well.

Those are not fixed yet (which is the reason of this email) and with those fixed most of the time in the child process in early stage will be idle time waiting for the FD to come back from the master process.

Now if you look at the profile, (4) is still here and delay the first load until the sync call is resolved. Now if it is removed (along with (1), (2) and (3) the theory is that the FD can be retrieved sooner which in returns can make the OnInputStreamReady() call to fire sooner (as a reminder it fires currently 92ms after the start of the child process).

That was the intention of bug 835698 to send the FD as soon as possible to reduce this wait. But this one has been resolved but not the dependencies.

I can not say what will be the exact win but https://bugzilla.mozilla.org/show_bug.cgi?id=835698#c54 from Chris Jones says that they bring a 'pretty significant win'. And imho even if the win is not 'pretty significant' and the child is sitting down waiting for the FD to be received that will:
 - results into cleaner profile, easier to read without fake positives
 - more idle time that we can use to preload some expensive stuff like RIL if the process has the RIL permission, or warming the mozSettings database if the process has the mozSettings permission (the RecvSetProcessPrivileges is the first thing receive by the process (at 12200).

So fixing those should bring us a win because the FD will be receive sooner and in the worst case I assume that we can exploit idle time by preloading expensive stuff.


The current situation is that the bugs have stalled since February mostly because Chris Jones left and one of the bug was something that Justin Lebar may have resolved...

But let's go forward! So now that bug 835698 (pre-send a FD) has been resolved it stays:
 a. Bug 839154 - Cache results from AppsService.getAppBy*() that are used immutably

 b. bug 835681 - Remove the sync messages from BrowserElementChild startup

 c. Bug 835679 - Make PRenderFrame ctor async and parent->child and remove synchronous GetDPI() (which can be altered by also removing the synchronous GetDefaultScale() added later)

Bug 835679 has 3 patches r+'ed already since months but is blocked by bug 763602, so adding it to the list.

 d. Bug 763602 - Decouple TabChild's creation of a window from TabChild::Show 

------
For now the results of this bug are very 'magnifique'!. See attachment.
Removing the last 2 JS reflows should be a nice win as well(bug 926871) . Roc has proposed me some help for the last reflow that comes from the platform tomorrow. will see!
Depends on: 930686
(In reply to Vivien Nicolas (:vingtetun) (:21) from comment #49)
> Removing the last 2 JS reflows should be a nice win as well(bug 926871) .
> Roc has proposed me some help for the last reflow that comes from the
> platform tomorrow. will see!

The last reflow likely come from nsSubDocumentFrame.cpp when the <iframe mozbrowser> is created and inserted into the DOM. No way to fix that :(

I also have a long tail of small JS handlers that consumes some startup time. Will open more bugs later.
Alias: less-js-at-startup
(In reply to Vivien Nicolas (:vingtetun) (:21) from comment #50)
> (In reply to Vivien Nicolas (:vingtetun) (:21) from comment #49)
> > Removing the last 2 JS reflows should be a nice win as well(bug 926871) .
> > Roc has proposed me some help for the last reflow that comes from the
> > platform tomorrow. will see!
> 
> The last reflow likely come from nsSubDocumentFrame.cpp when the <iframe
> mozbrowser> is created and inserted into the DOM. No way to fix that :(
> 
> I also have a long tail of small JS handlers that consumes some startup
> time. Will open more bugs later.

I finally found a way to avoid the reflow when the application is launched by moving it at some other point. Basically the hack consist of creating a <iframe mozbrowser> in the system app a bit after the preallocated process has been created. The iframe creation can be done when the system app has some idle time to not interfered too much with a possible user interaction.

Doing this also shave some app initialization time since it some of the sync calls have done at that point. For the template app it shaves around *40/50ms* of startup time. For some reasons it seems to shave a little bit less for other apps, like *30ms* but I didn't investigate that too much yet.

The tricky part by doing this is that the platform assume that the <iframe mozbrowser> is immutable once it has been initialized which makes it tricky to implement. I think the immutable state is fine, but then we can redefined what means initialization for an <iframe mozbrowser>. Preloading it can put it into a semi-initialized state that is mutable. It requires some changes to content/base/src/nsFrameLoader.cpp, dom/ipc/ContentParent.cpp, dom/ipc/TabParent.cpp, dom/ipc/ContentChild.cpp and dom/ipc/TabChild.cpp, and Likely some weird tricks with the preallocated process.

Doing this also change a lot the profile for application startup, so I mentionned in my previous comment that there is a long tail of JS handlers that slow down app startup but those are still here but a bit different due to the 2 steps initialization. So I think it makes more sense to look at them with a profile that is done after the 2 steps initialization has been implemented.

I have some dirty Gaia/Gecko patches that implement this, but they break application permissions (Seems like the permissions of an application are read only once during nsPermissionManager init), so it would be nice if there is a volunteer to help here makes the POC something real!

Anyone?
(In reply to Vivien Nicolas (:vingtetun) (:21) from comment #51)

> I have some dirty Gaia/Gecko patches that implement this, but they break
> application permissions (Seems like the permissions of an application are
> read only once during nsPermissionManager init), so it would be nice if
> there is a volunteer to help here makes the POC something real!
> 
> Anyone?

Post your patches to another bug and I'll take a look. I also wonder how/if that will change once we enable nuwa processes...
(In reply to Fabrice Desré [:fabrice] from comment #52
> I also wonder how/if that will change once we enable nuwa processes...

We can already turn on nuwa processes via a pref, right? If that's the case I'd be glad to help with profiling and comparing the two setups.
It's not a pref, it's a compile time flag. See bug 930282 and blockers.
(In reply to Fabrice Desré [:fabrice] from comment #52)
> (In reply to Vivien Nicolas (:vingtetun) (:21) from comment #51)
> 
> > I have some dirty Gaia/Gecko patches that implement this, but they break
> > application permissions (Seems like the permissions of an application are
> > read only once during nsPermissionManager init), so it would be nice if
> > there is a volunteer to help here makes the POC something real!
> > 
> > Anyone?
> 
> Post your patches to another bug and I'll take a look. I also wonder how/if
> that will change once we enable nuwa processes...

As far as I have understood the nuwa process does not take over the preload stuff we are doing for now. It forks at some point before that, and then preload.js is loaded. So that should be fine for now.

I assume that this stuff should change though since it will likely let us share more memory but I wonder if we won't end up with a 2 steps preload.js. Something that happens before the Nuwa freeze and something that happens after. I don't know enough though.

But in any case I think that's fine, since the preallocated <iframe mozbrowser> will also connects to a process that has been fork'ed from the template process, and not the template process itself.
(In reply to Fabrice Desré [:fabrice] from comment #52)
> Post your patches to another bug and I'll take a look. I also wonder how/if
> that will change once we enable nuwa processes...

I have added some wip in bug 939695. Thanks for the help. I need to find you a name like Midas or similar, or the guy that turns POC into real patches!
Component: General → Performance
Priority: -- → P2
See Also: → AppStartup
Whiteboard: [c=progress p= s= u=] → [c=progress p= s= u=][perf-wanted]
[Tracking Requested - why for this release]:
Closing this old B2G bug.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.