Profile the startup time of the template app

RESOLVED WORKSFORME

Status

Firefox OS
Gaia
RESOLVED WORKSFORME
6 years ago
5 years ago

People

(Reporter: vingtetun, Assigned: gsvelto)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

It would be useful to have a profile of the template app to see what is between the click of a user and the opening of an empty application.
(Assignee)

Updated

6 years ago
Assignee: nobody → gsvelto
(Assignee)

Comment 1

6 years ago
The profile is available here:

http://people.mozilla.com/~bgirard/cleopatra/#report=d626a627f5e676ba0dd4797d457d5ba96a3e1ec6

A few remarks:

- The total startup time takes roughly one second

- Disregard the BrowserElementChild.prototype._recvGetScreenshot/takeScreenshotClosure() call, it is executed only after the startup has finished and the application has gone idle so it doesn't impact startup time (see bug 799595 for more info)

- BrowserElementChild.js startup takes ~500ms, of which ~380ms are spent in the _init() function and the rest in various native calls

- Loading and executing contentSecurityPolicy.js seems to take ~100ms

- CSS processing and page layout seem to account for only ~50ms
Status: NEW → ASSIGNED
Blocks: 797189
Depends on: 810526
Created attachment 680487 [details] [diff] [review]
wip

Gabriele, can you run a new profile with this patch applied and see if we get any improvemen?
(Assignee)

Comment 3

6 years ago
(In reply to Fabrice Desré [:fabrice] from comment #2)
> Gabriele, can you run a new profile with this patch applied and see if we
> get any improvement?

Sure, here's the new profile:

http://people.mozilla.com/~bgirard/cleopatra/#report=e71580ec82cce5d8c9362e173ff6d3e2fd9526d1

I can see that end-to-end startup now takes 800ms so it looks like a very nice improvement! I'll try re-running the test a few times later to confirm if the measurement is consistent.
I don't get why this makes things faster.

Or at least, I believe that coalescing the two sync messages into one would have some effect.  But we're not making setting up the window (i.e., touching content) any faster -- we're simply putting it off until later in the app-startup process -- so I don't see how that should matter.
Coalescing the sync messages was worth ~40ms in most of my timings.

I also concur with justin on the second point. Maybe we're not measuring what we should, and we need a more "visual feedback" test like eiderticker?
(Assignee)

Comment 6

6 years ago
I re-run a bunch of times the profiles to see what I would get and there is a fair bit of variance so these results should be taken with a grain of salt. Shame on me for not having re-run those earlier. What I get is the following:

- Without the patch anything from 820 to 920 ms from the first sample after the last wait and until before the screenshot is taken (i.e. first-idle)

- With the patch values between 800 and 870 ms

- One of the quirks is that I get samples of nsAppShell::ProcessNextNativeEvent::Wait within the sample range

- Another issue is that I also got a couple of timings that were completely off the scale, ~600ms, that doesn't make me very confident of the other results
Can you see if there's a difference between Fabrice's full patch and a patch which only coalesces the two sync messages into one?  He or I can produce that smaller patch for you if you're not sure how to change that.
(Assignee)

Comment 8

6 years ago
I trimmed down the patch to only replace the two sync messages and re-run the test. I did five runs both with and without the patch and this time I tried to be as careful as possible to minimize variation; unfortunately this is a very time-consuming process so doing even more runs would be impractical. Here's what I got:

Run | Unpatched |  Patched  |
 1  |     810ms |     820ms |
 2  |     820ms |     810ms |
 3  |     800ms |     710ms |
 4  |     800ms |     810ms |
 5  |     880ms |     700ms |

The version with the patch applied looks better but there's still lots of variation, one of the issues is that I can get 5-10 samples in nsAppShell::ProcessNextNativeEvent::Wait which are pretty much meaningless since we're waiting on an external event - whatever that might be. So I tried to drill down and check only the startup time for BrowserElementChild.js, this is what I got:

Run | Unpatched |  Patched  |
 1  |     300ms |     220ms |
 2  |     250ms |     270ms |
 3  |     210ms |     150ms |
 4  |     300ms |     190ms |
 5  |     320ms |     160ms |

Now the difference between the two looks more marked and is more likely to be a direct effect of the patch. I have kept all the traces and I can upload them if you want to check them out personally.
(In reply to Vivien Nicolas (:vingtetun) from comment #0)
> It would be useful to have a profile of the template app to see what is
> between the click of a user and the opening of an empty application.

I think I have the answer to that now.

The bottom line is:

It currently takes ~1000-1100ms to load Template app.

 - tap: 0ms
 - navigationStart: 400ms
 - domLoading: 710ms
 - loadEvent: 810ms
 - document ready: 1100ms


I used three methods in combination:
 - hfs camera - external to software stack
 - system numbers from gecko - external to app, in system
 - in app numbers

and those three layers fit quite nicely together so I take it as a token of reliableness :)

The number is coming from a hfs camera recording between the moment of when the tap event is recognized on the app icon and the moment when the app is loaded and focus is given to it.

Now, digging into this:

First ~400ms are taken by gecko to initialize the document. I have no idea what is going on there, but it is only 400ms into the process when performance.timing.navigationStart happens, so my assumption is that it's everything that gecko needs to set up the iframe, sandbox it and initialize navigation.

Next ~10ms are between navigationStart and responseEnd, so it seems that fetching the document is fast.

At around 410ms into the boot process there's another black hole of around 300ms. I have no number from that.
The only data I have is that responseEnd is fired 10ms after navigationStart, and the next event - domLoading - fires 310ms after navigationStart.

So at the point when performance.timing.domLoading is recorded, we're ~710ms into the boot.

Then, it takes ~100ms before DOM is complete, and loadEvent starts.

That puts us 810ms into the boot, and then we have ~200 static cost of startup animation. (bug 822965).

It seems that there are two things on Gecko end that we could investigate:
 - the initial 400ms overhead
 - the 300ms between responseEnd and domLoading
more insight into the early stages of the univer... erm... app launching.

The app starts by registering tap event in homescreen's page.js.
Then it spawns webapp-launch event through Gecko and system app reacts to that.

It creates a new remote (OOP) iframe, appends it to the window and starts the transition.

At the same time in the new process, magic happens (bug 831135) and then gecko loads the document - which is documented above.

It looks more or less like this (number is miliseconds since tap event):

homescreen app:
   . tap                             0    
   . icon.app.launch                 3    


system app:
   . Reacting to webapps-launch      31   
   . appendFrame starts              37   
   . createFrame starts              39   
   . createFrame ends                42   
   . appendFrame ends                75   
   . About to launch setDisplayedApp 77   
   . firing "appwillopen"            80   
   . end of openWindow               99   
   . end of setDisplayedApp          101  
   . end of transitionOpenCallback   179  
   . transitioned starts             337  
   . transitioned starts             448  
   . transitioned and opening        451  
   . windowOpened start              452  
   . catching mozbrowserloadstart    635  
   . catching mozbrowserloadend      1112 

Template app:
   . onload                          1090
   . navigationStart was ago         471.649169  

More about the numbers:

createFrame is where the iframe is created and it's src set. It probably does not spawn the OOP just yet, because the iframe is not in DOM. Somewhere before appendFrame ends the iframe is being injected into DOM and then it should start the new process.
At this point, we're ~70ms into the boot time.

From appwillopen till transitionOpenCallback it's pure JS code that we can probably rearrange to optimize for perf.
Then, around 179ms we start the transition animation that ends around 451ms and fires windowOpened function that ends with firing "appopen" event somewhere around 500ms.

Now, that's the main process. Around 635ms into the boot the mozbrowserloadstart event is dispatched which marks the beginning of document loading in the child process and this takes ~470ms for Template app and ends with mozbrowserloadend.
That matches nicely internal readings from performance API from inside of the child process.

Now, the remaining magic is what happens between we start the new process ~70ms and mozbrowserloadstart at 635ms. That's 550ms of time that goes into something.

Jlebar is trying to get more numbers from it in bug 831611 and we track the whole thing in bug 831135.
We have a very good handle on no-op app startup now.
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.