Avoid/ Postpone unnecessary startup work for mobile

RESOLVED INCOMPLETE

Status

()

enhancement
RESOLVED INCOMPLETE
12 years ago
9 years ago

People

(Reporter: skumar, Assigned: skumar)

Tracking

({meta, mobile, perf})

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(9 attachments, 1 obsolete attachment)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12

We did timeline profile of the startup and following are some findings. We request experts to comment and give suggestions to optimize the start up speed.
(On Debian PC, the findings are similar, but time taken to show the minefield window is 3 seconds)

- It takes 12 seconds to show the minefield window (warm start)
  Some times the start up time is 15 seconds. (when this happens, a large number of nsProxyObjectCallInfo events are observed.)

- Around 25% (3 sec) of the time is spent before entering the event processing loop (appStartup->Run ())
  When the startup time is 15 seconds, this becomes 30% (5 sec)
  Let this step be called Initialization.

- Following are the major contributors to Initialization (when the startup takes 15 secs)
  - startup notifier -2 secs
  - cmdline->Run - 1.4 sec
  - XPCOM startup (Initialization, doAutoReg, setwindowCreator) - 780 ms
  - appStartup->CreateHiddenWindow () - 400 ms
  - Extension Manager Start, Notify Observers ("final-ui-startup") - 200 ms
  - dirProvider->DoStartup - 160 ms

- Starup Notifier
  There are 13 services that are created during this time. Please search "app-startup" in compreg.dat.
  Are all these needed for Mobile? Which ones can be disabled? Delayed?

- cmdline->Run ()
  Six command-line-handler services and One command-line-validator service are created and Handler () and Validate () respectively are called. The actual components can be found from compreg.dat (search for command-line-handler, command-line-validator under CATEGORIES) 

   Are all these needed for Mobile? Which ones can be disabled? Which ones can be started later? Procedure to disable these?

- cmdline related code.
  There is a lot of work done for commandline in XRE_main. Can it be skipped if  argc < 1?

- CreateHiddenWindow ()
  What is hidden window for? Is it needed for Mobile? How and what can be optimized?

- Extension Manager start, notify ("final-ui-startup").
  Any optimizations here? How to find the list of observers for "final-ui-startup"? For example, nsNavHistory is initialized during this time. How to postpone this to after the window is shown?

- XPCOM Initialization
  Can we avoid some fragmentation and save some time by allocating big arrays of modules, factories, interfaces, typelibraries at once rather than allocating each? We might know the number of these from compreg.dat or xpti.dat

  Can we break this work into two parts. First minimal work to show the window so user can start input and the second part done in background?

With touchscreen mobile chrome, the startup time is ~7 seconds and all the above listed steps take proportinal time.

Reproducible: Always

Steps to Reproduce:
1. --enable-timeline
2. build on N800
3.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Assignee: nobody → skumar
Keywords: perf
- Can we disable MOZ_ENABLE_XREMOTE for mobile?
Status: NEW → ASSIGNED
Keywords: mobile
- XPCOM Initialization
  Increase the size of following hash tables. During initialization it was observed that these hash tables were resized and resizing hash tables mean, allocating a new one and moving all the old entries to new one. Also resizing happens in steps (doubles every time the table is grown). For example to go from 32 to 256, it would be 32 to 64 to 128 to 256 (each time allocating new table, moving all the entries and freeing the old one).
  Obviously this will increase the heap usage, but isn't that better because it will happen eventually?
  Also, how to make this change to be platform dependent? for example, firefox on debian might have more components than firefox mobile.

   - ComponentManager->mContractIDs (from 1024 to 2048)
   - ComponentManager->mAutoRegEntries (from 32 to 256).
   - xptiWorkingSet.cpp : XPTI_HASHTABLE_SIZE (from 128 to 2048)
   - prefapi.cpp:pref_Init ()::gHashTable (from 1024 to 2048)
   - nsAtomTable.cpp - gAtomTable (from 2048 to 4096)
   - nsScriptNameSpaceManager.cpp, mGlobalNames (from 128-1024)
   - nsPersistentProperties.cpp: mTable (from 32-512)

   This might not give observable speed improvement, but this will avoid unnecessary work and might reduce fragmentation. I tried this and saw that calls to ChangeTable () are gone.
   Any comments?
This attachment is generated as follows.
- profile the event processing using dtrace (start and close firefox)
- use a python script to sort the event stacks by total processing time (also sort  output from multiple threads)

The output contains stack traces followed by processing times in nano seconds. 
Eg:  nsRunnableMethod<nsBindingManager>::Run()
     nsBindingManager::DoProcessAttachedQueue()
     nsBindingManager::ProcessAttachedQueue(unsigned int)
     nsCOMPtr_base::assign_with_AddRef(nsISupports*)
     nsXULDocument::AddRef()
     nsXMLDocument::AddRef()
     nsDocument::AddRef()
     nsDocument::UnblockOnload(int)
     nsDocument::DoUnblockOnload()
     nsCOMPtr_base::~nsCOMPtr_base()

[4542884475L, 463939, 437077]

indicates that this call stack was observed two times and total is 4542884475 nano seconds.

At the end, there are some unfinished stacks (for which dtrace did not detect a return from nsThread::ProcessNextEvent ().
More about the attachment:
  + indicates recursion. 
  +2+ indicates recursion 2 times.
  stacks are separated by --------
  [T, n1, n2, n3] indicate that this stack was observed 3 times and n1, n2, n3 are the durations each time and T is the total. (Duration is calculated with vtimestamp dtrace variable)
  Some events during exit are also reported as there is no specific place to stop profiling after start up.

Observations from attachment (https://bugzilla.mozilla.org/attachment.cgi?id=308429)

- Can we increase the duration for some timer call backs (so they won't fire during startup?). 
  For example,      mozJSComponentLoader::CloseFastLoad(nsITimer*, void*). The interval is 5 seconds. This is called multiple times. Can we increase the duration to 15 seconds for mobile? so the overhead of opening, closing the fastload system can be avoided?

- It would be ideal if there was a place in code which indicates completion of startup. At this point all the timers can be started, services like nsNavHistory can be started, etc..

- nsRunnableMethod<nsBindingManager>::Run()
  ....
  nsDocument::UnblockOnload(int)
  is the most time consuming event. Any optimizations?

- Any other optimizations obvious from this output?
Yes, I think that increasing the jscomponentloader fastload timeout is quite reasonable... please submit that patch (here or elsewhere)

Lots of work under UnblockOnload is a side-effect of being done with the document, so that it calls into JS to fire the onload() DOM event, which then does the work of finishing up browser initialization. So, other than "make browser initialization faster or delay some things", no there's a simple answer.

Are you testing this running *Firefox* on a mobile device, or some chrome-free browser? You almost certainly don't want to be using the Firefox UI code for this, since it does a lot of work with urlbar/tab initialization that doesn't make sense on mobile devices.

And, since you're using dtrace, are you profiling across JS calls? see bug 388564 for details on how you can trace entrances/exits from JS functions in dtrace, so that you can profile our UI code more effectively.

As for "place in the code that indicates completion of startup"... we have several notifications like this, depending on which phase of startup is complete. However, we explicitly want to initialize places in Firefox before we show the first window, so that the bookmarks toolbar is correct and you don't have it flash in "later". You may want to change this behavior in the mobile browser UI, since you almost certainly won't have a bookmarks toolbar. See http://developer.mozilla.org/en/docs/Observer_Notifications which has at least app-startup and final-ui-startup.
>> Are you testing this running *Firefox* on a mobile device...
Yes. I was using the trunk build for firefox on Mac. May be I will try with mobile chrome.

>> has at least app-startup and final-ui-startup.

No. These happen earlier than wanted. "xul-window-visible" sounds good. Is it a good place?
xul-window-visible is not "reliable"... it's quite possible for an application to launch and never open any windows. So you have to at least be careful that if you're starting system components, they fully start up even if the app doesn't open any windows.
Just to see the impactI 
- disabled 
   - nsBlocklistService.js
   - nsSessionStartup.js
   - nsSessionStore.js
   - nsTryToClose.js
   - nsBrowserGlue.js
   - nsUpdateService.js
   - WebContentConverter.js from app-startup category. 

- increased mozJSComponentLoader::CloseFastload timeout to 15 seconds.
- Delayed "final-ui-startup" notification. I started a 5 sec timer in XRE_main () and from timer callback, I did "final-ui-startup" notification.

- Increased the hash table sizes (mentioned above)

With these changes, I observed at least 700 ms improvement on N800.
Intention was to identify any components/ services that can be avoided or delayed.
** filename in the output indicates that the IID was not found in xpti.dat. Any ideas?

This is with previously mentioned changes and --enable-chrome-format=flat
Content policy stuff (nsIContentPolicy) can probably be delayed until we actually load anything other than the about:blank page if we're clever.

We'd possibly need to still instantiate the nsIContentPolicy itself, but we shouldn't need to go looking for which content policy listeners are installed and instantiate those.
Can somebody point me to an extensive list of mozilla preferences? I am thinking we might be able to change defaults for mobile so it will be faster.
Delay initializations for nsIFormFillController & nsIAutoCompleteController. 
Also, Things like SpellChecker can be initialized later.
List of preferences accessed during startup. The output is of the following format.
[type] : [count] : prefname
Previous attachment missed services created using GetServiceByContractID ()
Attachment #308887 - Attachment is obsolete: true
I wanted to see how much time can be saved by avoiding loading of unnecessary js files. So I deleted all .js files except nsDefaultCLH.js from bin/components/ dir.

The startup time came down to 4.15 sec from 6.6 sec on N800. (I have some other modifications, but I think this saved at least 1.5 sec).

This implies that all these js components can be loaded after the window is visible. Now I need to figure out how to postpone these.

NOTE: All the measurements (except when mentioned explicitly) are for touchscreen chrome on N800.
The extension manager initializes an entire large JS component on every startup. I have made experimental patches to split it up into two parts:

1.) A small piece that registers it for various notifications.
2.) The actual extension management piece.

I did both 1 and 2 in JS, but it might make sense to rewrite #1 in C++.
Used DTrace to get ustack, CID/Contract ID and IID on each component creation. Used DTrace to trace javascript stack too. Processed the output with Python to produce Native & JS stacks together in XML format for readability. Component creation statistics are also included. 

NOTES:
1. DTrace does not seem to produce the stacks in the order they are executed. For example, the output can contain a-b-c-d, a-e-f-h and then a-b-c-x. So, please do not use this output to analyze execution flow.

2. Reg. XML viewing: Firefox 2.0.x.x skips some output randomly. Firefox built from trunk shows it nicely and correctly. Safari is the worst. It shows only CDATA part and the output is worst. IE6 output is correct but not clean.
NOTES:
3. The function names might look strange. This is because I used mangled names in D script (DTrace does not work otherwise) and hence it produced the output in mangled form. Then I used c++filt and in my Python script I replaced all non XML tag characters (like :, (, etc..) with _ (underscore).
4. Explaination of output:
(2) Name : nsFileInputStream CTID : @mozilla.org/network/file-input-stream;1 IF : nsIFileInputStream
Means, this is the second time (2) nsFileInputStream class is created.
5. getService exec="fuelApplication.js:497" def="fuelApplication.js:495"
means, this function is defined at fuelApplication.js:495 and the current function call is from line 497
6. ExecStart exec="js:0" def="fuelApplication.js:1" is produced by D script probe: javascript*:::execute-start
7. func_0x1 & sktrace - tags like this are created by me to produce viewable XML.
Summary: Disable unnecessary startup work for mobile → Avoid/ Postpone unnecessary startup work for mobile
Could you attach the script used to produce attachment 313637 [details]? The DTrace toolkit from Sun contains a sample JavaScript dtrace script that shows time spent as well as a callgraph. A similar presentation would be really valuable here.
Following are my experiments so far in reducing startup time (on N800)
Following mozconfig options are used.
----
--disable-crashreporter
--disable-debug
--enable-optimize
--enbale-strip
--disable-logging
--disable-inspector-apis
--disable-xpcom-obsolete
--disable-mochitest
--disable-accessibility
--disable-tests
--enable-chrome-format=flat
--disable-javaxpcom
----
With the latest xulrunner, the mobile chrome takes 15 sec to start for the first time after the device restart. (Dynamic Loader, populating page cache, building fastload files, etc.)

Warm start up is about 5.8 sec.

Then I removed things like HTML FormFill controller, Secure Browser UI, Browser Navigation History, Content Area Drag and Drop, etc. from browser.xml.

And I increased the initial hash table sizes (bug 423633) - Might not be a big contributor.

-5.5 sec.

Removed all .js files from bin/components/ except nsDefaultCLH.js

-4.60 sec.
 
Removed following .css files as these are observed to be loaded during startup.
(html.css, quirk.css, toolbarbutton.css, autocomplete.css, dropmarker.css. missingPlugin.css, forms.css, popup.css, scrollbars.css). I know that removing is not the right way. I am just experimenting for improving startup time. May be I will find a way to load these later.

- 4.1 sec

Simplified ua.css and xul.css

- 3.70 sec

Commented creation of DragService from nsPresShell.cpp, commented creation of SocketTransport Service, DNS service, Network Link Service from nsIOService.cpp

- 3.6 sec

Commented Hidden window creation, Startup Notifier, HAVE_DESKTOP_ID from nsAppRunner.cpp:XRE_main ()

- 3.34 sec

Removed popup.xml, listbox.xml, autocomplete.xml and disabled XREMOTE

- 3.1 sec.

I experimented with mozconfig options I found over the internet, but they did not make a lot of difference. (--disable-pedantic, --disable-composer, --disable-profile-sharing --disable-profile-locking, --disable-permissions, --disable-mailnews?, --disable-gopher, --with-system-png, --with-system-jpeg, --with-system-zlib)

Also, I tried --enable-libxul, --enable-static --disable-shared some time back. These did not affect my startup time.

I have little older xulrunner code on which I applied dougt's configure.in patch for --with-embedded-profile=basic (bug 423277), and once I saw the startup time to be 2.96 sec! but I think basic profile is not going to be enough for mobile firefox.

For the things that need to be delayed (socket tx service, etc.), I am thinking of doing it in delayedStartup () (bug 364304). For the things that should be avoided(JS console, etc.), I need suggestions about how to disable them for mobile build. I am not sure how much of .css, .xml, .js stuff I can disable/delay.

Any suggestions?
All the measurements are warm startup times except when explicitly mentioned.
(In reply to comment #22)
> Following are my experiments so far in reducing startup time (on N800)

This is cool stuff.  Do you have profiles before/after?  I think we're going to want to put some of those things back (browser navigation history is pretty important), so we'll want to know whether the cost is inherent in the activity, or if we can make it more efficient.

> With the latest xulrunner, the mobile chrome takes 15 sec to start for the
> first time after the device restart. (Dynamic Loader, populating page cache,
> building fastload files, etc.)

Hmm, if we're building fastload files every time the device restarts, we should look at that -- probably an easy fix, and could have a big effect on startup perf.  Good find!

> Any suggestions?

If we're spending a lot of time initializing things that you think we don't need to initialize, we should figure out where that time's going.  A set of profiles showing the progressions as you remove things would help us target effort well, I think.

Thanks for the digging!
> Do you have profiles before/after?
On device, only profile I can get easily is NS_TIMELINE and I will generate those. 

>Hmm, if we're building fastload files every time the device restarts, we should
look at that

No. I removed .mozilla, copied the new xulrunner/dist/bin to device and measured. Only if there is no .mozilla, the fastload files are created (or if something in the chrome changes, etc.). It takes around 8 seconds if the fastload files are already there (if everything in .mozilla is valid). Sorry I forgot to mention this.

Using preloader solves the page cache population. I guess, something like prelink should be used to solve dynamic loader lag. 

> A set of profiles showing the progressions as you remove things would help us target effort well, I think.
 
Can you please give some details about what type of profiles will be useful? (oprofile, dtrace, NS_Timeline?) then I will generate the profiles.
I wouldn't think that prelink would help a _lot_, since we're only linked against libxul and a handful of other libraries.

If the effects you're seeing are mirrored on desktop (removing a given chunk of code gives roughly the same proportionate speedup), then getting profiles on a desktop with oprofile would probably be quite useful.

Are we targetting a given first-start-ever speed?  I suspect that we could figure out how to ship a fastload file, so even out-of-the-box device power-on wouldn't hurt, so I think we probably want to profile with fastload in place.
I tried oprofile (--no-vmlinux --image=xulrunner-bin), but the data does not seem to tell a lot. All I see is 
new 80%, 
nsHashTableET 20%.
(with opreport -l)

This might be due to the fact that on PC (Xeon, Dual processor), xulrunner takes less than a sec to show the mobile chrome.
I tried increasing the sampling rate by --event=GLOBAL_POWER_EVENTS:1000000 (default is 100000). Did not help. Also, I tried 6000 (because the oprofiled manual suggests (excerpt):
 binary didn't run long enough

    Remember OProfile is a statistical profiler - you're not guaranteed to get samples for short-running programs. You can help this by using a lower count for the performance counter, so there are a lot more samples taken per second.
(In reply to comment #27)
> I tried oprofile (--no-vmlinux --image=xulrunner-bin), but the data does not
> seem to tell a lot. All I see is 
> new 80%, 
> nsHashTableET 20%.
> (with opreport -l)

Sounds like you might not have symbols available?

> This might be due to the fact that on PC (Xeon, Dual processor), xulrunner
> takes less than a sec to show the mobile chrome.

Indeed, slower machines are better for profiling.  Do you have a low-end ARM system available, perhaps?  Something closer to the device's end of the 15x spectrum between PC and device, at any rate.

> I tried increasing the sampling rate by --event=GLOBAL_POWER_EVENTS:1000000
> (default is 100000).

My oprofile is rusty, but I think that would _decrease_ the sampling rate, since you would be taking a sample every 1e6 events rather than every 1e5 events.

> Did not help. Also, I tried 6000 (because the oprofiled
> manual suggests (excerpt)

Sampling every 6000 clock cycles seems like it should give you quite a few samples!  How many total samples does oprofile report that it's taken in that sub-second period?   If you hack in a busy loop that runs for ~5 sec on the PC, does it show up with oprofile?  My suspicion is that you don't have symbols in place, possibly because your build configuration is stripping them, or possibly because you're not asking for them to be generated in the first place.

You'll want these in your config:
--enable-optimize
--enable-libxul
--enable-debugger-info-modules
--disable-debug


The statistics I'd really like to see are a comparative profile:

1) stock XULRunner, warm start with components already registered
2) stock XULRunner with only jsconsole-clhandler.js removed.

It might be worthwhile to introduce a timeline point/dtrace probe/sharksomething at
http://mxr.mozilla.org/mozilla-central/source/toolkit/components/commandlines/src/nsCommandLine.cpp#565 so that you can profile each command-line handler in isolation: "entry" contains the category entry name.

We do call into jsconsole-clhandler.js at startup, but it should short-circuit almost immediately: http://mxr.mozilla.org/mozilla-central/source/toolkit/components/console/jsconsole-clhandler.js#49

So that comparative analysis should give us a pretty good understanding of how much it costs to load a fastloaded JS component, and where we spend that time. I'm concerned that "removing JS components" is probably not a good long-term solution, and we should focus on improving those load times, perhaps through JS engine XDR or XPCOM fastload improvements.
> Sounds like you might not have symbols available?
I do have symbols. I compiled with -g option (--enable-optimize=" -g" and --disable-strip). I don't have --enable-debugger-info-modules. Is this mandatory?

> I think that would _decrease_ the sampling rate
Thanks. It was not clear for me from the documentation.

> so that you can profile each command-line handler in isolation: "entry"
contains the category entry name.

Benjamin: Probably jsconsole-clhandler.js won't take a long time. But do we need that for mobile? Or Does it need to be initialized before the window is visible? If not, I would like to unregister it from "app-startup" notification and register for "window-visible" notification.

>I'm concerned that "removing JS components" is probably not a good long-term
solution,....

I agree. I am not asking for removing all JS components, but trying to see which  ones can be avoided for _mobile_ and which ones can be created later (after window-visible) or which ones can be created on demand.

> and we should focus on improving those load times, perhaps through JS
engine XDR or XPCOM fastload improvements.

I agree that all possible optimizations should be done to improve firefox on all platforms (devices, PCs, etc.)

I am not an expert, but after working on this for ~6 months, I think the return on investment by delaying or avoiding unnecessary stuff is much better/quicker than by optimization. 

Also, I think there is no single bottleneck or hotspot. Problem is a lot of stuff is being loaded. I guess the goal for desktop firefox was to initialize everything and be ready for all possibilities. The goal for mobile browser, in my mind, should be to do just enough work to make user experience better and do the rest in background or on demand.

It will take years for me to reach 3 sec startup time goal if I go the optimization route, I suppose ;-)
> Benjamin: Probably jsconsole-clhandler.js won't take a long time. But do we
> need that for mobile? Or Does it need to be initialized before the window is
> visible? If not, I would like to unregister it from "app-startup" notification
> and register for "window-visible" notification.

The jsconsole commandline handler isn't app-startup, I don't think. It's a command-line handler, which by nature has to be run during the startup sequence, because that's what tells you what windows to open! Now, you could delay the actual work of opening the JS console until the browser window is open (since the browser window is more important)... but that's not the issue here.

> I agree. I am not asking for removing all JS components, but trying to see
> which  ones can be avoided for _mobile_ and which ones can be created later
> (after window-visible) or which ones can be created on demand.

I am very worried about creating an entirely separate startup sequence and component set for mobile, which is where you seem to be going: there are lots of hidden pitfalls about components assuming other components are loaded, or that the profile is in a certain state, etc.  I think that we should try to keep the mobile platform as close to the desktop platform as we can, for ease of maintenance, shared development and testing, etc.  This is why I think we should be as specific as possible about profile data. If we can delay bringing up the network until the first browser window is open, we should do that in general, I think, not just for mobile (and we should write tests to make sure it stays that way).

> I am not an expert, but after working on this for ~6 months, I think the return
> on investment by delaying or avoiding unnecessary stuff is much better/quicker
> than by optimization. 

Perhaps so: but if that's the case, we should do it for desktop and mobile, and not go diverging the two without a firm understanding of what the cost/benefit tradeoff is.

The particular concern I have here is: it appears that we're not actually running much JS code in the case of the JSConsole command-line handler. If that very small amount of code is taking a long time to complete, we need to know that, hence the need for profiles.

> everything and be ready for all possibilities. The goal for mobile browser, in
> my mind, should be to do just enough work to make user experience better and do
> the rest in background or on demand.

There are two variables here:

* Time it takes to load component X
* The amount of time it takes for component X to do its duties

I'd really like to get profile data about these separately: if simply *loading* components takes time, we have to solve a very different architectural problem than simply delaying certain tasks until after a window is visible.
(In reply to comment #30)
> > Sounds like you might not have symbols available?
> I do have symbols. I compiled with -g option (--enable-optimize=" -g" and
> --disable-strip). I don't have --enable-debugger-info-modules. Is this
> mandatory?

Don't set --enable-optimize="-g" -- that turns off all the compiler optimization flags!

I don't know why you would only see two symbols, one of them "operator new", if you have the appropriate symbols in the resulting binary.  Does nm report a lot of symbols in the xulrunner binary?  Does oprofile give you results on a simple test program with busy loops in various functions?

> Benjamin: Probably jsconsole-clhandler.js won't take a long time. But do we
> need that for mobile? Or Does it need to be initialized before the window is
> visible? If not, I would like to unregister it from "app-startup" notification
> and register for "window-visible" notification.

The point is to find out what the overhead is of having a null JS component, so we know if it's fruitful to remove (and risk bad dependency interactions later, as history teaches us to fear) or simply short-circuit more aggressively.  That's why jsconsole-clhander was chosen, as Benjamin explained: it does basically nothing, so we'll know if the issue is "work being done" or "work being checked-for", basically.

(The clhandler only makes sense before window-visible, though, to answer your question.)

> I am not an expert, but after working on this for ~6 months, I think the return
> on investment by delaying or avoiding unnecessary stuff is much better/quicker
> than by optimization. 

What is the investment required to delay these things?  Benjamin and I are telling you that we believe it to be very high, and that we want to know where the time is being spent so as to help direct effort most effectively.  Maybe we have to rewrite some of the early-start components in C++, if the cost of thunking to a component is inherently too high.

> Also, I think there is no single bottleneck or hotspot. Problem is a lot of
> stuff is being loaded. 

I don't know how you can tell that there is no single bottleneck or hotspot, or that the problem is "stuff being loaded" without measurement.  (It's very likely that we'll have to improve performance in a number of areas in order to meet the startup targets, which is why getting good data is critical to making good decisions about where to point the effort.)

> I guess the goal for desktop firefox was to initialize
> everything and be ready for all possibilities. The goal for mobile browser, in
> my mind, should be to do just enough work to make user experience better and do
> the rest in background or on demand.

No, that's the same goal we have with Firefox -- get to the ready-for-user-action state as quickly as practical, without sacrificing necessary functionality.  "Mobile" is also not just a single set of use cases, and making people fracture the XULRunner platform differently in each way is not where we want to go.

> It will take years for me to reach 3 sec startup time goal if I go the
> optimization route, I suppose ;-)

It's going to take even longer if we're hacking stuff out without understanding where the costs go, because the experimental matrix gets to be enormous.
bsmedberg, mshaver: Thanks for your insightful comments and correcting me.
I was not thinking of completely diverging for mobile. I thought doing this cleanly is possible, in most cases, may be by new chrome (css, xul, xbl) or config options, etc.

I understand analyzing profiles is important, and I am working on them.

> The jsconsole commandline handler isn't app-startup, I don't think

My mistake. You are right.

> Don't set --enable-optimize="-g"

Sorry. While writing comments here, I wrote like that, but I have other options like -O2, etc. Anyway, your suggestion is better.

> Does nm report a lot of symbols in the xulrunner binary?
Yes.
Thank _you_ for digging into this.  I think the results are going to be pretty great, we just need to understand the terrain a bit better before we start picking a path.
This is the output I got from opreport -l.

I did the following:
* opcontrol --no-vmlinux --image=<path-to-xulrunner-bin> --event=GLOBAL_POWER_EVENTS:6000 
* opcontrol --start-daemon

Then I wrote following shell script so the shutdown sequence is omitted from profile data. The xulrunner is ran 100 times (see the script), since for one run there wasn't much output.
---
#!/bin/sh

export LD_LIBRARY_PATH=.

i=1
while [ $i -le 100 ]
do
    sudo opcontrol --start
    sleep 1
    ./xulrunner-bin --app ~/tbrowser/application.ini &
    tpid=$!
    sleep 2
    sudo opcontrol --dump
    sudo opcontrol --stop
    sleep 1
    kill -9 $tpid
    i=`expr $i + 1`
done
---

I used the following mozconfig with xulrunner code (I did not make any changes)

----
. $topsrcdir/xulrunner/config/mozconfig

mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/obj-dbg
ac_add_options --disable-crashreporter
ac_add_options --disable-debug
ac_add_options --enable-optimize
ac_add_options --enable-libxul
ac_add_options --enable-debugger-info-module
ac_add_options --disable-strip
ac_add_options --disable-logging
ac_add_options --disable-inspector-apis
ac_add_options --disable-xpcom-obsolete
ac_add_options --disable-mochitest
ac_add_options --disable-accessibility
ac_add_options --disable-tests
ac_add_options --enable-chrome-format=flat
ac_add_options --disable-javaxpcom
ac_add_options --disable-printing
----

After the build is done, I did tar -chzf bin.tgz bin in xulrunner/objdir/dist/ and then tar -xzf bin.tgz somewhere else and ran the script from this bin/ Just to make sure that all .sos are in this directory.

I see a lot of [heap] (...) (no-symbols). How to fix this?
Is this type of data useful or should I archive and upload the whole oprofile session data for each run?
Thanks to my colleague Guoxin Fan for generating oprofile results. Attached zip file contains the following.
- readme.txt : Explains the changes made in each step (test1, test2, etc..) and the steps followed to produce this output.
- optchrome : Modified files
- t0..t10 : Directories which contain the profile results for test0.. test10 respectively. Guoxin ran the oprofile 3 times for each test.

With --enable-debuginfo-modules, oprofile was not displaying all the symbols, so  -g is used.

As the attachment size is limited to 512KB, I split the output into 2.
Probably it has a sense to link xpt files:
http://www.mozilla.org/scriptable/typelib_tools.html#xpt_link
Product: Firefox → Toolkit
Blocks: 431824
tracking-fennec: --- → 1.0b2+
(In reply to comment #31)
> I'd really like to get profile data about these separately: if simply *loading*
> components takes time, we have to solve a very different architectural problem
> than simply delaying certain tasks until after a window is visible.

A small note related to loading times and Symbian OS, a platform Mozilla is also being ported to (https://wiki.mozilla.org/Mobile/Symbian).

On Symbian OS binary executables are loaded entirely in memory before they are executed. This applies also to the entire static dependency tree of a component. Time spent loading code is a typical start-up time issue on Symbian OS, also for small components with big dependency trees.

An effective way to tackle the problem (in Symbian OS versions prior 9.4, see below) is to break the components to a part required during start-up and part required later. Static dependencies can be broken with dynamic on-demand component loading. Although the  quote above seems to be related to Javascript, the similarity is that reducing the amount of code *loaded* on Symbian OS is very different architectural problem to solve than just limiting the amount of code *executed* during start-up.

Note that Symbian OS v9.4 will provide (still limited) support for on-demand paging of executable code: http://developer.symbian.com/main/downloads/papers/whatsnew9.4/What's_new_for_developers_v9.4.pdf
tracking-fennec: 1.0b2+ → 1.0+
We don't have any real actionable tasks in this bug. We have lots of great profile info though.

vote for not blocking1.0
removing blocking flags from meta bugs
tracking-fennec: 1.0+ → ---
Keywords: meta
Closing this bug. The N810 is no longer a supported platform for Fennec. We can, and should reopen any specific startup perf tasks in new bugs for current platforms.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.