Closed Bug 629513 Opened 11 years ago Closed 3 years ago

[meta] improve page load time

Categories

(Firefox for Android Graveyard :: General, defect, P1)

ARM
Android
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: tarend, Unassigned)

References

()

Details

(Keywords: meta, mobile, perf)

Attachments

(5 files, 2 obsolete files)

Comparing launch time - see video at http://bit.ly/gjD3Kk (left: B3 right:B4)
(1) Set up two identical Nexus S devices, one with Beta 3, the other with Beta 4
(2) Set both browser start pages to http://www.mozilla.com/en-US/m/
(3) cold-restart browser on both devices

Result: Beta 4 brings up chrome a little faster but seems to take longer to render the page. Overall, from tap of the icon to showing content, Beta 3 is slightly faster.
Expected: Beta 4 launches and renders first page significantly faster than Beta 3

Can we do some detailed performance profiling?
Is this first run? or have you let each device create a profile already? We haven't done much for first run in this dev cycle.
This is not first run. Profiles were created on both phones before. I saw some variation in my tests (of course, this is not an exact method to measure performance) - in most cases the Beta 3 version showed the web page faster, as can be seen in the video. Only sometimes both versions seemed to show the web page at about the same time.
tracking-fennec: --- → ?
tracking-fennec: ? → 2.0b5+
Summary: Launch time of Beta 4 has not improved compared to Beta 3 → page load time of Beta 4 has not improved compared to Beta 3
I will get ipc profiling (bug 596725) to work on device and see what that shows.
Assignee: nobody → azakai
Bug 632005 landed and gave us a 14.6% Tp improvement. The improvement came from a reduction in ipc STATE messages and a reduction in XPConnect calls into a JS implementation of a webprogress listener.

I have been looking for more wins, but haven't found any yet.

Bug 632037 has some general optimization changes that improve Tp by another ~15%.
Attached file startup ipc profiling
Startup IPC profiles (2 runs).

Not sure there is anything useful here. The latency just shows the child is busy while the parent floods it with stuff to do.
Attachment #511237 - Attachment mime type: application/octet-stream → text/plain
Some IPC profiling data for loading google.com (the nonmobile version).
IPC profile data for doing a google search for 'mozilla' in the awesomebar. Result is a google mobile page.
In google.com, we suffer

PStorageChild|Msg_GetValue [SYNC/RPC]    total latency:     0.0149    total processing:     0.6279

that is, 0.63 seconds are spent processing web storage. If we disable web storage, this would go away.

Otherwise nothing seems to jump out of the data.
alon, could you create a followup to disable storage.
(In reply to comment #9)
> alon, could you create a followup to disable storage.

Whoa - let's not be hasty. What storage are we talking about?
What is this about:

PHttpChannelParent|Msg_OnStatus               total latency:     8.5068    total processing:     0.0199    total: 45:45:45
PHttpChannelParent|Msg_OnProgress               total latency:     1.8206    total processing:     0.0126    total: 4:4:4

They are at the top of the list and it jumps out at me
(In reply to comment #9)
> alon, could you create a followup to disable storage.

We can do that in bug 627635, where discussion about disabling web storage already took place. I posted about the data from here in that bug.

Disabling web storage is very easy, it's just a matter of deciding to do so. (Or we can just pref it off by default.)
(In reply to comment #11)
> What is this about:
> 
> PHttpChannelParent|Msg_OnStatus               total latency:     8.5068   
> total processing:     0.0199    total: 45:45:45
> PHttpChannelParent|Msg_OnProgress               total latency:     1.8206   
> total processing:     0.0126    total: 4:4:4
> 
> They are at the top of the list and it jumps out at me

These do have a lot of latency, but the definition of latency is "*total* of |amount of time between sending the message, to starting to do something about it on the other side|".

The reported amount should possibly be 'average latency' and not total. Average latency for the first is 0.18, for the second 0.45. That is still significant, and we should perhaps look more into this, but I suspect it is just that there is a burst of HTTP traffic, and a lot of messages end up overlapping.
Attached image parent IPC traffic
Here is a picture of IPC traffic (parent process), just to clarify the latency issue. Each line is a single message. As you can see, there is a lot of overlap.
On simple sites, like google.com, I don't see much OnProgress and OnStatus message traffic. But on CNN or Yahoo, I see several hundred message sent.

I still can't determine if removing the messages makes the Tp any faster. My tools are failing me at the moment.
Filed bug 633442 to try to reduce the messages. I found removing them yields a 8.7% Tp improvement
Attached file parent event profiling (obsolete) —
Here is some preliminary data from a new event profiling patch I am working on. Patch is at http://hg.mozilla.org/users/azakai_mozilla.com/patches/file/tip/enventigating_no_rtti

The patch automatically rewrites the C++ codebase so that we can identify all runnables (and, partially, events). The output shows what runnables/events took up our time.

This file is for the parent process, and is for warm startup + load wikipedia's 'wiki' page.

Notes:

* main|side means which thread the event/runnable ran in.

Caveats:

* No checking for recursion. If one runnable ends up running additional runnables inside it, its total time will include its children.
* nsTimerEvent is not very informative yet, since we don't know which timer event is active here. I will try later to use my wakeup patch to help with that.
* DoWorkRunnable is a runnable that does IPC stuff. So pretty much all IPC stuff will look like this. But we have separate profiling data for IPC data.
* Only events/runnables that take longer than 0.01 seconds are noticed.
* Events are identified by their numerical identifiers, not string names yet.
Attached file child event profiling (obsolete) —
Child process stuff.
Moving this investigation to blocking final. We have got 2 improvements from this work in for beta 5, which is good. Hopefully, more will come.
tracking-fennec: 2.0b5+ → 2.0+
Attached file Profiling data
More raw profiling data than you can shake a stick at. Includes timer identifications.
Attachment #511833 - Attachment is obsolete: true
Attachment #511834 - Attachment is obsolete: true
Some commentary on this data:

1. Sync spends ~0.5 seconds on first startup, and ~0.1 seconds on warm startup later, on the main thread. This is without setting anything up for sync.

2. SessionStore takes some time in first startup and subsequent startups, sometimes quite a lot (2.3 seconds in one case!), but mostly 0.1 seconds or so. On every page load in the parent process, however, we spend some ~0.05 seconds in SessionStore (and sometimes significantly more), in the main thread.

3. |./dom/base/nsJSEnvironment.cpp:3591| and |./content/events/src/nsEventStateManager.cpp:809| are the GC timer. This takes ~0.3 secs on first startup, then 0.1-0.3 seconds every few page loads (not every one) and on first warm startup.

4. First warm startup after first run is not very fast. Subsequent warm startups are much faster. Visually, one difference is that the "Add-ons for your Fennec" thingie spins for a long time here. In the profiling data, |./dom/base/nsGlobalWindow.cpp:8826| is high, which is a setTimeout or Interval. Does the Addons thingie do a lot of stuff in a timeout? If so can we make it a web worker?

5. AsyncAssociateIconToPage and InsertVisitedURIs can take a second or more in some cases, less (or even 0) in others. At least they are on a side thread, but still a lot of time is spent in these two.

6. icanhascheezburger.com: Very intensive website. The parent process actually has little activity in it - no more than other websites - which is very good, and shows how responsive the browser is. In the child process, almost all the work is in PaintTask.
> 1. Sync spends ~0.5 seconds on first startup, and ~0.1 seconds on warm startup
> later, on the main thread. This is without setting anything up for sync.

I'm having trouble building for Android now for some reason, but on Linux desktop built today, I don't see sync anymore (unless I do something with it), confirming what dougt said to me on IRC. So it seems this is not an issue.
FWIW:

Runnable|./mobile/components/SessionStore.js:378

This is where the sessions store does a delayed write to disk of the current session. I'm not sure why it's triggered at startup. At most, it should wait 2 secs before writing out the state. If a new state request is triggered while we're waiting to write out the state, we ignore it.
Making this a meta bug for page load issues
tracking-fennec: 2.0+ → ---
Summary: page load time of Beta 4 has not improved compared to Beta 3 → [meta] improve page load time
Keywords: meta, mobile, perf
Blocks: 459117
Assignee: azakai → nobody

No activity for 8 years, let's close it.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.