Closed Bug 1038176 Opened 10 years ago Closed 10 years ago

[Meta] SMS app launch latency regressed in v2.0

Categories

(Firefox OS Graveyard :: Gaia::SMS, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:2.0+, b2g-v2.0 fixed, b2g-v2.0M fixed, b2g-v2.1 fixed, b2g-v2.2 fixed)

RESOLVED FIXED
2.1 S4 (12sep)
blocking-b2g 2.0+
Tracking Status
b2g-v2.0 --- fixed
b2g-v2.0M --- fixed
b2g-v2.1 --- fixed
b2g-v2.2 --- fixed

People

(Reporter: tkundu, Assigned: julienw)

References

Details

(Keywords: meta, perf, regression, Whiteboard: [caf priority: p2][c=regression p= s= u=2.0] [CR 694012] )

Attachments

(1 file)

SMS app launch latency has regressed compared to v1.3.

For v1.3:
gaia: https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gaia/commit/?h=mozilla/v1.3&id=e08beb0297aff472b5c84437e9d7eaf8c0400a29
gecko: https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gecko/commit/?h=mozilla/v1.3&id=e0bfab94fd20e4dcbb2fecf19d7c2dc606189f81




For v2.0:
gaia: https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gaia/commit/?h=mozilla/v2.0&id=ef67af27dff3130d41a9139d6ae7ed640c34d922
gecko: https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gecko/commit/?h=mozilla/v2.0&id=d3eae03cdf4e6944e646d05938a22dc1380a0d95


Test Steps:
1) Flush v1.3 and v2.0 build on two flame device (512MB) device. Launch Message app and compare side by side.

2) launch latency = touchdown to displaying Messages. Message app contains 80 messages.
   In v1.3 launch latency is : 694 msecs
   In v2.0 launch latency is : 1694 msecs
blocking-b2g: --- → 2.0?
Whiteboard: [CR 694012]
NI :mlee here as well to get started with investigation.
Flags: needinfo?(mlee)
Keywords: perf
blocking-b2g: 2.0? → 2.0+
David,
Can you have someone on the SMS app team look into this?
Flags: needinfo?(mlee) → needinfo?(dscravaglieri)
Whiteboard: [CR 694012] → [c=regression p= s= u=2.0] [CR 694012]
Here's the data we've gathered using Eideticker:


----- Forwarded Message -----
From: "William Lachance" <wlachance@mozilla.com>
Subject: Re: Regressions in App startup time in 2.0

Ok did a run, results below. Videos, etc. at

http://people.mozilla.org/~wlachance/eideticker-startup-test/

(click through to the index page, then click on the bars for videos and an
option to dig into the detail of what we're measuring... e.g.

http://people.mozilla.org/~wlachance/eideticker-startup-test/14-results/contacts-startup-14/#/timetostableframe
->
http://people.mozilla.org/~wlachance/eideticker-startup-test/14-results/contacts-startup-14/detail.html?id=90d314340b8f11e4b65210ddb19eacac#/framediffsums)

Remember this is a comparison between 1.4 and 2.0, so not the same as
what Qualcomm was measuring.

Still:

1. We see a small regression with contacts
2. An *improvement* in settings
3. An *improvement* in camera.

So not what they were seeing.

Hope this helps and let me know if you need more help interpreting this
data,

Will

----

v2.0 aurora (build on 14 July):

=== Results on b2g-contacts-startup for FirefoxOS ===
   Times to first stable frame (seconds):
   [1.8833333333333333, 1.95, 1.9, 1.9833333333333334, 1.8666666666666667]

=== Results on b2g-messages-startup for FirefoxOS ===
   Times to first stable frame (seconds):
   [4.683333333333334, 4.466666666666667, 4.633333333333334,
4.466666666666667, 4.55]

=== Results on b2g-settings-startup for FirefoxOS ===
   Times to first stable frame (seconds):
   [2.316666666666667, 2.1333333333333333, 2.2666666666666666, 2.1,
2.283333333333333]

=== Results on b2g-camera-startup for FirefoxOS ===
   Times to first stable frame (seconds):
   [2.6333333333333333, 2.2333333333333334, 2.95, 2.966666666666667, 2.2]

v1.4 (build on 14 July):

=== Results on b2g-contacts-startup for FirefoxOS ===
   Times to first stable frame (seconds):
   [1.8, 1.65, 1.65, 1.8166666666666667, 1.5666666666666667]

=== Results on b2g-messages-startup for FirefoxOS ===
   Times to first stable frame (seconds):
   [4.016666666666667, 4.0, 4.133333333333334, 4.133333333333334,
3.9833333333333334]

=== Results on b2g-settings-startup for FirefoxOS ===
   Times to first stable frame (seconds):
   [2.6166666666666667, 2.5, 2.8666666666666667, 2.1333333333333333, 2.6]

=== Results on b2g-camera-startup for FirefoxOS ===
   Times to first stable frame (seconds):
   [3.3, 2.933333333333333, 3.0, 2.8833333333333333, 2.816666666666667]
Could you please try with 'TestSteps' which I mentioned in Comment 0. 

Delay should be visible by open eyes if you compare side by side v1.3 and v2.0 devices.
Flags: needinfo?(mlee)
Tapas, can you please give us the start and end point for your measurements?

We _did_ changed how the startup works, and we know this is making things _look_ worse using automated measures, but in the real life it makes things (slightly) better: the data is displayed earlier. See bug 1002847 comment 1 and bug 1002847 comment 7 for more information.

Basically, I think that the meaningful time is from "tap on the icon" to "display the first thread", and _this_ time should be either equal or slightly better in average. Moreover, the new behavior should show the empty white screen for a much shorter time.

Tapas, can we have the side-by-side video you talked about?

Note that we do have plans to make this a lot better in v2.1. We did the background work during the v2.0 timeframe. If we need to improve this in v2.0, we can do it, but since it would be risky I'd rather do not.

Renominating 2.0? until we get a clearer understanding of the issue.
blocking-b2g: 2.0+ → 2.0?
(NI Tapas for the questions in previous comment)
Flags: needinfo?(tkundu)
Echo to Julien about the method for profiling the performance, because we have no idea where did these launch time profile comes from. We have app launch time profiling in datazilla, the environment is different but I can't see this tremendous lantency (up to 1 sec) between 2 branches:

Message app cold-load time in v1.3:
https://datazilla.mozilla.org/?start=1394523254&stop=1405405587&product=B2G&repository=v1.3&test=messages&page=cold_load_time&project=b2g

Message app cold-load time in v2.0:
https://datazilla.mozilla.org/?start=1404800787&stop=1405405587&product=B2G&repository=v2.0&test=messages&page=cold_load_time&project=b2g
On flame:
https://datazilla.mozilla.org/b2g/?branch=v2.0&device=flame&range=90&test=cold_load_time&app_list=marketplace,messages,sms&app=messages&gaia_rev=2c6c413ed729d465&gecko_rev=d32649a24965&plot=avg

Note that we actually have more perfromance testing event for now, but v1.3 only got cold-load time for comparison. If we want to compare the message displayed time, maybe we can check the 'above-the-fold-ready' event for more accurate result?
(In reply to Tapas Kumar Kundu from comment #4)
> Could you please try with 'TestSteps' which I mentioned in Comment 0. 
> 
> Delay should be visible by open eyes if you compare side by side v1.3 and
> v2.0 devices.

Tapas,

You can see videos with frame by frame stills of the Messages app launch process by clicking on any of the bars in the charts for 1.4 at:

http://people.mozilla.org/~wlachance/eideticker-startup-test/14-results/messages-startup-14/#/timetostableframe

and 2.0 at:

http://people.mozilla.org/~wlachance/eideticker-startup-test/20-results/messages-startup-20/#/timetostableframe

Click the "Detail view" link below the video preview to see individual frames. Is this not matching what your tests are seeing?
Flags: needinfo?(mlee)
hi Mike, 

Editicker videos seem to be interesting. Can you please load same SMS database(80 sms) on both v1.3 & 2.0 device and get Editicker results for that ? Please post exact gaia gecko revisions which you are using for this test.


We saw regression between v1.3 and v2.0 (see comment 0) .  I will also try to get a side by side comparison video for you.
Flags: needinfo?(tkundu) → needinfo?(mlee)
Hi Tapas,

Please post your comparison videos and respond to Julien's comment 5. We don't have FxOS 1.3 Flame builds that support the automation used to capture the data presented by Eideticker. 1.4 is the earliest version for which we have this support.

Thanks,
Mike
Flags: needinfo?(mlee) → needinfo?(tkundu)
Severity: blocker → normal
Priority: P1 → P2
Whiteboard: [c=regression p= s= u=2.0] [CR 694012] → [c=regression p= s= u=] [CR 694012]
Here is my side by side comparison: https://www.youtube.com/watch?v=LlfUCCNl4AE

Please note the my test environment is on flame with 1G ram and latest moz build. We can see a delay from tapping on the icon to app animation started, and app animation time is also slightly longer on v2.0  The delay from app show up. In the message app, the duration from app started to first thread show up seems take more time too. But overall latency should not be increased up to 1 sec. I can revisit the possible bottleneck in master(Although we did some performance improvement after 1.4, we also adding lots of features comparing to 1.3), and also flag QA wanted for more details on the exactly same test environment.
Keywords: qawanted
Whiteboard: [c=regression p= s= u=] [CR 694012] → [c=regression p= s= u=] [CR 694012]
The average time(from dom loaded to first batch of threads rendered on the view) for 2 branches:
v1.3 : 554ms | master : 978ms

It actually have some latency in master, but the gap isn't that huge as reporter did. For details from each part:

From dom loaded to module init complete:
v1.3 : 83ms  | master : 173ms

Some regression in initialization might be imaginable because we actually got more things for setup, but might need more inspection if this ~100ms delay is reasonable.

From getThread to start rendering:
v1.3 : ~50ms | master : ~200ms

I'll investigate the regression from getThread to start render first because we should have improvement in this part instead of regression.

Another noticeable part is thread appending. It's 40~50 ms for each in v1.3 but increased up to 70~120ms in master.
Hi Bevis,
After some inspect on getThreads part, I suspect the bottleneck is in mozMobilemessage.getThreads API. I narrow down the scope from cursor set to success callback return, and the latency is evident: on v1.3 it only take 50 ms around, but it need 200 ms on master. Another interesting discovery is when I clear the mesageDB, it still took nearly 200ms on master, but on 1.3 it would less than 5 ms. Do you have any idea about this regression?
Flags: needinfo?(btseng)
Hey Vicamo,

Since Bevis is away, can you have a look to comment 12 and comment 13 here?
Flags: needinfo?(vyang)
Flags: needinfo?(btseng)
(In reply to Steve Chung [:steveck] from comment #12)

> Another noticeable part is thread appending. It's 40~50 ms for each in v1.3
> but increased up to 70~120ms in master.

This would be useful to know if this is different in master and v2.0, because this could be a result of the new l10n library using Mutation Observers (in master only) and the fact that we're not very smart when we insert the threads. We did measure this before landing these changes though, but maybe we measured badly.
Something else to look at, the WindowManager patch [1] that landed in bug 1028002 would have lengthened the duration of the splash screen.

[1] https://github.com/mozilla-b2g/gaia/commit/0b42b82e00e4aff63ab0aacba10b1e923cf6e77e
blocking-b2g: 2.0? → 2.0+
Severity: normal → blocker
Priority: P2 → P1
Whiteboard: [c=regression p= s= u=] [CR 694012] → [c=regression p= s= u=2.0] [CR 694012]
Assigning this to you Steve since you're actively investigating this. If Vicamo or Bevis are better owners please reassign to them . We need owners for all 2.0 blockers.
Assignee: nobody → schung
Status: NEW → ASSIGNED
(In reply to Mike Lee [:mlee] from comment #17)
> Assigning this to you Steve since you're actively investigating this. If
> Vicamo or Bevis are better owners please reassign to them .

We'll see after Vicamo's answer.

> We need owners
> for all 2.0 blockers.

Unnecessary: the bug is moving forward without an assignee here, so what's the point in assigning someone?
Julien,

This may seem unnecessary to you but there's a significant of time being spent by b2g-drivers tracking these releases. Each issue without an assignee requires someone to click into it and read through multiple comments to get a picture of where the issue is and who to contact if more info is needed. This doesn't scale when looking at the volume of issues being tracked. Setting an assignee makes it clear at a high level who owns an issue at its current stage whether that's investigation or implementation.
(Master branch) In summary, when returning a thread object:

child DOM =(10 ms)=> parent service
          =(10 ms)=> cursor ready
          =(10 ms)=> native thread object created and sent to IPDL
          =(100 ms)=> child process received that notification
          =(70 ms)=> DOMCursor::FireSuccess()

Takes 200ms for the first record to arrive Gaia. Among them, 100ms is contributed by the context switch when sending message back from parent process, and another 70ms contributed by DOMCursor::FireSuccess() alone!

======= Raw log prints ========
07-18 11:57:12.141 I/Gecko   ( 1302): MobileMessageManager::GetThreads begin
07-18 11:57:12.151 I/Gecko   ( 1198): MobileMessageDB: Getting thread list
07-18 11:57:12.151 I/Gecko   ( 1302): MobileMessageManager::GetThreads end
07-18 11:57:12.151 I/Gecko   ( 1198): MobileMessageDB: newTxn ready
07-18 11:57:12.161 I/Gecko   ( 1198): MobileMessageDB: threadStore cursor.onsuccess
07-18 11:57:12.161 I/Gecko   ( 1198): MobileMessageDB: GetThreadsCursor::getThreadTxn createThread
07-18 11:57:12.171 I/Gecko   ( 1198): MobileMessageDB: GetThreadsCursor::getThreadTxn notifyCursorResult
07-18 11:57:12.171 I/Gecko   ( 1198): MobileMessageCursorParent::NotifyCursorResult
07-18 11:57:12.171 I/Gecko   ( 1198): MobileMessageCursorParent::SendNotifyResult
07-18 11:57:12.171 I/Gecko   ( 1198): MobileMessageDB: threadStore cursor.onsuccess done
07-18 11:57:12.271 E/GeckoConsole( 1302): Content JS LOG at app://sms.gaiamobile.org/js/message_manager.js:118 in mm_getThreads: Gaia: begin calling getThreads
07-18 11:57:12.271 E/GeckoConsole( 1302): Content JS LOG at app://sms.gaiamobile.org/js/message_manager.js:120 in mm_getThreads: Gaia: end calling getThreads
07-18 11:57:12.271 I/Gecko   ( 1302): MobileMessageCursorChild::RecvNotifyResult
07-18 11:57:12.271 I/Gecko   ( 1302): MobileMessageCursorCallback::NotifyCursorResult
07-18 11:57:12.271 I/Gecko   ( 1302): MobileMessageCursorCallback::NotifyCursorResult FireSuccess
07-18 11:57:12.341 I/Gecko   ( 1198): MobileMessageCursorParent::NotifyCursorDone
07-18 11:57:12.341 I/Gecko   ( 1302): MobileMessageCursorCallback::NotifyCursorResult done
07-18 11:57:12.642 E/GeckoConsole( 1302): Content JS LOG at app://sms.gaiamobile.org/js/message_manager.js:130 in onsuccess: Gaia: getThreads cursor.onsuccess
07-18 11:57:12.662 I/Gecko   ( 1302): MobileMessageCursorChild::Recv__delete__
07-18 11:57:12.662 I/Gecko   ( 1302): MobileMessageCursorCallback::NotifyCursorDone
07-18 11:57:12.672 I/Gecko   ( 1302): MobileMessageCursorCallback::NotifyCursorDone done
07-18 11:57:12.892 E/GeckoConsole( 1302): Content JS LOG at app://sms.gaiamobile.org/js/message_manager.js:130 in onsuccess: Gaia: getThreads cursor.onsuccess
Flags: needinfo?(vyang)
Hi, everyone,

Here are the results I get from Gaia-UI-test with high speed camera.
It is similar to eideticker.

@ Steps:
  1. Launch the message app
  2. Calculate the time from first splash frame to the first message displayed.

@ Data:
  200 message

@ Flame1GB Memory
  + v2.0
   1. 0.91
   2. 0.93
   3. 0.95
   4. 0.93
   5. 0.93
   6. 0.98
   Average: 0.938 seconds

  + v1.3
   1. 0.63
   2. 0.61
   3. 0.65
   4. 0.6
   5. 0.63
   6. 0.64
   Average: 0.626 seconds

@ Flame 512 MB Memory
  + v2.0
   1. 1.02
   2. 1.00
   3. 1.04
   4. 1.03
   5. 1.02
   6. 1.07
   Average: 1.03 seconds

  v1.3
   1. 0.67
   2. 0.63
   3. 0.68
   4. 0.67
   5. 0.71
   6. 0.66
   Average: 0.67 seconds

@ Build information:
 + v2.0
  - Gaia      8a8622d58af2c24ef8df1c7fc7aa9cda50ac0085
  - Gecko     https://hg.mozilla.org/releases/mozilla-aurora/rev/05c9d0a949bc
  - BuildID   20140717160202
  - Version   32.0a2

 + v1.3
  - Gaia      e1b7152715072d27e0880cdc6b637f82fa42bf4e
  - Gecko     e181a36ebafaa24e5390db9f597313406edfc794
  - BuildID   20140616171114
  - Version   28.0

@ Demo video:
 1. V2.0
  - https://www.youtube.com/watch?v=aLeLxUZnyqs&list=UUtUMK--rGTCnUDybexmp4aA
 2. V1.3
  - https://www.youtube.com/watch?v=6SkrtHopfy8&list=UUtUMK--rGTCnUDybexmp4aA
On v1.3, I still observed a 90ms IPC delay and DOMCursor::FireSuccess() takes 50ms.
(In reply to Mike Lee [:mlee] from comment #19)
> Julien,
> 
> This may seem unnecessary to you but there's a significant of time being
> spent by b2g-drivers tracking these releases. Each issue without an assignee
> requires someone to click into it and read through multiple comments to get
> a picture of where the issue is and who to contact if more info is needed.
> This doesn't scale when looking at the volume of issues being tracked.
> Setting an assignee makes it clear at a high level who owns an issue at its
> current stage whether that's investigation or implementation.

For blockers, if the bug is unassigned, the component owner's owns the issue. There is no need to read several comments to know who's the component's owner.
Keywords: qawanted
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #20)
> Takes 200ms for the first record to arrive Gaia. Among them, 100ms is
> contributed by the context switch when sending message back from parent
> process, and another 70ms contributed by DOMCursor::FireSuccess() alone!

Last time I checked the mozMobileMessage Gecko code it looked pretty inefficient, creating an entire IndexedDB transaction per cursor iteration which is what you're seeing in the log too.
No longer blocks: CAF-v2.0-FC-metabug
Vicamo, per Comment 24, it seems more investigation is needed on mozMobileMessage on Gecko
Flags: needinfo?(dscravaglieri) → needinfo?(vyang)
(In reply to David Scravaglieri [:scravag] from comment #25)
> Vicamo, per Comment 24, it seems more investigation is needed on
> mozMobileMessage on Gecko

That's not true.
Flags: needinfo?(vyang)
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #26)
> (In reply to David Scravaglieri [:scravag] from comment #25)
> > Vicamo, per Comment 24, it seems more investigation is needed on
> > mozMobileMessage on Gecko
> 
> That's not true.

Per comment 20 and comment 22, IPC contributes the major latency, and v2.0 and v1.3 show close results. Also, there's no evident logic change on the mozMobileMessage cursor part b/w v1.3 and v2.0. It's still not obvious that mozMobileMessage API caused the regression.

However, since Vicamo's analyzes are done on Unagi, we'd better profile again on Flame with the same scenario as Steve did. 

Vicamo, please help again, thank you!
Flags: needinfo?(vyang)
(In reply to William Hsu [:whsu] from comment #21)
> Hi, everyone,
> 
> Here are the results I get from Gaia-UI-test with high speed camera.
> It is similar to eideticker.
> 
> @ Steps:
>   1. Launch the message app
>   2. Calculate the time from first splash frame to the first message
> displayed.
> 
In our case, we measure only from touching sms icon to displaying full page of sms as 'launch latency'.
Thanks a lot for your comparison video. Its good to see that you are also observing some noticeable difference between v1.3 and v2.0 although it does not match with my observation in comment 0 .
Tapas, what does it mean "displaying full page"? Is it "the application is displayed but no threads displayed", "first thread is displayed", "first page of threads are displayed" or "all threads are loaded somewhere" ?
Depends on: 1041967
As we discussed in our sprint planning, we'll work in bug 1041967 on some launching time improvements and see if this makes things better.

In this bug we'll still investigate the other possible reasons.
Flame, v1.3 [1]
Gecko: b11acf33a9a7e8b30f8b4e9224ee4570ff502455
Gaia: 23f55be856cef53c6604a6fe4aeb09061afbc897

<<<< No message in DB [2] >>>>>
07-22 04:36:03.570 I/Gecko   ( 1921): MobileMessageManager::GetThreads begin
07-22 04:36:03.570 I/Gecko   ( 1921): MobileMessageManager::GetThreads end
07-22 04:36:03.570 I/Gecko   ( 1631): MobileMessageDatabaseService: Getting thread list
07-22 04:36:03.570 I/Gecko   ( 1631): MobileMessageDatabaseService: newTxn ready
07-22 04:36:03.580 I/Gecko   ( 1631): MobileMessageDatabaseService: threadStore cursor.onsuccess done
07-22 04:36:03.580 I/Gecko   ( 1631): MobileMessageCursorParent::NotifyCursorDone
07-22 04:36:03.640 I/Gecko   ( 1921): MobileMessageCursorChild::Recv__delete__
07-22 04:36:03.640 I/Gecko   ( 1921): MobileMessageCursorCallback::NotifyCursorDone
07-22 04:36:03.650 I/Gecko   ( 1921): MobileMessageCursorCallback::NotifyCursorDone done
07-22 04:36:03.650 E/GeckoConsole( 1921): Content JS LOG at app://sms.gaiamobile.org/js/message_manager.js:305 in mm_getThreads: Gaia: begin calling getThreads
07-22 04:36:03.650 E/GeckoConsole( 1921): Content JS LOG at app://sms.gaiamobile.org/js/message_manager.js:307 in mm_getThreads: Gaia: end calling getThreads
07-22 04:36:03.650 E/GeckoConsole( 1921): Content JS LOG at app://sms.gaiamobile.org/js/message_manager.js:314 in onsuccess: Gaia: getThreads cursor.onsuccess
==========
Gecko round trip time: 80ms include 60ms in IPC channel.

Flame, v2.0
Gecko ac566ed31fe9ba0c11365acbf51ac5830ffe7541
Gaia: b9d19011123487009c80d1200937652d58c434a0

<<<<< No message in DB >>>>>
07-22 16:45:27.474 I/Gecko   ( 1156): MobileMessageManager::GetThreads begin
07-22 16:45:27.474 I/Gecko   ( 1156): MobileMessageManager::GetThreads end
07-22 16:45:27.474 I/Gecko   (  285): MobileMessageDB: Getting thread list
07-22 16:45:27.474 I/Gecko   (  285): MobileMessageDB: newTxn ready
07-22 16:45:27.484 I/Gecko   (  285): MobileMessageDB: threadStore cursor.onsuccess done
07-22 16:45:27.484 I/Gecko   (  285): MobileMessageCursorParent::NotifyCursorDone
07-22 16:45:27.624 I/Gecko   ( 1156): MobileMessageCursorChild::Recv__delete__
07-22 16:45:27.624 I/Gecko   ( 1156): MobileMessageCursorCallback::NotifyCursorDone
07-22 16:45:27.634 I/Gecko   ( 1156): MobileMessageCursorCallback::NotifyCursorDone done
07-22 16:45:27.644 E/GeckoConsole( 1156): Content JS LOG at app://sms.gaiamobile.org/js/message_manager.js:129 in mm_getThreads: Gaia: begin calling getThreads
07-22 16:45:27.644 E/GeckoConsole( 1156): Content JS LOG at app://sms.gaiamobile.org/js/message_manager.js:131 in mm_getThreads: Gaia: end calling getThreads
07-22 16:45:27.644 E/GeckoConsole( 1156): Content JS LOG at app://sms.gaiamobile.org/js/message_manager.js:141 in onsuccess: Gaia: getThreads cursor.onsuccess
============
Gecko round trip time doubles to 160ms, inclusive of 140ms in IPC channel.

<<<<< One message in DB >>>>>
07-22 16:59:45.774 I/Gecko   ( 1263): MobileMessageManager::GetThreads begin
07-22 16:59:45.774 I/Gecko   ( 1263): MobileMessageManager::GetThreads end
07-22 16:59:45.784 I/Gecko   (  285): MobileMessageDB: Getting thread list
07-22 16:59:45.784 I/Gecko   (  285): MobileMessageDB: newTxn ready
07-22 16:59:45.784 I/Gecko   (  285): MobileMessageDB: threadStore cursor.onsuccess
07-22 16:59:45.794 I/Gecko   (  285): MobileMessageDB: GetThreadsCursor::getThreadTxn createThread
07-22 16:59:45.794 I/Gecko   (  285): MobileMessageDB: GetThreadsCursor::getThreadTxn notifyCursorResult
07-22 16:59:45.794 I/Gecko   (  285): MobileMessageCursorParent::NotifyCursorResult
07-22 16:59:45.794 I/Gecko   (  285): MobileMessageCursorParent::SendNotifyResult
07-22 16:59:45.794 I/Gecko   (  285): MobileMessageDB: threadStore cursor.onsuccess done
07-22 16:59:45.914 E/GeckoConsole( 1263): Content JS LOG at app://sms.gaiamobile.org/js/message_manager.js:129 in mm_getThreads: Gaia: begin calling getThreads
07-22 16:59:45.914 E/GeckoConsole( 1263): Content JS LOG at app://sms.gaiamobile.org/js/message_manager.js:131 in mm_getThreads: Gaia: end calling getThreads
07-22 16:59:45.914 I/Gecko   ( 1263): MobileMessageCursorChild::RecvNotifyResult
07-22 16:59:45.914 I/Gecko   ( 1263): MobileMessageCursorCallback::NotifyCursorResult
07-22 16:59:45.914 I/Gecko   ( 1263): MobileMessageCursorCallback::NotifyCursorResult FireSuccess
07-22 16:59:45.944 I/Gecko   ( 1263): MobileMessageCursorCallback::NotifyCursorResult done
07-22 16:59:45.944 I/Gecko   (  285): MobileMessageCursorParent::NotifyCursorDone
07-22 16:59:46.084 E/GeckoConsole( 1263): Content JS LOG at app://sms.gaiamobile.org/js/message_manager.js:141 in onsuccess: Gaia: getThreads cursor.onsuccess
07-22 16:59:46.084 I/Gecko   ( 1263): MobileMessageCursorChild::Recv__delete__
07-22 16:59:46.084 I/Gecko   ( 1263): MobileMessageCursorCallback::NotifyCursorDone
07-22 16:59:46.094 I/Gecko   ( 1263): MobileMessageCursorCallback::NotifyCursorDone done
07-22 16:59:46.204 E/GeckoConsole( 1263): Content JS LOG at app://sms.gaiamobile.org/js/message_manager.js:141 in onsuccess: Gaia: getThreads cursor.onsuccess
============
120ms, 140ms for the 1st & 2nd time in IPC channel. DOMCursor::FireSuccess takes 30ms.

[1]: There is no v1.3 manifest for Flame at all. I built v1.3 Gecko with v1.4 Gonk and flashed Gecko/Gaia into the device instead.
[2]: DISCLAIMER: After flash v1.3 Gecko, the RIL doesn't work and I have no intention to debug, introspect, or do anything to even try to fix it.
Flags: needinfo?(vyang)
ni? to Ben as this might have something to do with IPC.

ni? to Jason. This may not be easy, but is it possible to get a regression-window?
Flags: needinfo?(jsmith)
Flags: needinfo?(bent.mozilla)
Whiteboard: [c=regression p= s= u=2.0] [CR 694012] → [caf priority: p2][c=regression p= s= u=2.0] [CR 694012]
datazilla shows an apparent regression around 24/25/26 June. I don't know if this is due to a change in how we get the data for datazilla or a real regression.
(In reply to Mike Lee [:mlee] Paris: 2014.07.21 - 25 from comment #10)

I shared some profiling data via email to you. Please let us know if you need more information from us. Thanks for your help :)
Flags: needinfo?(tkundu)
(In reply to Wesley Huang [:wesley_huang] from comment #32)
> ni? to Ben as this might have something to do with IPC.
> 
> ni? to Jason. This may not be easy, but is it possible to get a
> regression-window?

I know it would be hard to do a window using end-user stopwatch measurements. Let me ask Mike if someone on his team thinks that make test perf could allow us to do a window here easily.
Flags: needinfo?(jsmith) → needinfo?(mlee)
Steve, I'm taking it because I'll dig in the Homescreen and System. We clearly see in the video that the launch animation takes more time to be displayed.
Assignee: schung → felash
See Also: → 1038179
Keywords: regression
(In reply to Jason Smith [:jsmith] from comment #35)
> (In reply to Wesley Huang [:wesley_huang] from comment #32)
> > ni? to Ben as this might have something to do with IPC.
> > 
> > ni? to Jason. This may not be easy, but is it possible to get a
> > regression-window?
> 
> I know it would be hard to do a window using end-user stopwatch
> measurements. Let me ask Mike if someone on his team thinks that make test
> perf could allow us to do a window here easily.

Dave,
Do we have bisection automation that can do this?
Flags: needinfo?(mlee) → needinfo?(dhuseby)
I tried various things, added logs, and TLDR it looks like everything takes more time in v2.0. I used a Buri.

There is an obvious improvement from changing the event to launch apps from 'click' to 'touchend' in Verticalhome, but this saves only about 20ms.

I tried:
* running old homescreen (version from v2.0 branch): no real difference once I use 'touchend'
* running SMS from v1.3 on all otherwise all v2.0

Everything looks longer on v2.0:
1. receiving "webapps-launch" once we called "app.launch" in homescreen. The code in Webapps.jsm didn't change much except it uses more Promises.
2. inserting an iframe once we receive webapps-launch. The code in System changed a lot.
3. displaying each thread in SMS looks longer (even when using the SMS from v1.3 on Gecko v2.0).
4. But SMS v1.3 on Gecko v2.0 is faster than SMS v2.0 on Gecko v2.0.

About 4), I did more investigations. Notably, removing the Facebook Datastore Check in contacts.js makes inserting 1 thread as fast in SMS v2.0 than in SMS v1.3 (still running on a v2.0 Gecko). Then I notice that we start rendering threads a little later in SMS-v2.0 too. I don't have an explanation for this yet.
No longer blocks: CAF-v2.0-FC-metabug
Depends on: 1043293
needinfo'ing myself to provide profiles for comparison between the different versions
Flags: needinfo?(gsvelto)
I just talked with Julien and decided that bisecting isn't necessary at this point.  Jan noticed a small gap in time, in v2.0, between the first paint and the first content being painted in the SMS app.  That gap in time is not present in v1.3 and Julien is going to try to figure out where it is coming from.  He thought it might be related to the increase in JS and possibly the addition of calls to the Facebook contacts API.  Julien will report back when he knows more.
Flags: needinfo?(dhuseby)
Here are some profiling of the launch; I waited that about 10 threads were added.

SMS v2.0 on Gecko v2.0: https://people.mozilla.org/~bgirard/cleopatra/#report=f915068c02ba89e3c605ca453a6a7eeb0f16685f
SMS v1.3 on Gecko v2.0: https://people.mozilla.org/~bgirard/cleopatra/#report=976b4cd91bb808f19bffe7a47c2256b939b6a438

I don't find anything obvious...
Here's my first round of profiling, I've focused on the time needed to display the first thread, this has increased by 400-600ms from v1.3 to v2.0. Here's two profiles I find representative of this issue (I've taken quite a few more to ensure that what I was seeing was not only variability):

- Gaia v1.3 on Gecko v1.3, time to first thread is ~1190ms
  http://people.mozilla.org/~bgirard/cleopatra/#report=92d4fe6350bb5532605fdb3e5c29f0705a58b7fd

- Gaia v2.0 on Gecko v2.0, time to first thread is ~1750ms
  http://people.mozilla.org/~bgirard/cleopatra/#report=426886aae7b82c01a1c9aeb2b1ca73a8cdd2d098

The startup path is now much longer but most of the slowness boils down to two things:

- Loading and parsing JS/HTML/CSS code, this has increased from ~190ms to ~460ms. That's a huge increase and not all of it comes from JS parsing (which has increased by ~125ms anyway). It seems like we've added a lot of code and this is having a very measurable impact on the startup time. BTW all loading operations are slower, not just parsing, which is consistent with an increased footprint. The only solution here is to find a way to cut back on the amount of stuff we load at startup.

- Painting on the startup path has become a *lot* slower. In v1.3 we had two quick paint events that would display the empty view (the first two events marked in both profiles) followed by a third paint when the first thread was displayed. On v2.0 we have two additional refresh driver ticks burning ~165ms worth of time (see the samples between 20416 and 20796). The first of these events is synchronously caused by FontSizeUtils._registerHeadersiInSubtree() and specifically in FontSizeUtils._getCachedContext() which is drawing into a canvas so no surprise there. The second event is caused by CSS recomputations. Unfortunately I cannot tell what caused those but it could have to do with us reading a pref or something, I can see some Gecko settings activity just before it. I'm just not sure if it's related, are we lazy-loading some CSS code or creating some element after having loaded the DOM but before having added the first thread? Could this be caused by l10n code modifying the DOM?

Last but not least painting a single thread has become quite a bit slower. I need a more fine-grained profile to analyze that so I'll try to do it on Monday as I've still got other things to do today.
Flags: needinfo?(gsvelto)
Some early comments:
* FontSizeUtils stuff comes from the new headers. It should affect all apps. Could be nice if we can fix this.
* about JS code: v1.4 has an amount of JS code closer to v2.0, and I don't see so much latency in v1.4.
* However, we have more CSS files, and we don't concatenate/minify them. So maybe this could have an impact. Any idea why we don't concatenate them by the way?


If you can, I'd also like that you compare with v1.3 on Gecko v2.0 and v1.4 and Gecko v2.0. It would help separating the issues between Gaia and Gecko.
(In reply to Julien Wajsberg [:julienw] from comment #44)
> Some early comments:
> * FontSizeUtils stuff comes from the new headers. It should affect all apps.
> Could be nice if we can fix this.

Yes, I'll investigate if we have a bug covering it already. It's a small but nasty regression in startup time.

> * about JS code: v1.4 has an amount of JS code closer to v2.0, and I don't
> see so much latency in v1.4.
> * However, we have more CSS files, and we don't concatenate/minify them. So
> maybe this could have an impact. Any idea why we don't concatenate them by
> the way?

Not sure, I haven't dug much in gaia's build system lately, it's... complicated :)

> If you can, I'd also like that you compare with v1.3 on Gecko v2.0 and v1.4
> and Gecko v2.0. It would help separating the issues between Gaia and Gecko.

Sure, I'll try. But I'm interested in comparing the time to paint a single thread element first because that seems to have regressed by as much as 50%.
> Sure, I'll try. But I'm interested in comparing the time to paint a single thread element 
> first because that seems to have regressed by as much as 50%.

Steve thought that this can come from the fact we do more Contact lookup operations (Facebook lookup), and so this could compete with the MobileMessaging operations, as all are on the main thread.

But if you can confirm or invalidate this supposition, it could be nice :)
Summary: SMS app launch latency regressed in v2.0 → [Meta] SMS app launch latency regressed in v2.0
Here's the profiles with 1ms granularity for drawing threads

- Gaia v1.3 on Gecko v1.3, average time to insert and paint a thread element is ~77ms
  http://people.mozilla.org/~bgirard/cleopatra/#report=faa2e65637533b7bd52fefa98453b7e444fd947a

- Gaia v2.0 on Gecko v2.0, average time to insert and paint a thread element is ~115ms
  http://people.mozilla.org/~bgirard/cleopatra/#report=dbad650d4733c325a6fac2d8c31270d9c06df379

I've zoomed on each individual thread insertion event to figure out what's going on. On v2.0 there's some small additional overhead per sample that seems to boil down the the FB stuff, it also introduces some extra variation but overall the overhead is very small. The real problem seems to be related to painting. Not CSS processing (which we still do excessively) nor layout but pure painting. PresShell::Paint events are ~45ms long on v2.0, the same on v1.3 was ~12.5ms long.

Figuring out why paint time boomed won't be easy. First of all I need to profile v1.3 on top of a gecko master to see if this is a gecko regression - which it might be. Then I also have to compare between different devices. My tests were done on the Hamachi but painting may be influenced by external factors such as the underlying GL driver so I need to verify that this isn't a device-specific issue. If both the previous assumptions are wrong (it's not a gecko regression and it's not device-specific) our Gaia changes might have hit a slow, non-optimized path in Gecko but somehow I doubt it.

I'll try to do further digging next week and report back on Monday.
To summarize a conversation we had on IRC the issue here seems to boil down to painting the contact picture element that was introduced as part of the Facebook code. The acccount image has rounded corners and a drop shadow which are both quite expensive to paint and are most likely to account for the majority of the regression in thread element paint time.
Yeah, I tried to remove them and it was not that obvious.

Another culprit is that in v1.3 we were using 2 IndexedDB db (for messages and contacts) while now we are using 4 IndexedDB db (for messages, contacts, facebook contacts, drafts). I'm not sure yet how to change this besides saving the information for the first panel separately in one indexeddb.
Depends on: 1048839
Depends on: 1048841
Depends on: 1048843
I'm not aware of anything that would have slowed down ipc in general.
Flags: needinfo?(bent.mozilla)
Hey Inder/Tapas,

Given we have had two dependencies landed here(1041967 1043293) to address the meta bug, can we get feedback from CAF if we heading in the right direction giving us confidence the patches may have helped ?
Flags: needinfo?(tkundu)
Flags: needinfo?(ikumar)
(In reply to bhavana bajaj [:bajaj] [On PTO, Aug 1] from comment #52)
> Hey Inder/Tapas,
> 
> Given we have had two dependencies landed here(1041967 1043293) to address
> the meta bug, can we get feedback from CAF if we heading in the right
> direction giving us confidence the patches may have helped ?

We will ask our internal test team to confirm this. Please expect update from us by friday.
Flags: needinfo?(ikumar)
Keywords: meta
Depends on: 1049741
With all patches in, I don't see much difference between v1.3 and master to display the first panel now.

Displaying the whole list still takes more time though, but this is expected, given we lazyload the JS files once the first panel is displayed.

We need feedback to know if this is enough for resolving this issue.
(In reply to Julien Wajsberg [:julienw] from comment #54)
> With all patches in, I don't see much difference between v1.3 and master to
> display the first panel now.
> 
> Displaying the whole list still takes more time though, but this is
> expected, given we lazyload the JS files once the first panel is displayed.
> 
> We need feedback to know if this is enough for resolving this issue.


here is the launch latency of SMS app from latest build[1] : 1536ms
This is taken on msm8610 512MB build [1]. 

If we compare this with Comment 0, then we are seeing only 1694-1536=158ms improvement in [1]. 
But in Android KK, we are seeing this launch latency as 554ms
and in FFOS v1.3 launch latency is : 694 msecs (See Comment 0)


[1] https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gaia/commit/?h=mozilla/v2.0&id=9e5907995c9327f14cb5d182cee5ff16b1743ed4
[2] https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gecko/commit/?h=mozilla/v2.0&id=ed87f0a54baf646eb0b20b4debc090c8016d2104

We still need to lots of improvement to make it comparable with Android or FFOS v1.3 .
Flags: needinfo?(tkundu)
(In reply to Tapas Kumar Kundu from comment #55)
>................
> But in Android KK, we are seeing this launch latency as 554ms
Sorry, Android KK, is showing 1100ms for SMS App launch and our goal is to meet at least 1100ms here. 

So, we still need to improve at least 1536ms-1100ms = 436ms here.
Flags: needinfo?(felash)
Tapas, I'd like to confirm with you what you are measuring exactly. You didn't answer my question in comment 29: what does "full page" means for you?
Flags: needinfo?(felash) → needinfo?(tkundu)
Tapas, I'd like also to know if your target device has one or two CPU. Thanks
(In reply to Julien Wajsberg [:julienw] from comment #58)
> Tapas, I'd like also to know if your target device has one or two CPU. Thanks

It has two CPUs.
(In reply to Julien Wajsberg [:julienw] from comment #57)
> Tapas, I'd like to confirm with you what you are measuring exactly. You
> didn't answer my question in comment 29: what does "full page" means for you?

I have shared data in mail for this.
Flags: needinfo?(tkundu)
From the video, I understand that "full page" means "wait that all threads are added in the page".

On a UX point of view, we assume that the important information are the most recent threads, that means about the 10 first threads, and we used to optimize for this. We assume that the older threads are less important and therefore the user can wait a little more for them.

So, yes, the improvements we made in bug 1041967 were focusing on displaying the 9 or 10 first threads. We do the same in bug 1048841 (in that bug, we'll investigate if we can work with the full list of threads though).

Another thing we plan to do is _not_ adding older threads at startup in this view, but display them only when we scroll. How would that work with your measurement? Would the measurement stop when the screen stops changing (with the scrollbar)?

Tapas, please answer the additional questions here, and comment on what is important from the UX point of view. NI Jenny too to comment on the UX part. We need to have the same expectations.
Flags: needinfo?(tkundu)
Flags: needinfo?(jelee)
(In reply to Julien Wajsberg [:julienw] from comment #61)
> From the video, I understand that "full page" means "wait that all threads
> are added in the page".
> 
Could you please clarify me what do you mean by "threads" here ? I understand that SMS app shows list of messages. Do you mean each message as one thread ? 

> So, yes, the improvements we made in bug 1041967 were focusing on displaying
> the 9 or 10 first threads. We do the same in bug 1048841 (in that bug, we'll
> investigate if we can work with the full list of threads though).
> 
I didn't understand term "thread" properly. Do you mean to say that SMS app will try to display 1st 9 to 10 messages quickly ?

> Another thing we plan to do is _not_ adding older threads at startup in this
> view, but display them only when we scroll. How would that work with your
> measurement? Would the measurement stop when the screen stops changing (with
> the scrollbar)?
> 
I will confirm it once you answers my questions. Thanks a lot for your help.

> Tapas, please answer the additional questions here, and comment on what is
> important from the UX point of view. NI Jenny too to comment on the UX part.
> We need to have the same expectations.

I will be very happy to answer any questions but please clarify me what do you mean by "threads"
Flags: needinfo?(tkundu) → needinfo?(felash)
Basically, the SMS app is organized like this:

* first panel (called Inbox or Thread List) shows the list of recipients/senders, ordered by most recent received/sent first.
* scond panel (called Thread Panel) shows the list of sent/received messages for a sender/recipient.

What we call a "thread" is the list of message sent/received for a specific contact or phone number. This is what the first panel is showing.

I hope this is clearer but please ask more question if this is not !
Flags: needinfo?(felash) → needinfo?(tkundu)
(In reply to Julien Wajsberg [:julienw] from comment #61)
> From the video, I understand that "full page" means "wait that all threads
> are added in the page".
> 
Yes. I also agree. 

Example :
for 320x480 device,launch latency means touching SMS icon to displaying all messages
(thread list) which you can display within 320x480 display without scrolling (please see video). 


> On a UX point of view, we assume that the important information are the most
> recent threads, that means about the 10 first threads, and we used to
> optimize for this. We assume that the older threads are less important and
> therefore the user can wait a little more for them.

I also agree. But I would suggest instead of considering only 1st 10 threads (10 messages), we should optimize all threads (messages) which any user can see in his display without scrolling during app launch.

> 
> So, yes, the improvements we made in bug 1041967 were focusing on displaying
> the 9 or 10 first threads. We do the same in bug 1048841 (in that bug, we'll
> investigate if we can work with the full list of threads though).
> 

Nice.

> Another thing we plan to do is _not_ adding older threads at startup in this
> view, but display them only when we scroll. How would that work with your
> measurement? 

Yes. It will be fine.

> Would the measurement stop when the screen stops changing (with
> the scrollbar)?

No, we don't consider scroll bar activity during measurement. Think from a end user point of view. End user will touch SMS icon and wait to see when his device display will become full of messages (It will vary based on device resolution). 

Please ask more questions if you have doubts :)
Flags: needinfo?(tkundu) → needinfo?(felash)
(In reply to Tapas Kumar Kundu from comment #64)
> (In reply to Julien Wajsberg [:julienw] from comment #61)
> > From the video, I understand that "full page" means "wait that all threads
> > are added in the page".
> > 
> Yes. I also agree. 
> 
> Example :
> for 320x480 device,launch latency means touching SMS icon to displaying all
> messages
> (thread list) which you can display within 320x480 display without scrolling
> (please see video). 
> 
> 
> > On a UX point of view, we assume that the important information are the most
> > recent threads, that means about the 10 first threads, and we used to
> > optimize for this. We assume that the older threads are less important and
> > therefore the user can wait a little more for them.
> 
> I also agree. But I would suggest instead of considering only 1st 10 threads
> (10 messages), we should optimize all threads (messages) which any user can
> see in his display without scrolling during app launch.

On a 320x480 device, it's ~6 threads; On a device with a bigger screen, it's ~10 threads. That's why we choose 10 threads; if there is a real difference between 6 and 10 we can try to optimize in a different way.

> > Would the measurement stop when the screen stops changing (with
> > the scrollbar)?
> 
> No, we don't consider scroll bar activity during measurement. Think from a
> end user point of view. End user will touch SMS icon and wait to see when
> his device display will become full of messages (It will vary based on
> device resolution). 

So in the video, the start of measurement is at 23sec and the end of the measurement is at 39sec?


Thanks for your answers!
Flags: needinfo?(felash) → needinfo?(tkundu)
(In reply to Julien Wajsberg [:julienw] from comment #65)
> So in the video, the start of measurement is at 23sec and the end of the
> measurement is at 39sec?
> 

Yes. You are correct :) .
Flags: needinfo?(tkundu) → needinfo?(felash)
Thanks, then I think we're on the same page (pun intended) :)
Flags: needinfo?(felash)
(In reply to Julien Wajsberg [:julienw] from comment #61)

> On a UX point of view, we assume that the important information are the most
> recent threads, that means about the 10 first threads, and we used to
> optimize for this. We assume that the older threads are less important and
> therefore the user can wait a little more for them.
> 
This makes sense!

> Another thing we plan to do is _not_ adding older threads at startup in this
> view, but display them only when we scroll. How would that work with your
> measurement? Would the measurement stop when the screen stops changing (with
> the scrollbar)?
> 
Right now I'm fine either way as long as it gives user the impression that messages load fast (will have to actually try how things work to decide if not-adding-older-threads-at-startup is better). Thanks!
Flags: needinfo?(jelee)
Depends on: 1051852
QA Contact: gmealer
Flags: in-moztrap?
I believe this is already covered in Moztrap test case https://moztrap.mozilla.org/manage/case/13751/
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
NI : Tapas to help with testing/feedback to see if patch in  https://bugzilla.mozilla.org/show_bug.cgi?id=1051852#c1 helps here.
Flags: needinfo?(tkundu)
Flags: in-moztrap? → in-moztrap?(bzumwalt)
Test case can be found here: https://moztrap.mozilla.org/manage/case/13751/
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
Flags: in-moztrap?(bzumwalt)
Flags: in-moztrap+
(In reply to bhavana bajaj [:bajaj] from comment #70)
> NI : Tapas to help with testing/feedback to see if patch in 
> https://bugzilla.mozilla.org/show_bug.cgi?id=1051852#c1 helps here.

I asked our internal performance measurement team to measure launch latency with this patch. 

I will update here again after 2 working days.
(In reply to bhavana bajaj [:bajaj] from comment #70)
> NI : Tapas to help with testing/feedback to see if patch in 
> https://bugzilla.mozilla.org/show_bug.cgi?id=1051852#c1 helps here.


We are seeing SMS launch latency as 1442ms on 512MB msm8610 .

If we compare this number with Comment 55 and Comment 56 then you will see that we still need 
to improve 1442-1100=342 ms to match Android numbers.
Flags: needinfo?(tkundu)
Tapas, I'd like to double-check with you the measurement from v1.3 you have in comment 0. From my personal tests, I see v1.3 is still faster than v2.0, however not twice as fast like you suggest. 694ms looks very very quick to me.

If possible, it would be useful to have a video of v1.3 on the same hardware.
Flags: needinfo?(tkundu)
(In reply to Julien Wajsberg [:julienw] from comment #74)
> Tapas, I'd like to double-check with you the measurement from v1.3 you have
> in comment 0. From my personal tests, I see v1.3 is still faster than v2.0,
> however not twice as fast like you suggest. 694ms looks very very quick to
> me.
> 
> If possible, it would be useful to have a video of v1.3 on the same hardware.

Did you use Flame(msm8610) 512MB with ONLY 80 SMS loaded ?
Flags: needinfo?(tkundu) → needinfo?(felash)
No, I mainly use a Buri (slower) with more SMS taken from my personal DB (to make the loading issues more noticeable).

Using the Flame makes it difficult to see the differences without a low speed camera.
Flags: needinfo?(felash)
So, I tested again these configurations with my own DB (97 threads, most of them with 3/4 messages, some of them with _a lot_ of messages):

1. SMS master on Gaia master on Gecko master
2. SMS v1.3 on Gaia master on Gecko master
3. SMS v1.3 on Gaia v1.3 on Gecko v1.3

here is what I find:

* 1) is way faster than 2). This means that the remaining issues are likely not in the SMS app.
* 3) is way faster than 2) too. This means that either Gecko, Homescreen or System regressed performance.
* 3) is still faster than 2), but not that much currently.

My next experiment is:

4. SMS master with old homescreen (master) on Gaia master on Gecko master
=> no obvious difference

When I first worked on this bug I also looked at System, adding some logs, and I did not see obvious differences.

Next experiment is:

5. SMS master on Gaia v1.3 on Gecko v1.3
=> clearly the fastest option of all. It's way faster than 1) (all master) and a little faster than 3) (all v1.3).

All this seems to point to a Gecko issue, and/or possibly a System app issue, even if I think the System app has no issue here.


Vicamo, I know you already looked at it, but can you please have a second look, especially when there are more messages in the DB?
Flags: needinfo?(vyang)
What about building blocks? Is it possible that some changes there made us slower? If my old measurement (higher painting times) was true then that might have been a possibility.
(In reply to Gabriele Svelto [:gsvelto] from comment #78)
> What about building blocks? Is it possible that some changes there made us
> slower? If my old measurement (higher painting times) was true then that
> might have been a possibility.

Since SMS master (so that uses the building blocks from master) on gecko v1.3 is faster than SMS v1.3 on gecko v1.3, then I don't think the building blocks account much for the regression here.
> Vicamo, I know you already looked at it, but can you please have a second look,
> especially when there are more messages in the DB?

One thought, if there is nothing obvious we can optimize in the API:
* instead of returning one thread at each cursor step, we can return a chunk of threads (say: 5 or 10 threads). Maybe it's possible to test it quickly with the current API if Steve and you can team up?
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage+][2.0-signoff-need+]
(In reply to Julien Wajsberg [:julienw] from comment #77)
> Vicamo, I know you already looked at it, but can you please have a second
> look, especially when there are more messages in the DB?

Will do.

(In reply to Julien Wajsberg [:julienw] from comment #80)
> One thought, if there is nothing obvious we can optimize in the API:
> * instead of returning one thread at each cursor step, we can return a chunk
> of threads (say: 5 or 10 threads). Maybe it's possible to test it quickly
> with the current API if Steve and you can team up?

We just had a discuss and that's also one thing has been in my mind for a long time. The major concern for me is the memory pressure. Currently we keep an array of numeric message/thread IDs only. With such read-ahead mechanism, the worst case is keeping all the messages/threads somewhere and results in a tendency to oom. However, this is an interesting topic and we can easily solve that by a "dom.sms.dbReadAhead" preference. Let's see if we may have a quick patch for this. :)
Here is a new experimentation.

I changed the SMS app to _not_ display the threads, and merely iterate the thread data. I measured the difference between the moment we call getThreads and the moment we get each callback. For each callback, I measure the timestamp at the start and end of the callback.

I used the exact same code for the SMS app on Firefox OS master and Firefox OS v1.3 (On Buri), with the exact same data.

Note that I get similar results when I retry so I pasted here one particular result.

SMS master on Firefox OS master:

start of cursor callback 1  113 ms
end of cursor callback   1  116 ms
start of cursor callback 2  143 ms
end of cursor callback   2  144 ms
start of cursor callback 3  258 ms
end of cursor callback   3  270 ms
start of cursor callback 4  347 ms
end of cursor callback   4  348 ms
start of cursor callback 5  394 ms
end of cursor callback   5  395 ms
start of cursor callback 6  446 ms
end of cursor callback   6  447 ms
start of cursor callback 7  488 ms
end of cursor callback   7  489 ms
start of cursor callback 8  527 ms
end of cursor callback   8  527 ms
start of cursor callback 9  564 ms
end of cursor callback   9  745 ms
start of cursor callback 10 1129 ms
end of cursor callback   10 1130 ms


SMS master on Firefox OS v1.3:

start of cursor callback 1  172 ms
end of cursor callback   1  177 ms
start of cursor callback 2  299 ms
end of cursor callback   2  300 ms
start of cursor callback 3  354 ms
end of cursor callback   3  355 ms
start of cursor callback 4  371 ms
end of cursor callback   4  372 ms
start of cursor callback 5  388 ms
end of cursor callback   5  390 ms
start of cursor callback 6  400 ms
end of cursor callback   6  404 ms
start of cursor callback 7  421 ms
end of cursor callback   7  423 ms
start of cursor callback 8  433 ms
end of cursor callback   8  435 ms
start of cursor callback 9  448 ms
end of cursor callback   9  515 ms
start of cursor callback 10 812 ms
end of cursor callback   10 822 ms


What do we see?

* There is a huge gap between thread 9 and thread 10. This is expected because that's were we lazy load more stuff.
* With Gecko master, we get the first thread _earlier_. (about 60ms). This is good !
* But... with Gecko master, we get the 9th thread _later_ (about 110ms) ! This means we lose about 170ms compared to v1.3 !
Attached patch debug-sms.patchSplinter Review
Patch to generate the data from comment 82.
(In reply to KTucker (:KTucker) from comment #71)
> Test case can be found here: https://moztrap.mozilla.org/manage/case/13751/

We probably should be relying on automation here, rather than having a manual test case here, given that need reliability of consistent measurements to determine the performance here. Can we disable this test case?
Flags: needinfo?(ktucker)
Disabled the test case in moztrap.
Flags: needinfo?(ktucker)
Flags: in-moztrap+ → in-moztrap-
Some more experiments:

https://github.com/julienw/gaia/compare/speed-up-1
https://github.com/julienw/gaia/compare/speed-up-2
https://github.com/julienw/gaia/compare/speed-up-3

Latest seems faster, but should not gain moe than 50ms (which is already good !)
mmm speed-up-3 has an issue that I fixed in speed-up-4, but now it's not faster anymore. :'(
speed-up-2 is still faster though.
Actually speed-up-1 and 2 have the same issue than 3: the lazyloading is not done, so the app is broken. Not sure that when fixing the issue it will still be as fast :( I can't do it now.
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #81)
> We just had a discuss and that's also one thing has been in my mind for a
> long time. The major concern for me is the memory pressure. Currently we
> keep an array of numeric message/thread IDs only. With such read-ahead
> mechanism, the worst case is keeping all the messages/threads somewhere and
> results in a tendency to oom. However, this is an interesting topic and we
> can easily solve that by a "dom.sms.dbReadAhead" preference. Let's see if we
> may have a quick patch for this. :)

So far had a WIP branch for this in https://github.com/vicamo/b2g_mozilla-central/tree/experimental/mmdb-read-ahead . Major functionality to be completed in MobileMessageDB.jsm.
QA Whiteboard: [QAnalyst-Triage+][2.0-signoff-need+] → [QAnalyst-Triage+][lead-review+][2.0-signoff-need+]
Thinker, can your team have someone to check if there is any issue in IPC which was identified in Comment 31? Thanks.
Flags: needinfo?(tlee)
I can take a look at the issue in comment #31. How can I reproduce it locally? Is it already in adb logcat or should I apply some patch to see the log as in comment #31 during app loading?
Flags: needinfo?(kchang)
You can get these info through logcat by enabling debug flag of message. I will tell Steve and Vicamo to work with you for this.
Flags: needinfo?(kchang)
Flags: needinfo?(tlee)
another gaia-only experiment: https://github.com/mozilla-b2g/gaia/pull/23026
I could not shave more than 3ms on Flame (it was more on buri).
Steve shared some of the risk profile over email in-case we have to do more improvements here, The efforts did sound like a lot of new implementation on gecko side involving WebAPI changes and scope isn't a trivial one and may not be within the timeline expected and without a clear understanding how much that would benefit. I am leaving a NI on him to better explain the technical details and weigh in risk.
Flags: needinfo?(schung)
In order to improve 300ms for first view ready, the only way we could think of is returning array of threads data instead of single thread data. Reducing thread query might be helpful because each iteration costs 30~70ms(or even more). But this idea need some changes in both gecko and gaia.

In gecko side, logic of returning the threads would be more complicate, and please note this involved API changes(returning a thread -> array of thread). Patches related to API changes usually takes longer time in review process. Changes in gaia would be much less impact then gecko because it's simply changing the rendering behavior from single list item to batch of list. Once the WIP in comment 89 is ready, we could provide some profiles and the let the release management to decide whether we should take it or not.
Flags: needinfo?(schung)
I get Vicamo's debug instrumentation from https://github.com/vicamo/b2g_mozilla-central/tree/debug/mobilemessage-getThreads-round-trip-time and I also turn on IPC logging. From my test result I conclude that IPC doesn't result in the increased launch time.

Note that the measurement is made from one actor sending one message to the another actor receiving the message. Take PMobileMessageCursor protocol as example, 4 threads involve in sending one message:
1. Parent process' main thread: MobileMessageCursorParent::Send__delete__()
2. Parent process' IPC thread receives the request and sends the message over the wire
3. Child process' IPC thread receives the message and dispatch the message to the main thread
4. MobileMessageCursorChild::Recv__delete__() is called

Generally it takes < 1 ms from step 1 to step 3. Step 3 to 4 takes almost all the messaging delay.

When the main thread of the receiving side is not busy, messaging can be as quick as 1 ms:

> 08-20 18:56:09.626 I/GeckoIPC(15662): [time:1408532169631199][15662->16448][PMobileMessageCursorParent] Sending Msg_NotifyResult([TODO])
> 08-20 18:56:09.626 I/GeckoIPC(16448): [time:1408532169632000][16448<-15662][PMobileMessageCursorChild] Received Msg_NotifyResult([TODO])

But if the receiver's main thread is busy, the message can be queued in the main thread longer:

> 08-20 18:56:13.036 I/Gecko   (15662): MobileMessageCursorParent::NotifyCursorDone
> 08-20 18:56:13.036 I/GeckoIPC(15662): [time:1408532173044202][15662->16448][PMobileMessageCursorParent] Sending Msg___delete__([TODO])
> 08-20 18:56:13.036 I/Gecko   (16448): MobileMessageCursorCallback::NotifyCursorResult done
> 08-20 18:56:13.036 I/GeckoIPC(16448): [time:1408532173044931][16448<-15662][PContentChild] Received Msg_AsyncMessage([TODO])
... (some lines omitted for brevity)
> 08-20 18:56:13.076 I/Gecko   (16448): MobileMessageCursorChild::Recv__delete__

After the parent sent out the delete message and before the message was received, there's still at least PContentChild::RecvAsyncMessage(), and let alone other jobs (e.g. layout) that might delay the delivery of the message but not seen from the log.

So I think IPC is the wrong direction for tracing increased app launch time. From the discussion with :vicamo and :schung, optimizing the operation of getting message threads may be the right way to take.
(In reply to Steve Chung [:steveck] from comment #95)
> In gecko side, logic of returning the threads would be more complicate, and
> please note this involved API changes(returning a thread -> array of
> thread). Patches related to API changes usually takes longer time in review
> process. Changes in gaia would be much less impact then gecko because it's
> simply changing the rendering behavior from single list item to batch of
> list. Once the WIP in comment 89 is ready, we could provide some profiles
> and the let the release management to decide whether we should take it or
> not.

Couldn't this be hidden in the implementation? I.e. the parent process would return an array of threads to the child and the iteration over the various threads would then be done directly on the child side w/o any further IPC. AFAIK that should be possible with the current interface.
Flags: needinfo?(mvikram)
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #89)
> So far had a WIP branch for this in
> https://github.com/vicamo/b2g_mozilla-central/tree/experimental/mmdb-read-
> ahead . Major functionality to be completed in MobileMessageDB.jsm.

Moved to https://github.com/vicamo/b2g_mozilla-central/tree/bugzilla/1038176/mmdb-read-ahead

Changes:
1. Address Gabriele's suggestion,
2. Use settings key "ril.sms.maxReadAheadEntries" to control read-ahead aggressiveness. Positive value for finite max read-ahead entries, negative values for read to end, zero for no read-ahead, which is the current and default behaviour.
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #98)
> https://github.com/vicamo/b2g_mozilla-central/tree/bugzilla/1038176/mmdb-
> read-ahead

Still fixing a problem when maxReadAheadEntries is a positive number, but I didn't see any significant improvement to app load time with negative ones and zero. From moz-chrome-dom-loaded to |fistPanelCount === 0|, messages app always takes about 800ms with or without the read-ahead patch, negative aggressiveness or zero. But it does improve total load time significantly, from 6.5 seconds to 3.5 seconds. Need more experiments.
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #99)
> But it does improve total load time significantly,
> from 6.5 seconds to 3.5 seconds. Need more experiments.

That's sweet!
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #99)
> Still fixing a problem when maxReadAheadEntries is a positive number,

Fixed and updated.

> but I
> didn't see any significant improvement to app load time with negative ones
> and zero. From moz-chrome-dom-loaded to |fistPanelCount === 0|, messages app
> always takes about 800ms with or without the read-ahead patch, negative
> aggressiveness or zero. But it does improve total load time significantly,
> from 6.5 seconds to 3.5 seconds. Need more experiments.

          aggressiveness | unlimited |   7 [3]|   1  |  0
-------------------------+-----------+--------+------+-----
first panel load time [1]|       9x0 |    760 |  860 |  880
-------------------------+-----------+--------+------+-----
     cursor done time [2]|      2200 |   3340 | 6500 | 6600

unit: ms

[1]: From moz-chrome-dom-loaded to |fistPanelCount === 0|
[2]: From "Getting thread list" to "notifyCallback(results[0], 0)"
[3]: Flame shows 7 threads on thread list.
Flags: needinfo?(vyang)
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #101)
>           aggressiveness | unlimited |   7 [3]|   1  |  0
> -------------------------+-----------+--------+------+-----
> first panel load time [1]|       9x0 |    760 |  860 |  880
> -------------------------+-----------+--------+------+-----
>      cursor done time [2]|      2200 |   3340 | 6500 | 6600

By the way, these statistics were done with medium reference workload.
From moz-chrome-dom-loaded to onThreadsRendered, it takes 2200+750ms with unlimited read-ahead aggressiveness while 6600+150ms with 0 aggressiveness.
Hi Vicamo, thanks for the great effort on this improvement! But it seems we could only reduce ~100ms for first view in this case. Just curious about the first cursor return time for different aggressiveness case, maybe you should also adjust fistPanelCount to 7 to make you measurement precisely. Because the original first view ready will be fired when 9 threads rendered, but this means first view ready event will be fired at 2nd time cursor return. maybe we could change the count to measure the time only for 1st return and rendering time(in aggressiveness === 7 case)
(In reply to Steve Chung [:steveck](PTO 8/22) from comment #104)
> Hi Vicamo, thanks for the great effort on this improvement! But it seems we
> could only reduce ~100ms for first view in this case. Just curious about the
> first cursor return time for different aggressiveness case, maybe you should
> also adjust fistPanelCount to 7 to make you measurement precisely. Because
> the original first view ready will be fired when 9 threads rendered, but
> this means first view ready event will be fired at 2nd time cursor return.
> maybe we could change the count to measure the time only for 1st return and
> rendering time(in aggressiveness === 7 case)

With aggressiveness = firstPanelCount = 7, first panel load time lowers to ~520ms. No much difference in cursor done time.
Depends on: 1057915
I'm moving review process of the multi-chunk thread idea to bug 1057915.
Hey Tapas,
Can you take the patch from Bug 1057915 and test the performance improvement?
Just want to buy some time in case there would be further adjustment to be made.
Flags: needinfo?(tkundu)
As I said in bug 1057915 comment 17, I am waiting to get a fix for v2.0 . You need to rebase that fix for v2.0. I will update here asap once I get that fix.
Flags: needinfo?(tkundu) → needinfo?(whuang)
Our internal test team has confirmed that msm8610 512MB device has SMS app launch latency as 1194ms with fix from bug 1057915. In Android kitkat, SMS app launch latency is 1100 ms . We are almost doing as good as Android now in 512MB.

But in 256MB configuration, we are not seeing any improvement in SMS app launch latency. It may be because of increased memory usage either by SMS app or some other apps. I am trying to break down memory usage of all FFOS apps to confirm this.
(In reply to Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) from comment #110)
> Our internal test team has confirmed that msm8610 512MB device has SMS app
> launch latency as 1194ms with fix from bug 1057915. In Android kitkat, SMS
> app launch latency is 1100 ms . We are almost doing as good as Android now
> in 512MB.
> 
> But in 256MB configuration, we are not seeing any improvement in SMS app
> launch latency. It may be because of increased memory usage either by SMS
> app or some other apps. I am trying to break down memory usage of all FFOS
> apps to confirm this.

Ken/vicamo, any input here on the difference we are seeing in 256MB devices, given that's the one of the config 2.0 will ship-to ?

Tapas, do you have the memory usage report?
Flags: needinfo?(vyang)
Flags: needinfo?(tkundu)
(In reply to bhavana bajaj [:bajaj] from comment #111)
> (In reply to Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) from
> comment #110)
> > Our internal test team has confirmed that msm8610 512MB device has SMS app
> > launch latency as 1194ms with fix from bug 1057915. In Android kitkat, SMS
> > app launch latency is 1100 ms . We are almost doing as good as Android now
> > in 512MB.
> > 
> > But in 256MB configuration, we are not seeing any improvement in SMS app
> > launch latency. It may be because of increased memory usage either by SMS
> > app or some other apps. I am trying to break down memory usage of all FFOS
> > apps to confirm this.
> 
> Ken/vicamo, any input here on the difference we are seeing in 256MB devices,
> given that's the one of the config 2.0 will ship-to ?
> 
> Tapas, do you have the memory usage report?

We are seeing 1535 ms for "STAND ALONE sms app launch use case" on msm8610 256MB device (with your fix from bug 1057915) and this is a different observation from Comment 110. 

Comment 110 was for launching SMS app after launching four other apps. In 256MB, SMS app launch latency becomes degraded if we launch SMS app after launching other apps. But that is a separate issue which is related to FFOS memory usage of many other apps. We will are still analyzing this use case and we will create new bug for memory usage if needed.


bug 1038176 can be closed now as we have seen at least 300ms improvement with fix from bug 1057915. Thanks for all the good work :)
Flags: needinfo?(tkundu)
Flags: needinfo?(mvikram)
(In reply to Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) from comment #112)
> (In reply to bhavana bajaj [:bajaj] from comment #111)
> > (In reply to Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) from
> > comment #110)
> > > Our internal test team has confirmed that msm8610 512MB device has SMS app
> > > launch latency as 1194ms with fix from bug 1057915. In Android kitkat, SMS
> > > app launch latency is 1100 ms . We are almost doing as good as Android now
> > > in 512MB.
> > > 
> > > But in 256MB configuration, we are not seeing any improvement in SMS app
> > > launch latency. It may be because of increased memory usage either by SMS
> > > app or some other apps. I am trying to break down memory usage of all FFOS
> > > apps to confirm this.
> > 
> > Ken/vicamo, any input here on the difference we are seeing in 256MB devices,
> > given that's the one of the config 2.0 will ship-to ?
> > 
> > Tapas, do you have the memory usage report?
> 
> We are seeing 1535 ms for "STAND ALONE sms app launch use case" on msm8610
> 256MB device (with your fix from bug 1057915) and this is a different
> observation from Comment 110. 
> 
> Comment 110 was for launching SMS app after launching four other apps. In
> 256MB, SMS app launch latency becomes degraded if we launch SMS app after
> launching other apps. But that is a separate issue which is related to FFOS
> memory usage of many other apps. We will are still analyzing this use case
> and we will create new bug for memory usage if needed.
> 
> 
> bug 1038176 can be closed now as we have seen at least 300ms improvement
> with fix from bug 1057915. Thanks for all the good work :)

Thanks. Please go ahead and file the memory usage bug for tracking.
I think we can close bug 1038176 after bug 1057915 uplifted to 2.0.
Flags: needinfo?(whuang)
(In reply to bhavana bajaj [:bajaj] from comment #111)
> (In reply to Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) from
> comment #110)
> > But in 256MB configuration, we are not seeing any improvement in SMS app
> > launch latency. It may be because of increased memory usage either by SMS
> > app or some other apps. I am trying to break down memory usage of all FFOS
> > apps to confirm this.
> 
> Ken/vicamo, any input here on the difference we are seeing in 256MB devices,
> given that's the one of the config 2.0 will ship-to ?

Memory pressure is always the major concern for SMS read-ahead as stated in comment 81. However, so far we only read at most 7 more entries, not all of them. It's hard to believe that it's the only reason that dispels the improvements made by bug 1057915.

Maybe the lack of cached db raw data in memory becomes the problem in this case. From above diggings we know our IndexedDB is so fast that most time was wasted in IPC ping-pongs. But what if IndexedDB is not fast anymore because it has to reload file blob into memory from slow non-volatile flash again and again? It makes sense to me, but it's only based on blind guess. Need a 256 MB device to investigate more anyway.
Flags: needinfo?(vyang)
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #114)
> But what if IndexedDB is not fast anymore
> because it has to reload file blob into memory from slow non-volatile flash
> again and again? It makes sense to me, but it's only based on blind guess.
> Need a 256 MB device to investigate more anyway.

Also, is that device a regular 256 MB one (like the Hamachi) or does it use some form of swap space like the Tarako? On the Tarako we've seen cases where IndexedDB performance dropped considerably under memory pressure because it was fighting with I/O requests generated for swapping memory pages.
So after some discussion on IRC and taking into account comment 112 I'm closing. If we need to investigate other more specific startup issues such as the one on 256MiB devices mentioned before let's open a follow up with a detailed STR so we can focus on the problem at hand.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → 2.1 S4 (12sep)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: