[META] SMS app launch latency is higher than LA in v2.1

RESOLVED WONTFIX

Status

defect
P1
blocker
RESOLVED WONTFIX
5 years ago
4 years ago

People

(Reporter: tkundu, Assigned: bobby.chien+bugzilla)

Tracking

({meta, perf})

unspecified
ARM
Gonk (Firefox OS)
Dependency tree / graph
Bug Flags:
in-moztrap -

Firefox Tracking Flags

(blocking-b2g:-, tracking-b2g:-)

Details

(Whiteboard: [caf priority: p2][CR 732361])

Attachments

(1 attachment)

Hi,

We are seeing a higher launch latency for SMS app in FFOS 2.1 compared to Android on MSM8926 (4 cpu cores and 1GB memory with no zram) .

SMS app launch latency :
Android : 638 ms
FFOS : 1100 ms

gaia/gecko used : 

https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gaia/commit/?h=mozilla/v2.1&id=713448b8963cd53c561f4b38640f8c63b655ce33
https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gecko/commit/?h=mozilla/v2.1&id=395370094caa386ebf8fe8831d030bc5484fa599

Please also look into bug 1038176 for clarification of how we are measuring SMS app launch latency.

Please also note that bug 1038176 comment 110 has improved SMS app launch latency to 1194ms on msm8610 (it has 2 cpu cores) and launch latency is almost as good as Android for msm8610 platform

But bug 1038176 comment 110 does not show enough improvement for msm8926 (It has 4 cpu cores and 1 GB memory ) and Android takes only 638ms for SMS app launching on msm8926 platform .
[Blocking Requested - why for this release]:
blocking-b2g: --- → 2.1?
_Again_ we have this requirement very late in the cycle. Why didn't we have the requirement earlier? It's not like we didn't request this in the past !
Frankly, I have absolutely no idea on how to improve more on the Gaia side. I have no idea if we leverage the 4 cores correctly in Gecko.
Whiteboard: [CR 732361}
Whiteboard: [CR 732361} → [CR 732361]
Whiteboard: [CR 732361] → [caf priority: p2][CR 732361]
(In reply to Julien Wajsberg [:julienw] from comment #3)
> Frankly, I have absolutely no idea on how to improve more on the Gaia side.
> I have no idea if we leverage the 4 cores correctly in Gecko.

Totally agree, I think we've done tons of effort in bug 1038176, and we have no more idea to improve in gaia side currently. Hi thinker, do we have any plan or bug for quadcore device tuning?
Flags: needinfo?(tlee)
triage: per comment 4, I'm setting minus unless there's clear actions we can do at this moment.
Also ni? Fabrice to get some comments.
blocking-b2g: 2.1? → -
Flags: needinfo?(fabrice)
Duplicate of this bug: 1075410
(In reply to Steve Chung [:steveck] from comment #4)
> no more idea to improve in gaia side currently. Hi thinker, do we have any
> plan or bug for quadcore device tuning?

There is a plan about retaining display list, and offline rendering.  I could not give a exact plan until I do talk to some others.
Flags: needinfo?(tlee)
Profile: https://people.mozilla.org/~bgirard/cleopatra/#report=f8d965b9b95656a79a8c5b3d6299d85908f89625

It was captured on a 512MB Flame with reference-workload-medium installed, running base image v184 and b2g v2.1:

gecko: 81d98fe40d10d5a8a48664aa60913491db215a4f
gaia: 9861c61ec302fb0316c753a2e1c0f592180515fd
(In reply to Ting-Yu Chou [:ting] from comment #8)
> Profile:
> https://people.mozilla.org/~bgirard/cleopatra/
> #report=f8d965b9b95656a79a8c5b3d6299d85908f89625

The refresh 0~3 take ~350ms, any ideas what the restyle & reflow for?
Hi Ting-Yu, may I know how you figure out the refresh penalty from profile?
Flags: needinfo?(tchou)
Grabbed another profile which includes b2g process: http://people.mozilla.org/~bgirard/cleopatra/#report=fcdd51c148c93cf102beabf92d51e0e6554020b8

It seems b2g process is not really busy in the beginning, though I couldn't tell where did it start launching Message app.
Talked to Steve directly, as we are not quite sure how was 1100ms measured, it's not clear which part of profile result we should look into.
Flags: needinfo?(tchou)

Updated

5 years ago
Blocks: 1079506
Flags: needinfo?(fabrice)
We tried with fix from bug 1069452 and we are still seeing 1125ms for sms app launch latency on msm8226(4 cores and 1GB RAM) with following gaia/gecko. 

https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gaia/commit/?h=mozilla/v2.1&id=9861c61ec302fb0316c753a2e1c0f592180515fd
https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gecko/commit/?h=mozilla/v2.1&id=81d98fe40d10d5a8a48664aa60913491db215a4f

here is my analysis for debug build of SMS app launch usecase : 

1) touchdown to nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream *stream) called by SMS app pid is 297ms 
     This means that SMS app is delaying to parse its own html code after touching sms icon by 297ms .
2) nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream *stream) is executed by sms app pid to parse html code and it takes 144ms
3) nsJSUtils::EvaluateString()  is executed by sms app pid to parse html code and it takes 200 ms 
4) nsStyleSheet:FileRules() is executed by sms app pid to parse html code and it takes 40 ms
5) Paint/Reflow activity executed by SMS app pid takes 472 ms

SMS app launch activity mostly uses two cpu cores. This is the reason why we are not seeing big improvements when we are moving from msm8610 (2 cores) to msm8226 (4 cores) platform.
Tapas, do you have a cleopatra profile that shows the results from your last comment?

The SMS app is loading many resources at startup, and that should happen in parallel. Maybe we are i/o bound there and so having 4 cores doesn't help us.

nsJSUtils::EvaluateString() parses and evaluates *js* code, not HTML, and nsStyleSheet::FileRules() deals with css rules.
(In reply to Ting-Yu Chou [:ting] from comment #12)
> Talked to Steve directly, as we are not quite sure how was 1100ms measured,
> it's not clear which part of profile result we should look into.

I think that we have a long discussion on bug 1038176 and Mozilla got clear idea about what exactly we are measuring during long discussion on that bug (We also shared video on this already) .

Comment 16

5 years ago
Thinker, Has your investigations on retaining display list, and offline rendering show any possibility of improvement for SMS app on 4 cores?
Flags: needinfo?(tlee)
(In reply to Fabrice Desré [:fabrice] from comment #14)
> Tapas, do you have a cleopatra profile that shows the results from your last
> comment?
> 
> The SMS app is loading many resources at startup, and that should happen in
> parallel. Maybe we are i/o bound there and so having 4 cores doesn't help us.
> 
> nsJSUtils::EvaluateString() parses and evaluates *js* code, not HTML, and
> nsStyleSheet::FileRules() deals with css rules.

Thanks for clarification . 

I followed https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Profiling_with_the_Built-in_Profiler#Start_the_Profiler

and run

./profile.sh start -p b2g -t Compositor && ./profile.sh start -p Settings

but my device hanged after this and I am seeing 3 b2g process after this:

 1356    96108K   33416K   19087K   15120K  /system/b2g/b2g
 1138    72980K   22872K   10509K    7512K  /system/b2g/b2g
 1495    62388K   19684K    8522K    5900K  /system/b2g/b2g

I guess that something has broken most likely on tip build(See Comment 13 for gaia/gecko SHA1s). 

I would suggest you to profile it on any 4 core device and this may help you to conclude faster. 

If you are not sure about what we are measuring then please look into videos shared in bug 1038176
(In reply to Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) from comment #13)
> 1) touchdown to nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream
> *stream) called by SMS app pid is 297ms 
>      This means that SMS app is delaying to parse its own html code after
> touching sms icon by 297ms .
> 2) nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream *stream) is
> executed by sms app pid to parse html code and it takes 144ms
> 3) nsJSUtils::EvaluateString()  is executed by sms app pid to parse html
> code and it takes 200 ms 
> 4) nsStyleSheet:FileRules() is executed by sms app pid to parse html code
> and it takes 40 ms
> 5) Paint/Reflow activity executed by SMS app pid takes 472 ms
> 
> SMS app launch activity mostly uses two cpu cores. This is the reason why we
> are not seeing big improvements when we are moving from msm8610 (2 cores) to
> msm8226 (4 cores) platform.

This is a well-known issue for quad-core.  There are already some improvements are going on or in plan.

I guess retaining display list doesn't improve step 5 since it is the first time of doing rendering. Some one is working on off-main thread painting that may help step 5.
Flags: needinfo?(tlee)
(In reply to Thinker Li [:sinker] from comment #18)

> > SMS app launch activity mostly uses two cpu cores. This is the reason why we
> > are not seeing big improvements when we are moving from msm8610 (2 cores) to
> > msm8226 (4 cores) platform.
> 
> This is a well-known issue for quad-core.  There are already some
> improvements are going on or in plan.

Do we have bugs for these improvements? If not, we should file them (even if work has not started yet) and make them block a meta "improve performance on devices with more than 2 cores".
blocking-b2g: - → 2.2?
(In reply to Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) from comment #15)
> I think that we have a long discussion on bug 1038176 and Mozilla got clear
> idea about what exactly we are measuring during long discussion on that bug
> (We also shared video on this already) .

FWIW, I just read all the comments, and found this in bug 1038176 comment 64:

<quote>
launch latency means touching SMS icon to displaying all messages (thread list) which you can display within 320x480 display without scrolling
</quote>

But the video Tapas mentioned seems was shared through email offline.
Tapas, should we remove this bug from the CAF blocker?
Flags: needinfo?(tkundu)

Comment 22

5 years ago
We can remove it from 2.1.
No longer blocks: CAF-v2.1-FC-metabug, 1079506
Flags: needinfo?(tkundu)

Updated

5 years ago
Blocks: 1079506
I added some profiler markers since I want to have clearer range of the measurement showing on Cleopatra, and then captured a profile as following:

  1. $ ./profile.sh start -p b2g && ./profile.sh start -p Homescreen && ./profile.sh start -p 3410
  2. Touch messages app icon on homescreen (reference-workload-medium installed)
  3. Wait for scrollbar stop moving
  4. $ ./profile.sh capture

Note pid 3410 is the Preallocated process.

But the result profile does not contain all the markers and seems unaligned:

  http://people.mozilla.org/~bgirard/cleopatra/#report=47cfa6e1480108739a6920cd1d8b629d2abe06db

  10-15 07:03:06.112 D/x       ( 2739): PresShell::DispatchTouchEvent NS_TOUCH_START
  10-15 07:03:06.112 D/x       ( 2850): PresShell::DispatchTouchEvent NS_TOUCH_START
  10-15 07:03:06.202 D/x       ( 2739): PresShell::DispatchTouchEvent NS_TOUCH_END
  10-15 07:03:06.202 D/x       ( 2850): PresShell::DispatchTouchEvent NS_TOUCH_END
  10-15 07:03:06.252 D/x       ( 2739): nsElementFrameLoaderOwner::LoadSrc
  10-15 07:03:06.252 D/x       ( 2739): nsElementFrameLoaderOwner::LoadSrc
  10-15 07:03:06.442 D/x       ( 3410): nsDocShell::LoadURI
  10-15 07:03:07.153 D/x       ( 3410): MobileMessageCursor::FireSuccessWithNextPendingResult
  10-15 07:03:07.233 D/x       ( 3410): MobileMessageCursor::FireSuccessWithNextPendingResult
  10-15 07:03:07.243 D/x       ( 3410): MobileMessageCursor::FireSuccessWithNextPendingResult
  ...

BenWa, did I do anything wrong or is this a known issue? I grabbed the profile on a Nexus4 (quad-core) with the latest v2.1.
Flags: needinfo?(bgirard)
This is the intended use case of markers. We use them a lot in Graphics and AFAIK there's no outstanding bugs with them. You have to make sure:
1) The thread that the marker is being inserted to is being profiled. You're profiling the main thread of 3 processes but no secondary thread.
2) The time at which you inserted the marker is still in the profile. Since the storage is limited when the space runs out previous data, including markers will be truncated in FIFO order. This varies per thread, in your profile you got 4 seconds for one thread.

If 2) is a problem and you're only interested in the markers then use -i 50 with profile.sh start when profiling. Make sure to reboot the phone in between.
Flags: needinfo?(bgirard) → needinfo?(tchou)
Thanks BenWa, that helps. I grabbed another profile with interval 10ms:

http://people.mozilla.org/~bgirard/cleopatra/#report=43d5c3cd9b28dbb7216e8c6cff92d332d115e003&select=1100,2499

device: Nexus4
gecko: 8ce5544ad315cc17231c2ab41ad860de1d08a87d + attachment 8505423 [details] [diff] [review]
gaia: 379ea4c9dd6d3f8ca2f79ce59c15f6afe6e557c3
workload: reference-workload-medium
Flags: needinfo?(tchou)
A first glance of the profile in comment 25:

1. The RecvShow() in Messages process [1270,1419] delays RecvLoadURL(), note it includes sending sync messages.
2. Seesms refresh 3 & 4 stops indexDB from firing success.
3. In [1796,1990] there's a js::RunScript takes ~200ms, which are proto.styleHack and Startup.init.
Used command |$ systrace.py -t 5 sched| to cpature a systrace with the same configuration of comment 25, remember to open it by Chrome:

http://www.googledrive.com/host/0B0yKxuGl8-XARTlacTNZa0xZakU
(In reply to Ting-Yu Chou [:ting] from comment #26)
> A first glance of the profile in comment 25:
> 
> 1. The RecvShow() in Messages process [1270,1419] delays RecvLoadURL(), note
> it includes sending sync messages.
> 2. Seesms refresh 3 & 4 stops indexDB from firing success.
> 3. In [1796,1990] there's a js::RunScript takes ~200ms, which are
> proto.styleHack and Startup.init.

I'll assume that when you say Startup.init it's really an inner function that is the DOMContentLoaded handler. Looking closely at what the handler does: calling init for some other objects. These init functions do a lot of DOM operations, and especially extracting templates data from the DOM. I wonder if we could try to lazy init those instead of init this now.

For example we can try to change the Template constructor in shared/js/template.js to not extract the data in the constructor, but at first use instead.

I don't know about proto.styleHack, it seems to come from gaia-theme but I don't know how it works.
(In reply to Julien Wajsberg [:julienw] from comment #28)
> I'll assume that when you say Startup.init it's really an inner function
> that is the DOMContentLoaded handler.

No, it is https://github.com/mozilla-b2g/gaia/blob/v2.1/apps/sms/js/startup.js#L85. Note there's a "startup.js:87" in the entry of Startup.init.

> I don't know about proto.styleHack, it seems to come from gaia-theme but I
> don't know how it works.

I will ask Steve if he has any ideas, this one takes longer which I think is more important.
(In reply to Ting-Yu Chou [:ting] from comment #29)
> (In reply to Julien Wajsberg [:julienw] from comment #28)
> > I'll assume that when you say Startup.init it's really an inner function
> > that is the DOMContentLoaded handler.
> 
> No, it is
> https://github.com/mozilla-b2g/gaia/blob/v2.1/apps/sms/js/startup.js#L85.
> Note there's a "startup.js:87" in the entry of Startup.init.

So yeah, line 87, it's exactly what I said: the DOMContentLoaded handler :) You see we call init at the end of the handler, and that init does what I said.

> 
> > I don't know about proto.styleHack, it seems to come from gaia-theme but I
> > don't know how it works.
> 
> I will ask Steve if he has any ideas, this one takes longer which I think is
> more important.

Ah, in v2.1, there is a styleHack in gaia-header, so this is likely what takes time.  We can ask Yan Or or Wilson Page.
Flags: needinfo?(yor)
Flags: needinfo?(wilsonpage)
(In reply to Ting-Yu Chou [:ting] from comment #29)
> (In reply to Julien Wajsberg [:julienw] from comment #28)
> > I don't know about proto.styleHack, it seems to come from gaia-theme but I
> > don't know how it works.
> 
> I will ask Steve if he has any ideas, this one takes longer which I think is
> more important.

Just talked to Steve, proto.runFontFit() seems is calculating the font size for the app header. Wonder does this need to be calculated every time.

Note it takes ~160ms from the profile.
(In reply to Ting-Yu Chou [:ting] from comment #31)
> 
> Just talked to Steve, proto.runFontFit() seems is calculating the font size
> for the app header. Wonder does this need to be calculated every time.
> 
> Note it takes ~160ms from the profile.

Like julien said in comment 30, runFontFit belongs to gaia-header component and maybe wilson or yor have answer for it. Not sure if we could have a method that force not to execute runFontFit.
(In reply to Julien Wajsberg [:julienw] from comment #30)
> Ah, in v2.1, there is a styleHack in gaia-header, so this is likely what
> takes time.  We can ask Yan Or or Wilson Page.

`styleHack()` involves cloning <gaia-header>'s shadow-dom <style> into the root of the component in the light-dom. We have to do this to work around the lack of `::content` (bug 992249) and `:host` (bug 992245) selectors. Once both those bugs land, we'll be able to remove `styleHack`.

(In reply to Steve Chung [:steveck] from comment #32)
> Like julien said in comment 30, runFontFit belongs to gaia-header component
> and maybe wilson or yor have answer for it. Not sure if we could have a
> method that force not to execute runFontFit

AFAIK font-fit currently causes one forced synchronous reflow as it needs to measure the available space. I didn't personally write the font-fit library, so don't know a great deal about it. I believe it is the same logic that was being used in v2.0 for the headers.css building-block.

Perhaps there is still margin for performance optimization. Do we know specifically where time is being spent in Gecko, is it in reflow?
Flags: needinfo?(wilsonpage)
Look at this profile: https://people.mozilla.org/~bgirard/cleopatra/#report=5640ed7c1ccb8a34f39190abdb414a58f6d517ef&search=runfontfit running m-c + master with an empty sms database.

This reflow is really hurting us a lot. We should try to get traction in bug 1005348 during 2.2
From systrace of comment 27, cpu3 is mostly idle, I am not sure what else can be moved out from the main thread of b2g and content process, nsRefreshDriver? Is it possible?
triage: propose to nominate as new feature
blocking-b2g: 2.2? → ---
feature-b2g: --- → 2.2?
(In reply to Ting-Yu Chou [:ting] from comment #26)
> 1. The RecvShow() in Messages process [1270,1419] delays RecvLoadURL(), note
> it includes sending sync messages.

There're three:

1. SendPRenderFrameConstructor()
2. SendPLayerTransactionConstructor()
3. sendSyncMessage('browser-element-api:call',{'msg_name':'hello'})

b2g process is busy on a restyle from aw__render() and nsRefreshDriver::Tick() when 1 and 3 are pending, wonder if we can eliminate them from system app.
fabrice: I've done some playing around with gaia-header to try to avoid forced reflows. This Gaia branch [0] uses the alternative gaia-header and makes some small changes to SMS app. I'm interested to know if this improves startup profiles at all.

[0] https://github.com/wilsonpage/gaia/tree/1074783
blocking-b2g: --- → 2.2?
feature-b2g: 2.2? → ---
Flags: needinfo?(fabrice)
Wilson, moving from "DOMContentLoaded" to "load" to init our objects will likely hurt the launch time :/
(In reply to Julien Wajsberg [:julienw] from comment #39)
> Wilson, moving from "DOMContentLoaded" to "load" to init our objects will
> likely hurt the launch time :/

Yes this is likely to delay init slightly, but if we save > 4 force sync reflows in the process, we could improve performance...
(In reply to Ting-Yu Chou [:ting] from comment #37)

> 1. SendPRenderFrameConstructor()
> 2. SendPLayerTransactionConstructor()
> 3. sendSyncMessage('browser-element-api:call',{'msg_name':'hello'})
> 
> b2g process is busy on a restyle from aw__render() and
> nsRefreshDriver::Tick() when 1 and 3 are pending, wonder if we can eliminate
> them from system app.

Ting-Yu, I'm working on a patch to remove 3. (I need to fix the test failures from https://tbpl.mozilla.org/?tree=Try&rev=0f820e420696).

(In reply to Wilson Page [:wilsonpage] from comment #38)
> fabrice: I've done some playing around with gaia-header to try to avoid
> forced reflows. This Gaia branch [0] uses the alternative gaia-header and
> makes some small changes to SMS app. I'm interested to know if this improves
> startup profiles at all.
> 
> [0] https://github.com/wilsonpage/gaia/tree/1074783

Ok, will test.
Flags: needinfo?(fabrice)
(In reply to Ting-Yu Chou [:ting] from comment #37)
> (In reply to Ting-Yu Chou [:ting] from comment #26)
> > 1. The RecvShow() in Messages process [1270,1419] delays RecvLoadURL(), note
> > it includes sending sync messages.
> 
> There're three:
> 
> 1. SendPRenderFrameConstructor()
> 2. SendPLayerTransactionConstructor()

Sotaro, do you have any ideas how can we improve those two?
Flags: needinfo?(sotaro.ikeda.g)
(In reply to Ting-Yu Chou [:ting] from comment #37)
> b2g process is busy on a restyle from aw__render()

I enabled restyle logging, there are 3 pending restyles in the first call to RestyleTracker::DoProcessRestyles() after touching Messages icon, and the one affects "windows" div really takes time, I don't know what triggers the restyle yet:

10-20 23:43:59.998 D/x       ( 4339): Processing 3 pending restyles with 3 restyle roots for app://system.gaiamobile.org/index.html
10-20 23:43:59.998 D/x       ( 4339):   processing style root div@ae34c880 data-z-index-level="gesture-panel" class="gesture-panel" id="right-panel" Block(div)(103)@af105520 at index 2
10-20 23:43:59.998 D/x       ( 4339):     aRestyleHint = eRestyle_Self, aChangeHint = NS_STYLE_HINT_NONE
10-20 23:43:59.998 D/x       ( 4339):       RestyleSelf Block(div)(103)@af105520, aRestyleHint = eRestyle_Self
10-20 23:43:59.998 D/x       ( 4339):         parentContext = aba95b70
10-20 23:43:59.998 D/x       ( 4339):         oldContext = ae2ecd90, newContext = aa14a868
10-20 23:43:59.998 D/x       ( 4339):         CaptureChange, ourChange = NS_STYLE_HINT_VISUAL, aChangeToAssume = NS_STYLE_HINT_NONE
10-20 23:43:59.998 D/x       ( 4339):           appending change NS_STYLE_HINT_VISUAL
10-20 23:43:59.998 D/x       ( 4339):           mHintsNotHandledForDescendants = NS_STYLE_HINT_NONE
10-20 23:43:59.998 D/x       ( 4339):         continuing restyle since there is different style data: Visibility
10-20 23:43:59.998 D/x       ( 4339):         swapping style structs between ae2ecd90 and aa14a868
10-20 23:43:59.998 D/x       ( 4339):         setting new style context
10-20 23:43:59.998 D/x       ( 4339):         returning eRestyleResult_Continue
10-20 23:43:59.998 D/x       ( 4339):       RestyleContentChildren
10-20 23:43:59.998 D/x       ( 4339):   processing style root div@ae34c820 data-z-index-level="gesture-panel" class="gesture-panel" id="left-panel" Block(div)(101)@af105288 at index 1
10-20 23:43:59.998 D/x       ( 4339):     aRestyleHint = eRestyle_Self, aChangeHint = NS_STYLE_HINT_NONE
10-20 23:43:59.998 D/x       ( 4339):       RestyleSelf Block(div)(101)@af105288, aRestyleHint = eRestyle_Self
10-20 23:43:59.998 D/x       ( 4339):         parentContext = aba95b70
10-20 23:43:59.998 D/x       ( 4339):         oldContext = ae2ec920, newContext = ae2ecd90
10-20 23:43:59.998 D/x       ( 4339):         CaptureChange, ourChange = NS_STYLE_HINT_VISUAL, aChangeToAssume = NS_STYLE_HINT_NONE
10-20 23:43:59.998 D/x       ( 4339):           appending change NS_STYLE_HINT_VISUAL
10-20 23:43:59.998 D/x       ( 4339):           mHintsNotHandledForDescendants = NS_STYLE_HINT_NONE
10-20 23:44:00.008 D/x       ( 4339):         continuing restyle since there is different style data: Visibility
10-20 23:44:00.008 D/x       ( 4339):         swapping style structs between ae2ec920 and ae2ecd90
10-20 23:44:00.008 D/x       ( 4339):         setting new style context
10-20 23:44:00.008 D/x       ( 4339):         returning eRestyleResult_Continue
10-20 23:44:00.008 D/x       ( 4339):       RestyleContentChildren
10-20 23:44:00.008 D/x       ( 4339):   processing style root div@b1542be0 data-z-index-level="app" id="windows" HTMLScroll(div)(14)@ae5cf7a8 at index 0
10-20 23:44:00.008 D/x       ( 4339):     aRestyleHint = eRestyle_Subtree, aChangeHint = NS_STYLE_HINT_NONE
10-20 23:44:00.008 D/x       ( 4339):       RestyleSelf HTMLScroll(div)(14)@ae5cf7a8, aRestyleHint = eRestyle_Subtree
10-20 23:44:00.008 D/x       ( 4339):         parentContext = aba95b70
10-20 23:44:00.008 D/x       ( 4339):         oldContext = ac37aed0, newContext = ac37aed0 (same)
10-20 23:44:00.008 D/x       ( 4339):         returning eRestyleResult_Continue
10-20 23:44:00.008 D/x       ( 4339):       RestyleUndisplayedChildren: undisplayed->mContent = b039f880
10-20 23:44:00.008 D/x       ( 4339):       RestyleContentChildren...
(In reply to Ting-Yu Chou [:ting] from comment #43)
> (In reply to Ting-Yu Chou [:ting] from comment #37)
> > b2g process is busy on a restyle from aw__render()
> 
> I enabled restyle logging, there are 3 pending restyles in the first call to
> RestyleTracker::DoProcessRestyles() after touching Messages icon, and the
> one affects "windows" div really takes time, I don't know what triggers the
> restyle yet:

aw__render() calls appendChild() on the "windows" div, not sure is it. Will double check tomorrow.
Attachment #8505423 - Attachment is patch: true
Attachment #8505423 - Attachment mime type: text/x-patch → text/plain
Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(sotaro.ikeda.g)
(In reply to Ting-Yu Chou [:ting] from comment #42)
> (In reply to Ting-Yu Chou [:ting] from comment #37)
> > (In reply to Ting-Yu Chou [:ting] from comment #26)
> > > 1. The RecvShow() in Messages process [1270,1419] delays RecvLoadURL(), note
> > > it includes sending sync messages.
> > 
> > There're three:
> > 
> > 1. SendPRenderFrameConstructor()
> > 2. SendPLayerTransactionConstructor()
> 
> Sotaro, do you have any ideas how can we improve those two?

One idea is use asynchronize technique like AsyncTransactionTracker. But more modification seems necessary to address interdependency.
Anyway, more than 100ms in this function is huge, we need to improve it.
Flags: needinfo?(sotaro.ikeda.g)
Depends on: 1085655
Bug 1085655 is created for TabChild::InitRenderingState().
Depends on: 1003848
(In reply to Ting-Yu Chou [:ting] from comment #44)
> aw__render() calls appendChild() on the "windows" div, not sure is it. Will
> double check tomorrow.

Yes, it is http://lxr.mozilla.org/gaia/source/apps/system/js/app_window.js#574 triggers RestyleManager::RestyleForAppend() which falls to the path of NODE_HAS_SLOW_SELECTOR and restyle the subtree of div #windows.
dbaron, do you have any suggestions on how to avoid a container restyle when appendChild()? Thanks.
Flags: needinfo?(dbaron)
Avoid using :nth-last-child(), or :last-child selectors that could match children of that element, or :-moz-empty-except-children-with-localname() selectors that could match the element itself.
Flags: needinfo?(dbaron)
After taking these two rules off from sound_manager.css:

  #volume section div:nth-last-child(2)
  #volume[data-channel="bt_sco"] section div:nth-last-child(2)

the RestyleManager::RestyleForAppend() from aw__render() falls to path NODE_HAS_EDGE_CHILD_SELECTOR.

But strangely, the parent container of #volume (https://github.com/mozilla-b2g/gaia/blob/master/apps/system/index.html#L1200) is div #system-overlay not div #windows (https://github.com/mozilla-b2g/gaia/blob/master/apps/system/index.html#L640), I don't know why it is #CallscreenWindow_0 matches the rules and set #windows NODE_HAS_SLOW_SELECTOR.

10-22 17:30:29.110 D/x       (  206): ePseudoClass_nthLastChild #volume section div:nth-last-child(2)
10-22 17:30:29.110 D/x       (  206): nthChildGenericMatches elem=div@ac3a4b80 id="CallscreenWindow_0" class="appWindow attentionWindow callscreenWindow hidden render" loading-state="false" style="background-image: none;" aria-hidden="true" parent=0xabde5b80
10-22 17:30:47.250 I/GeckoDump(  206): containerElement > appendChild
10-22 17:30:47.260 D/x       (  206): RestyleManager::RestyleForAppend NODE_HAS_SLOW_SELECTOR container=div@abde5b80 data-z-index-level="app" id="windows"
10-22 17:30:47.260 I/GeckoDump(  206): containerElement < appendChild

dbaron, do you think this is a bug?
Flags: needinfo?(dbaron)
BTW, after launching an app, the first come nsRefreshDriver::Tick() on b2g process really takes time (>90ms). I thought the only painting is splash icon enlarge animation which can be done from hardware and shouldn't take too much time on b2g main thread. Not sure how can I check what are painted in the tick.
Depends on: 1087329
I created bug 1087329 to track improvements that could be in Gaia only. I think I have some leads already.
(In reply to Ting-Yu Chou [:ting] from comment #25)
> Thanks BenWa, that helps. I grabbed another profile with interval 10ms:
> 
> http://people.mozilla.org/~bgirard/cleopatra/
> #report=43d5c3cd9b28dbb7216e8c6cff92d332d115e003&select=1100,2499
> 

From the above profile, before Message app's first LoadUri, b2g process is very busy, it seems to block gecko ipc from Message app's main thread to b2g process's main thread.
(In reply to Sotaro Ikeda [:sotaro] from comment #53)
> From the above profile, before Message app's first LoadUri, b2g process is
> very busy, it seems to block gecko ipc from Message app's main thread to b2g
> process's main thread.

Yes, the things I noticed are comment 47 and comment 51.
(In reply to Ting-Yu Chou [:ting] from comment #50)
> After taking these two rules off from sound_manager.css:
> 
>   #volume section div:nth-last-child(2)
>   #volume[data-channel="bt_sco"] section div:nth-last-child(2)
> 
> the RestyleManager::RestyleForAppend() from aw__render() falls to path
> NODE_HAS_EDGE_CHILD_SELECTOR.
> 
> But strangely, the parent container of #volume
> (https://github.com/mozilla-b2g/gaia/blob/master/apps/system/index.
> html#L1200) is div #system-overlay not div #windows
> (https://github.com/mozilla-b2g/gaia/blob/master/apps/system/index.
> html#L640), I don't know why it is #CallscreenWindow_0 matches the rules and
> set #windows NODE_HAS_SLOW_SELECTOR.
> 
> 10-22 17:30:29.110 D/x       (  206): ePseudoClass_nthLastChild #volume
> section div:nth-last-child(2)
> 10-22 17:30:29.110 D/x       (  206): nthChildGenericMatches
> elem=div@ac3a4b80 id="CallscreenWindow_0" class="appWindow attentionWindow
> callscreenWindow hidden render" loading-state="false"
> style="background-image: none;" aria-hidden="true" parent=0xabde5b80
> 10-22 17:30:47.250 I/GeckoDump(  206): containerElement > appendChild
> 10-22 17:30:47.260 D/x       (  206): RestyleManager::RestyleForAppend
> NODE_HAS_SLOW_SELECTOR container=div@abde5b80 data-z-index-level="app"
> id="windows"
> 10-22 17:30:47.260 I/GeckoDump(  206): containerElement < appendChild
> 
> dbaron, do you think this is a bug?

I don't understand the question, but I suspect that it's not a bug.
Flags: needinfo?(dbaron)
dbaron is right, it's not a bug.

Since sound_manager.css is applied to the index.html of systeam app, when CallscreenWindow calls insertAdjacentHTML() to insert its HTML to #windows div, the rules from sound_manager.css are actually checked and that's when nthChildGenericMatches() sets NODE_HAS_SLOW_SELECTOR to #windows div.

I'll see if we can limit the scope of sound_manager.css.
Ting-Yu, how do you enable these logs? Could be useful to check this during the SMS launch period too.
I added them manually...
Depends on: 1087905
Created bug 1087905 for the heavy restyle of #windows div when aw__render().
Duplicate of this bug: 1079582
(In reply to Ting-Yu Chou [:ting] from comment #51)
> I thought the only painting is splash
> icon enlarge animation which can be done from hardware and shouldn't take
> too much time on b2g main thread.

I found there're other updates: .gesture-panel (left/right), #statusbar, .statusbar-shadow, .notifications-shadow, .sb-icon-*.
Depends on: 1089920
Filed bug 1089920 to investigate the issue with gaia header.
I tested Wilson's patch from comment 38 and it seems to improve the situation a bit (but not at every launch). The SMS app performance is still worse than the v2.0 application on the same gecko (note that I compared v2.0 with wilson's branch, not v2.1 + wilson's patch, because there were conflicts. We could test again with a v2.1 specific patch). 

We'll really need to profile both versions to see what changed.
Depends on: 1096745
triage: feature, instead of a blocking defect.
blocking-b2g: 2.2? → backlog
I had a talk with Steve a while back, and Thinker would like me to post it here.

Ting:  Have we tried to batch the messages read from IndexedDB and insert them
       (maybe a page of them) to DOM once?
Steve: Yes, but lauch time is not improved.
Ting:  Thinker has an idea that what if we remember the font size of gaia-header,
       so we don't need runFontFit() at every launch?
Steve: Probably can do, but there're things to consider such as you'll need to
       know when to recalculate it e.g., when language is changed.

Note Julien has found gaia-header is behaving particularly badly in the Messages app, see bug 1089920.
How would you "remember" the font size? Using localStorage? IndexedDB? A cookie? All these potential solutions have also issues :/ (and of course the language issue)

The main issue with gaia header is not how it's calculating, it's how it gets the data to calculate, which triggers sync reflows. My experimentation is about giving this data as a parameter instead of taking it from the DOM. We'll see :)
(In reply to Julien Wajsberg [:julienw] from comment #66)
> gets the data to calculate, which triggers sync reflows. My experimentation
> is about giving this data as a parameter instead of taking it from the DOM.
> We'll see :)

Great, please let me know if there's anything I can help ;)
(In reply to Julien Wajsberg [:julienw] from comment #66)
> How would you "remember" the font size? Using localStorage? IndexedDB? A
> cookie? All these potential solutions have also issues :/ (and of course the
> language issue)
Are you talking the latency of these APIs?  If it is, we could try to improve it for launch time.
Do you think that the b2g process sending cookie string to the content along with or immediately after LoadURI() is helpful?
I just find that we have already do preloading for localstorage.  It should be fast enough.
(In reply to Thinker Li [:sinker] from comment #70)
> I just find that we have already do preloading for localstorage.  It should
> be fast enough.

But not at save time.

FTR we already investigated a lot storing things to be faster at startup, it always led to worse peformance.
Seems BenWa is working on bug 1098495 which improves app switching animation.
Julien, since you are working on this bug, could I assign it to you?
Actually I see this bug more as a meta. Not sure it shuold have an assignee :)
Keywords: meta
Summary: SMS app launch latency is higher than LA in v2.1 → [META] SMS app launch latency is higher than LA in v2.1
Depends on: 835679
Depends on: 1102154
2.2 feature tracking.
feature-b2g: --- → 2.2+
No longer blocks: B2G-Multicore
No longer depends on: 1102154
Depends on: 1107259
No longer blocks: AppStartup
Depends on: AppStartup
Blocks: AppStartup
No longer depends on: AppStartup
Depends on: 1102154
please remove 2.2+ since this bug is not in the scope.
Flags: needinfo?(bchien)
(Assignee)

Comment 77

4 years ago
As discussion with Kevin and Ken, this is in v2.2 feature scope. Marked this for scope tracking.
Flags: needinfo?(bchien)
Flags: in-moztrap?(edchen)
QA Contact: edchen
QA Whiteboard: [COM=Gaia::SMS][2.2-feature-qa+]
(Assignee)

Updated

4 years ago
Assignee: nobody → bchien
blocking-b2g: backlog → ---
Depends on: 1119626

Updated

4 years ago
blocking-b2g: --- → 2.2?
feature-b2g: 2.2+ → ---
Triage: Blocking on a meta-bug is too risky: you can't know how many blockers you have left before shipping and it makes the approval process harder to follow.

We recommend to nominate all the actual performance bugs.
blocking-b2g: 2.2? → -
Flags: in-moztrap?(edchen) → in-moztrap-

Updated

4 years ago

Updated

4 years ago
No longer blocks: CAF-v3.0-FL-metabug

Comment 79

4 years ago
CAF (Inder) expressed a preference to track performance meta bugs like these instead of their more granular blockers since, in their opinion, the individual blockers tend to change and don't always reflect a direct connection to the primary performance issue. CAF will track and update (close) these bugs when the reported performance issue is resolved.
(Assignee)

Comment 80

4 years ago
Cold launch time (Visually Loaded) on Flame:
Datazilla    Master    v2.2    v2.1
Camera       1375      1383    1634
Clock        1266      1164    1041
Contacts      860       806     883
FM Radio      627       554     542
Gallery      1000       940     969
Messages     1392      1255    1262
Music        1312      1164     929 
Phone         615       554     567
Settings     2537      2573    2586
Video        1054      1022     964

Updated

4 years ago
No longer blocks: CAF-v2.2-metabug
(Assignee)

Updated

4 years ago
No longer blocks: 1079506
(Assignee)

Comment 82

4 years ago
Main stream is moving to 2.5 on Flame and Aries (Sony Z3C). FxOS 2.1 is low priority now. Mark as wontfix. Please reopen if any special request.

Please follow up bug 1181017 for Message app in v2.5.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
tracking-b2g: --- → -
Flags: needinfo?(yor)
Resolution: --- → WONTFIX
See Also: → 1181017
You need to log in before you can comment on or make changes to this bug.