Open Bug 1164539 Opened 9 years ago Updated 2 years ago

[meta] postMessage during document load is slow

Categories

(Core :: General, defect, P5)

ARM
Gonk (Firefox OS)
defect

Tracking

()

tracking-b2g backlog

People

(Reporter: wilsonpage, Unassigned, NeedInfo)

References

(Depends on 3 open bugs)

Details

(Keywords: meta, perf, Whiteboard: [platform])

Attachments

(4 files)

window.postMessage() between window and iframe during document load appears to be delayed.

TEST-CASE: https://github.com/wilsonpage/post-message-startup-perf

1. git clone
2. open index.html in desktop (or flash app to phone)
3. Check console for timings
4. Press 'reconnect' button in app
5. Check console for timings

---

We have observed the test to be up to 10x slower at startup.
Chrome seems to have consistent postMessage timings, during load and loaded.
I wonder if we should just have a separate runnable queue for window <-> window messaging.
Called around the end of a main thread task.
Split test-case [1] into BroadcastChannel example and window.postMessage() example.

BroadcastChannel appears to have a similar delay on document load.

[1] https://github.com/wilsonpage/post-message-startup-perf
See Also: → messages-nga-services
Sounds like also Chrome just uses main thread the way we do, so using a separate queue for 
post messages might be too regression risky.

What I don't understand where we spend time between posting the message and receiving the message on the other end, and why chrome doesn't take any time there even during page load.
See Also: → 1170644
From the profile [1] Wilson captured, I see two big chunks at B2G process:

- [3602, 3662] system app handling mozbrowserloadend event
- [3677, 3758] nsRefreshDriver::Tick

The long refresh driver tick seems relate to bug 1110625, bug 1163471 is trying to fix it.

[1] https://people.mozilla.org/~bgirard/cleopatra/#report=d6b4d410427c59d9240294571fc6ca7eb93c3941
(In reply to Ting-Yu Chou [:ting] from comment #6)
> From the profile [1] Wilson captured, I see two big chunks at B2G process:
> 
> - [3602, 3662] system app handling mozbrowserloadend event
> - [3677, 3758] nsRefreshDriver::Tick
> 
> The long refresh driver tick seems relate to bug 1110625, bug 1163471 is
> trying to fix it.
> 
> [1]
> https://people.mozilla.org/~bgirard/cleopatra/
> #report=d6b4d410427c59d9240294571fc6ca7eb93c3941

I just learnt from Cervantes that broadcast channel message is dispatched to listeners by B2G process background thread, not main thread, and will be handled in content process main thread. So these two chunks are not important at all.
Since the sender and receiver are the same content process, yield after postMessage() as early as possible to receive the message earlier. Off load JS parsing and CSS rule matching should also help since this is during app startup.

I'll double check the profile once I reach a PC.
I see 4 postMessage() and 4 onmessage() in process Contactsproto:13409, I assume they're in pair:

1. [3426, 3493]
  1a. [3427, 3432] nsInputStreamPump::OnStateStop() / threads.js is still running after postMessage()
  1b. [3438, 3481] nsHtml5TreeOpExecutor::RunFlushLoop() / list.js, contacts.js, and styles
  1c. [3482, 3491] Timer::Fire() / PresShell::Flush(Flush_Layout)
2. [3632, 3781]
  2a. [3640, 3658] PCompositor::RecvDidComposite() / PresShell::Paint
  2b. [3660, 3672] nsInputStreamPump::OnStateStart() / AppsService.js
  2c. [3687, 3759] nsInputStreamPump::OnStateStop() / parse threads.js and run
  2d. [3766, 3779] nsHTML5TreeOpExecutor::RunFlushLoop() / contacts.js
3. [3752, 3814]
  2c.
  2d.
  3a. [3783, 3795] nsRefreshDriver::Tick / PresShell::Flush(Flush_InterruptibleLayout)
  3b. [3800, 3813] nsRefreshDriver::Tick / PresShell::Paint
4. [3854, 3864]

One thing I noticed is 1a and 2c seems exactly the same thing, parsing and loading threads.js but from 2 different iframes. :smaug, is it ok to skip the 2nd parsing?

Another idea came up is the nested OOP so running scripts at outer iframe does not block the inner iframe to receive broadcast message. I just asked Kanru, it can be enabled by the pref "dom.ipc.tabs.nested.enabled", but it's not complete yet, he says using IndexedDB and Blob could crash it.
Flags: needinfo?(bugs)
(In reply to Ting-Yu Chou [:ting] from comment #9)
> One thing I noticed is 1a and 2c seems exactly the same thing, parsing and
> loading threads.js but from 2 different iframes. :smaug, is it ok to skip
> the 2nd parsing?

This could be fixed by JS 'bytecode-caching' (bug 1164389)

> Another idea came up is the nested OOP so running scripts at outer iframe
> does not block the inner iframe to receive broadcast message. I just asked
> Kanru, it can be enabled by the pref "dom.ipc.tabs.nested.enabled", but it's
> not complete yet, he says using IndexedDB and Blob could crash it.

I don't really understand what this means :)
(In reply to Wilson Page [:wilsonpage] from comment #10)
> > Another idea came up is the nested OOP so running scripts at outer iframe
> > does not block the inner iframe to receive broadcast message. I just asked
> > Kanru, it can be enabled by the pref "dom.ipc.tabs.nested.enabled", but it's
> > not complete yet, he says using IndexedDB and Blob could crash it.
> 
> I don't really understand what this means :)

When there's only one process for nested iframes, if the main thread is running JS of outer iframe, even the message is arrived, it will be queued til the JS finishs. But if there're two processes (one for each iframe), then the inner iframe will not affect by outer iframe running whatever JS.
We are nowhere near having support for oop-iframe in general, and it will probably never work in case
the iframe is same-domain as its parent.


And yes, something like Bug 1164389 could help quite a bit when loading large script libraries to
several iframes.

But for now, any chance to split threads.js to smaller pieces? Some small part which initiates the communication between iframes and so, and then load large library when needed?
Flags: needinfo?(bugs)
(In reply to Olli Pettay [:smaug] from comment #12)
> But for now, any chance to split threads.js to smaller pieces? Some small
> part which initiates the communication between iframes and so, and then load
> large library when needed?

This is completely possible. I've already done some tests with smaller pieces and we can see some improvements in app load time. If/when we do have byte-code caching, it might be better to use the single large library so the other frames get super quick load?
(In reply to Olli Pettay [:smaug] (Reviewing overload, no new review requests before June 7, please) from comment #12)
> We are nowhere near having support for oop-iframe in general, and it will
> probably never work in case
> the iframe is same-domain as its parent.

Could you please explain me a bit, I don't understand why it won't work in this case? Thanks.
Because the web relies on that JS in window A and B run in the same thread in case A and B can access each others (which is the case when A for example has B in an iframe and A and B are from the same domain).
(In reply to Olli Pettay [:smaug] (Reviewing overload, no new review requests before June 7, please) from comment #15)
> Because the web relies on that JS in window A and B run in the same thread
> in case A and B can access each others (which is the case when A for example
> has B in an iframe and A and B are from the same domain).

If iframes are 'sandboxed' windows can't access each others' internal state. Could run in separate processes/threads under these conditions?
My crude diagnosis so far:

1. Script parse is quite expensive and should be minimised on startup. Byte-code caching (bug 1164389) could help here.
2. Creating new documents is quite expensive, especially on b2g where injected scripts do stuff to the document (eg. BrowserElementChildPreload.js bug 1170644)
3. There's some random GC happening on startup sometimes and this is blocking.
4. For an empty document we're painting too often (need gfx contact for more info here)
5. 'Main Thread IO!' is happening a lot (dunno if that's normal)
6. BroadcastChannel.postMessage takes longer to reach .onmessage handler than window.postMessage. This is like due to requiring two tasks to process as opposed to one.
7. Layout may be invalidated when injecting hidden iframe into document (probably shouldn't?).

Latest profile: http://people.mozilla.org/~bgirard/cleopatra/#report=b5ea5763e27bcc8238d4e15ef8af2b67c84c7fa2
Bug 1154987 and bug 1166173 would also be helpful.
benwa: Can you take a look at comment 17 and see if there is anything unusual gfx wise during startup of this demo app [1]. The app has no styling and simply loads one <iframe> from initial markup and then another via script.

Right now it's taking to long for us to receive a data from the last iframe via postMessage and we're trying to figure out why.

[1] https://github.com/wilsonpage/contacts_prototype/tree/perf/using-threads-library
Flags: needinfo?(bgirard)
What if move broadcast message listener to a worker, can we do the work off main thread?
(In reply to Wilson Page [:wilsonpage] from comment #17)
> My crude diagnosis so far:

I think your diagnosis is correct and would be nice to measure each point of this list.
 
> 2. Creating new documents is quite expensive, especially on b2g where
> injected scripts do stuff to the document (eg. BrowserElementChildPreload.js
> bug 1170644)

Are we planning to rewrite it in C++? Would this help?
NI fabrice for this.

> 5. 'Main Thread IO!' is happening a lot (dunno if that's normal)

Can you give us a list of those?
Flags: needinfo?(fabrice)
(In reply to Andrea Marchesini (:baku) from comment #21)
> (In reply to Wilson Page [:wilsonpage] from comment #17)
> > My crude diagnosis so far:
> 
> I think your diagnosis is correct and would be nice to measure each point of
> this list.
>  
> > 2. Creating new documents is quite expensive, especially on b2g where
> > injected scripts do stuff to the document (eg. BrowserElementChildPreload.js
> > bug 1170644)
> 
> Are we planning to rewrite it in C++? Would this help?
> NI fabrice for this.
> 
> > 5. 'Main Thread IO!' is happening a lot (dunno if that's normal)
> 
> Can you give us a list of those?

You can see them in the profile linked to [1] in comment 17. Make sure you have the 'Content (Contactsprotot:9304)' process highlighted (blue).

[1] http://people.mozilla.org/~bgirard/cleopatra/#report=b5ea5763e27bcc8238d4e15ef8af2b67c84c7fa2
The main thread I/O is a bug in the profiler. All the IPC calls cause it, and BenWa said on IRC to just not care about it (except it there is actual real file I/O or some such happening).
Kanru, do you think the loading of BrowserElementChildPreload.js can be a problem here?
any plan to rewrite Browser Element API in C++?
Flags: needinfo?(fabrice) → needinfo?(kchen)
(In reply to Andrea Marchesini (:baku) from comment #24)
> Kanru, do you think the loading of BrowserElementChildPreload.js can be a
> problem here?
> any plan to rewrite Browser Element API in C++?

I don't think the loading of BrowserElementChildPreload.js could be a problem other than the sync reflow it causes. Yes, there was a plan to rewrite Browser Element API in C++ and part of it has already been converted. Looks like it's time to convert the remaining parts to C++ too.
Flags: needinfo?(kchen)
Is rewriting the Browser Element API in C++ the only thing we can do here?
Split threads.js to smaller pieces so that we don't load massive amount scripts before doing anything.
Bug 1154987 may help, if there are spare cores on the device, and Bug 1164389 should help when
loading the same script to several documents.

From comment 17 (4) is an interesting case. Someone more familiar with layout or graphics should take a look. Why are we painting so often?

(I don't know what (7) from that list means. visibility: hidden or display: none; or what?)
(In reply to Olli Pettay [:smaug] from comment #27)
> From comment 17 (4) is an interesting case. Someone more familiar with
> layout or graphics should take a look. Why are we painting so often?

I NI'd :benwa, but no response as of yet.

> (I don't know what (7) from that list means. visibility: hidden or display:
> none; or what?)

`display: none`
When CC or GC happens, are we dealing with a new process or loading some new html document to existing process? Since CC/GC are normally postponed while starting up a new process.
(In reply to Olli Pettay [:smaug] from comment #29)
> When CC or GC happens, are we dealing with a new process or loading some new
> html document to existing process? Since CC/GC are normally postponed while
> starting up a new process.

My test-case just works with iframes, so assuming all in same process.
Well, Contactsproto is some separate process.
Yes, please ignore the main thread IO unless there's actual file activity happening in the timeline there.

(In reply to Wilson Page [:wilsonpage] from comment #19)
> benwa: Can you take a look at comment 17 and see if there is anything
> unusual gfx wise during startup of this demo app [1]. The app has no styling
> and simply loads one <iframe> from initial markup and then another via
> script.

I don't see anything wrong. I just see that app running too much scripts and that being the bottleneck to starting fast. There's a fraction of gfx work to put our a frame here and there but nothing that really holds back the timeline.

Let me know if you have something in specific you want information on.
Flags: needinfo?(bgirard)
Oli can you relay your paint concerns to Benwa?
Flags: needinfo?(bugs)
I think it is more about layout and invalidation stuff. So, some layout developer might be better to answer why we need to paint so often.
dholbert, maybe you have some input to comment 17 (4).
Flags: needinfo?(bugs) → needinfo?(dholbert)
(In reply to Olli Pettay [:smaug] from comment #34)
> dholbert, maybe you have some input to comment 17 (4).
(In reply to Wilson Page [:wilsonpage] from comment #17)
> 4. For an empty document we're painting too often (need gfx contact for more
> info here)
> http://people.mozilla.org/~bgirard/cleopatra/#report=b5ea5763e27bcc8238d4e15ef8af2b67c84c7fa2

We're painting 60 times a second; for an emptyish document, I'd hope we could skip some of those. We're spending nonzero time in style recalculation / display list construction, so we apparently think there are *some* style changes that are happening & need flushing (which trigger repaints).

I can't really figure out anything more from this profile, though, since it doesn't have C++ backtraces. Do we have any sort of sampling profiler (like the gecko profiler add-on) which works on B2G?
Flags: needinfo?(dholbert)
You can use -f stackwalk to get stackwalk on b2g. This will give you less accurate sampling but with more details. The cause for the style flush is coming from the (previous?) tick. So there's definitely something occurring.
Also, immediately before each style flush, the waterfall shows a teeny "scripts" blip, which corresponds to this section:
http://mxr.mozilla.org/mozilla-central/source/layout/base/nsRefreshDriver.cpp?rev=f52c18aac7ce&mark=1540-1567#1540
... which I think means we're running some requestAnimationFrame (rAF) callbacks.

Those rAF scripts could conceivably be restyling something in a way that makes us flush styles.

Alternately/also: since this is at document load-time, we might be progressively loading the document's stylesheets, and we're getting more and more of the styles loaded every refresh-driver tick, and that impacts the <body> or something in such a way that we think we need to repaint.

(These are semi-shots in the dark; not sure if either is on the right track here.)
We should have all of the UA style sheets available in the style sheet cache, so I wouldn't expect we need to wait for any loads there.  If there are other style sheets loaded by the page/iframe then that's another matter.
Depends on: 1170644
No longer depends on: 1150172
Depends on: 1150172
Devtools performance profile of an empty document with an empty <iframe hidden> sending a child-ping<->parent-pong postMessage() during document load.

We can see it's taking >70ms to receive the 'pong' message back from the parent window. This is almost instant when the document is idle.

There is a small 'layout' event and a 'paint' event that could be partially responsible for blocking the 'message' event here.

There are a succession of small layout and recalc-style events at the start of the profile which don't seem necessary considering we don't have any nodes to layout (excl. <html> and <body>).
I will follow this up with a proper Gecko profile, just waiting for B2G to build...
(In reply to Daniel Holbert [:dholbert] (vacation 8/27-9/13) from comment #35)
> (In reply to Olli Pettay [:smaug] from comment #34)
> > dholbert, maybe you have some input to comment 17 (4).
> (In reply to Wilson Page [:wilsonpage] from comment #17)
> > 4. For an empty document we're painting too often (need gfx contact for more
> > info here)
> > http://people.mozilla.org/~bgirard/cleopatra/#report=b5ea5763e27bcc8238d4e15ef8af2b67c84c7fa2
> 
> We're painting 60 times a second; for an emptyish document, I'd hope we
> could skip some of those. We're spending nonzero time in style recalculation
> / display list construction, so we apparently think there are *some* style
> changes that are happening & need flushing (which trigger repaints).
> 
> I can't really figure out anything more from this profile, though, since it
> doesn't have C++ backtraces. Do we have any sort of sampling profiler (like
> the gecko profiler add-on) which works on B2G?

How can I get C++ backtraces included in my profile?
Flags: needinfo?(dholbert)
It appears from this profile that this function within BrowserElementChildPreload.js [1] is blocking the main thread for 33ms.

I'm not exactly sure what the job of this function is, but wondering if it could be delayed until after we've got content painted?

[1] https://github.com/mozilla/gecko-dev/blob/master/dom/browser-element/BrowserElementChildPreload.js#L1583
After our app has finished doing its thing we can see ~50ms is spent in System app:

33ms - spent in window management, 22ms of which is statusbar.js which is causing a sync reflow.

11ms - Spent inside the l10n.js responding to DOM mutations.
Between BrowserElementChild and BrowserElementParent scripts we're adding ~100ms to the critical-path of every fxos app. Although this work must be done, I'm sure majority of it isn't as critical as letting the app render its content.

FULL PROFILE: https://people.mozilla.org/~bgirard/cleopatra/#report=9b2be1600fbc5bb79d5468ffc515896905a26631
Other interesting parts:

52ms - Painting empty document
49ms - Layout of empty document (Flush_InterruptibleLayout)
30ms - In clock_gettime (system/lib/libc.so)
(In reply to Wilson Page [:wilsonpage] from comment #42)
> Created attachment 8657076 [details]
> ping-pong-gecko-profile.png
> 
> It appears from this profile that this function within
> BrowserElementChildPreload.js [1] is blocking the main thread for 33ms.
> 
> I'm not exactly sure what the job of this function is, but wondering if it
> could be delayed until after we've got content painted?
> 
> [1]
> https://github.com/mozilla/gecko-dev/blob/master/dom/browser-element/
> BrowserElementChildPreload.js#L1583

That is bug 926452
Depends on: 926452
Depends on: 1179723
> > I can't really figure out anything more from this profile, though, since it
> > doesn't have C++ backtraces. Do we have any sort of sampling profiler (like
> > the gecko profiler add-on) which works on B2G?
> 
> How can I get C++ backtraces included in my profile?

I don't know (on b2g), sorry. (I was asking the same question, RE a sampling profiler, which would have c++ backtraces)
Flags: needinfo?(dholbert)
(In reply to Daniel Holbert [:dholbert] (vacation 8/27-9/13) from comment #48)
> > How can I get C++ backtraces included in my profile?
> 
> I don't know (on b2g), sorry. (I was asking the same question, RE a sampling
> profiler, which would have c++ backtraces)

You need to include "stackwalk" [1] in the start command, e.g., ./profile.sh -f stackwalk,js

[1] https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Profiling_with_the_Built-in_Profiler#Stackwalk
Wilson, the clock_gettime() you asked me yesterday is the regression from bug 1166207 and Cervantes is fixing it.
Depends on: 1166207
blocking-b2g: --- → 2.5+
Making it a P1. This bug is a must have for Music App, and as listed, it makes the app 10 times slower.
Priority: -- → P1
Given that this is a meta bug, moving from DOM to General.
Component: DOM → General
After Cervantes fixes bug 1166207, does Music app still have performance regression bugs like what comment 51 mentions?
Flags: needinfo?(hkoka)
Whiteboard: [platform]
(In reply to Ken Chang[:ken] from comment #53)
> After Cervantes fixes bug 1166207, does Music app still have performance
> regression bugs like what comment 51 mentions?

We've worked around the issue by using a tip from :smaug that gives us synchronous postMessage-like behaviour using window.dispatchEvent().

As I said this is a 'workaround' and masks the problem. Our postMessage task is just one in a long queue of 'runnables' during document load. This means message can't be delivered until the other tasks ahead of it have completed.

To speed this up we have some options:

- Do less stuff on startup (tricky when you have  UI to draw)
- Move some of these blocking tasks of main-thread (dependencies of this bug tracking some of these efforts)
- Somehow prioritise postMessage tasks over others on the main-thread

We'd need someone with in-depth platform knowledge in this area to take this further.
Keywords: meta
Summary: postMessage during document load is slow → [meta] postMessage during document load is slow
(In reply to Ken Chang[:ken] from comment #53)
> After Cervantes fixes bug 1166207, does Music app still have performance
> regression bugs like what comment 51 mentions?

As Wilson mentioned in the recent comment, we have a workaround and as a result this bug will not block Music app for 2.5. 

I am Ni'ing Francisco also here to make sure Messages and Contacts are not affected by this one. 

Thanks
hema
Flags: needinfo?(hkoka) → needinfo?(fjordano)
Keywords: perf
OS: Unspecified → Gonk (Firefox OS)
Hardware: Unspecified → ARM
Flags: needinfo?(fjordano) → needinfo?(francisco)
I've been double checking with Julien and Oleg and they confirm they are not using the postMessage until visually completed.

This will change after 2.5, but definitely we won't be blocking here for the branching.
Flags: needinfo?(francisco)
(In reply to Francisco Jordano [:arcturus] [:francisco] from comment #56)
> I've been double checking with Julien and Oleg and they confirm they are not
> using the postMessage until visually completed.
> 
> This will change after 2.5, but definitely we won't be blocking here for the
> branching.

Based on the above and the implementation of workaround, this is not currently blocking for Nov 2nd RA for 2.5 release. However, please put this on the priority list for the next release. 

Thanks
Hema
blocking-b2g: 2.5+ → ---
[Tracking Requested - why for this release]:

Adding the tracking flag as this is a Priority to handle post 2.5
Depends on: 1218844
No longer depends on: 1218844
Decreasing the priority as no update for the last 2 years on this bug.
See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage 
about the priority meaning.
Priority: P1 → P5
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: