581930 - [E10S] IPC messages overhead in TabChild/Parent progress listener

Reporter

Description

•

14 years ago

I've found that on mobile device we are loading some pages ~2x slower.

I was testing on  HS.fi, and found 2 problems:
1) Time between LoadURL call and Title signal ~2x more on e10s fennec comparing to normal Fennec. most of time wasted in sending Progress messages from Child to Parent.

For hs.fi between start loading and title changed approximately we have:
>TabChild:OnStatusChange   - 350 calls
>TabChild:OnStateChange    - 356 calls
>TabChild:OnProgressChange - 118 calls

I think we should do something about it

Oleg Romashin (:romaxa)

Reporter

Comment 1

•

14 years ago

Second problem is plugins, time between title-change and first paint  increased (2 seconds more) because of plugins initialization and loading...

I think it would be nice to delay plugins loading, and paint at first initial page content, and then start plugins initialization  (process execution e.t.c.)

Oleg Romashin (:romaxa)

Reporter

Comment 2

•

14 years ago

Attached file More correct numbers, between startload and titleChange — Details

56  - Progress
237 - OnStateChange::511 state:c0010 - 9 different flags/status combinations
168 - OnStatusChange - 9 different status/message combinations

Here is the time measure results:
Normal fennec:
START->TITLE - 4 sec
TITLE->PAINT - 6 sec

E10S fennec:
START->TITLE - 7 sec
TITLE->PAINT - 8 sec

E10S fennec - with dsabled "status/state/progress" messages:
START->TITLE - 4.3 sec
TITLE->PAINT - 7.2 sec

Oleg Romashin (:romaxa)

Reporter

Comment 3

•

14 years ago

I think we must simplify progress system, and send only valid (useful) messages, only if parent listening for that messages, and don't send same messages with the same values repeatedly.

Oleg Romashin (:romaxa)

Reporter

Comment 4

•

14 years ago

Also if I disable progress/state/status messages sending in TabChild fennec still works, and progress/start/stop messaging system working because of content.js script loaded on child process...

Question is do we still need this TabChild/Parent progress messaging system? is still in use?

Mark Finkle (:mfinkle) (use needinfo?)

Comment 5

•

14 years ago

(In reply to comment #4)
> Also if I disable progress/state/status messages sending in TabChild fennec
> still works, and progress/start/stop messaging system working because of
> content.js script loaded on child process...
> 
> Question is do we still need this TabChild/Parent progress messaging system? is
> still in use?

We actually want to stop using the webprogress message in content.js, if possible. We would ideally only use the TabParent/TabChild webprogress message system.

Mark Finkle (:mfinkle) (use needinfo?)

Comment 6

•

14 years ago

This information does provide some interesting clues as to the affect of sending messages (message load) on the speed of web page loading times.

Oleg Romashin (:romaxa)

Reporter

Comment 7

•

14 years ago

Then we should definitely simplify and reduce amount of messages between Parent/Child

because
Sending IPC 43 'Transferring data from www.hs.fi…'
            45 'Waiting for www.hs.fi…'
            54 'Waiting for hs.fi…' 
messages in 3 seconds is definitely wrong behavior.

Oleg Romashin (:romaxa)

Reporter

Comment 8

•

14 years ago

> 
> We actually want to stop using the webprogress message in content.js, if

content.js approach looks much flexible for me, because we can filter useless messages on content side, and send to UI via IPC-bus only useful status info.

Oleg Romashin (:romaxa)

Reporter

Comment 9

•

14 years ago

Attached file Here is the messages in original order — Details

Filtering also will not help here because 
it is possible to filter messages like this:
...
>Func:OnStateChange::511 state:10001, status:0
>Func:OnStateChange::511 state:10001, status:0
....

but it is not possible to filter these messages universally:
>Func:OnStatusChange::544 status:804b000a, msg:'Waiting for hs.fi…'
>Func:OnStatusChange::544 status:804b000a, msg:'Waiting for blogit.hs.fi…'
>Func:OnStatusChange::544 status:804b0006, msg:'Transferring data from blogit.hs.fi…'
>Func:OnStateChange::511 state:10004, status:0
>Func:OnProgressChange::529 cs:2561, ms:2561, ct:1159528, mt:1159528
>Func:OnStateChange::511 state:10010, status:0
>Func:OnStatusChange::544 status:804b000a, msg:'Waiting for hs.fi…'
>Func:OnStatusChange::544 status:804b000a, msg:'Waiting for blogit.hs.fi…'
>Func:OnStatusChange::544 status:804b0006, msg:'Transferring data from blogit.hs.fi…'
>Func:OnStateChange::511 state:10004, status:0

I think client approach is more optimal for this case...


Also we can create new browser webListener IPC interface, and send only specific messages. like "start" "progress 1 per sec" stop.

Benjamin Smedberg

Comment 10

•

14 years ago

These are async messages, right? 350 async messages shouldn't really show up as a factor. 350 sync messages sure would, though.

Mark Finkle (:mfinkle) (use needinfo?)

Comment 11

•

14 years ago

(In reply to comment #10)
> These are async messages, right? 350 async messages shouldn't really show up as
> a factor. 350 sync messages sure would, though.

The progress messages sent in the fennec code (chrome/content/bindings/browser.js) are all async

(In reply to comment #7)
> Sending IPC 43 'Transferring data from www.hs.fi…'
>             45 'Waiting for www.hs.fi…'
>             54 'Waiting for hs.fi…' 

We should not even send those messages, since Fennec does not display any status messages. We really need a way make the filter, sent via WebProgress.addProgressListener) work in the child.

Oleg Romashin (:romaxa)

Reporter

Comment 12

•

14 years ago

(In reply to comment #10)
> These are async messages, right? 350 async messages shouldn't really show up as
> a factor. 350 sync messages sure would, though.

we have mobile device, one processor, and amount of messages in IPC channel per second make noticeable perf hit on CPU/BUS e.t.c...

Benjamin Smedberg

Comment 13

•

14 years ago

We need to know what this means a lot more exactly: sending an async message should be really cheap, so that you can pass thousands per second with minimal overhead. The sequence of actions involved is:

1) content process main thread posts to the I/O event queue: one lock, thread continues immediately
2) context-switch to the I/O thread
3) I/O thread locks and pulls events off the queue, posts them to the IPC pipe. If there are multiple enqueued messages they can all be processed in one go

4) chrome process I/O thread unblocks, receives messages, posts them to the main-thread event queue (one lock)
5) chrome process main thread receives messages as part of normal event processing

If you schedule all four threads involved for every one of those messages, that's a lot of context switching. But that *shouldn't* be the case, and if it is we should figure out how to just fix it. If you're at all CPU-bound, multiple progress events can be batched at almost any of these junction points, so that you can process multiple events in one thread context. That's one of the many advantages of asynchronous processing.

Oleg Romashin (:romaxa)

Reporter

Comment 14

•

14 years ago

Probably problem here is not big amount of messages, but messages content...
I've made simple filter on OnStateChange 
+  if (mPrevStateFlags != aStateFlags) {
+    SendNotifyStateChange(aStateFlags, aStatus);
+    mPrevStateFlags = aStateFlags;
+  }

and with that filter and disabled StatusChange/ProgressChange - 7 sec ->  4.5 sec

If I enabled ProgressChange messages, then I have 4.5 sec -> 5.3 sec perf hit
If I enable statusChange messages (string content) then 5.3 -> 7 sec.

Mark Finkle (:mfinkle) (use needinfo?)

Comment 15

•

14 years ago

Added bug 581958 to help filter webprogress messages

Mark Finkle (:mfinkle) (use needinfo?)

Updated

•

14 years ago

Blocks: 581532

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 16

•

14 years ago

Attached patch Add an async "spam test" and make output more readable — Details — Splinter Review

One general note: no matter how fast our IPC code is, sending messages will never be free (obviously). We shouldn't be sending unnecessary or superfluous messages, just like we shouldn't be delivering unnecessary or superfluous observer notifications, running unnecessary or superfluous timers, etc.

To investigate perf here more, we need a few things. First, the attached patch, which gives us a rough idea of the best-case raw perf of our C++ IPC code. This test is best-case because it runs with a quiet event loop, no XPCOM. For reference, below are some numbers I get on my desktop machine (16 HW thread Xeon X5570 @ 2.93GHz, tons of RAM) with an opt build. I see rather high variance test-to-test, it's worth trying a few times; these numbers seem to be fairly typical. (Note that you need to build --enable-ipdl-tests to run this.)

$ ~/mozilla/mp-opt/dist/bin/run-mozilla.sh ~/mozilla/mp-opt/dist/bin/ipdlunittest TestLatency
TEST-INFO| running test | TestLatency
timing resolution: 9e-08 seconds
[snip]
TEST-PASS | TestLatency |
average #ping-pong/sec: 13245.5
average #ping5-pong5/sec: 7180.98
average #RPC call-answer/sec: 25095.7
average #spams/sec: 537804

The most relevant numbers here are "#RPC call-answer/sec", which tests making an RPC parent->child in a tight loop ('sync' would be about the same), and "#spams/sec", sending async messages parent->child in a tight loop. We're doing 25k/sec and 500k/sec respectively on my machine. I would expect these numbers to be about an order of magnitude lower on the N900.

Second, we want similar best-case numbers for messagemanager spam. Note that smaug has a basic test for this at [1]. The numbers will be strictly lower for this test than for #spams/sec above because of JS+JSON+XPCOM-event-loop overhead. On my machine, I repeatedly get around 1,000 messages in 90-100 ms, so ~10,000/sec.

Third, we need a more detailed profile of the bottlenecks. Which test setup to look at first depends on which numbers look most skewed compared to desktop. Oleg, can you help with this? We have a few options for profiling: something like Vlad's function timer setup for finding bottlenecks; zwol's (awesome) PerfMeasurement tool in bug 568863; and (obviously) "perf" or oprofile or something similar.

I know of a few existing dumb overheads in the IPC code. One is bug 552009; we use atomic refcounting even though we don't need it. The second isn't filed yet, but the problem is that in RPCChannels, we make a full copy of each Message that goes through so as to provide runtime reflection on the IPC message stack (we only need to copy the Message::Header, which is much much cheaper. There's also an unnecessary Message copy that happens from IO->main thread, and there may be others.) The last is that messagemanager JSON-stringifies its message object (apparently). Benjamn has a pure-IPDL alternative to JSON[2] that in addition to almost certainly being faster than toJSON(), is strictly more powerful: it can serialize self-referential objects.

[1] http://hg.mozilla.org/mozilla-central/file/d1183b4f57b1/dom/ipc/test.xul#l151
[2] http://hg.mozilla.org/mozilla-central/file/d1183b4f57b1/js/ipc/PObjectWrapper.ipdl#l52

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 17

•

14 years ago

Er oops, [2] was supposed to be http://hg.mozilla.org/mozilla-central/file/f9ce465b7276/js/jetpack/PJetpack.ipdl#l46 .

Martijn Wargers (dead)

Updated

•

14 years ago

Keywords: perf

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 18

•

14 years ago

Pushed spam test: http://hg.mozilla.org/mozilla-central/rev/01cd3b3fc201

Oleg Romashin (:romaxa)

Reporter

Comment 19

•

14 years ago

582840 - is fixing that code

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → DUPLICATE

Martijn Wargers (dead)

Updated

•

14 years ago

No longer blocks: 581532

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 20

•

14 years ago

(In reply to comment #16)
> $ ~/mozilla/mp-opt/dist/bin/run-mozilla.sh
> ~/mozilla/mp-opt/dist/bin/ipdlunittest  TestLatency
> TEST-INFO| running test | TestLatency
>   timing resolution: 9e-08 seconds
> [snip]
> TEST-PASS | TestLatency | 
>   average #ping-pong/sec:        13245.5
>   average #ping5-pong5/sec:      7180.98
>   average #RPC call-answer/sec:  25095.7
>   average #spams/sec:            537804
> 
> The most relevant numbers here are "#RPC call-answer/sec", which tests making
> an RPC parent->child in a tight loop ('sync' would be about the same), and
> "#spams/sec", sending async messages parent->child in a tight loop.  We're
> doing 25k/sec and 500k/sec respectively on my machine.  I would expect these
> numbers to be about an order of magnitude lower on the N900.
> 

FTR, here are some apparently typical results from this test on the N900 (copied by hand >:( )

$ LD_LIBRARY_PATH="/home/user/fennec" ./ipdlunittest TestLatency
TEST-INFO | running test | TestLatency
  timing resolution: 1e-09 seconds [YAH RIGHT!!!  TimeStamp_posix isn't dealing well with really slow devices that tell clock_getres() fibs]
[snip]
TEST-PASS | TestLatency |
  average #ping-pong/sec:        1121.82
  average #ping5-pong5/sec:      270.162
  average #RPC call-answer/sec:  1436.92
  average #spams/sec:            7557.47

More correct numbers, between startload and titleChange 14 years ago Oleg Romashin (:romaxa) 5.36 KB, text/plain		Details
Here is the messages in original order 14 years ago Oleg Romashin (:romaxa) 27.30 KB, text/plain		Details
Add an async "spam test" and make output more readable 14 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 6.72 KB, patch		Details \| Diff \| Splinter Review