Closed
Bug 581930
Opened 14 years ago
Closed 14 years ago
[E10S] IPC messages overhead in TabChild/Parent progress listener
Categories
(Core :: IPC, defect)
Tracking
()
RESOLVED
DUPLICATE
of bug 582840
People
(Reporter: romaxa, Unassigned)
References
()
Details
(Keywords: perf)
Attachments
(3 files)
5.36 KB,
text/plain
|
Details | |
27.30 KB,
text/plain
|
Details | |
6.72 KB,
patch
|
Details | Diff | Splinter Review |
I've found that on mobile device we are loading some pages ~2x slower.
I was testing on HS.fi, and found 2 problems:
1) Time between LoadURL call and Title signal ~2x more on e10s fennec comparing to normal Fennec. most of time wasted in sending Progress messages from Child to Parent.
For hs.fi between start loading and title changed approximately we have:
>TabChild:OnStatusChange - 350 calls
>TabChild:OnStateChange - 356 calls
>TabChild:OnProgressChange - 118 calls
I think we should do something about it
Reporter | ||
Comment 1•14 years ago
|
||
Second problem is plugins, time between title-change and first paint increased (2 seconds more) because of plugins initialization and loading... I think it would be nice to delay plugins loading, and paint at first initial page content, and then start plugins initialization (process execution e.t.c.)
Reporter | ||
Comment 2•14 years ago
|
||
56 - Progress 237 - OnStateChange::511 state:c0010 - 9 different flags/status combinations 168 - OnStatusChange - 9 different status/message combinations Here is the time measure results: Normal fennec: START->TITLE - 4 sec TITLE->PAINT - 6 sec E10S fennec: START->TITLE - 7 sec TITLE->PAINT - 8 sec E10S fennec - with dsabled "status/state/progress" messages: START->TITLE - 4.3 sec TITLE->PAINT - 7.2 sec
Reporter | ||
Comment 3•14 years ago
|
||
I think we must simplify progress system, and send only valid (useful) messages, only if parent listening for that messages, and don't send same messages with the same values repeatedly.
Reporter | ||
Comment 4•14 years ago
|
||
Also if I disable progress/state/status messages sending in TabChild fennec still works, and progress/start/stop messaging system working because of content.js script loaded on child process... Question is do we still need this TabChild/Parent progress messaging system? is still in use?
Comment 5•14 years ago
|
||
(In reply to comment #4) > Also if I disable progress/state/status messages sending in TabChild fennec > still works, and progress/start/stop messaging system working because of > content.js script loaded on child process... > > Question is do we still need this TabChild/Parent progress messaging system? is > still in use? We actually want to stop using the webprogress message in content.js, if possible. We would ideally only use the TabParent/TabChild webprogress message system.
Comment 6•14 years ago
|
||
This information does provide some interesting clues as to the affect of sending messages (message load) on the speed of web page loading times.
Reporter | ||
Comment 7•14 years ago
|
||
Then we should definitely simplify and reduce amount of messages between Parent/Child because Sending IPC 43 'Transferring data from www.hs.fi…' 45 'Waiting for www.hs.fi…' 54 'Waiting for hs.fi…' messages in 3 seconds is definitely wrong behavior.
Reporter | ||
Comment 8•14 years ago
|
||
>
> We actually want to stop using the webprogress message in content.js, if
content.js approach looks much flexible for me, because we can filter useless messages on content side, and send to UI via IPC-bus only useful status info.
Reporter | ||
Comment 9•14 years ago
|
||
Filtering also will not help here because it is possible to filter messages like this: ... >Func:OnStateChange::511 state:10001, status:0 >Func:OnStateChange::511 state:10001, status:0 .... but it is not possible to filter these messages universally: >Func:OnStatusChange::544 status:804b000a, msg:'Waiting for hs.fi…' >Func:OnStatusChange::544 status:804b000a, msg:'Waiting for blogit.hs.fi…' >Func:OnStatusChange::544 status:804b0006, msg:'Transferring data from blogit.hs.fi…' >Func:OnStateChange::511 state:10004, status:0 >Func:OnProgressChange::529 cs:2561, ms:2561, ct:1159528, mt:1159528 >Func:OnStateChange::511 state:10010, status:0 >Func:OnStatusChange::544 status:804b000a, msg:'Waiting for hs.fi…' >Func:OnStatusChange::544 status:804b000a, msg:'Waiting for blogit.hs.fi…' >Func:OnStatusChange::544 status:804b0006, msg:'Transferring data from blogit.hs.fi…' >Func:OnStateChange::511 state:10004, status:0 I think client approach is more optimal for this case... Also we can create new browser webListener IPC interface, and send only specific messages. like "start" "progress 1 per sec" stop.
Comment 10•14 years ago
|
||
These are async messages, right? 350 async messages shouldn't really show up as a factor. 350 sync messages sure would, though.
Comment 11•14 years ago
|
||
(In reply to comment #10) > These are async messages, right? 350 async messages shouldn't really show up as > a factor. 350 sync messages sure would, though. The progress messages sent in the fennec code (chrome/content/bindings/browser.js) are all async (In reply to comment #7) > Sending IPC 43 'Transferring data from www.hs.fi…' > 45 'Waiting for www.hs.fi…' > 54 'Waiting for hs.fi…' We should not even send those messages, since Fennec does not display any status messages. We really need a way make the filter, sent via WebProgress.addProgressListener) work in the child.
Reporter | ||
Comment 12•14 years ago
|
||
(In reply to comment #10) > These are async messages, right? 350 async messages shouldn't really show up as > a factor. 350 sync messages sure would, though. we have mobile device, one processor, and amount of messages in IPC channel per second make noticeable perf hit on CPU/BUS e.t.c...
Comment 13•14 years ago
|
||
We need to know what this means a lot more exactly: sending an async message should be really cheap, so that you can pass thousands per second with minimal overhead. The sequence of actions involved is: 1) content process main thread posts to the I/O event queue: one lock, thread continues immediately 2) context-switch to the I/O thread 3) I/O thread locks and pulls events off the queue, posts them to the IPC pipe. If there are multiple enqueued messages they can all be processed in one go 4) chrome process I/O thread unblocks, receives messages, posts them to the main-thread event queue (one lock) 5) chrome process main thread receives messages as part of normal event processing If you schedule all four threads involved for every one of those messages, that's a lot of context switching. But that *shouldn't* be the case, and if it is we should figure out how to just fix it. If you're at all CPU-bound, multiple progress events can be batched at almost any of these junction points, so that you can process multiple events in one thread context. That's one of the many advantages of asynchronous processing.
Reporter | ||
Comment 14•14 years ago
|
||
Probably problem here is not big amount of messages, but messages content... I've made simple filter on OnStateChange + if (mPrevStateFlags != aStateFlags) { + SendNotifyStateChange(aStateFlags, aStatus); + mPrevStateFlags = aStateFlags; + } and with that filter and disabled StatusChange/ProgressChange - 7 sec -> 4.5 sec If I enabled ProgressChange messages, then I have 4.5 sec -> 5.3 sec perf hit If I enable statusChange messages (string content) then 5.3 -> 7 sec.
Comment 15•14 years ago
|
||
Added bug 581958 to help filter webprogress messages
One general note: no matter how fast our IPC code is, sending messages will never be free (obviously). We shouldn't be sending unnecessary or superfluous messages, just like we shouldn't be delivering unnecessary or superfluous observer notifications, running unnecessary or superfluous timers, etc. To investigate perf here more, we need a few things. First, the attached patch, which gives us a rough idea of the best-case raw perf of our C++ IPC code. This test is best-case because it runs with a quiet event loop, no XPCOM. For reference, below are some numbers I get on my desktop machine (16 HW thread Xeon X5570 @ 2.93GHz, tons of RAM) with an opt build. I see rather high variance test-to-test, it's worth trying a few times; these numbers seem to be fairly typical. (Note that you need to build --enable-ipdl-tests to run this.) $ ~/mozilla/mp-opt/dist/bin/run-mozilla.sh ~/mozilla/mp-opt/dist/bin/ipdlunittest TestLatency TEST-INFO| running test | TestLatency timing resolution: 9e-08 seconds [snip] TEST-PASS | TestLatency | average #ping-pong/sec: 13245.5 average #ping5-pong5/sec: 7180.98 average #RPC call-answer/sec: 25095.7 average #spams/sec: 537804 The most relevant numbers here are "#RPC call-answer/sec", which tests making an RPC parent->child in a tight loop ('sync' would be about the same), and "#spams/sec", sending async messages parent->child in a tight loop. We're doing 25k/sec and 500k/sec respectively on my machine. I would expect these numbers to be about an order of magnitude lower on the N900. Second, we want similar best-case numbers for messagemanager spam. Note that smaug has a basic test for this at [1]. The numbers will be strictly lower for this test than for #spams/sec above because of JS+JSON+XPCOM-event-loop overhead. On my machine, I repeatedly get around 1,000 messages in 90-100 ms, so ~10,000/sec. Third, we need a more detailed profile of the bottlenecks. Which test setup to look at first depends on which numbers look most skewed compared to desktop. Oleg, can you help with this? We have a few options for profiling: something like Vlad's function timer setup for finding bottlenecks; zwol's (awesome) PerfMeasurement tool in bug 568863; and (obviously) "perf" or oprofile or something similar. I know of a few existing dumb overheads in the IPC code. One is bug 552009; we use atomic refcounting even though we don't need it. The second isn't filed yet, but the problem is that in RPCChannels, we make a full copy of each Message that goes through so as to provide runtime reflection on the IPC message stack (we only need to copy the Message::Header, which is much much cheaper. There's also an unnecessary Message copy that happens from IO->main thread, and there may be others.) The last is that messagemanager JSON-stringifies its message object (apparently). Benjamn has a pure-IPDL alternative to JSON[2] that in addition to almost certainly being faster than toJSON(), is strictly more powerful: it can serialize self-referential objects. [1] http://hg.mozilla.org/mozilla-central/file/d1183b4f57b1/dom/ipc/test.xul#l151 [2] http://hg.mozilla.org/mozilla-central/file/d1183b4f57b1/js/ipc/PObjectWrapper.ipdl#l52
Er oops, [2] was supposed to be http://hg.mozilla.org/mozilla-central/file/f9ce465b7276/js/jetpack/PJetpack.ipdl#l46 .
Pushed spam test: http://hg.mozilla.org/mozilla-central/rev/01cd3b3fc201
Reporter | ||
Comment 19•14 years ago
|
||
582840 - is fixing that code
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → DUPLICATE
(In reply to comment #16) > $ ~/mozilla/mp-opt/dist/bin/run-mozilla.sh > ~/mozilla/mp-opt/dist/bin/ipdlunittest TestLatency > TEST-INFO| running test | TestLatency > timing resolution: 9e-08 seconds > [snip] > TEST-PASS | TestLatency | > average #ping-pong/sec: 13245.5 > average #ping5-pong5/sec: 7180.98 > average #RPC call-answer/sec: 25095.7 > average #spams/sec: 537804 > > The most relevant numbers here are "#RPC call-answer/sec", which tests making > an RPC parent->child in a tight loop ('sync' would be about the same), and > "#spams/sec", sending async messages parent->child in a tight loop. We're > doing 25k/sec and 500k/sec respectively on my machine. I would expect these > numbers to be about an order of magnitude lower on the N900. > FTR, here are some apparently typical results from this test on the N900 (copied by hand >:( ) $ LD_LIBRARY_PATH="/home/user/fennec" ./ipdlunittest TestLatency TEST-INFO | running test | TestLatency timing resolution: 1e-09 seconds [YAH RIGHT!!! TimeStamp_posix isn't dealing well with really slow devices that tell clock_getres() fibs] [snip] TEST-PASS | TestLatency | average #ping-pong/sec: 1121.82 average #ping5-pong5/sec: 270.162 average #RPC call-answer/sec: 1436.92 average #spams/sec: 7557.47
You need to log in
before you can comment on or make changes to this bug.
Description
•