Open Bug 926447 Opened 7 years ago Updated 5 years ago

sendAsyncMessage is expensive

Categories

(Core :: DOM: Content Processes, defect)

defect
Not set

Tracking

()

People

(Reporter: vingtetun, Unassigned)

References

Details

(Keywords: perf)

Attachments

(1 file)

I was looking at the time spent in the JS handler of a mozbrowser iframe and I noticed than most of the time is spent in sendAsyncMessage. Not sure if this is the code to clone the data or not but it would help a little bit startup time to make it faster.

The time spend in that code is a few ms each time.

Smaug can you suggest someone to help here?
Flags: needinfo?(bugs)
Is the call itself expensive irrespective of how the type of message sent? Thinker Li had opened bug 915733 to try and improve our IPC performance; there's not been any action there but it might be worth linking these two bugs.
Is there a testcase for this, something which can be run on desktop?
Would need to get some profile here.
Flags: needinfo?(21)
(In reply to Olli Pettay [:smaug] from comment #2)
> Is there a testcase for this, something which can be run on desktop?
> Would need to get some profile here.

Gabriele is much better at doing profile than me. Can you help here?
Flags: needinfo?(21) → needinfo?(gsvelto)
I'm happy to profile, if I had a testcase for desktop.
Flags: needinfo?(bugs)
(In reply to Olli Pettay [:smaug] from comment #4)
> I'm happy to profile, if I had a testcase for desktop.

The thing is that on desktop you won't probably notice the 3ms taken in sendAsyncMessage :/
That doesn't matter. If I have a test case to profile, the profiler tells what is taking most of the
time. Whether it is 30ms or 30ns doesn't matter.
Though running the slow thing several times helps profiling.
(In reply to Vivien Nicolas (:vingtetun) (:21) from comment #3)
> Gabriele is much better at doing profile than me. Can you help here?

Instead of using the profiler whose granularity might not be enough it might be worth doing some ad-hoc timing of sendAsyncMessage method at the Gecko level. I'll try to do that and report here with the timings (and the patch I will use for taking them).
I would still like to see a testcase for this.
(Note, I'm not using Gecko profiler but Zoom which gives better granularity, and on OSX one could
use Shark or whatever has replaced Shark in the recent OSX releases.)
I guess something like 
data:, function perftest() { for (var i = 0; i < 10000; ++i) { sendAsyncMessage("foo"); } }; content.setInterval(perftest, 50);
in the test-ipc.xul 
http://mxr.mozilla.org/mozilla-central/source/dom/ipc/test.xul
might work.
Or do I need some more complicated message?
Based on the profile:
- we should webidly message manager at some point, but not quite possible yet. Need to wait
  for window to be webidlfied. Webidlfication should make us spend less time in xpconnect.
- most time is still spend under DispatchAsyncMessage
  - creating and destroying nsString takes time. Need to optimize that. IPDL at least used to require
    nsStrings and didn't accept nsAString, but need to check if that is fixed these days.
  - 35+% is IPDL/IPC level stuff. Plenty of allocation and deallocation.
(In reply to Olli Pettay [:smaug] from comment #10)
> - most time is still spend under DispatchAsyncMessage
>   - creating and destroying nsString takes time. Need to optimize that. IPDL
> at least used to require
>     nsStrings and didn't accept nsAString, but need to check if that is
> fixed these days.

It's not.  Probably ought to be, and shouldn't be too awful to fix.

>   - 35+% is IPDL/IPC level stuff. Plenty of allocation and deallocation.

Are there any hotspots for this?
Message ctor is rather heavy, well Pickle under it. Allocation is just always slow.
Could we recycle Messages? or stack allocate?
ProcessLink::SendMessage seems to be also very malloc-happy.

So in general it looks like we should do some recycling, and not rely on malloc to give good performance.
(In reply to Olli Pettay [:smaug] from comment #10)
>   - 35+% is IPDL/IPC level stuff. Plenty of allocation and deallocation.

ProcessLink::SendMessage() is creating a runnable method for each invocation, that constructor in turns seems to be creating quite some stuff:

http://dxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/task.h#l333
Flags: needinfo?(gsvelto)
Currently there's no point for IPDL to use nsAString& instead of nsString& because it immediately saves the value into a nsString - string sharing should make this as cheap as we can get at the moment.

I'm interested in the allocation situation, though: other than a single Message I didn't think we'd have much allocation involved for this stuff.
Blocks: 927078
(In reply to Olli Pettay [:smaug] from comment #9)
> I guess something like 
> data:, function perftest() { for (var i = 0; i < 10000; ++i) {
> sendAsyncMessage("foo"); } }; content.setInterval(perftest, 50);
> in the test-ipc.xul 
> http://mxr.mozilla.org/mozilla-central/source/dom/ipc/test.xul
> might work.
> Or do I need some more complicated message?

Yeah this sounds good enough. We are not doing anything fancy, just sending stuff like sendAsyncMessage('browser-element-api:call', {'msg': 'loadstart'});
sendAsyncMessage('browser-element-api:call', {'msg': 'loadstart', 'backgroundColor': '#000000'});

The difference with your testcase is that there is always some data in our case.
OK, I'll profile with a case when there is some data.
data:, function perftest() { for (var i = 0; i < 10000; ++i) { sendAsyncMessage("foo", {'msg': 'loadstart'}); } }; content.setInterval(perftest, 50);


And with that the profile is quite different, since structured clone serialization takes as much
time as dispatching the message. Someone from jseng team should look at that.
Nicolas, do you know anything about that?
Flags: needinfo?(nicolas.b.pierron)
Where is setAsyncMessage defined?  I cannot find it anywhere in the code base.

(In reply to Olli Pettay [:smaug] from comment #9)
> data:, function perftest() { for (var i = 0; i < 10000; ++i) {
> sendAsyncMessage("foo"); } }; content.setInterval(perftest, 50);
> in the test-ipc.xul 
> http://mxr.mozilla.org/mozilla-central/source/dom/ipc/test.xul
> might work.

Note that xul code is not running with the compileAndGo flag in the JS Engine, which means that it will not be compiled with IonMonkey and will only remain in Baseline.  But I don't think this would be the issue here.

(In reply to Olli Pettay [:smaug] from comment #18)
> Created attachment 817466 [details]
> jsstructuredclone part of the profile

Looking at the profile and at the description of sendAsyncMessage[1], I guess this might be related to what Yoric reported during the JS work week (see Bug 873293)

Jason, do you have any idea what we could do better?  I guess that we can move the Rooted outside the loop of JSStructuredCloneWriter::write, as we did in the interpreter.  The Rooted class are adding them-self into a linked list of mark-able memory area, moving them outside the loop will reduce the cost.  Also, we might be able to do a switch on the type tag in JSStructuredCloneWriter::startWrite, instead of doing tons of else-if.

Vivien, are the message always identical, is that always the same object or is it a new similar one all the time?

[1] https://wiki.mozilla.org/Content_Process_Event_Handlers
Flags: needinfo?(nicolas.b.pierron) → needinfo?(jorendorff)
I don't think we've made any serious effort to profile or optimize the structured cloning code.

One part that is definitely unnecessarily slow is how we serialize arrays. We treat them like objects and make an array in memory of all their properties.

Another possibly slow part is the JSObject::getGeneric call in JSStructuredCloneWriter::write. We could probably optimize that away in most cases.

Anyone could hack on this. Doesn't take a JS engine hacker. Knowing how to work a profiler is more relevant.
Flags: needinfo?(jorendorff)
Keywords: perf
fwiw, I timed JSStructuredCloneWriter::write() during app startup and it's a very low overhead compared to everything else: at most 3ms overall.
Component: IPC → DOM: Content Processes
You need to log in before you can comment on or make changes to this bug.