Last Comment Bug 853864 - Expose IPC message information to profiler
: Expose IPC message information to profiler
Status: ASSIGNED
p=8 [qa?]
:
Product: Core
Classification: Components
Component: Plug-ins (show other bugs)
: Trunk
: All All
: P2 normal with 2 votes (vote)
: mozilla23
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
Depends on:
Blocks: 890735 921301
  Show dependency treegraph
 
Reported: 2013-03-22 08:00 PDT by Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3]
Modified: 2015-11-01 13:12 PST (History)
15 users (show)
mmucci: firefox‑backlog+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Part 1: add IPC message tracing (65.76 KB, patch)
2013-06-07 16:48 PDT, Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3]
no flags Details | Diff | Review
Part 1: add IPC message tracing, v2 (58.93 KB, patch)
2013-06-10 07:10 PDT, Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3]
no flags Details | Diff | Review
Part 1: add IPC message tracing, v3 (60.11 KB, patch)
2013-06-13 15:10 PDT, Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3]
no flags Details | Diff | Review
Part 2: Test IPC message tracing. (6.31 KB, patch)
2013-06-13 15:13 PDT, Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3]
benjamin: feedback+
Details | Diff | Review
Perf test (7.49 KB, patch)
2013-07-01 02:28 PDT, Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3]
no flags Details | Diff | Review
Part 1: add IPC message tracing, v4 (59.04 KB, patch)
2013-07-01 02:41 PDT, Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3]
no flags Details | Diff | Review
Part 2: Test IPC message tracing, v2 (5.62 KB, patch)
2013-07-01 02:51 PDT, Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3]
no flags Details | Diff | Review
Part 1 rebased (59.25 KB, patch)
2013-07-07 12:28 PDT, Benoit Girard (:BenWa)
no flags Details | Diff | Review
Perf test for plugin streaming scenario (9.87 KB, patch)
2013-07-18 10:37 PDT, Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3]
no flags Details | Diff | Review
Part 1: add IPC message tracing, v5 (59.79 KB, patch)
2013-07-18 13:50 PDT, Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3]
bent.mozilla: review-
Details | Diff | Review
Add mfbt/FunctionName.h (19.35 KB, patch)
2013-09-03 09:06 PDT, Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3]
jwalden+bmo: review+
Details | Diff | Review
Perf test v2, child spams async messages (9.00 KB, patch)
2013-09-10 11:24 PDT, Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3]
no flags Details | Diff | Review
Perf test v3, updated to ipdl changes (8.63 KB, patch)
2013-12-04 08:26 PST, Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3]
no flags Details | Diff | Review
Add IPC message tracing, v6 (70.27 KB, patch)
2013-12-04 08:33 PST, Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3]
no flags Details | Diff | Review
WIP: enable tracing via pref, requiring restart (72.70 KB, patch)
2014-01-10 14:30 PST, Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3]
no flags Details | Diff | Review
WIP: enable tracing via pref, requiring restart, v2 (73.41 KB, patch)
2014-01-14 11:06 PST, Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3]
no flags Details | Diff | Review

Description Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-03-22 08:00:17 PDT
To diagnose possible IPC

For start & end we want observable events: 
 ipdl-tracing-start - activates the tracing
 ipdl-tracing-capture - collect the data in JSON; channels tell the caller to wait for them to complete the data collection
 ipdl-tracing-capture-completed - thrown by the channels when they completed the data collection; contains the data
 ipdl-tracing-end - stops the tracing

We want to trace the last N messages in two circular in-memory buffers per channel/link (parent and child process).
Required field information:
 * Channel/link id
 * message id / sequence number
 * message delivery type: rpc, sync, async
 * probe type: post-task, send, recv, dequeue, complete, (optional return)
    ... instead of return we could add info to map the original sync message to it's reply message
 * timestamp

 Process A --- PT --- S --------------------
                       \
                        \
 Process B ------------- R --- Dq --- Comp -

This means we need to have a protocol extension to 
a) tell the child processes to start/stop the data collection (async)
b) tell the child processes to send their buffer to the parent
c) for the child processes to perform sending their buffer to their parent (async)
Once the parent side of the process received the buffer from the child, it merges it with it's own buffer and sends it off via "ipdl-tracing-capture-completed".
Comment 1 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-03-22 08:01:57 PDT
Benoit, Benjamin, does this look about right to you?
Comment 2 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-03-22 08:03:58 PDT
(In reply to Georg Fritzsche [:gfritzsche] [at Performance work-week] from comment #0)
> To diagnose possible IPC

... to diagnose possible IPC problems like message starving, congestion, ... we want to have data on the actual times it takes for messages to work their way through our IPC infrastructure and to get processed.
Comment 3 Benoit Girard (:BenWa) 2013-03-22 08:29:15 PDT
(In reply to Georg Fritzsche [:gfritzsche] [at Performance work-week] from comment #0)
>     ... instead of return we could add info to map the original sync message
> to it's reply message

Yes that sounds better in fact.

(In reply to Georg Fritzsche [:gfritzsche] [at Performance work-week] from comment #0)
> This means we need to have a protocol extension to 

benjamin suggests that we create a protocol like crash reporter that is managed everywhere instead of having to extend all the top level actors.
Comment 4 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-06-07 16:48:43 PDT
Created attachment 760012 [details] [diff] [review]
Part 1: add IPC message tracing

This is mainly IPC specific. bent will delegate if something is better reviewed by someone else.

This adds IPC message tracing in our IPC glue, with 2 ways to use it:
* a NSPR_LOG_MODULE MessageTrace, so this can be used without additional tools baked in
* the previously discussed, observer-based message-collection/-gathering mechanism

I'm pretty sure that the naming and placement is not entirely ideal yet, but i went with what seemed like it matched the style of already existing parts.

Still missing: tests - i'm not sure yet how we can best cover this test-wise.
Comment 5 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-06-07 17:00:10 PDT
BenWa, for the "ipdl-tracing-capture" notification notifying the caller to wait for them to send them data... how did you intend this to work? Something via the subject or data arg?

For sending the data i ended up with this JSON format:
{ 
  "channel": <channel id>,
  "data": [
    "seq_no": <sequence number>,
    "msg_type": <number: async=0, sync=1, rpc=2>,
    "probe_type": <number: PostTask=0, Send=1, Receive=2, Dequeue=3, Complete=4>,
    "is_reply": <bool>,
    "timestamp": <number>,
  ],
}

... any objections on this?
Comment 6 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-06-10 07:10:22 PDT
Created attachment 760384 [details] [diff] [review]
Part 1: add IPC message tracing, v2

Removed some code duplication.
Comment 7 Benoit Girard (:BenWa) 2013-06-10 09:14:02 PDT
(In reply to Georg Fritzsche [:gfritzsche] from comment #5)
> BenWa, for the "ipdl-tracing-capture" notification notifying the caller to
> wait for them to send them data... how did you intend this to work?
> Something via the subject or data arg?
> 
> For sending the data i ended up with this JSON format:
> { 
>   "channel": <channel id>,
>   "data": [
>     "seq_no": <sequence number>,
>     "msg_type": <number: async=0, sync=1, rpc=2>,
>     "probe_type": <number: PostTask=0, Send=1, Receive=2, Dequeue=3,
> Complete=4>,
>     "is_reply": <bool>,
>     "timestamp": <number>,
>   ],
> }
> 
> ... any objections on this?

Looks good. In my opinion msg_type should be a string rather then a number.
Comment 8 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-06-13 15:10:33 PDT
Created attachment 762337 [details] [diff] [review]
Part 1: add IPC message tracing, v3

* Fixed some things that came up via the test
* Incorporated Benoits request
Comment 9 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-06-13 15:13:31 PDT
Created attachment 762341 [details] [diff] [review]
Part 2: Test IPC message tracing.

Adds some test-coverage for the notification-driven part of the message tracing.
Comment 10 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-06-13 15:14:12 PDT
And a try run for the current versions:
http://hg.mozilla.org/mozilla-central/rev/6852e7c47edf
Comment 11 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-06-13 15:19:49 PDT
Comment on attachment 762341 [details] [diff] [review]
Part 2: Test IPC message tracing.

Review of attachment 762341 [details] [diff] [review]:
-----------------------------------------------------------------

Benjamin, this re-enables the dom/ipc/tests for Cocoa, which you turned off in [1].
Bug 581335 didn't contain more details what to watch out for and it seems fine on try.
Do you have any objections against reactivating this? Am i waking some sleeping dragons i just haven't noticed yet?

[1] http://hg.mozilla.org/mozilla-central/rev/6852e7c47edf
Comment 12 Benjamin Smedberg [:bsmedberg] 2013-06-13 15:31:42 PDT
Comment on attachment 762341 [details] [diff] [review]
Part 2: Test IPC message tracing.

At the time, there was no cocoa backend for DOM-IPC. If there is a cocoa backend now, go for it.
Comment 13 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-06-13 15:43:03 PDT
Minor warning-as-error-bustage on try, new run:
https://tbpl.mozilla.org/?tree=Try&rev=f8b87be67828
Comment 14 Ben Turner (not reading bugmail, use the needinfo flag!) 2013-06-17 19:32:53 PDT
Comment on attachment 762337 [details] [diff] [review]
Part 1: add IPC message tracing, v3

Review of attachment 762337 [details] [diff] [review]:
-----------------------------------------------------------------

Overall this patch looks like it's heading in the right direction! Sorry it took a while to review this first round. I'll be much faster the next time now that I understand what this is doing.

Buried in here are a few big concerns, specifically around simplifying the protocol and worry over threading issues and perf in AsyncChannel.

General style nit stuff that I didn't bother marking:

1. Functions in dom/ipc should have args named "aFoo", not "foo".
2. All your virtual functions should be marked as virtual and have MOZ_OVERRIDE on them.

::: dom/ipc/ContentParent.cpp
@@ +1119,5 @@
>          unused << SendAppInfo(version, buildID);
>      }
> +
> +    if (!IPCMessageTracerParent::CreateIPCMessageTracer(this)) {
> +      NS_ERROR("failed to create ipc message tracer");

This can't be asserted.

@@ +2606,5 @@
>  
> +IPCMessageTracerParent*
> +ContentParent::MessageTracer()
> +{
> +    return static_cast<IPCMessageTracerParent*>(ManagedPIPCMessageTracerParent()[0]);

This is not guaranteed to have anything in it, you must check the array length before indexing into it.

::: dom/ipc/ContentParent.h
@@ +412,5 @@
>      virtual bool RecvSystemMessageHandled() MOZ_OVERRIDE;
>  
>      virtual void ProcessingError(Result what) MOZ_OVERRIDE;
>  
> +    IPCMessageTracerParent* MessageTracer();

Anything that could return null should have "Get" prepended by convention.

::: dom/ipc/IPCMessageTracerChild.cpp
@@ +1,5 @@
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +

Nit: extra newline

::: dom/ipc/IPCMessageTracerChild.h
@@ +13,5 @@
> +class IPCMessageTracerChild : public PIPCMessageTracerChild
> +{
> +public:
> +  IPCMessageTracerChild();
> +  ~IPCMessageTracerChild();

This should only be created/destroyed from ContentChild/PluginModuleChild, so I prefer forward-declaring and then adding those as friend classes and making the constructor/destructor private. Plus the destructor should be marked virtual.

Same goes for the parent version.

::: dom/ipc/IPCMessageTracerParent.cpp
@@ +1,5 @@
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +

Nit: extra newline

@@ +44,5 @@
> +  return "?";
> +}
> +
> +static nsString
> +MessageInfoToJson(const MessageInfo& msg)

Please rework this as:

  static void
  AppendMessageInfoToJson(const MessageInfo& msg, nsString& json)
  {
    json.AppendPrintf("...", ...);
  }

Or better yet just inline it into AppendMessageInfos?

@@ +57,5 @@
> +
> +static void
> +AppendMessageInfos(nsString& json, const InfallibleTArray<MessageInfo>& data)
> +{
> +  for (size_t i=0; i<data.Length(); ++i) {

nsTArray uses uint32_t for iteration.

@@ +66,5 @@
> +  }
> +}
> +
> +bool
> +IPCMessageTracerParent::RecvDataCollected(const InfallibleTArray<MessageInfo>& childData)

In here you're not doing anything to handle the case where the circular buffer wrapped around (oldest MessageInfo might be somewhere in the middle of the array). Does the profiler frontend handle this somehow? Or should you be putting them in order here?

@@ +74,5 @@
> +    NS_ERROR("no observer service");
> +    return true;
> +  }
> +
> +  nsTArray<MessageInfo> parentData;

Nit: Move this down a line to right before you use it.

@@ +79,5 @@
> +  int32_t channelId = GetIPCChannel()->GetTraceChannelId();
> +  GetIPCChannel()->GetMessageTracingData(parentData);
> +
> +  nsString json;
> +  json.Adopt(nsTextFormatter::smprintf(

Please use AppendPrintf() instead.

@@ +82,5 @@
> +  nsString json;
> +  json.Adopt(nsTextFormatter::smprintf(
> +    NS_LITERAL_STRING("{\"channel\": %d, \"data\": [").get(), channelId));
> +
> +  AppendMessageInfos(json, parentData);

Please Clear() and Compact() after you use this to bring peak memory down

@@ +86,5 @@
> +  AppendMessageInfos(json, parentData);
> +  json.Append(',');
> +  AppendMessageInfos(json, childData);
> +
> +  json.Append(NS_LITERAL_STRING("]}"));

Please use AppendLiteral() here.

@@ +93,5 @@
> +  return true;
> +}
> +
> +void
> +IPCMessageTracerParent::StartTracing()

Assert that you're on the main thread here and in Stop

Also, up to this point in the file you've been using 2-space indent. Now you have 4. Need consistency.

@@ +95,5 @@
> +
> +void
> +IPCMessageTracerParent::StartTracing()
> +{
> +    sMessageTracingEnabled = true;

I also think you should enforce a single Start/Stop pair here.

@@ +97,5 @@
> +IPCMessageTracerParent::StartTracing()
> +{
> +    sMessageTracingEnabled = true;
> +    if (!SendStartTracing()) {
> +        NS_ERROR("failed to send start tracing to child");

You can't assert this, the child could have crashed and the parent must always handle it. Same in Stop and CollectTracingData.

@@ +99,5 @@
> +    sMessageTracingEnabled = true;
> +    if (!SendStartTracing()) {
> +        NS_ERROR("failed to send start tracing to child");
> +    }
> +    GetIPCChannel()->StartMessageTracing();

I think you should change the order here, the child could send several messages between the time you call SendStartTracing and StartMessageTracing.

@@ +116,5 @@
> +void
> +IPCMessageTracerParent::CollectTracingData()
> +{
> +    if (!sMessageTracingEnabled) {
> +        return;

You should at least warn if this happens.

@@ +125,5 @@
> +    }
> +}
> +
> +class MessageTracingObserver MOZ_FINAL : public nsIObserver,
> +                                         public nsSupportsWeakReference

It doesn't look like you're using the weak reference at all, nor do you really need to, so let's remove this.

@@ +131,5 @@
> +public:
> +    NS_DECL_ISUPPORTS
> +    NS_DECL_NSIOBSERVER
> +
> +    explicit MessageTracingObserver(IPCMessageTracerParent* parent)

Nit: The explicit thing isn't really needed here.

@@ +133,5 @@
> +    NS_DECL_NSIOBSERVER
> +
> +    explicit MessageTracingObserver(IPCMessageTracerParent* parent)
> +      : mParent(parent)
> +    {}

Nit: Assert that parent is non-null.

@@ +144,5 @@
> +
> +NS_IMETHODIMP
> +MessageTracingObserver::Observe(nsISupports *aSubject,
> +                                const char *aTopic,
> +                                const PRUnichar *aData)

Nit: * on the left here.

@@ +147,5 @@
> +                                const char *aTopic,
> +                                const PRUnichar *aData)
> +{
> +      if (nsCRT::strcmp(aTopic, kMessageTracingStartObserverTopic) == 0) {
> +            mParent->StartTracing();

Nit: too much indent in this function.

::: dom/ipc/IPCMessageTracerParent.h
@@ +5,5 @@
> +#ifndef mozilla_dom_IPCMessageTracerParent_h
> +#define mozilla_dom_IPCMessageTracerParent_h
> +
> +#include "mozilla/dom/PIPCMessageTracerParent.h"
> +#include "nsIObserver.h"

Nit: Just forward-declare and move the include to the cpp

@@ +13,5 @@
> +
> +extern const char* const kMessageTracingStartObserverTopic;
> +extern const char* const kMessageTracingStopObserverTopic;
> +extern const char* const kMessageTracingCaptureObserverTopic;
> +extern const char* const kMessageTracingCaptureCompletedObserverTopic;

As far as I can tell these are only ever used in the cpp file so there's no need to expose them here.

@@ +33,5 @@
> +  CreateIPCMessageTracer(Toplevel* actor);
> +
> +protected:
> +  void Initialize();
> +  void Shutdown();

These two methods aren't really needed, since they can't fail they could just be inlined into the constructor/destructor.

@@ +41,5 @@
> +};
> +
> +template<class Toplevel>
> +/* static */ bool
> +IPCMessageTracerParent::CreateIPCMessageTracer(Toplevel* actor)

If you make the protocol changes I suggested then this can go away entirely.

@@ +43,5 @@
> +template<class Toplevel>
> +/* static */ bool
> +IPCMessageTracerParent::CreateIPCMessageTracer(Toplevel* actor)
> +{
> +  PIPCMessageTracerParent* pp =

Nit: s/pp/protocolActor/

@@ +46,5 @@
> +{
> +  PIPCMessageTracerParent* pp =
> +      actor->CallPIPCMessageTracerConstructor();
> +  if (!pp) {
> +    NS_ERROR("Error creating ipc message tracer actor");

This will happen if a child crashes so you can't assert this.

@@ +50,5 @@
> +    NS_ERROR("Error creating ipc message tracer actor");
> +    return false;
> +  }
> +
> +  IPCMessageTracerParent* p = static_cast<IPCMessageTracerParent*>(pp);

Nit: s/p/actor/

::: dom/ipc/MessageInfo.h
@@ +14,5 @@
> +  MessageType_Rpc,
> +  MessageType_Count,
> +};
> +
> +enum ProbeType {

These enum names aren't very helpful. I'm sympathetic to keeping them short, but I think we can do a little better:

  ProbeType_DispatchToIOThread,
  ProbeType_SendOnIOThread,
  ProbeType_ReceiveOnIOThread,
  ProbeType_BeginOnMessageReceived,
  ProbeType_EndOnMessageReceived

How do those sound?

I'm actually curious why you want to collect data for the last one (ProbeType_Complete/ProbeType_EndOnMessageReceived). Does that tell you anything that the already available profiler data can't?

@@ +32,5 @@
> +  bool is_reply;
> +
> +  MessageInfo()
> +    : timestamp(0),
> +      seq_no(0)

Nit: These are out of order... but you're not initializing everything either.

@@ +54,5 @@
> +namespace IPC {
> +
> +template <>
> +struct ParamTraits<mozilla::dom::MessageType> :
> +  public EnumSerializer<mozilla::dom::MessageType, mozilla::dom::MessageType_Async, mozilla::dom::MessageType_Count> {};

Nit: Wrap this (and the other below) to under 80 chars per line.

::: dom/ipc/PContent.ipdl
@@ +361,5 @@
>      NotifyProcessPriorityChanged(ProcessPriority priority);
>      MinimizeMemoryUsage();
>      CancelMinimizeMemoryUsage();
>  
> +    rpc PIPCMessageTracer();

I don't understand why this needs to be RPC. Async (the default) should be just fine, right?

::: dom/ipc/PIPCMessageTracer.ipdl
@@ +7,5 @@
> +
> +include "mozilla/dom/MessageInfo.h";
> +
> +using mozilla::dom::MessageType;
> +using mozilla::dom::ProbeType;

Are these two actually required? I think IPDL only cares about the types you use in the message declarations...

@@ +13,5 @@
> +
> +namespace mozilla {
> +namespace dom {
> +
> +sync protocol PIPCMessageTracer {

I think you can simplify this protocol a lot:

  protocol PIPCMessageTracer
  {
    manager PContent or PPluginModule;
  child:
    StopTracing();
  parent:
    __delete__(MessageInfo[] info);
  };

Then when you send the PIPCMessageTracer() constructor message (from PContent or PPluginModule) tracing starts immediately. When you want to stop the parent sends the StopTracing() message and the child responds with the destructor message including the trace data. Sound ok?

Each trace would require a separate actor, but that doesn't immediately sound bad to me.

> sync protocol PIPCMessageTracer {

Also, this is not a sync protocol (you don't have any sync messages defined in it). All messages are async unless otherwise annotated.

::: dom/ipc/moz.build
@@ +25,5 @@
>      'ContentProcess.h',
>      'CrashReporterChild.h',
>      'CrashReporterParent.h',
> +    'IPCMessageTracerChild.h',
> +    'IPCMessageTracerParent.h',

It doesn't look like these need to be exported, they're only used locally in dom/ipc right?

::: dom/plugins/ipc/PPluginModule.ipdl
@@ +88,5 @@
>  
>    rpc GeckoGetProfile()
>      returns (nsCString aProfile);
>  
> +  rpc PIPCMessageTracer();

This does not need to be RPC.

::: dom/plugins/ipc/PluginModuleParent.cpp
@@ +115,5 @@
>          return nullptr;
>      }
>  #endif
>  
> +    if (!IPCMessageTracerParent::CreateIPCMessageTracer(parent.get())) {

This can't be asserted.

@@ +1804,5 @@
> +
> +IPCMessageTracerParent*
> +PluginModuleParent::MessageTracer()
> +{
> +    return static_cast<IPCMessageTracerParent*>(ManagedPIPCMessageTracerParent()[0]);

Check the length first.

::: dom/plugins/ipc/PluginModuleParent.h
@@ +323,5 @@
>      void ShutdownPluginProfiling();
>  #endif
>  
> +    void InitMessageTracing();
> +    void ShutdownMessageTracing();

It doesn't look like these two are ever defined or used.

::: ipc/glue/AsyncChannel.cpp
@@ +186,5 @@
> +AsyncChannel::ProcessLink::DispatchSend(Message* msg)
> +{
> +    MOZ_TRACE_IPC_MESSAGE(mChan, *msg, mozilla::dom::ProbeType_Send);
> +
> +    mTransport->Send(msg);

Is it worth tracking how long this Send call takes? I'd expect there will be times when it is not instantaneous...

@@ +195,5 @@
>  {
>      mChan->AssertWorkerThread();
>      mChan->mMonitor->AssertCurrentThreadOwns();
>  
> +    MOZ_TRACE_IPC_MESSAGE(mChan, *msg, mozilla::dom::ProbeType_PostTask);

This is a nested lock... As long as we make sure that TraceMessage() never grabs the channel's mMonitor it's ok, but we need to add some big scary comment there saying it may or may not be holding the monitor.

@@ +284,2 @@
>  
>  AsyncChannel::AsyncChannel(AsyncListener* aListener)

You're not initializing mNextSeqno.

@@ +297,5 @@
>  {
>      MOZ_COUNT_CTOR(AsyncChannel);
> +
> +    if (!gMessageTraceLog) {
> +        gMessageTraceLog = PR_NewLogModule("MessageTrace");

This should probably be "IPCMessage". "MessageTrace" is sort of vague and "Trace" is kinda redundant.

@@ +461,5 @@
>      mMonitor->AssertNotCurrentThreadOwns();
>      NS_ABORT_IF_FALSE(MSG_ROUTING_NONE != msg->routing_id(), "need a route");
>  
> +    if (msg->seqno() == 0) {
> +      msg->set_seqno(NextSeqno());

I don't understand this. Async messages have never had a sequence number (they're not part of any sequence, after all). Why would you change this?

@@ +908,5 @@
>      printf("NOTE: %s process received `Goodbye', closing down\n",
>             mChild ? "child" : "parent");
>  }
>  
> +void AsyncChannel::SetTraceChannelId(int32_t channelId)

You should assert !mChild and !mTraceChannelId here.

@@ +913,5 @@
> +{
> +    mTraceChannelId = channelId;
> +}
> +
> +void AsyncChannel::StartMessageTracing()

I think it's pretty reasonable to assert that this (and Stop) only get called on the main thread (AssertWorkerThread())

@@ +916,5 @@
> +
> +void AsyncChannel::StartMessageTracing()
> +{
> +    MutexAutoLock lock(mMessageTraceMutex);
> +    static const size_t kMessageTraceBufferSize = 100000;

This might be too big for memory-constrained devices like B2G. Maybe this should be pref-able?

@@ +919,5 @@
> +    MutexAutoLock lock(mMessageTraceMutex);
> +    static const size_t kMessageTraceBufferSize = 100000;
> +
> +    if (mMessageTraceEnabled) {
> +        return;

I think you should just assert this rather than bailing out. If you ever have two calls to StartMessageTracing you're more than likely going to get two calls to StopMessageTracing and that probably won't work.

@@ +922,5 @@
> +    if (mMessageTraceEnabled) {
> +        return;
> +    }
> +
> +    mMessageTraceData.SetLength(kMessageTraceBufferSize);

This isn't really efficient, you run a bunch of constructors that you might not ever need. I think you should use SetCapacity here instead, then change TraceMessage to call AppendElement up to the max.

@@ +932,5 @@
> +{
> +    MutexAutoLock lock(mMessageTraceMutex);
> +
> +    mMessageTraceEnabled = false;
> +    mMessageTraceData.SetLength(0);

Nit: Clear()

@@ +937,5 @@
> +    mMessageTraceData.Compact();
> +}
> +
> +static const char* const
> +GetMessageTypeString(const IPC::Message& msg)

Nit: Let's move all these helper functions out of the middle of the AsyncChannel methods and up into the anonymous namespace block at the top of the file. You can remove the 'static' then too.

@@ +959,5 @@
> +        E(Complete);
> +        default: return "???";
> +    }
> +#undef E
> +}

Nit: Add MOZ_NOT_REACHED here.

@@ +974,5 @@
> +
> +void
> +AsyncChannel::TraceMessage(const IPC::Message& msg, mozilla::dom::ProbeType probeType, const char* const func)
> +{
> +    const int64_t now = JS_Now();

Please use the TimeStamp class for any and all time things.

@@ +977,5 @@
> +{
> +    const int64_t now = JS_Now();
> +
> +    PR_LOG(gMessageTraceLog, PR_LOG_ALWAYS,
> +           ("(msgtrace) channel=%d, type=%s, probeno=%d, probe=%s, reply=%s, seqno=%d, stamp=%llu, name=%s, func=%s\n",

All of these %d should be %ld since they're all 32-bit.

@@ +980,5 @@
> +    PR_LOG(gMessageTraceLog, PR_LOG_ALWAYS,
> +           ("(msgtrace) channel=%d, type=%s, probeno=%d, probe=%s, reply=%s, seqno=%d, stamp=%llu, name=%s, func=%s\n",
> +             mTraceChannelId,
> +             GetMessageTypeString(msg),
> +             int(probeType),

Nit: please always use static_cast when you can.

@@ +982,5 @@
> +             mTraceChannelId,
> +             GetMessageTypeString(msg),
> +             int(probeType),
> +             GetProbeTypeString(probeType),
> +             msg.is_reply() ? "true":"false",

Nit: spaces around :

@@ +988,5 @@
> +             now,
> +             msg.name(),
> +             func));
> +
> +    if (!mMessageTraceEnabled) {

This isn't safe, you must check this after acquiring the mutex or else you will randomly corrupt memory.

@@ +992,5 @@
> +    if (!mMessageTraceEnabled) {
> +        return;
> +    }
> +
> +    MOZ_ASSERT(mMessageTraceData.Length() > 0);

This too, and imo asserting "!IsEmpty()" is more readable.

@@ +995,5 @@
> +
> +    MOZ_ASSERT(mMessageTraceData.Length() > 0);
> +    MutexAutoLock lock(mMessageTraceMutex);
> +
> +    ++mMessageTraceCount;

You're always skipping index 0 here.

Also, why keep an ever-increasing count here rather that just the next index into the array? It doesn't look like you ever use the count for anything.

@@ +1001,5 @@
> +      MessageInfo(msg.seqno(), GetMessageType(msg), probeType, now, msg.is_reply());
> +}
> +
> +void
> +AsyncChannel::GetMessageTracingData(nsTArray<mozilla::dom::MessageInfo>& aData)

This method copies a potentially huge array but it's really a single-use thing that could be swapped out instead. Let's rename it to 'ForgetMessageTracingData' and use SwapElements() on the array.

@@ +1003,5 @@
> +
> +void
> +AsyncChannel::GetMessageTracingData(nsTArray<mozilla::dom::MessageInfo>& aData)
> +{
> +    if (!mMessageTraceEnabled) {

Check after acquiring the mutex.

::: ipc/glue/AsyncChannel.h
@@ +17,2 @@
>  #include "mozilla/ipc/Transport.h"
> +#include "mozilla/dom/MessageInfo.h"

Can you just forward-declare this?

@@ +19,5 @@
>  
>  //-----------------------------------------------------------------------------
>  
> +#if defined(_MSC_VER)
> +#define FULLFUNCTION __FUNCSIG__

Hm, this header gets included in lots of places and I kinda hate adding internal-ish #defines to widely used headers... Rather than defining FULLFUNCTION can you just have three nearly-identical MOZ_TRACE_IPC_MESSAGE defines here?

@@ +26,5 @@
> +#else
> +#define FULLFUNCTION __FUNCTION__
> +#endif
> +
> + #define MOZ_TRACE_IPC_MESSAGE(channel, message, probetype) \

Nit: extra space at column 0

@@ +305,5 @@
>      int mWorkerLoopID;
> +
> +    // This is only accessed from the worker thread; seqno's are
> +    // completely opaque to the IO thread.
> +    int32_t mNextSeqno;

I can't see why you've decided to move this from SyncChannel to AsyncChannel. It doesn't make sense really.

@@ +308,5 @@
> +    // completely opaque to the IO thread.
> +    int32_t mNextSeqno;
> +
> +    // do we trace IPC message routing?
> +    bool mMessageTraceEnabled;

Nit: Stick this down below the mutex and add a comment to the block of new vairables you added saying that they are protected by this mutex.

@@ +310,5 @@
> +
> +    // do we trace IPC message routing?
> +    bool mMessageTraceEnabled;
> +
> +    Mutex mMessageTraceMutex;

I'm a little worried about using a mutex to keep everything synchronized here since I expect this all to be pretty hot code. You're almost guaranteed to have contention since you're trying to measure things that happen close together in time.

I don't care too much about performance penalties for when this code is being used (e.g. when mMessageTraceEnabled is true) but I'm really concerned that this will slow things down when mMessageTraceEnabled is false.

I think there are lots of opportunities for optimization here, but I don't want to optimize prematurely here. What measurements have you done to gauge the performance impacts here?

@@ +312,5 @@
> +    bool mMessageTraceEnabled;
> +
> +    Mutex mMessageTraceMutex;
> +    nsTArray<MessageInfo> mMessageTraceData;
> +    uint32_t mMessageTraceIndex;

This looks unused.

::: ipc/glue/RPCChannel.cpp
@@ +266,5 @@
>                      "rpc", true);
>              }
>  
> +            // can't add traces in all calling code, so trace the reply as completed here
> +            MOZ_TRACE_IPC_MESSAGE(this, recvd, mozilla::dom::ProbeType_Complete);

I don't understand this really. ProbeType_Complete means different things for normal messages vs. RPC messages?

@@ +716,5 @@
>          MOZ_ASSERT(mPending.back().compress());
> +
> +        // mark this so we don't get holes in our ipc message trace
> +        MOZ_TRACE_IPC_MESSAGE(this, mPending.back(), mozilla::dom::ProbeType_Dequeue);
> +        MOZ_TRACE_IPC_MESSAGE(this, mPending.back(), mozilla::dom::ProbeType_Complete);

I think this should probably get annotated in a different way. Otherwise looking at a profile you're going to see several of the same kind of message that seem to take no time at all and then one that takes way longer.

::: ipc/glue/SyncChannel.cpp
@@ +122,5 @@
>                        "unexpected sync message");
>  
> +    MOZ_TRACE_IPC_MESSAGE(this, mRecvd, mozilla::dom::ProbeType_Dequeue);
> +    // can't add traces in all calling code, so trace the reply as completed here
> +    MOZ_TRACE_IPC_MESSAGE(this, mRecvd, mozilla::dom::ProbeType_Complete);

Same here. This seems too confusing.
Comment 15 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-06-18 04:42:34 PDT
(In reply to ben turner [:bent] from comment #14)
> Overall this patch looks like it's heading in the right direction! Sorry it
> took a while to review this first round. I'll be much faster the next time
> now that I understand what this is doing.

Thanks for the thorough review, looks like i missed a bit more than i expected.


> @@ +66,5 @@
> > +  }
> > +}
> > +
> > +bool
> > +IPCMessageTracerParent::RecvDataCollected(const InfallibleTArray<MessageInfo>& childData)
> 
> In here you're not doing anything to handle the case where the circular
> buffer wrapped around (oldest MessageInfo might be somewhere in the middle
> of the array). Does the profiler frontend handle this somehow? Or should you
> be putting them in order here?

I was thinking that we don't need special handling here: as the messages are uniquely identified per channel, overlap shouldn't matter. But maybe that's not the best path?


> ::: dom/ipc/PContent.ipdl
> @@ +361,5 @@
> >      NotifyProcessPriorityChanged(ProcessPriority priority);
> >      MinimizeMemoryUsage();
> >      CancelMinimizeMemoryUsage();
> >  
> > +    rpc PIPCMessageTracer();
> 
> I don't understand why this needs to be RPC. Async (the default) should be
> just fine, right?


I went with what e.g. PCrashReporter() does in PPluginModule - with async i'd have to synchronize on construction before sending e.g. StartTracing messages, no?

> @@ +13,5 @@
> > +
> > +namespace mozilla {
> > +namespace dom {
> > +
> > +sync protocol PIPCMessageTracer {
> 
> I think you can simplify this protocol a lot:
> 
>   protocol PIPCMessageTracer
>   {
>     manager PContent or PPluginModule;
>   child:
>     StopTracing();
>   parent:
>     __delete__(MessageInfo[] info);
>   };
> 
> Then when you send the PIPCMessageTracer() constructor message (from
> PContent or PPluginModule) tracing starts immediately. When you want to stop
> the parent sends the StopTracing() message and the child responds with the
> destructor message including the trace data. Sound ok?

But then i'd need to duplicate the notification and some initialization handling in PluginModule and Content. This way i can expect them to have a IPCMessageTracer, which implements everything that's common. Or am i missing a better route here?

> ::: ipc/glue/AsyncChannel.cpp
> @@ +186,5 @@
> > +AsyncChannel::ProcessLink::DispatchSend(Message* msg)
> > +{
> > +    MOZ_TRACE_IPC_MESSAGE(mChan, *msg, mozilla::dom::ProbeType_Send);
> > +
> > +    mTransport->Send(msg);
> 
> Is it worth tracking how long this Send call takes? I'd expect there will be
> times when it is not instantaneous...

Right, i'll add that unless proven irrelevant.

> @@ +461,5 @@
> >      mMonitor->AssertNotCurrentThreadOwns();
> >      NS_ABORT_IF_FALSE(MSG_ROUTING_NONE != msg->routing_id(), "need a route");
> >  
> > +    if (msg->seqno() == 0) {
> > +      msg->set_seqno(NextSeqno());
> 
> I don't understand this. Async messages have never had a sequence number
> (they're not part of any sequence, after all). Why would you change this?

We need a unique identifier for messages across parent- and child-processes. We already have the seqno field, so i thought filling it for async messages as well would be the next-best option.

> @@ +916,5 @@
> > +
> > +void AsyncChannel::StartMessageTracing()
> > +{
> > +    MutexAutoLock lock(mMessageTraceMutex);
> > +    static const size_t kMessageTraceBufferSize = 100000;
> 
> This might be too big for memory-constrained devices like B2G. Maybe this
> should be pref-able?

Thinking about this again it's probably best to let the client set this via the start notification and have a much smaller default - different data collection intervals as well as tracing scenarios may need different buffer sizes etc.

> @@ +977,5 @@
> > +{
> > +    const int64_t now = JS_Now();
> > +
> > +    PR_LOG(gMessageTraceLog, PR_LOG_ALWAYS,
> > +           ("(msgtrace) channel=%d, type=%s, probeno=%d, probe=%s, reply=%s, seqno=%d, stamp=%llu, name=%s, func=%s\n",
> 
> All of these %d should be %ld since they're all 32-bit.

Hm, int is 32bit across LLP64, LP64 & ILP32 - am i missing something here?
I should really check though if we can't get a drop-in for inttypes.h for MSVC so this and other places can just use PRId32 etc.


> @@ +988,5 @@
> > +             now,
> > +             msg.name(),
> > +             func));
> > +
> > +    if (!mMessageTraceEnabled) {
> 
> This isn't safe, you must check this after acquiring the mutex or else you
> will randomly corrupt memory.


> @@ +1003,5 @@
> > +
> > +void
> > +AsyncChannel::GetMessageTracingData(nsTArray<mozilla::dom::MessageInfo>& aData)
> > +{
> > +    if (!mMessageTraceEnabled) {
> 
> Check after acquiring the mutex.

> @@ +310,5 @@
> > +
> > +    // do we trace IPC message routing?
> > +    bool mMessageTraceEnabled;
> > +
> > +    Mutex mMessageTraceMutex;
> 
> I'm a little worried about using a mutex to keep everything synchronized
> here since I expect this all to be pretty hot code. You're almost guaranteed
> to have contention since you're trying to measure things that happen close
> together in time.
> 
> I don't care too much about performance penalties for when this code is
> being used (e.g. when mMessageTraceEnabled is true) but I'm really concerned
> that this will slow things down when mMessageTraceEnabled is false.

To address both concerns above, could we do a early-return on !mMessageTraceEnabled for perf and another check after aquiring the mutex for correctness?

> I think there are lots of opportunities for optimization here, but I don't
> want to optimize prematurely here. What measurements have you done to gauge
> the performance impacts here?

I wasn't worried about perf so far due to the early-returns making the impact seem trivial compared to the rest of the ipc/glue code. Shall i take this through some specific tests?

> ::: ipc/glue/RPCChannel.cpp
> @@ +266,5 @@
> >                      "rpc", true);
> >              }
> >  
> > +            // can't add traces in all calling code, so trace the reply as completed here
> > +            MOZ_TRACE_IPC_MESSAGE(this, recvd, mozilla::dom::ProbeType_Complete);
> 
> I don't understand this really. ProbeType_Complete means different things
> for normal messages vs. RPC messages?

For async & parts of sync messages the handler code can be assumed complete after |mListener->OnMessageReceived(msg)|. Other parts have their handlers on the stack above channel code (ipdl actor code etc.), so i can't know generally when the handling code completed without inserting probes all over.

> I'm actually curious why you want to collect data for the last one
> (ProbeType_Complete/ProbeType_EndOnMessageReceived). Does that tell you
> anything that the already available profiler data can't?

Maybe the Complete probe isn't really useful here with those limitations and this is better measured with other means. I think i'll just remove it.

> @@ +716,5 @@
> >          MOZ_ASSERT(mPending.back().compress());
> > +
> > +        // mark this so we don't get holes in our ipc message trace
> > +        MOZ_TRACE_IPC_MESSAGE(this, mPending.back(), mozilla::dom::ProbeType_Dequeue);
> > +        MOZ_TRACE_IPC_MESSAGE(this, mPending.back(), mozilla::dom::ProbeType_Complete);
> 
> I think this should probably get annotated in a different way. Otherwise
> looking at a profile you're going to see several of the same kind of message
> that seem to take no time at all and then one that takes way longer.

Right, adding a flag or probe for them being compressed is probably better.
Comment 16 Ben Turner (not reading bugmail, use the needinfo flag!) 2013-06-18 15:58:42 PDT
(In reply to Georg Fritzsche [:gfritzsche] from comment #15)
> I was thinking that we don't need special handling here: as the messages are
> uniquely identified per channel, overlap shouldn't matter. But maybe that's
> not the best path?

I don't understand I guess. In most cases one channel roughly corresponds to one process. It's quite possible that we could have a long-running process that wraps around the buffer. Unless the frontend is somehow sorting by timestamp then your messages may show up out of order. I think that has to be handled somewhere, and I personally would think it would be best here in c++, but that's something I guess you and benwa can discuss.

> I went with what e.g. PCrashReporter() does in PPluginModule

In general anything plugin-related should be ignored by everyone ;) Plugins are horrible for a number of reasons but they're pretty much the only thing that should ever need RPC messages. Nothing that blocks the parent process is allowed in new code so you'll have to use async.

> with async i'd have to synchronize on construction before sending e.g. StartTracing
> messages, no?

I don't know what you mean really, but maybe this helps: Message order is always guaranteed across channels (with some caveats about different message types). So, if we open a channel and then send two async messages (A and B) you're guaranteed that the other process will process A before B.

> But then i'd need to duplicate the notification and some initialization
> handling in PluginModule and Content.

How so? You can still keep your MessageTracingObserver, it would just then call PIPCMessageTracerConstructor instead of StartTracing. Rather than holding a PIPCMessageTracer pointer it would just hold Toplevel pointer instead.

> We need a unique identifier for messages across parent- and child-processes.
> We already have the seqno field, so i thought filling it for async messages
> as well would be the next-best option.

No, if this is supposed to be a unique id then the seqno field is not what you want. First, several messages can have the same seqno (for sync messages it's always a message and its reply, but for RPC messages they can be anything). Also, it could easily overflow since it's a signed 32bit int and the sign matters.

I'd really hate to add an additional 64-bit id to all messages too since we push a bunch of these around and there's no easy way to only inflate them when tracing is enabled...

> Thinking about this again it's probably best to let the client set this via
> the start notification and have a much smaller default

Making the allocation fallible will also be good then.

> Hm, int is 32bit across LLP64, LP64 & ILP32 - am i missing something here?
> I should really check though if we can't get a drop-in for inttypes.h for
> MSVC so this and other places can just use PRId32 etc.

Not sure, I think maybe you misunderstood. NSPR printf-style functions expect "%d" to mean 16-bit, "%ld" to mean 32-bit, and "%lld" for 64-bit. These should be 32-bit ints so you should use "%ld" in your format string.

> To address both concerns above, could we do a early-return on
> !mMessageTraceEnabled for perf and another check after aquiring the mutex
> for correctness?

No, double-checked locking isn't guaranteed to be safe (at high optimization levels the compiler can do funny things like use that space for temporary values and such). You're only guaranteed to get the results you want when you write race-free code.

> I wasn't worried about perf so far due to the early-returns making the
> impact seem trivial compared to the rest of the ipc/glue code. Shall i take
> this through some specific tests?

Unfortunately the way you have this set up now will require you to enter the mutex lots of times on more than one thread so you're very likely to have contention. That's not necessarily bad, but coupled with the fact that we expect this to be very hot code I would say we definitely need to run this through some performance tests.

I'm not sure that we really have much written now so you'll be on your own probably :( I'd recommend either writing an IPDL c++ test (see http://mxr.mozilla.org/mozilla-central/source/ipc/ipdl/test/cxx/README.txt) or maybe using xpcshell with the sendCommand function. Then you can send a ton of messages (hopefully with replies) and get baseline timing. Then apply your patch and see what changes?

> Maybe the Complete probe isn't really useful here with those limitations and
> this is better measured with other means. I think i'll just remove it.

I think that is best.
Comment 17 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-07-01 02:28:34 PDT
Created attachment 769610 [details] [diff] [review]
Perf test

The test i used to time the differences in raw message delivery speed (without the observer-based tracing activated).
This produces separate timings for RPC, nested RPC, async & sync calls.

While the timing increase with the message tracing patch applied stays mostly below 5%, async messages take around 25% longer on average. 
I presume this is much more pronounced with async because the message handling code path there is less complex and hence the message tracing overhead more noticable.

More detailed numbers here:
https://docs.google.com/spreadsheet/ccc?key=0AhKmHVpmAzfcdDNCTUhxZTRmb3ppMjBiUHEzRXduOVE
Comment 18 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-07-01 02:41:34 PDT
Created attachment 769613 [details] [diff] [review]
Part 1: add IPC message tracing, v4

Addressing review, will add comments in splinter shortly (hopefully making it much easier to address the locations compared to trying to respond further in comments).

Giving the rough performance results above, the biggest question is if we
a) can live with that overhead
b) need to optimize now
c) need to make this optional via a build flag
Comment 19 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-07-01 02:51:56 PDT
Created attachment 769618 [details] [diff] [review]
Part 2: Test IPC message tracing, v2
Comment 20 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-07-01 03:06:39 PDT
Comment on attachment 769613 [details] [diff] [review]
Part 1: add IPC message tracing, v4

Review of attachment 769613 [details] [diff] [review]:
-----------------------------------------------------------------

Clarifying notes.

::: dom/ipc/PIPCMessageTracer.ipdl
@@ +15,5 @@
> +protocol PIPCMessageTracer
> +{
> +  manager PContent or PPluginModule;
> +child:
> +  CollectData();

Note that we need this and DataCollected() as the data can be requested in regular intervals (e.g. profiler view updates).

::: ipc/glue/AsyncChannel.cpp
@@ +224,5 @@
> +AsyncChannel::ProcessLink::DispatchSend(Message* msg)
> +{
> +    MOZ_TRACE_IPC_MESSAGE(mChan, *msg, mozilla::dom::ProbeType_SendOnIOThread);
> +
> +    mTransport->Send(msg);

Note that the msg ownership was taken by Send() and i'm not keen on copying every message here.
If we'd really need to measure this at some point we should probably do it differently.

@@ +978,5 @@
> +AsyncChannel::TraceMessage(const IPC::Message& aMsg, mozilla::dom::ProbeType aProbeType, const char* const aFunc)
> +{
> +    // NOTE: we may or may not hold the monitor - we can't rely on this!
> +
> +    const int64_t now = JS_Now();

I don't see how TimeStamp would improve this. It seems to be great for measuring durations, but for just storing time samples it seems a little awkward here (e.g. would have to go through creating a duration from some arbitrary TimeStamp offset just to print it).

::: ipc/glue/AsyncChannel.h
@@ +17,2 @@
>  #include "mozilla/ipc/Transport.h"
> +#include "mozilla/dom/MessageInfo.h"

This header defines enums used below and enums can't be forward declared.
Comment 21 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-07-01 03:10:40 PDT
(In reply to Georg Fritzsche [:gfritzsche] from comment #18)
> Created attachment 769613 [details] [diff] [review]
> Part 1: add IPC message tracing, v4

(In reply to Georg Fritzsche [:gfritzsche] from comment #19)
> Created attachment 769618 [details] [diff] [review]
> Part 2: Test IPC message tracing, v2

Try run for these:
https://tbpl.mozilla.org/?tree=Try&rev=dbeffe01a7d9
Comment 22 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-07-01 07:46:33 PDT
jduell, given that networking implementations are likely to be the heaviest usage of IPC/IPDL...
Could you maybe suggest a performance testing scenario that mirrors real-world usage of the IPC/IPDL infrastructure?

While i have measurements above, they are rather artificial (plain IPC message delivery speed).
Comment 23 Benoit Girard (:BenWa) 2013-07-07 12:28:21 PDT
Created attachment 771862 [details] [diff] [review]
Part 1 rebased

I rebased part 1. Posting here to avoid someone else the work.
Comment 24 Benoit Girard (:BenWa) 2013-07-07 15:01:45 PDT
Comment on attachment 769613 [details] [diff] [review]
Part 1: add IPC message tracing, v4

Review of attachment 769613 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/ipc/IPCMessageTracerParent.h
@@ +93,5 @@
> +      return NS_OK;
> +    }
> +
> +    IPCMessageTracerParent* tracer = mActor->GetMessageTracer();
> +    MOZ_ASSERT(tracer);

This crashes a debug build if the user incorrectly tries to dump a profile without starting the capture first. I think we should just ignore the message if we're not capturing, not assert and return NS_OK. Maybe a warning at worse.
Comment 25 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-07-18 10:37:13 PDT
Created attachment 777922 [details] [diff] [review]
Perf test for plugin streaming scenario

So far i haven't had input on any interesting scenarios for perf tests so i decided checking into the impact on plugin streaming.

As it turns out there seems to be no measurable impact over timing value fluctuation (at least in this scenario):
https://docs.google.com/spreadsheet/ccc?key=0AhKmHVpmAzfcdHlFWXdtV2dwZ0E4UU90ZE9wLTJSSlE&usp=sharing

This is for both async & sync response sending from the SJS. Also different chunk-sizes in the plugin don't make any difference.
Comment 26 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-07-18 10:37:58 PDT
bent, what is next here? Do you need any more data from me?
Comment 27 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-07-18 13:50:45 PDT
Created attachment 778039 [details] [diff] [review]
Part 1: add IPC message tracing, v5

Rebased again and BenWas request incorporated.

See comments 17-20 for the previous addressing of review comments.
Comment 28 Ben Turner (not reading bugmail, use the needinfo flag!) 2013-08-07 11:23:00 PDT
Comment on attachment 778039 [details] [diff] [review]
Part 1: add IPC message tracing, v5

Review of attachment 778039 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for running those perf tests. I am still super nervous about having a mutex around all this though. I expect that the slowdown in the async case is actually due to contention... The sync/rpc tests are necessarily serial (one message at a time, one reply at a time) but the async tests have lots of messages being processed at once. I would be interested to see what happens if the child process is set to continually send an async no-op message to the parent while the sync and rpc tests are running with and without your patch. That would hopefully test my contention hypothesis.

So on to the patch itself... I think it's pretty close! Some changes below are small, some are larger. Sorry for the delay here, I got slammed by a bunch of big patches all at once :(

::: dom/ipc/ContentChild.cpp
@@ +19,5 @@
>  
>  #include "mozilla/Attributes.h"
>  #include "mozilla/dom/ExternalHelperAppChild.h"
>  #include "mozilla/dom/PCrashReporterChild.h"
> +#include "mozilla/dom/PIPCMessageTracerChild.h"

Nit: This shouldn't be necessary since it's included in IPCMessageTracerChild.h

@@ +755,5 @@
>  
> +PIPCMessageTracerChild*
> +ContentChild::AllocPIPCMessageTracerChild(const uint32_t& aBufferSize)
> +{
> +    GetIPCChannel()->StartMessageTracing(aBufferSize);

This sort of thing belongs in RecvPIPCMessageTracerConstructor (an optional override), not here. Alloc/Dealloc are really only for memory management.

::: dom/ipc/ContentChild.h
@@ +105,5 @@
>      DeallocPCrashReporterChild(PCrashReporterChild*);
>  
> +    virtual PIPCMessageTracerChild*
> +    AllocPIPCMessageTracerChild(const uint32_t& aBufferSize) MOZ_OVERRIDE;
> +    virtual bool DeallocPIPCMessageTracerChild(PIPCMessageTracerChild*) MOZ_OVERRIDE;

Nit: return type on its own line, though this file is remarkably inconsistent already :-/

::: dom/ipc/ContentParent.cpp
@@ +1861,5 @@
>  
> +PIPCMessageTracerParent*
> +ContentParent::AllocPIPCMessageTracerParent(const uint32_t& aBufferSize)
> +{
> +    return new IPCMessageTracerParent(aBufferSize);

You should assert ManagedPIPCMessageTracerParent().IsEmpty() here.

@@ +2621,5 @@
>  
> +IPCMessageTracerParent*
> +ContentParent::GetMessageTracer()
> +{
> +    if (ManagedPIPCMessageTracerParent().Length() != 1) {

I think this should be an IsEmpty() check, and you should first assert that Length() == 0 or Length() == 1.

::: dom/ipc/IPCMessageTracerChild.cpp
@@ +11,5 @@
> +{
> +  MOZ_COUNT_CTOR(IPCMessageTracerChild);
> +
> +  //AsyncChannel* channel = GetIPCChannel();
> +  //channel->StartMessageTracing(aBufferSize);

Why is this commented out? I like this much better than calling StartMessageTracing() in ContentChild::AllocPIPCMessageTracerChild().

@@ +23,5 @@
> +}
> +
> +bool IPCMessageTracerChild::RecvCollectData()
> +{
> +  AsyncChannel* channel = GetIPCChannel();

Nit: No real need for the temporary variable

@@ +26,5 @@
> +{
> +  AsyncChannel* channel = GetIPCChannel();
> +  nsTArray<MessageInfo> data;
> +
> +  channel->ForgetMessageTracingData(data);

Nit: I'd declare data on the line right before you use it instead of splitting the two with a newline.

::: dom/ipc/IPCMessageTracerChild.h
@@ +8,5 @@
> +#include "mozilla/dom/PIPCMessageTracerChild.h"
> +
> +namespace mozilla {
> +namespace plugins {
> +  class PluginModuleChild;

Nit: forward declarations usually stick to column 0

@@ +25,5 @@
> +  virtual bool RecvCollectData() MOZ_OVERRIDE;
> +
> +protected:
> +  IPCMessageTracerChild(uint32_t aBufferSize);
> +  ~IPCMessageTracerChild();

Nit: Mark this as virtual

::: dom/ipc/IPCMessageTracerParent.cpp
@@ +30,5 @@
> +  MOZ_COUNT_DTOR(IPCMessageTracerParent);
> +}
> +
> +static const char* const
> +MessageTypeToString(MessageType type)

This should be unified with GetMessageTypeString in AsyncChannel

@@ +43,5 @@
> +  return "?";
> +}
> +
> +static void
> +AppendMessageInfos(nsString& aJson, const InfallibleTArray<MessageInfo>& aData)

Nit: In general arguments that get modified are the arguments that come at the end. For such a small internal function I don't care a ton but I figured you should know and might want to change it :)

@@ +45,5 @@
> +
> +static void
> +AppendMessageInfos(nsString& aJson, const InfallibleTArray<MessageInfo>& aData)
> +{
> +  for (uint32_t i=0; i<aData.Length(); ++i) {

Nit: Spaces before and after the = and < here.

@@ +48,5 @@
> +{
> +  for (uint32_t i=0; i<aData.Length(); ++i) {
> +    const MessageInfo& msg = aData[i];
> +    aJson.AppendPrintf(
> +      "{\"msg_type\": \"%s\", \"probe_type\": %ld, \"is_reply\": %s, \"msg_id\": %lu, \"timestamp\": %llu}",

Manual JSON is scary, but necessary (in our current world) I guess. Are there tests somewhere that ensure you got this string right?

@@ +55,5 @@
> +        msg.is_reply ? "true" : "false",
> +        msg.msg_id,
> +        msg.timestamp);
> +
> +    if (i < (aData.Length() - 1)) {

Nit: != is easier

@@ +66,5 @@
> +IPCMessageTracerParent::RecvDataCollected(const InfallibleTArray<MessageInfo>& aChildData)
> +{
> +  nsCOMPtr<nsIObserverService> obsService = mozilla::services::GetObserverService();
> +  if (!obsService) {
> +    NS_ERROR("no observer service");

NS_ERROR is a non-fatal assertion (which is kind of a contradiction of terms, right?). In general we shouldn't add it any longer. So, should this be a fatal assertion (MOZ_ASSERT) that cannot actually happen in the real world? Or should this be a strange-but-handled situation that is either a) worth warning about (NS_WARNING), or b) not worth warning about?

@@ +85,5 @@
> +
> +  AppendMessageInfos(json, aChildData);
> +
> +  json.AppendLiteral("]}");
> +  obsService->NotifyObservers(nullptr, kMessageTracingCaptureCompletedObserverTopic, json.get());

Same concern about JSON here. What guarantees do we have that this formatting code is correct?

@@ +91,5 @@
> +  return true;
> +}
> +
> +/* static */ void
> +IPCMessageTracerParent::StopTracing(IPCMessageTracerParent* aActor)

I think this is safe because mActor is only used within Observe(), but ideally this should null out the mActor pointer since it's about to die.

@@ +100,5 @@
> +    NS_WARNING("shouldn't get called when message tracing isn't enabled");
> +    return;
> +  }
> +
> +  sMessageTracingEnabled = false;

You can't do this, there could be more than one ContentParent and PluginModuleParent alive at one time, and then you'd at least trigger several warnings above. What does sMessageTracingEnabled actually help you with? Can you just remove it?

@@ +103,5 @@
> +
> +  sMessageTracingEnabled = false;
> +  aActor->GetIPCChannel()->StopMessageTracing();
> +
> +  unused << IPCMessageTracerParent::Send__delete__(aActor);

Bah, sorry, this won't work. You have the potential to race with a child->parent message (DataCollected) here. Normal messages can race without consequence but the destructor cannot be allowed to race (it aborts the process). You need to either:

  a) guarantee that the race can't happen at a higher level (somehow delay this call while waiting for outstanding collections to finish, maybe with a counter), or
  b) add a new parent->child message that says "please die" and then change the direction of the destructor message to be from child->parent in response.

@@ +111,5 @@
> +IPCMessageTracerParent::CollectTracingData()
> +{
> +  MOZ_ASSERT(NS_IsMainThread());
> +  MOZ_ASSERT(sMessageTracingEnabled);
> +  if (!sMessageTracingEnabled) {

Asserting and then doing a runtime check is not useful. See my previous comment on the various types of error handling :)

@@ +125,5 @@
> +MessageTracingObserverBase::MessageTracingObserverBase()
> +{
> +  nsCOMPtr<nsIObserverService> observerService = mozilla::services::GetObserverService();
> +  if (observerService) {
> +    observerService->AddObserver(this, kMessageTracingStartObserverTopic, false);

It's not safe to call XPCOM methods with interface objects that have a refcount of 0 (e.g. 'this' here)... This code cannot be called in a constructor and must move to a method that gets called once a reference has been established. I suggest you make a static Create method that combines new and AddObserver to make sure that this works as you intend.

::: dom/ipc/IPCMessageTracerParent.h
@@ +9,5 @@
> +#include "nsIObserver.h"
> +
> +namespace mozilla {
> +namespace plugins {
> +  class PluginModuleParent;

Nit: no indent here

@@ +24,5 @@
> +
> +class IPCMessageTracerParent : public PIPCMessageTracerParent
> +{
> +public:
> +  virtual bool RecvDataCollected(const InfallibleTArray<MessageInfo>& info) MOZ_OVERRIDE;

This should be protected (only called by PIPCMessageTracerParent, right?)

Nit: Please wrap to 80 chars (I suggest making the return type go on its own line).

@@ +26,5 @@
> +{
> +public:
> +  virtual bool RecvDataCollected(const InfallibleTArray<MessageInfo>& info) MOZ_OVERRIDE;
> +
> +  static void StopTracing(IPCMessageTracerParent* aActor);

I don't understand why this is static when you always pass an instance pointer...

@@ +36,5 @@
> +  friend class mozilla::plugins::PluginModuleParent;
> +  friend class ContentParent;
> +
> +  IPCMessageTracerParent(uint32_t aBufferSize);
> +  ~IPCMessageTracerParent();

Nit: mark as virtual

@@ +48,5 @@
> +public:
> +  NS_DECL_ISUPPORTS
> +
> +  // nsIObserver
> +  NS_IMETHOD Observe(nsISupports*, const char*, const PRUnichar*) MOZ_OVERRIDE = 0;

Please just leave this without an override in this class since you override it in every instantiation.

@@ +53,5 @@
> +
> +  void StopObserving();
> +
> +protected:
> +  template<class TopLevel> friend class MessageTracingObserver;

Template things like this generally cause bustage on one platform or another... Has this gone through the try server yet? (I thought MSVC wanted all friend classes to be declared, or forward-declared, before being marked as a friend...)

@@ +68,5 @@
> +   : mActor(aActor)
> +  {
> +    MOZ_ASSERT(mActor);
> +  }
> +  virtual ~MessageTracingObserver() {}

Reference-counted things should have (at least) protected destructors.

@@ +70,5 @@
> +    MOZ_ASSERT(mActor);
> +  }
> +  virtual ~MessageTracingObserver() {}
> +
> +  NS_IMETHOD Observe(nsISupports* aSubject, const char* aTopic,

Virtual methods won't be inlined, just move this to the cpp and explicitly instantiate the template there. Then you can lose all the extern strings and such.

Though, maybe you just want to stick with virtual methods here and lose the templates altogether? The template isn't really buying you anything here.

@@ +102,5 @@
> +    if (strcmp(aTopic, kMessageTracingStopObserverTopic) == 0) {
> +      IPCMessageTracerParent::StopTracing(tracer);
> +    } else if (strcmp(aTopic, kMessageTracingCaptureObserverTopic) == 0) {
> +      tracer->CollectTracingData();
> +    }

Maybe warn if aTopic is not something you handle?

::: dom/ipc/MessageInfo.h
@@ +10,5 @@
> +
> +enum MessageType {
> +  MessageType_Async,
> +  MessageType_Sync,
> +  MessageType_Rpc,

A new type (urgent) was recently added in bug 867013.

@@ +18,5 @@
> +enum ProbeType {
> +  ProbeType_DispatchToIOThread,
> +  ProbeType_SendOnIOThread,
> +  ProbeType_ReceiveOnIOThread,
> +  ProbeType_OnMessageReceived,

What about:

  SendOnMainThread
  SendOnIOThread
  ReceiveOnIOThread
  ReceiveOnMainThread

That feels more symmetrical and should be easier to reason about.

@@ +26,5 @@
> +
> +struct MessageInfo {
> +  MessageType msg_type;
> +  ProbeType probe_type;
> +  int64_t timestamp;

Shouldn't this be uint64_t?

@@ +28,5 @@
> +  MessageType msg_type;
> +  ProbeType probe_type;
> +  int64_t timestamp;
> +  bool is_reply;
> +  uint32_t msg_id;

Can you try to optimize the packing here? It's possible that the enums would pack better with the bool or the uint32_t.

@@ +58,5 @@
> +namespace IPC {
> +
> +template <>
> +struct ParamTraits<mozilla::dom::MessageType> :
> +  public EnumSerializer<mozilla::dom::MessageType,

You should #include IPCMessageUtils.h so that this file will compile on its own.

::: dom/ipc/PContent.ipdl
@@ +274,5 @@
>      NotifyProcessPriorityChanged(ProcessPriority priority);
>      MinimizeMemoryUsage();
>      CancelMinimizeMemoryUsage();
>  
> +    async PIPCMessageTracer(uint32_t aBufferSize);

Nit: async is the default so you don't need to specify it.

::: dom/ipc/PIPCMessageTracer.ipdl
@@ +9,5 @@
> +
> +using mozilla::dom::MessageInfo;
> +
> +namespace mozilla {
> +namespace dom {

Oops, sorry that I didn't notice this before. It's a bit weird that this lives in mozilla::dom since it's not in any way related to the DOM. I think this all of this tracing code should be in mozilla::ipc instead. The MessageTracer files should live in ipc/glue.

@@ +14,5 @@
> +
> +protocol PIPCMessageTracer
> +{
> +  manager PContent or PPluginModule;
> +child:

Nit: newline before "child:"

::: dom/plugins/ipc/PluginModuleChild.cpp
@@ +713,5 @@
> +    return true;
> +}
> +
> +bool
> +PluginModuleChild::RecvPIPCMessageTracerConstructor(

No need to override this if you're not using it... But I suggest you do, see comments in ContentChild.

::: dom/plugins/ipc/PluginModuleParent.cpp
@@ +1803,5 @@
> +
> +#endif // MOZ_ENABLE_PROFILER_SPS
> +
> +IPCMessageTracerParent*
> +PluginModuleParent::GetMessageTracer()

Same comments as in ContentParent

::: ipc/glue/AsyncChannel.cpp
@@ +75,5 @@
>  };
>  
> +PRLogModuleInfo* gMessageTraceLog = nullptr;
> +
> +static const char* const

Nit: Functions in anonymous namespaces automatically get static-ified.

@@ +76,5 @@
>  
> +PRLogModuleInfo* gMessageTraceLog = nullptr;
> +
> +static const char* const
> +GetMessageTypeString(const IPC::Message& msg)

Needs "urgent" support.

@@ +86,5 @@
> +    return "async";
> +}
> +
> +static const char* const
> +GetProbeTypeString(mozilla::dom::ProbeType type)

Nit: Parameters are named "aFoo"

@@ +239,3 @@
>      mIOLoop->PostTask(
>          FROM_HERE,
> +        NewRunnableMethod(this, &ProcessLink::DispatchSend, msg));

Hm, what guarantees that 'this' stays alive while we're waiting for DispatchSend to run on the IO thread? Before we were careful to keep mTransport alive but I don't know offhand if the same care has been taken for 'this'.

@@ +335,5 @@
>  {
>      MOZ_COUNT_CTOR(AsyncChannel);
> +
> +    if (!gMessageTraceLog) {
> +        gMessageTraceLog = PR_NewLogModule("IPCMessage");

Can you assert that you're on the "worker thread" here to make sure that this is safe?

@@ +944,5 @@
> +{
> +    MOZ_ASSERT(!mChild);
> +    MOZ_ASSERT(mTraceChannelId == 0);
> +
> +    mTraceChannelId = aChannelId;

If you think this should be only touched while holding the lock then you need to fix this here.

@@ +968,5 @@
> +{
> +    AssertWorkerThread();
> +    MutexAutoLock lock(mMessageTraceMutex);
> +
> +    mMessageTraceEnabled = false;

Want to assert that this is true before you set it?

@@ +974,5 @@
> +    mMessageTraceData.Compact();
> +}
> +
> +void
> +AsyncChannel::TraceMessage(const IPC::Message& aMsg, mozilla::dom::ProbeType aProbeType, const char* const aFunc)

Nit: Wrap to 80 chars

@@ +976,5 @@
> +
> +void
> +AsyncChannel::TraceMessage(const IPC::Message& aMsg, mozilla::dom::ProbeType aProbeType, const char* const aFunc)
> +{
> +    // NOTE: we may or may not hold the monitor - we can't rely on this!

What's this all about? You're always entering the mutex below...

@@ +978,5 @@
> +AsyncChannel::TraceMessage(const IPC::Message& aMsg, mozilla::dom::ProbeType aProbeType, const char* const aFunc)
> +{
> +    // NOTE: we may or may not hold the monitor - we can't rely on this!
> +
> +    const int64_t now = JS_Now();

Let's go for PR_Now. There's no real need for IPC to depend on JS.

@@ +980,5 @@
> +    // NOTE: we may or may not hold the monitor - we can't rely on this!
> +
> +    const int64_t now = JS_Now();
> +
> +    PR_LOG(gMessageTraceLog, PR_LOG_ALWAYS,

You're accessing mTraceChannelId without holding the lock... Assuming that this is fine (see my comment in the header) then you might as well go ahead and use GetTraceChannelId().

@@ +985,5 @@
> +           ("(msgtrace) channel=%ld, child=%s, type=%s, probeno=%ld, probe=%s, reply=%s, stamp=%llu, type=%lu, name=%s, func=%s\n",
> +             mTraceChannelId,
> +             mChild ? "true" : "false",
> +             GetMessageTypeString(aMsg),
> +             static_cast<int32_t>(aProbeType),

Is this useful when you already have the string representation below?

@@ +988,5 @@
> +             GetMessageTypeString(aMsg),
> +             static_cast<int32_t>(aProbeType),
> +             GetProbeTypeString(aProbeType),
> +             aMsg.is_reply() ? "true" : "false",
> +             now,

As you have it here 'now' is a 64bit signed int so you should use %lld.

@@ +999,5 @@
> +        return;
> +    }
> +
> +    const MessageInfo msgInfo =
> +        MessageInfo(GetMessageType(aMsg), aProbeType, now, aMsg.is_reply(), aMsg.type());

It would be better to avoid the temporary if you can... I don't know if the compiler can tell optimize this.

@@ +1007,5 @@
> +    } else {
> +        mMessageTraceData[mMessageTraceIndex] = msgInfo;
> +    }
> +
> +    mMessageTraceIndex = (mMessageTraceIndex + 1) % mMessageTraceData.Capacity();

Wouldn't it make more sense to only do this in the 'else' block above?

@@ +1015,5 @@
> +AsyncChannel::ForgetMessageTracingData(nsTArray<MessageInfo>& aData)
> +{
> +    MutexAutoLock lock(mMessageTraceMutex);
> +    MOZ_ASSERT(mMessageTraceEnabled);
> +    if (!mMessageTraceEnabled) {

Don't leave a runtime check here if this can actually be asserted.

@@ +1019,5 @@
> +    if (!mMessageTraceEnabled) {
> +        return;
> +    }
> +
> +    aData.SwapElements(mMessageTraceData);

You should assert that aData is empty before locking.

::: ipc/glue/AsyncChannel.h
@@ +20,5 @@
>  //-----------------------------------------------------------------------------
>  
> +#if defined(_MSC_VER)
> +#define MOZ_TRACE_IPC_MESSAGE(channel, message, probetype) \
> +   channel->TraceMessage(message, probetype, __FUNCSIG__)

I don't think these macros really do much but make the code harder to understand. Why not just use a better function name at the call sites (since there are only like 6) and avoid the macros altogether?

@@ +213,5 @@
>          virtual void SendClose() MOZ_OVERRIDE;
>      };
>  
> +    void SetTraceChannelId(int32_t aChannelId);
> +    int32_t GetTraceChannelId() const { return mTraceChannelId; }

This isn't protected by the mutex... But why should it be? I think maybe your comment below about what the mutex protects is wrong.

@@ +218,5 @@
> +
> +    void StartMessageTracing(uint32_t aBufferSize);
> +    void StopMessageTracing();
> +
> +    void TraceMessage(const IPC::Message& aMsg, mozilla::dom::ProbeType aType, const char* const aFunc);

Nit: please wrap to 80 chars.

@@ +298,5 @@
>      // id() of mWorkerLoop.  This persists even after mWorkerLoop is cleared
>      // during channel shutdown.
>      int mWorkerLoopID;
> +
> +    // the following message trace members are all protected by this mutex

Nit: capitalize "the" and end with period.

@@ +300,5 @@
>      int mWorkerLoopID;
> +
> +    // the following message trace members are all protected by this mutex
> +    Mutex mMessageTraceMutex;
> +    bool mMessageTraceEnabled; // do we trace IPC message routing?

Nit: This comment doesn't really add much beyond what's conveyed by the name of the member.
Comment 29 Jason Duell [:jduell] (needinfo? me) 2013-08-07 17:52:12 PDT
> given that networking implementations are likely to be the heaviest usage of IPC/IPDL.

necko will probably be heaviest IPDL user in terms of total amount of bytes, but there may be other parts of the stack that actually send more messages, FWIW.

> a performance testing scenario that mirrors real-world usage of the IPC/IPDL infrastructure

Sadly we're short on automated testing scenarios that are actually realistic for networking. We've got Stone Ridge, but it doesn't work with B2G:

  https://wiki.mozilla.org/Necko/Performance/AutomatedTesting
  https://todesschaf.org/posts/2013/04/15/introducing-stone-ridge.html

I'd say you're best off manually running apps with "interesting" network activity: browsing complex pages (try gmail?), email, video, etc.

To get a sense of the perf wins that are possible see bug 561694.  I'm not aware of any other messaging patterns that are such low-hanging fruit--but that's why we need profiling :)
Comment 30 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-08-21 14:37:57 PDT
It took me a little to follow-up here - following are a few comments/questions, i will have to look into the rest later after i re-read & remembered the patch.

(In reply to ben turner [:bent] (needinfo? encouraged) from comment #28)
> Comment on attachment 778039 [details] [diff] [review]
> Part 1: add IPC message tracing, v5
> 
> Review of attachment 778039 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I would be interested to see what
> happens if the child process is set to continually send an async no-op
> message to the parent while the sync and rpc tests are running with and
> without your patch. That would hopefully test my contention hypothesis.

You just mean something simple like sending a no-op message, then post a task to repeat this?

> Manual JSON is scary, but necessary (in our current world) I guess. Are
> there tests somewhere that ensure you got this string right?

I'll add one.

> Template things like this generally cause bustage on one platform or
> another... 

Will re-check.

> @@ +70,5 @@
> > +    MOZ_ASSERT(mActor);
> > +  }
> > +  virtual ~MessageTracingObserver() {}
> > +
> > +  NS_IMETHOD Observe(nsISupports* aSubject, const char* aTopic,
> 
> Virtual methods won't be inlined, just move this to the cpp and explicitly
> instantiate the template there. Then you can lose all the extern strings and
> such.
> 
> Though, maybe you just want to stick with virtual methods here and lose the
> templates altogether? The template isn't really buying you anything here.

It avoids writing the observer code twice per actor type. The alternative, hanging the required actor calls of a common interface, doesn't seem to work out without major headaches here.

> @@ +976,5 @@
> > +
> > +void
> > +AsyncChannel::TraceMessage(const IPC::Message& aMsg, mozilla::dom::ProbeType aProbeType, const char* const aFunc)
> > +{
> > +    // NOTE: we may or may not hold the monitor - we can't rely on this!
> 
> What's this all about? You're always entering the mutex below...

You wanted a scary comment before we enter it, no?

> @@ +985,5 @@
> > +           ("(msgtrace) channel=%ld, child=%s, type=%s, probeno=%ld, probe=%s, reply=%s, stamp=%llu, type=%lu, name=%s, func=%s\n",
> > +             mTraceChannelId,
> > +             mChild ? "true" : "false",
> > +             GetMessageTypeString(aMsg),
> > +             static_cast<int32_t>(aProbeType),
> 
> Is this useful when you already have the string representation below?

It's easier to process/sort/group the log output with the with it being numeric.

> >  
> > +#if defined(_MSC_VER)
> > +#define MOZ_TRACE_IPC_MESSAGE(channel, message, probetype) \
> > +   channel->TraceMessage(message, probetype, __FUNCSIG__)
> 
> I don't think these macros really do much but make the code harder to
> understand. Why not just use a better function name at the call sites (since
> there are only like 6) and avoid the macros altogether?

Can do - i mainly wanted to avoid having to sprinkle things like __FUNCSIG__ repeatedly and thought only having to changes those in one place is nicer and less error-prone.

> 
> @@ +213,5 @@
> >          virtual void SendClose() MOZ_OVERRIDE;
> >      };
> >  
> > +    void SetTraceChannelId(int32_t aChannelId);
> > +    int32_t GetTraceChannelId() const { return mTraceChannelId; }
> 
> This isn't protected by the mutex... But why should it be? I think maybe
> your comment below about what the mutex protects is wrong.

Yes, i missed moving it out of that section.
Comment 31 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-08-26 08:03:23 PDT
(In reply to Georg Fritzsche [:gfritzsche] from comment #30)
> > Manual JSON is scary, but necessary (in our current world) I guess. Are
> > there tests somewhere that ensure you got this string right?
> 
> I'll add one.

Actually part 2 above, containing the test, does sanity-test the behavior and the JSON.
Comment 32 Ben Turner (not reading bugmail, use the needinfo flag!) 2013-08-26 09:23:54 PDT
(In reply to Georg Fritzsche [:gfritzsche] from comment #30)
> You just mean something simple like sending a no-op message, then post a
> task to repeat this?

Yep, in the child->parent direction, just to make sure that the IPC machinery is being used as much as possible.

> It avoids writing the observer code twice per actor type. The alternative,
> hanging the required actor calls of a common interface, doesn't seem to work
> out without major headaches here.

Ok.

> You wanted a scary comment before we enter it, no?

Well, the issue is that mMessageTraceMutex is grabbed in AsyncChannel::TraceMessage, and we don't always know if mChan->mMonitor is already held (AsyncChannel::ProcessLink::SendMessage, for instance, grabs the monitor while most other callers don't). That should be spelled out a little more clearly here. (I interpreted your comment to mean that we may or may not be grabbing mMessageTraceMutex recursively, which would deadlock.)

> It's easier to process/sort/group the log output with the with it being
> numeric.

Sold!

> Can do - i mainly wanted to avoid having to sprinkle things like __FUNCSIG__
> repeatedly and thought only having to changes those in one place is nicer
> and less error-prone.

Having a #define for just __FUNCSIG__/__PRETTY_FUNCTION__/__FUNCTION__ would be a good solution I think. Then the call to TraceMessage wouldn't be hidden and you could still get future-proof function names.

We actually already have this, http://mxr.mozilla.org/mozilla-central/source/dom/plugins/ipc/PluginMessageUtils.h#69 (FULLFUNCTION). Can you just move that somewhere to make it usable here as well?
Comment 33 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-08-26 09:50:59 PDT
> Having a #define for just __FUNCSIG__/__PRETTY_FUNCTION__/__FUNCTION__ would
> be a good solution I think. Then the call to TraceMessage wouldn't be hidden
> and you could still get future-proof function names.
> 
> We actually already have this,
> http://mxr.mozilla.org/mozilla-central/source/dom/plugins/ipc/
> PluginMessageUtils.h#69 (FULLFUNCTION). Can you just move that somewhere to
> make it usable here as well?

Waldo, could we possibly move this as, say, MOZ_FULLFUNCTION to MFBT?
Comment 34 Jeff Walden [:Waldo] (remove +bmo to email) 2013-08-30 13:41:00 PDT
Seems a reasonable addition.  mfbt/FunctionName.h or PrettyFunction.h or something seems fine by me (don't want it in Util.h, we're trying to remove that).
Comment 35 Jeff Walden [:Waldo] (remove +bmo to email) 2013-08-30 13:50:05 PDT
As far as name goes, maybe MOZ_PRETTY_FUNCTION?  It vaguely feels to me like "pretty function" is a phrase people are most likely to have seen before, where "full" isn't part of any of the existing macro names as provided by compilers.  But I may well be projecting here.  ;-)

Too bad __func__ isn't compatible yet.
Comment 36 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-09-03 09:06:36 PDT
Created attachment 798896 [details] [diff] [review]
Add mfbt/FunctionName.h

MOZ_PRETTY_FUNCTION sounds better & more recognizable to me as well.
I went with FunctionName.h as maybe something other than current single macro could be added.

> Too bad __func__ isn't compatible yet.

Per C++11 lax specification ("where function-name is an implementation-defined string") i wouldn't rely on it becoming useful, it doesn't even require or imply having the signature :(
Comment 37 Jeff Walden [:Waldo] (remove +bmo to email) 2013-09-03 14:28:24 PDT
Comment on attachment 798896 [details] [diff] [review]
Add mfbt/FunctionName.h

Review of attachment 798896 [details] [diff] [review]:
-----------------------------------------------------------------

::: mfbt/FunctionName.h
@@ +1,5 @@
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +/* Provides macros for function names. */

Maybe a little more precision?

"Provides a macro expanding to a human-readable name for the enclosing function, as a string."

@@ +6,5 @@
> +
> +#ifndef mozilla_FunctionName_h
> +#define mozilla_FunctionName_h
> +
> +

Remove one of thes blank lines.
Comment 38 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-09-10 11:20:37 PDT
I'm still chasing down a race in the new patch, but i did run the modified test. I added the numbers to tab 2:
https://docs.google.com/spreadsheet/ccc?key=0AhKmHVpmAzfcdDNCTUhxZTRmb3ppMjBiUHEzRXduOVE&usp=sharing

This is with continuously spamming async messages. The increase for the async test part roughly doubles in comparison to the previous run, so that looks about right.
Not putting weight on the slightly weird change for nested RPC, non-nested RPC and sync increased but not majorly.
I don't see major problems with those numbers - bent, what do you think?
Comment 39 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-09-10 11:24:22 PDT
Created attachment 802494 [details] [diff] [review]
Perf test v2, child spams async messages
Comment 40 Ben Turner (not reading bugmail, use the needinfo flag!) 2013-09-11 01:00:10 PDT
Hi Georg, I've talked with three people now who all agree that these numbers are too high to accept in all builds. So far three ideas have surfaced:

1. Use thread-local message ring buffers. Since we're storing timestamps already we could sort everything from all the buffers in a post-processing step or when we call CollectData(). This should be significantly faster than the single buffer with mutex that we have here.

2. Hide all this code behind the --enable-profiling build flag or a runtime preference (requiring a restart) so that default builds are unaffected. I guess the choice for pref or build flag depends on what kind of builds we're hoping to collect information for. Are you hoping to be able to tell regular users to collect these reports and send them in? Or are you just wanting to let nightly users or ff devs access this?
Comment 41 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-09-11 07:42:26 PDT
Benjamin, are there any scenarios where (In reply to ben turner [:bent] (needinfo? encouraged) from comment #40)
> 2. Hide all this code behind the --enable-profiling build flag or a runtime
> preference (requiring a restart) so that default builds are unaffected. I
> guess the choice for pref or build flag depends on what kind of builds we're
> hoping to collect information for. Are you hoping to be able to tell regular
> users to collect these reports and send them in? Or are you just wanting to
> let nightly users or ff devs access this?

Benjamin, do any of the intended usage scenarios actually require this to be always built or enabled?
Comment 42 Benoit Girard (:BenWa) 2013-09-11 07:52:11 PDT
(In reply to Georg Fritzsche [:gfritzsche] from comment #38)
> I'm still chasing down a race in the new patch, but i did run the modified
> test. I added the numbers to tab 2:
> https://docs.google.com/spreadsheet/
> ccc?key=0AhKmHVpmAzfcdDNCTUhxZTRmb3ppMjBiUHEzRXduOVE&usp=sharing
> 

Are these numbers with the feature enabled or disabled? We should worry about the performance/codesize regression of always building this while it's turned off. For the performance regression this should just be a few branch steps. Why is the async case so heavily affected?
Comment 43 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-09-11 08:16:04 PDT
(In reply to Benoit Girard (:BenWa) from comment #42)
> Are these numbers with the feature enabled or disabled? We should worry
> about the performance/codesize regression of always building this while it's
> turned off. 
The numbers are with the feature disabled.

> For the performance regression this should just be a few branch
> steps. Why is the async case so heavily affected?

The async code paths are simpler/shorter and the async messages can be sent off non-blockingly. So this *should* just be the same overhead but magnified - hitting the mutex from two threads in AsyncChannel::TraceMessage().
Comment 44 Benjamin Smedberg [:bsmedberg] 2013-09-11 08:32:32 PDT
It is perfectly sufficient to have this only in --enable-profiling builds, which includes nightly. If we want these sorts of profiles from end users, we can just ask them to download nightly and collect the profile.

I don't understand why we'd need to pay locking overhead if profiling is disabled.
Comment 45 Benoit Girard (:BenWa) 2013-09-12 17:20:31 PDT
In general we want to avoid enable-profiling leading to #ifdef. If we can get this down to a few branches we can afford that and can build it everywhere.
Comment 46 Ben Turner (not reading bugmail, use the needinfo flag!) 2013-10-15 10:53:12 PDT
I also just learned about bug 853864. Maybe there is some work that can be shared?
Comment 47 Ben Turner (not reading bugmail, use the needinfo flag!) 2013-10-15 11:00:24 PDT
Oops. Bug 908995.
Comment 48 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-12-04 08:18:20 PST
Updated numbers using suggested cheaper branching on Atomic<bool> for checking whether profiling is enabled on the tabs "3rd, atomics" and "3rd, atomics, no log" (without the non-critical PR_LOG):
https://docs.google.com/spreadsheet/ccc?key=0AhKmHVpmAzfcdDNCTUhxZTRmb3ppMjBiUHEzRXduOVE&usp=sharing

At least to me, the numbers for both look acceptable (considering that this is just measuring pure delivery overhead and no actual work is being done). Bent, what do you think?

Note that due to ipdl changes the test needed to change (e.g. no more RPC nested), putting it up in a bit.
Comment 49 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-12-04 08:26:09 PST
Created attachment 8342439 [details] [diff] [review]
Perf test v3, updated to ipdl changes
Comment 50 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-12-04 08:33:01 PST
Created attachment 8342443 [details] [diff] [review]
Add IPC message tracing, v6
Comment 51 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-12-04 08:39:01 PST
(In reply to ben turner [:bent] (use the needinfo? flag!) (gone until dec. 6!) from comment #46)
> I also just learned about bug 908995. Maybe there is some work that can be
> shared?

Unless i'm missing something, i don't see a real overlap here (tracing tasks vs. tracing stages of IPC message delivery).
Comment 52 Ben Turner (not reading bugmail, use the needinfo flag!) 2013-12-16 09:31:19 PST
(In reply to Georg Fritzsche [:gfritzsche] from comment #48)
> Updated numbers

Hm, I see two sets of numbers, but I was expecting three. Are the "with patch applied" numbers from builds where the logging is enabled or disabled?
Comment 53 Benoit Girard (:BenWa) 2013-12-16 09:47:29 PST
We have an intern starting in January to work on the profiler front-end (and maybe some back-end pieces too). It would be nice if this could get wrap up around that time since we have the front-end piece for this has a possible project (However I can't promise that).
Comment 54 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-12-16 09:50:40 PST
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #52)
> Hm, I see two sets of numbers, but I was expecting three. Are the "with
> patch applied" numbers from builds where the logging is enabled or disabled?

Those numbers are with tracing disabled. To be sure it doesn't matter much, i have numbers with the PR_LOG() removed separately in the last tab.
(Different days and apparently base performance, hence new numbers for both with and without the patch for that)
Comment 55 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2013-12-20 09:12:52 PST
We decided that the numbers with atomics are too high - now checking controlling it via a bool pref & requiring restart.
I now have mostly stable numbers for that, they are in sheet 4 here:
https://docs.google.com/spreadsheet/ccc?key=0AhKmHVpmAzfcdDNCTUhxZTRmb3ppMjBiUHEzRXduOVE&usp=sharing

Ben, is this acceptable now?
(This is still with the async message flooding on and all message handlers being effectively no-ops)
Comment 56 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2014-01-10 14:30:45 PST
Created attachment 8358636 [details] [diff] [review]
WIP: enable tracing via pref, requiring restart

Just a quick WIP i used to get the numbers above, not completely updated.
Comment 57 Ben Turner (not reading bugmail, use the needinfo flag!) 2014-01-13 12:41:08 PST
I'm still surprised that this is taking so much longer... The patch doesn't show anything obviously wrong, except maybe it isn't inlining correctly... Can you try a before/after profile to see what is taking longer here?
Comment 58 Benoit Girard (:BenWa) 2014-01-13 12:44:44 PST
I discussed this with gfritzche. I think if we restructure the code we can allow the compiler to inline only the |if (enabled) { call(...); }| in the IPC code.
Comment 59 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2014-01-14 10:56:09 PST
Yeah, i've shared this with bent on friday and ran it this week. Updated with sheet 5 [1] - i see an improvement but still a relatively consistent impact.
I can try to see if this is something that would show up in sampling profilers.

[1] https://docs.google.com/spreadsheet/ccc?key=0AhKmHVpmAzfcdDNCTUhxZTRmb3ppMjBiUHEzRXduOVE&usp=sharing
Comment 60 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2014-01-14 10:58:16 PST
(In reply to Georg Fritzsche [:gfritzsche] from comment #59)
> I can try to see if this is something that would show up in sampling
> profilers.

... although with the timing fluctuations this tends have it might be hard to gain insights - i'd happy for alternative suggestions.
Comment 61 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2014-01-14 11:06:25 PST
Created attachment 8359924 [details] [diff] [review]
WIP: enable tracing via pref, requiring restart, v2

Updated WIP patch.
Comment 62 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2014-02-19 08:48:00 PST
bent, what do you think of the numbers in the last two sheets:
https://docs.google.com/spreadsheet/ccc?key=0AhKmHVpmAzfcdDNCTUhxZTRmb3ppMjBiUHEzRXduOVE&usp=sharing

Two pretty different machines, not much of a measurable impact.
Comment 63 Georg Fritzsche [:gfritzsche] [away Jun 24 - Jul 3] 2014-02-19 09:23:26 PST
Per jeff & BenWa this is still not just a well predicted branch, re-checking.

Note You need to log in before you can comment on or make changes to this bug.