Closed Bug 1333489 Opened 3 years ago Closed 3 years ago

Record sync IPC message timing in telemetry

Categories

(Core :: IPC, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox53 --- fixed
firefox54 --- fixed

People

(Reporter: billm, Assigned: cpearce)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

We should have a histogram keyed on the message name for how long we wait for a sync IPC message.
Andrew, can we please find an owner for this?  Thanks!
Flags: needinfo?(overholt)
I volunteer!
Assignee: nobody → cpearce
Flags: needinfo?(overholt)
Just FYI, you should be able to insert instrumentation in the MessageChannel::Send overload that has two arguments. It'll probably look something like this:
http://searchfox.org/mozilla-central/source/ipc/glue/MessageChannel.cpp#1076
Comment on attachment 8831959 [details]
Bug 1333489 - Record time spent waiting on sync IPC.

https://reviewboard.mozilla.org/r/108426/#review109672

It seems like you had a copy/paste accident when you wrote the commit message :)
Attachment #8831959 - Flags: review?(francois)
What are the possible values for the message string (i.e. the key)?
Flags: needinfo?(cpearce)
The names of each IPC message.  See the IPC_MESSAGE_SIZE probe for an example.
(In reply to François Marier [:francois] from comment #6)
> What are the possible values for the message string (i.e. the key)?

Stringification of function names corresponding to IPC messages. Some examples I just pulled out of the MOZ_LOG logging this patch adds:

PAPZCTreeManager::Msg_ReceiveMouseInputEvent
PAPZCTreeManager::Msg_TransformEventRefPoint
PBrowser::Msg_NotifyIMEFocus
PBrowser::Msg_SyncMessage
PCompositorBridge::Msg_FlushRendering
PCompositorBridge::Msg_PLayerTransactionConstructor
PCompositorBridge::Msg_WillClose
PContent::Msg_ClipboardHasType
PContent::Msg_GetGfxInfoFeatureStatus
PContent::Msg_GetGfxVars
PContent::Msg_GetGraphicsDeviceInitData
PContent::Msg_GetProcessAttributes
PContent::Msg_GetXPCOMProcessAttributes
PContent::Msg_PCrashReporterConstructor
PContent::Msg_PScreenManagerConstructor
PContent::Msg_ReadPermissions
PContent::Msg_ReadPrefsArray
PContent::Msg_SyncMessage
PGMPService::Msg_GetGMPNodeId
PGMPService::Msg_LaunchGMP
PGPU::Msg_AddLayerTreeIdMapping
PImageBridge::Msg_WillClose
PLayerTransaction::Msg_Update
PLayerTransaction::Msg_Update
PLayerTransaction::Msg_Update
PLayerTransaction::Msg_Update
PLayerTransaction::Msg_Update
PScreenManager::Msg_GetPrimaryScreen
Flags: needinfo?(cpearce)
Comment on attachment 8831959 [details]
Bug 1333489 - Record time spent waiting on sync IPC.

https://reviewboard.mozilla.org/r/108426/#review109702

::: ipc/glue/MessageChannel.cpp:133
(Diff revision 2)
>  namespace mozilla {
>  namespace ipc {
>  
>  static const uint32_t kMinTelemetryMessageSize = 8192;
>  
> +static const uint32_t kMinTelemetrySyncIPCLatency = 1;

I guess the actual cutoff is 500us due to the rounding. Might want to mention that somewhere in a comment.

::: ipc/glue/MessageChannel.cpp:1280
(Diff revision 2)
>                                nsDependentCString(msgName), aReply->size());
>      }
> +
> +    uint32_t ms = round((TimeStamp::Now() - start).ToMilliseconds());
> +    if (ms >= kMinTelemetrySyncIPCLatency) {
> +      IPC_LOG("Sent Sync IPC '%s', seqno=%d, xid=%d, in %ums", msgName,

Did you add this for a reason? Aside from the timing, it's redundant with other logging we do. If you want to keep it, I'd rather consolidate with this logging:
http://searchfox.org/mozilla-central/rev/4e0c5c460318fb9ef7d92b129ac095ce04bc4795/ipc/glue/MessageChannel.cpp#1256

::: toolkit/components/telemetry/Histograms.json:10247
(Diff revision 2)
> +    "alert_emails": ["cpearce@mozilla.com"],
> +    "bug_numbers": [1333489],
> +    "expires_in_version": "60",
> +    "kind": "exponential",
> +    "high": 60000,
> +    "n_buckets": 50,

It might be nice to have fewer buckets (more like 10) to keep the ping size lower. This one measurement is going to trigger a lot of histograms (one per message). You could compensate for the loss of resolution by lowering the "high" end to something like 2 seconds. We'll still have an overflow bucket that counts the excess.

::: toolkit/components/telemetry/Histograms.json:10249
(Diff revision 2)
> +    "expires_in_version": "60",
> +    "kind": "exponential",
> +    "high": 60000,
> +    "n_buckets": 50,
> +    "keyed": true,
> +    "description": "Measures the number of milliseconds we spend waiting for sync IPC messages to finish sending, keyed by message name."

Please mention that we omit messages where the response was received in < 500us. This is important because it means we can't use this histogram to count the total number of sync messages sent.
Attachment #8831959 - Flags: review?(wmccloskey) → review+
Comment on attachment 8831959 [details]
Bug 1333489 - Record time spent waiting on sync IPC.

https://reviewboard.mozilla.org/r/108426/#review109702

> Did you add this for a reason? Aside from the timing, it's redundant with other logging we do. If you want to keep it, I'd rather consolidate with this logging:
> http://searchfox.org/mozilla-central/rev/4e0c5c460318fb9ef7d92b129ac095ce04bc4795/ipc/glue/MessageChannel.cpp#1256

I'll consolidate it with the existing logging.
Comment on attachment 8831959 [details]
Bug 1333489 - Record time spent waiting on sync IPC.

https://reviewboard.mozilla.org/r/108426/#review109778

(In reply to Chris Pearce (:cpearce) from comment #8)
> (In reply to François Marier [:francois] from comment #6)
> > What are the possible values for the message string (i.e. the key)?
> 
> Stringification of function names corresponding to IPC messages. Some
> examples I just pulled out of the MOZ_LOG logging this patch adds:

Ok good. I wanted to make sure it would not include any parameters since they could be derived from user input.

datareview+
Attachment #8831959 - Flags: review?(francois) → review+
Pushed by cpearce@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0ec2488aebbd
Record time spent waiting on sync IPC. r=billm,francois
https://hg.mozilla.org/mozilla-central/rev/0ec2488aebbd
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Depends on: 1337073
Duplicate of this bug: 1296160
Chris, I think we should backport this to beta to start getting telemetry data on sync IPCs more quickly.  Do you have time to prepare for an uplift?  (Michael is also in the process of changing this.... Maybe we should do both at the same time?)
Flags: needinfo?(michael)
Flags: needinfo?(cpearce)
Doing both at the same time makes sense to me, given that we're renaming the probe. Can Michael do the uplift here and in bug 1337073?
Flags: needinfo?(cpearce)
Comment on attachment 8831959 [details]
Bug 1333489 - Record time spent waiting on sync IPC.

Approval Request Comment
[Feature/Bug causing the regression]: N/a
[User impact if declined]: We would collect less important telemetry on sync IPC delays.
[Is this code covered by automated tests?]: No
[Has the fix been verified in Nightly?]: Yes
[Needs manual test from QE? If yes, steps to reproduce]: No 
[List of other uplifts needed for the feature/fix]: We would uplift this alongside bug 1337073 and bug 1343729 to get better data from the probe in beta. 
[Is the change risky?]: No
[Why is the change risky/not risky?]: Only adds a non-invasive telemetry probe to sync IPC code.
[String changes made/needed]: None
Flags: needinfo?(michael)
Attachment #8831959 - Flags: approval-mozilla-beta?
Comment on attachment 8831959 [details]
Bug 1333489 - Record time spent waiting on sync IPC.

This patch should land first, then the one from bug 1343729, then bug 1337073. 
Tweaking sync telemetry for better diagnostic info.
Attachment #8831959 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Setting qe-verify- based on Michael's assessment on manual testing needs (see Comment 19).
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.