Closed Bug 1344216 Opened 7 years ago Closed 7 years ago

[meta] Stop doing sync IPC for WebVR

Categories

(Core :: WebVR, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
Performance Impact ?

People

(Reporter: ehsan.akhgari, Assigned: kip)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: meta, Whiteboard: [gfx-noted])

Our WebVR implementation relies on a number of sync IPC messages on PVRManager.  At least three of these (GetSensorState, SetHaveEventListener and GetDisplays) can be pretty slow, and as far as I can tell, these sync IPC messages are directly called from APIs that we expose to the web page, so the resulting experience is that the page calls the APIs, and we block the entire content process before the parent process gets back to us.  If the page ends up calling one of these APIs several times per frame, the result can be disastrous.

Here is the telemetry data we have so far for these three messages that are showing up as being particularly slow:

<https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2017-02-28&keys=PVRManager%253A%253AMsg_GetSensorState!PVRManager%253A%253AMsg_SetHaveEventListener!PVRManager%253A%253AMsg_GetDisplays!PAPZCTreeManager%253A%253AMsg_ReceiveMouseInputEvent&max_channel_version=nightly%252F54&measure=IPC_SYNC_LATENCY_MS&min_channel_version=null&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2017-02-02&table=0&trim=1&use_submission_date=0>

GetSensorState:
Start	End	IPC_SYNC_LATENCY_MS Count
0	1	0 (0%)
1	3	69.82M (20.41%)
3	8	76.63M (22.4%)
8	20	180.95M (52.89%)
20	50	13.93M (4.07%)
50	126	778.13k (0.23%)
126	317	31.32k (0.01%)
317	796	1.41k (0%)
796	2k	250 (0%)
2k	Infinity	48 (0%)

SetHaveEventListener:
Start	End	IPC_SYNC_LATENCY_MS Count
0	1	0 (0%)
1	3	118.09k (55.09%)
3	8	43.04k (20.08%)
8	20	29.82k (13.91%)
20	50	15.93k (7.43%)
50	126	5.62k (2.62%)
126	317	1.36k (0.63%)
317	796	325 (0.15%)
796	2k	101 (0.05%)
2k	Infinity	68 (0.03%)

GetDisplays:
Start	End	IPC_SYNC_LATENCY_MS Count
0	1	0 (0%)
1	3	7.68k (53.52%)
3	8	2.64k (18.41%)
8	20	2.57k (17.95%)
20	50	1.01k (7.03%)
50	126	330 (2.3%)
126	317	78 (0.54%)
317	796	23 (0.16%)
796	2k	4 (0.03%)
2k	Infinity	8 (0.06%)

Note that due to the fact that this API isn't very popular yet, I'm extremely worried that we may not have a good amount of data here.  But it seems like we should probably study our performance story very carefully before considering shipping.
I am working on a patch now that will ensure that all WebVR IPC that happens outside of WebVR presentation will be async..

I'd like to explore alternative non-blocking methods of sharing sensor state from the GPU process to all of the content processes to replace the current broadcast of IPC messages.  One technique could be inspired by the non-locking memory mapped data structures that are used by Chromium's gamepad api.
Whiteboard: [gfx-noted]
(In reply to :Ehsan Akhgari from comment #0)
> 
> GetSensorState:
> Start	End	IPC_SYNC_LATENCY_MS Count
> 0	1	0 (0%)
> 1	3	69.82M (20.41%)
> 3	8	76.63M (22.4%)
> 8	20	180.95M (52.89%)
> 20	50	13.93M (4.07%)
> 50	126	778.13k (0.23%)
> 126	317	31.32k (0.01%)
> 317	796	1.41k (0%)
> 796	2k	250 (0%)
> 2k	Infinity	48 (0%)
> 

These metrics do not match up with the hardware supported latency measurements that I am seeing in tests here.  The vast majority of the calls to GetSensorState should be returning in a fraction of a millisecond.  I noticed that our telemetry code is not reporting anything under 1ms due to kMinTelemetrySyncIPCLatencyMs:

https://dxr.mozilla.org/mozilla-central/source/ipc/glue/MessageChannel.cpp#1282

I suspect that these figures are representative of the IPC queue length for frames that endured latency due to other non-WebVR IPC messages, rather than following the impact of the GetSensorState call.
Please feel free to remove "sync GetControllers()" at PVRManager.ipdl. I have checked there is no one using it.
(In reply to :kip (Kearwood Gilbert) from comment #2)
> (In reply to :Ehsan Akhgari from comment #0)
> > 
> > GetSensorState:
> > Start	End	IPC_SYNC_LATENCY_MS Count
> > 0	1	0 (0%)
> > 1	3	69.82M (20.41%)
> > 3	8	76.63M (22.4%)
> > 8	20	180.95M (52.89%)
> > 20	50	13.93M (4.07%)
> > 50	126	778.13k (0.23%)
> > 126	317	31.32k (0.01%)
> > 317	796	1.41k (0%)
> > 796	2k	250 (0%)
> > 2k	Infinity	48 (0%)
> > 
> 
> These metrics do not match up with the hardware supported latency
> measurements that I am seeing in tests here.  The vast majority of the calls
> to GetSensorState should be returning in a fraction of a millisecond.  I
> noticed that our telemetry code is not reporting anything under 1ms due to
> kMinTelemetrySyncIPCLatencyMs:
> 
> https://dxr.mozilla.org/mozilla-central/source/ipc/glue/MessageChannel.
> cpp#1282
> 
> I suspect that these figures are representative of the IPC queue length for
> frames that endured latency due to other non-WebVR IPC messages, rather than
> following the impact of the GetSensorState call.

Yes, that is what you sign up for to deal with when you use sync IPC, and precisely why you shouldn't be doing that.  ;-)
(To be more clear, the reason we are ignoring sync IPCs less than 1ms is that like all performance issues, when debugging perf problems we don't measure the good performance scenarios, but the bad ones.  Very similarly to sync IO, for example, even though under tight testing conditions sync IPC may finish very fast, on users machines in the wild that may not end up being the case, which is why local measurements and best case performance numbers for things like this shouldn't be trusted.)
Depends on: 1345497
Depends on: 1345564
Depends on: 1346923
Depends on: 1346926
Once the dependencies tracked in this bug have landed, WebVR will no longer be doing any Sync IPC.
Summary: Stop doing sync IPC for WebVR → [meta] Stop doing sync IPC for WebVR
Whiteboard: [gfx-noted] → [gfx-noted][qf:meta]
Depends on: 1346927
The final sync IPC message, which only occurs during VR presentation, will be replaced with the lock free shared memory implementation in Bug 1346927.

I'll be doing this work, so have taken this bug.
Assignee: nobody → kgilbert
No longer blocks: webvr_1.1
Component: Graphics → WebVR
This has been fixed with the refactoring in Bug 1362213
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Performance Impact: --- → ?
Keywords: meta
Whiteboard: [gfx-noted][qf:meta] → [gfx-noted]
You need to log in before you can comment on or make changes to this bug.