Closed Bug 1968122 Opened 3 months ago Closed 2 months ago

WebGPU makes too many IPDL calls

Categories

(Core :: Graphics: WebGPU, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
142 Branch
Tracking Status
firefox142 --- fixed

People

(Reporter: jimb, Assigned: teoxoy)

References

(Blocks 4 open bugs, Regressed 1 open bug)

Details

Attachments

(29 files, 1 obsolete file)

48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review

Firefox's WebGPU bindings spend a quarter of their time in the kernel, simply sending IPDL messages.

This profile view shows a 283ms requestAnimationFrame call, with about a quarter of that time spent in IPDL-generated code. In the content process, if we focus on the frame JS function, 27% of samples have IPC::Channel::ChannelImpl::ProcessOutgoingMessages near the top of their stack.

Our WebGPU implementation should buffer messages and send them in large blocks, flushing the buffer at calls to GPUQueue methods.

As a less ambitious optimization, we could make command buffer recording entirely local to the content process, rather than sending a separate IPDL CommandEncoderAction for each call to a GPUCommandEncoder method. Among the above call to frame's 4,387 samples, 1083 are in GPUCommandEncoder methods, and 691 of those are in PWebGPU methods. I think this means that local command buffer recording would save about 16% of frame's time.

(I first heard this idea from Nical. But WebGL also does its own serialization, buffering, and flushing.)

See Also: → 1968102
No longer blocks: 1937100

Bug 1969364 shows that it is not sufficient to flush buffered calls on operations like submit and mapAsync. We must place a bound on how long calls stay buffered in the content process --- ideally, a deterministic bound. It seems like putting flushing jobs on the microtask queue would work.

See Also: → 1969364

When I was thinking about the buffering/flushing strategy I was concerned that if we flush only on submit + the other async methods, we would increase the latency of some future work in situations where the application tries to create resources in advance.

Using a microtask to flush buffered messages sounds like the best way to ensure that we don't wait too long to flush since as far as I understand the microtask would run after let's say the requestAnimationFrame callback right?

Jamie mentioned that external texture expiry will also use a microtask, we should ensure that the expiry microtask runs after our flushing microtask.

I've looked at this a bit from the IPC side, and some of this overhead is potentially avoidable.

A good chunk of the time in the profile (though not the bulk of it - 5.1% in the sample shown above) is spent in NtQueryVolumeInformationFile which is being run from the WinIOAutoObservation constructor to determine the file type (https://searchfox.org/mozilla-central/rev/3b58bde321bf705a91c32c7a6574a3427ce81d3f/xpcom/build/PoisonIOInterposerWin.cpp#133). If we were able to avoid interposing these particular calls to WriteFile withthe IOInterposer, it would be possible to avoid this part of the overhead. In addition, if the interposer avoided calling GetFileType when it is suppressed (as we do for ipc_channel_win.cc here: https://searchfox.org/mozilla-central/rev/3b58bde321bf705a91c32c7a6574a3427ce81d3f/ipc/chromium/src/chrome/common/ipc_channel_win.cc#463,466), we could also avoid the unnecessary call.

The actual call to WriteFile though is somewhat tricky to reduce the overhead of. Before bug 1792474, we actually would always do the WriteFile call on the IPC I/O thread, but we moved the overlapped IO call to the main thread to reduce latency for IPC messages. The assumption was that due to an OVERLAPPED being specified, the write would largely be happening async, and would therefore have minimal overhead on the sending thread, instead just avoiding some thread hops.

Not knowing whether this operation is slow because we're doing a lot of individual system calls vs. because the actual OVERLAPPED WriteFile calls are expensive does make this a bit tricky unfortunately. Would be nice if we could see what Windows is doing under the hood which is taking time within NtWriteFile.

Assignee: nobody → ttanasoaia
Status: NEW → ASSIGNED
Attachment #9495135 - Attachment is obsolete: true

I've run all the demos/apps we are tracking for release (Bug 1956044) with the patch stack applied and I didn't find any new unexpected behavior.

Jobs part of recent CTS runs have been timing out, this is probably due to recent passes being promoted out of backlog.

I had to demote 2 tests because they are testing promise ordering that
is not guaranteed by the specification. I will open an issue on the
spec repo to clarify the situation.

The new timeouts are actually existing timeouts that were not surfaced
by the harness. I've looked at a recent CTS run without the patch stack
and the tests time out but the harness reports TEST-OK. Locally,
the tests also run at the same speed with and without the patch stack.

I'm not sure what change in the patch stack caused the harness to now
report the timeout but I think this is now working as intended.

We were previously always creating and returning async pipelines even
though there was a validation error, the tests were running but when
reading back the results all they got were 0s, we now reject the
promise which changes the test outcome from FAIL to ERROR.

Pushed by ttanasoaia@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/34fda53f5396 https://hg.mozilla.org/integration/autoland/rev/7bde5c633884 Replace individual `Drop`/`Destroy` IPDL messages with our own `Message` enum. r=webgpu-reviewers,nical https://github.com/mozilla-firefox/firefox/commit/e2ae6e549f90 https://hg.mozilla.org/integration/autoland/rev/6d6355a3b148 Wrap device actions in `Message::Device`. r=webgpu-reviewers,nical https://github.com/mozilla-firefox/firefox/commit/0f70fabca04e https://hg.mozilla.org/integration/autoland/rev/e44206c0b7cc Wrap texture actions in `Message::Texture`. r=webgpu-reviewers,nical https://github.com/mozilla-firefox/firefox/commit/6bc38dde0224 https://hg.mozilla.org/integration/autoland/rev/9532eabe579b Wrap command encoder actions in `Message::CommandEncoder`. r=webgpu-reviewers,nical https://github.com/mozilla-firefox/firefox/commit/1a3d6ceabd22 https://hg.mozilla.org/integration/autoland/rev/34c823ba8400 Move `CommandEncoderFinish` into our `Message` enum. r=webgpu-reviewers,nical https://github.com/mozilla-firefox/firefox/commit/9d7951a61b4d https://hg.mozilla.org/integration/autoland/rev/b7a17e072855 Store the device ID inside `ErrorBuffer`. r=webgpu-reviewers,nical https://github.com/mozilla-firefox/firefox/commit/a1ae8babfa2d https://hg.mozilla.org/integration/autoland/rev/ae534cb1b5b3 Move pass messages into our `Message` enum. r=webgpu-reviewers,nical https://github.com/mozilla-firefox/firefox/commit/85589a371903 https://hg.mozilla.org/integration/autoland/rev/eda9efca62e6 Move `QueueWriteAction` into our `Message` enum. r=webgpu-reviewers,nical https://github.com/mozilla-firefox/firefox/commit/9bf4977179b4 https://hg.mozilla.org/integration/autoland/rev/6dad2b345750 Add inline support to `Message::QueueWrite`. r=webgpu-reviewers,nical https://github.com/mozilla-firefox/firefox/commit/dd09c0579656 https://hg.mozilla.org/integration/autoland/rev/83a72bfebf3e Move `DeviceCreateBuffer` into our `Message` enum. r=webgpu-reviewers,jimb https://github.com/mozilla-firefox/firefox/commit/65d17c86e640 https://hg.mozilla.org/integration/autoland/rev/510a7b4f6203 Move `DevicePushErrorScope` into our `Message` enum. r=webgpu-reviewers,nical https://github.com/mozilla-firefox/firefox/commit/35f4a41df916 https://hg.mozilla.org/integration/autoland/rev/84f895015c01 Move `BufferUnmap` into our `Message` enum. r=webgpu-reviewers,jimb https://github.com/mozilla-firefox/firefox/commit/018ce15ce0a0 https://hg.mozilla.org/integration/autoland/rev/b7cd4914c875 Move `QueueSubmit` into our `Message` enum. r=webgpu-reviewers,nical https://github.com/mozilla-firefox/firefox/commit/6ed419f6ac47 https://hg.mozilla.org/integration/autoland/rev/c9e106a02c59 Move swap chain messages into our `Message` enum. r=webgpu-reviewers,jimb https://github.com/mozilla-firefox/firefox/commit/94879837a111 https://hg.mozilla.org/integration/autoland/rev/ef3ca838d33c Move `InstanceRequestAdapter` into our `Message` enum. r=webgpu-reviewers,nical https://github.com/mozilla-firefox/firefox/commit/e000a13390c1 https://hg.mozilla.org/integration/autoland/rev/a0f2c6fe8d03 Move `AdapterRequestDevice` into our `Message` enum. r=webgpu-reviewers,nical https://github.com/mozilla-firefox/firefox/commit/69d6d8094ea4 https://hg.mozilla.org/integration/autoland/rev/a5f7b9eaf0c8 Move `DevicePopErrorScope` into our `Message` enum. r=webgpu-reviewers,nical https://github.com/mozilla-firefox/firefox/commit/8ae809066c1a https://hg.mozilla.org/integration/autoland/rev/64a6525d29f1 Move `DeviceActionWithAck` into our `Message` enum. r=webgpu-reviewers,nical https://github.com/mozilla-firefox/firefox/commit/c99341ba09ae https://hg.mozilla.org/integration/autoland/rev/283fa57cd389 Move `DeviceCreateShaderModule` into our `Message` enum. r=webgpu-reviewers,nical https://github.com/mozilla-firefox/firefox/commit/3776a47ff7bb https://hg.mozilla.org/integration/autoland/rev/9305dede4d1c Move `BufferMap` into our `Message` enum. r=webgpu-reviewers,nical https://github.com/mozilla-firefox/firefox/commit/6197d6b665f1 https://hg.mozilla.org/integration/autoland/rev/17df0a2dba00 Move `QueueOnSubmittedWorkDone` into our `Message` enum. r=webgpu-reviewers,nical https://github.com/mozilla-firefox/firefox/commit/f7394013851c https://hg.mozilla.org/integration/autoland/rev/98c197ab32a8 Add owner pointer to `Client`. r=webgpu-reviewers,nical https://github.com/mozilla-firefox/firefox/commit/d72937450bf7 https://hg.mozilla.org/integration/autoland/rev/ac748e201657 Allow `wgpu_server_message` to surface errors. r=webgpu-reviewers,nical https://github.com/mozilla-firefox/firefox/commit/fffdb292be0b https://hg.mozilla.org/integration/autoland/rev/38d2580426fa Remove `ReportError` message. r=webgpu-reviewers,nical https://github.com/mozilla-firefox/firefox/commit/045d762e133f https://hg.mozilla.org/integration/autoland/rev/c47b7c9db6b0 Queue messages in `WebGPUChild`. r=webgpu-reviewers,profiler-reviewers,canaltinova,nical https://github.com/mozilla-firefox/firefox/commit/1d828ac8f51f https://hg.mozilla.org/integration/autoland/rev/85f42f42bb8d Avoid bincode serialization/deserialization overhead for inline buffer data. r=webgpu-reviewers,nical https://github.com/mozilla-firefox/firefox/commit/64815e20bca0 https://hg.mozilla.org/integration/autoland/rev/6a034a351c96 Increase WebGPU CTS chunks by 4 to avoid job timeouts. r=webgpu-reviewers,nical https://github.com/mozilla-firefox/firefox/commit/b509fda5c3ad https://hg.mozilla.org/integration/autoland/rev/12619b90dd5e Update tier 2 CTS expectations. r=webgpu-reviewers,nical https://github.com/mozilla-firefox/firefox/commit/8e3947bf4dce https://hg.mozilla.org/integration/autoland/rev/29605bf21ee0 Update tier 3 CTS expectations. r=webgpu-reviewers,nical

https://hg.mozilla.org/mozilla-central/rev/7bde5c633884
https://hg.mozilla.org/mozilla-central/rev/6d6355a3b148
https://hg.mozilla.org/mozilla-central/rev/e44206c0b7cc
https://hg.mozilla.org/mozilla-central/rev/9532eabe579b
https://hg.mozilla.org/mozilla-central/rev/34c823ba8400
https://hg.mozilla.org/mozilla-central/rev/b7a17e072855
https://hg.mozilla.org/mozilla-central/rev/ae534cb1b5b3
https://hg.mozilla.org/mozilla-central/rev/eda9efca62e6
https://hg.mozilla.org/mozilla-central/rev/6dad2b345750
https://hg.mozilla.org/mozilla-central/rev/83a72bfebf3e
https://hg.mozilla.org/mozilla-central/rev/510a7b4f6203
https://hg.mozilla.org/mozilla-central/rev/84f895015c01
https://hg.mozilla.org/mozilla-central/rev/b7cd4914c875
https://hg.mozilla.org/mozilla-central/rev/c9e106a02c59
https://hg.mozilla.org/mozilla-central/rev/ef3ca838d33c
https://hg.mozilla.org/mozilla-central/rev/a0f2c6fe8d03
https://hg.mozilla.org/mozilla-central/rev/a5f7b9eaf0c8
https://hg.mozilla.org/mozilla-central/rev/64a6525d29f1
https://hg.mozilla.org/mozilla-central/rev/283fa57cd389
https://hg.mozilla.org/mozilla-central/rev/9305dede4d1c
https://hg.mozilla.org/mozilla-central/rev/17df0a2dba00
https://hg.mozilla.org/mozilla-central/rev/98c197ab32a8
https://hg.mozilla.org/mozilla-central/rev/ac748e201657
https://hg.mozilla.org/mozilla-central/rev/38d2580426fa
https://hg.mozilla.org/mozilla-central/rev/c47b7c9db6b0
https://hg.mozilla.org/mozilla-central/rev/85f42f42bb8d
https://hg.mozilla.org/mozilla-central/rev/6a034a351c96
https://hg.mozilla.org/mozilla-central/rev/12619b90dd5e
https://hg.mozilla.org/mozilla-central/rev/29605bf21ee0

Status: ASSIGNED → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
Target Milestone: --- → 142 Branch
Blocks: 1975714
Regressions: 1975465
QA Whiteboard: [qa-triage-done-c143/b142]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: