Don't poll WebGPU from a timer
Categories
(Core :: Graphics: WebGPU, defect, P3)
Tracking
()
People
(Reporter: jimb, Unassigned)
References
(Blocks 3 open bugs)
Details
Right now, when Firefox creates a PWebGPU IPDL parent, the WebGPUParent constructor starts a timer thread calling wgpu_core::global::device::Global::poll_all_devices, to respond to the completion of work on the GPU. This polling runs ten times a second, and probably has an unacceptable impact on battery life.
Instead, creating a WebGPUParent should start a separate thread that polls the wgpu devices only as necessary. Submitting a command buffer to a queue should dispatch a runnable to that thread (in classic Gecko style, using nsIEventTarget::Dispatch or something like it) that calls Global::device_poll, passing Maintain::WaitForSubmissionIndex with the submission index returned by the submission. This will cause the thread to block until the GPU has completed the submitted work.
Various other WebGPU operations require calling device_poll as well, and these can be handled by sending the thread appropriate runnables.
This approach should ensure that WebGPU consumes the CPU's attention only when we are actually waiting for work.
| Reporter | ||
Updated•1 year ago
|
| Reporter | ||
Updated•1 year ago
|
Comment 1•10 months ago
|
||
Linking this other report with an attached test case and some discussion: https://bugzilla.mozilla.org/show_bug.cgi?id=1900273 and the issue on wgpu github with a link to a solution to this problem on the servo browser: https://github.com/gfx-rs/wgpu/issues/6660 https://github.com/servo/servo/pull/32266
In terms of severity, this makes compute shaders so slow as to be unusuable on firefox.
Comment 2•7 months ago
|
||
Just pinging here to see if there is any chance of fixing this sometime soon? :) Thanks!
Comment 3•7 months ago
|
||
This is definitely a priority for before the release we're hoping for by June. However, we haven't gotten more fine-grained with our planning than that right now, sorry!
Comment 4•4 months ago
|
||
Here's your periodic ping to see if there is any update on when this might be fixed :)
Comment 5•4 months ago
|
||
Unfortunately, I think this task is going to wait until closer to EOY. It's not clear, given our current backlog and decision to ship (which will, in turn, impose more backlog that we haven't found yet).
NI'ing :jimb for his thoughts as the team's tech lead.
| Reporter | ||
Comment 6•4 months ago
|
||
Actually, I think we may want to raise the priority here.
I just tried lowering POLL_MS_TIME in WebGPUParent.cpp from 100ms to 5ms, and it lowers the time needed to run the WebGPU CTS test webgpu:shader,execution,expression,call,builtin,abs from 12s to 1.2s. So it seems plausible that our current polling approach is behind some of our CTS timeouts.
Polling at 100ms is ridiculous: that's an average 50s of latency, unless you're submitting continuously and taking advantage of the implicit poll that submission does.
We did this in servo last year and let me tell you it was worth it. We slashed CTS run times in half.
Polling at 100ms is ridiculous: that's an average 50s of latency, unless you're submitting continuously and taking advantage of the implicit poll that submission does.
That is probably true for most rendering scenarios where you have requestAnimationFrame doing submits for every frame.
Comment 8•3 months ago
|
||
So is the plan to just change this POLL_MS_TIME constant or do the solution outlined above? Any timeframe for this happening? :)
Comment 9•3 months ago
|
||
Randall: I know that :jimb has been doing some implementation design research, and has converged on a design for wait-polling in a separate thread when Firefox knows that some GPU work is in flight. I don't know when we'll be getting to implementing it (minus what I said in comment 5 for before EOY). The changes seem straightforward—we just need to figure out when it will happen compared to the other priorities we have.
Comment 10•1 month ago
|
||
I was looking at this a bit today. Things are a bit hairy around WebGPUParent::GetFrontBufferSnapshot because it calls ffi::wgpu_server_poll_all_devices itself.
We talked about the best way to fit this into the solution proposed above. JimB proposed having the callback set a condition variable and have GetFrontBufferSnapshot wait on that condition variable and then do the readback. This will keep all the polling (expect maybe for the shutdown poll) on the polling thread.
Description
•