Open Bug 1969364 Opened 11 months ago Updated 9 months ago

Buffering calls in content process indefinitely breaks error reporting

Categories

(Core :: Graphics: WebGPU, defect, P2)

defect

Tracking

()

People

(Reporter: jimb, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

In the WebGPU CTS, Firefox [edited] fails webgpu:api,validation,encoding,encoder_state:call_after_successful_finish because it buffers render and compute passes in the content process, transmitting them to the GPU process only when end is called on the pass encoder.

Assuming no error scope is pushed, validation errors should eventually be reported to content as uncapturederror events. This is what the above CTS test checks. However, if a validation error is detected in the midst of a pass, but content never calls end on the pass encoder, Firefox will never transmit the incorrect pass contents, and the uncapturederror event will never be sent.

Technically, if content provokes no other device timeline activity, then Firefox's behavior is strictly compliant with the spec, as the spec doesn't say how quickly the uncapturederror must be delivered. However, indefinitely undelivered errors are a poor developer experience.

Problems of this sort will arise any time the content process buffers any calls capable of generating a validation error, instead of transmitting them immediately to the GPU process.

In the past, we've discussed reducing WebGPU's IPDL overhead by buffering all WebGPU calls, flushing the buffer on calls to submit, mapAsync, popErrorScope, and so on. This issue shows that such an optimization must ensure that calls do not remain buffered indefinitely. For example, it might be possible for Firefox to ensure that a microtask to flush the call buffer is enqueued whenever the call buffer becomes non-empty. This would probably allow sufficient buffering to make the optimization effective, but still place a deterministic bound on how long calls linger in the content process.

See Also: → 1968122

The issue with webgpu:api,validation,encoding,encoder_state:pass_end_invalid_order:* is that we report validation errors too soon (on pass.end() rather than encoder.finish()).

The test uses expectValidationError and its implementation first calls pushErrorScope, it then calls the function given to it (in this case encoder.finish()) and then calls popErrorScope.

We have an issue upstream tracking this: https://github.com/gfx-rs/wgpu/issues/7391.

Following up from our discussion in the meeting, it's actually not the pass_end_invalid_order test that's relevant here. The test that exposes this issue is webgpu:api,validation,encoding,encoder_state:call_after_successful_finish (test source, Web CTS Runner). I believe no fixes for wgpu#7391 are needed to see the relevant failure in call_after_successful_finish.

I see. Our current render/compute pass encoding is done on the content side and only on pass.end() do we issue the commands to the GPU process. This does violate the specification as it's currently written since the validation in "Validate the encoder state" algorithm can generate a validation error if the encoder is in the ended state. I don't think this behavior was intended, all other cases where validation errors arise end up invalidating the encoder, with the errors accumulating in the command encoder and only when encoder.finish() is called do we actually "generate a validation error" as per spec. The cases I see that "generate a validation error" when they shouldn't are:

I will bring up this case in https://github.com/gpuweb/gpuweb/issues/5207 since it's the same issue in nature.

You need to log in before you can comment on or make changes to this bug.