Closed Bug 1915572 Opened 5 months ago Closed 3 months ago

[meta] webgpus errors on onnx inference

Categories

(Core :: Machine Learning, defect)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: tarek, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: meta, Whiteboard: [genai])

When running the NER mode in about:inference using the GPU

[0;93m2024-08-29 11:31:58.972436 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.[m
[0;93m2024-08-29 11:31:58.973840 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.[m
An uncaught WebGPU validation error was raised: Shader module creation failed: Parsing error ort.webgpu.mjs:2562:3419
Encountered one or more errors while creating shader module "Softmax"
WebGPU compilation info for shader module "Softmax" (1 error(s), 0 warning(s), 0 info)

Shader 'Softmax' parsing error: expected assignment or increment/decrement, found 'wg'
   ┌─ wgsl:30:15
   │
30 │         const wg = 64;
   │               ^^ expected assignment or increment/decrement


An uncaught WebGPU validation error was raised: Error matching shader requirements against the pipeline, caused by: Shader module is invalid ort.webgpu.mjs:2562:3419
An uncaught WebGPU validation error was raised: Pipeline is invalid ort.webgpu.mjs:2562:3419
An uncaught WebGPU validation error was raised: Bind group layout is invalid ort.webgpu.mjs:2562:3419
An uncaught WebGPU validation error was raised: In a set_pipeline command, caused by: ComputePipelineId Id(19,1,mtl) is invalid ort.webgpu.mjs:2562:3419
An uncaught WebGPU validation error was raised: Command encoder is locked by a previously created render/compute pass. Before recording any new commands, the pass must be ended. ort.webgpu.mjs:2562:3419
[0;93m2024-08-29 11:33:24.548662 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.[m
[0;93m2024-08-29 11:33:24.550023 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.[m

with image to text

[0;93m2024-08-29 11:41:37.822226 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.[m
[0;93m2024-08-29 11:41:37.823627 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.[m
[0;93m2024-08-29 11:42:07.639508 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.[m
[0;93m2024-08-29 11:42:07.641232 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.[m
An uncaught WebGPU validation error was raised: Shader module creation failed: Shader validation error ort.webgpu.mjs:2562:3419
Encountered one or more errors while creating shader module "Concat"
An uncaught WebGPU validation error was raised: Error matching shader requirements against the pipeline, caused by: Shader module is invalid ort.webgpu.mjs:2562:3419
An uncaught WebGPU validation error was raised: Pipeline is invalid ort.webgpu.mjs:2562:3419
An uncaught WebGPU validation error was raised: Bind group layout is invalid ort.webgpu.mjs:2562:3419
An uncaught WebGPU validation error was raised: In a set_pipeline command, caused by: ComputePipelineId Id(6,1,mtl) is invalid ort.webgpu.mjs:2562:3419
An uncaught WebGPU validation error was raised: Command encoder is locked by a previously created render/compute pass. Before recording any new commands, the pass must be ended. ort.webgpu.mjs:2562:3419
An uncaught WebGPU validation error was raised: Shader module creation failed: Parsing error ort.webgpu.mjs:2562:3419
Encountered one or more errors while creating shader module "Softmax"
An uncaught WebGPU validation error was raised: In a set_pipeline command, caused by: ComputePipelineId Id(21,1,mtl) is invalid ort.webgpu.mjs:2562:3419
Encountered one or more errors while creating shader module "Concat"
An uncaught WebGPU validation error was raised: In a set_pipeline command, caused by: ComputePipelineId Id(42,1,mtl) is invalid ort.webgpu.mjs:2562:3419
An uncaught WebGPU validation error was raised: Command encoder is invalid ort.webgpu.mjs:2562:3419
Encountered one or more errors while creating shader module "Softmax"
An uncaught WebGPU validation error was raised: In a set_pipeline command, caused by: ComputePipelineId Id(44,1,mtl) is invalid ort.webgpu.mjs:2562:3419
Encountered one or more errors while creating shader module "Softmax"
An uncaught WebGPU validation error was raised: In a set_pipeline command, caused by: ComputePipelineId Id(47,1,mtl) is invalid ort.webgpu.mjs:2562:3419
[ERROR wgpu_core::device::global] Device::create_shader_module error:
    Shader validation error:
       ┌─ Concat:44:3
       │
    44 │ ╭   fn calculateInputIndex(index: u32) -> u32 {
    45 │ │     let sizeInConcatAxis = array<u32, 2u>(uniforms.sizeInConcatAxis0,uniforms.sizeInConcatAxis1);
    46 │ │     for (var i: u32 = 0u; i < 2; i += 1u ) {
    47 │ │       if (index < sizeInConcatAxis[i]) {
       │ │                   ^^^^^^^^^^^^^^^^^^^ naga::Expression [14]
       · │
    50 │ │     }
    51 │ │     return 2u;
       │ ╰──────────────^ naga::Function [5]


[ERROR wgpu_core::device::global] Device::create_shader_module error:
    Shader 'Softmax' parsing error: expected assignment or increment/decrement, found 'wg'
       ┌─ wgsl:30:15
       │
    30 │         const wg = 64;
       │               ^^ expected assignment or increment/decrement


[ERROR wgpu_core::device::global] Device::create_shader_module error:
    Shader validation error:
       ┌─ Concat:49:3
       │
    49 │ ╭   fn calculateInputIndex(index: u32) -> u32 {
    50 │ │     let sizeInConcatAxis = array<u32, 2u>(uniforms.sizeInConcatAxis0,uniforms.sizeInConcatAxis1);
    51 │ │     for (var i: u32 = 0u; i < 2; i += 1u ) {
    52 │ │       if (index < sizeInConcatAxis[i]) {
       │ │                   ^^^^^^^^^^^^^^^^^^^ naga::Expression [14]
       · │
    55 │ │     }
    56 │ │     return 2u;
       │ ╰──────────────^ naga::Function [5]


[ERROR wgpu_core::device::global] Device::create_shader_module error:
    Shader 'Softmax' parsing error: expected assignment or increment/decrement, found 'wg'
       ┌─ wgsl:30:15
       │
    30 │         const wg = 64;
       │               ^^ expected assignment or increment/decrement


[ERROR wgpu_core::device::global] Device::create_shader_module error:
    Shader 'Softmax' parsing error: expected assignment or increment/decrement, found 'wg'
       ┌─ wgsl:30:15
       │
    30 │         const wg = 64;
       │               ^^ expected assignment or increment/decrement

Summary: webgpus errors for the NER model → webgpus errors on onnx inference

gpu + fp32

Error: Failed to load resource://webcompat/AboutCompat.sys.mjs AboutPagesUtils.sys.mjs:19:26
[0;93m2024-09-02 16:10:56.475338 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.[m
[0;93m2024-09-02 16:10:56.476891 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.[m
An uncaught WebGPU validation error was raised: Shader module creation failed: Parsing error backend-webgpu.ts:271:16
Encountered one or more errors while creating shader module "Softmax"
WebGPU compilation info for shader module "Softmax" (1 error(s), 0 warning(s), 0 info)
An uncaught WebGPU validation error was raised: Error matching shader requirements against the pipeline, caused by: Shader module is invalid backend-webgpu.ts:271:16
An uncaught WebGPU validation error was raised: Pipeline is invalid backend-webgpu.ts:271:16
An uncaught WebGPU validation error was raised: Bind group layout is invalid backend-webgpu.ts:271:16
An uncaught WebGPU validation error was raised: In a set_pipeline command, caused by: ComputePipelineId Id(19,1,mtl) is invalid backend-webgpu.ts:271:16
An uncaught WebGPU validation error was raised: Command encoder is locked by a previously created render/compute pass. Before recording any new commands, the pass must be ended. backend-webgpu.ts:271:16

Based on the errors in bug 1915572, comment 2, the shader compilation errors appear to be due to bug 1878320 and bug 1913424. Pretty sure the remaining errors about invalid objects (pipeline, bind group layout, etc.) are fallout from the shader module being invalid.

Severity: -- → S3
Depends on: 1878320, 1913424

Thanks! I have a patch for about:inference that will make it easier to reproduce (bug 1913071) once landed we can run any model for any quantization level on the gpu

Depends on: 1913071
Whiteboard: [genai]
Summary: webgpus errors on onnx inference → [meta] webgpus errors on onnx inference
Shader 'Softmax' parsing error: expected assignment or increment/decrement, found 'wg'
   ┌─ wgsl:30:15
   │
30 │         const wg = 64;
   │               ^^ expected assignment or increment/decrement

I believe this problem, at least, has been fixed upstream by this PR: https://github.com/gfx-rs/wgpu/pull/6156

Firefox's next wgpu update should bring this into Mozilla Central, and hence Nightly.

The fix upstream in #6156 should be brought into Mozilla Central by bug 1917102.

See Also: → 1917102

I believe the error below will be addressed by wgpu#6188, which is under review upstream. It will not be included in bug 1917102.

       ┌─ Concat:44:3
       │
    44 │ ╭   fn calculateInputIndex(index: u32) -> u32 {
    45 │ │     let sizeInConcatAxis = array<u32, 2u>(uniforms.sizeInConcatAxis0,uniforms.sizeInConcatAxis1);
    46 │ │     for (var i: u32 = 0u; i < 2; i += 1u ) {
    47 │ │       if (index < sizeInConcatAxis[i]) {
       │ │                   ^^^^^^^^^^^^^^^^^^^ naga::Expression [14]
       · │
    50 │ │     }
    51 │ │     return 2u;
       │ ╰──────────────^ naga::Function [5]

FWIW, I think this is the TypeScript code generating the above WGSL.

(In reply to Jim Blandy :jimb from comment #8)

The fix upstream in #6156 should be brought into Mozilla Central by bug 1917102.

Tested and it works like a charm now! thanks.

I will update the bug with the latest status, but most problems (besides fp16 support) seem to be working very well now on the latest central

Blocks: 1913071
No longer depends on: 1913071

Now focusing on

https://huggingface.co/spaces/Xenova/webgpu-embedding-benchmark

WebGPU fp32 works on Xenova/all-MiniLM-L6-v2 but I get this warning

[0;93m2024-10-09 12:44:58.563000 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.[m

I don't think it's an issue though, I get the same performance than with Chrome

Thanks for the help everyone involved, closing this and will open specific ones when we find some issues

Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.