Closed Bug 1636992 Opened 4 years ago Closed 1 year ago

WebGPU Driver Peformance Issue on Machine Learning Compute Shader

Categories

(Core :: Graphics: WebGPU, enhancement, P3)

enhancement

Tracking

()

RESOLVED DUPLICATE of bug 1841346

People

(Reporter: tqchen, Unassigned)

References

Details

User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.92 Safari/537.36

Steps to reproduce:

We recently created project to compile machine learning to web gpu. Note that firefox lacks a createFence feature, which we still need to get accurate timing for compute. But because we are curious to get firefox to work, we create a temporary workaround that skips the fence sync for compute.

This would give us inaccurate timing for compute measurement, but the timing for end to end pipeline(sync after copy) will still be accurate.

https://tqchen.com/tvm-webgpu-example/

Actual results:

What we find is that firefox's webgpu implementation seems to had performance issue. In particular:

  • on a Mackbook 13'inch, Chrome would gives end to end runtime overhead for about 20ms - 30ms, while firefox would take 40ms and more(The runtime overhead of Chrome is close to native) .
  • I also tried on a linux machine which have a GTX titanX, the native overhead should be about 1ms-2ms end to end, firefox still takes about 40ms to complete end to end.

These evidence seems to reveal that there are some performance issue in the firefox's webgpu driver. It would be great to reduce them.

Of course, there are other details that might be related to the issue, which we would love to see feedback from:

  • Right now we call submit for each compute shader as soon as we have them to send them to the queue, we understand for certain platforms we understand that maybe we want to lazily dispatch these shaders.

I haven't profiled the test case yet. Our priority is to get the API updated to latest.
However, I know one thing that could have caused this. Our WebGPU implementation would choose CPU-shared memory for createBufferMapped buffers. So using them extensively on GPU would get a performance hit.
This should be fixed once https://phabricator.services.mozilla.com/D92636 lands

Severity: normal → S3

:tqchen: Is this still an issue for you?

Blocks: webgpu-v1
Priority: -- → P3

Just like to update this a bit more. I am not sure what is the state of the original issue.

But recently we have had good experience with Chrome WebGPU and build some fun ML applications

Please checkout https://mlc.ai/web-llm/

I think it could be a great pressure test case for firefox as well

ATM, I believe we expect reasonably good performance from WebGPU. I've filed bug 1841346 as follow-up for testing against the link you suggested.

Status: UNCONFIRMED → RESOLVED
Closed: 1 year ago
Duplicate of bug: webgpu-webllm
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.