Closed Bug 1837027 Opened 1 year ago Closed 1 year ago

Need Mesa Lavapipe 22.1.2 or later to run WebGPU CTS on Linux

Categories

(Testing :: General, task, P2)

Default
Desktop
Linux
task

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jimb, Unassigned)

References

Details

We need to run the WebGPU conformance test suite in CI on Linux for Mozilla Central. This requires a conformant Vulkan implementation. Linux Mesa lavapipe 22.1.2 was certified as a conformant implementation in 2022-07, so that should be good enough. Lavapipe is a CPU-based renderer, so it does not require a machine with a GPU. It seems that Ubuntu 22.04 LTS only has Mesa 22.0.1, which is not good enough, but Jammy Updates has Mesa 22.2.5.

Currently we are building an ubuntu 22.04 VM running Wayland (not X11). Will WebGPU/Mesa work on that, or does it need X11?

From my little digging on Google searches, it seems that it doesn't matter if Ubuntu is running X11 or Wayland.

:jimb, can you confirm?

Flags: needinfo?(jimb)

:jmaher: Answering for Jim here, we believe that this should work. The best way to figure this out is going to be with testing, though. Is there a way to access an interactive environment with the current WIP of the Ubuntu 22.04 environment?

Flags: needinfo?(jimb) → needinfo?(jmaher)

I am looking into options with the current Ubuntu 22.04 Wayland- I just got back from a long PTO, so I am not sure who is on PTO, etc. - I assume later this week or first thing next week I can have an answer.

Depends on: 1841867

Mostly for :jimb: I found this upstream issue in wgpu noting that lavapipe seems to stall on certain hardware profiles: https://github.com/gfx-rs/wgpu/issues/1974. It's hard to tell if this might affect us until we know more about the hardware profile of CI runners, though.

The current Wayland 22.04 image has version 22.2.5-0ubuntu0.1~22.04.1 of mesa-vulkan-drivers. There is a minor update available (details in Bug 1841867).

Have we tried running the tests on the workers that use the image (https://firefox-ci-tc.services.mozilla.com/worker-manager/gecko-t%2Ft-linux-vm-2204-wayland)?

Flags: needinfo?(jimb)
Flags: needinfo?(jmaher)

:aerickson: I can answer that. Short answer: nope! Do you have some instructions to help us test consuming this image? I'm unfamiliar with plumbing specific images to Taskcluster jobs, but I'd likely be the one to take it on. πŸ˜…

Flags: needinfo?(jimb)
Priority: -- → P2

./mach try fuzzy -q 'test linux webgpu' --worker-override="t-linux-large-gcp=gecko-t/t-linux-vm-2204-wayland"

Just tried before and after Try builds. Both are broken. πŸ™ Looks like the worker type is different (t-linux-large-gcp vs t-linux-xlarge-gcp) and the image didn't get picked up in these test runs, so I'll try pushing again with an additional --worker-override="t-linux-xlarge-gcp=gecko-t/t-linux-vm-2204-wayland".

EDIT: That didn't work either. 😩 Gonna reach out to Andrew and Joel directly, see if I can get some instruction on how to get this consumed properly for a test run.

the override isn't working, is there going to be a new push with the new --worker-override... ?

:jmaher: I'm not sure who you're addressing (does support for --worker-override=… need to be added still?), but attempting to override xlarge workers on Linux didn't work either.

I think you want just t-linux-large and t-linux-xlarge like:

--worker-override="t-linux-xlarge=gecko-t/t-linux-vm-2204-wayland"

See https://mozilla-hub.atlassian.net/wiki/spaces/ROPS/pages/318996714/Using+mach+try+fuzzy+--worker-override - it's confusing. I couldn't find an alias t-linux-xlarge-gcp in taskcluster/ci/config.yml.

:aerickson: Aha, that seems to change something, but now all non-build jobs are ending in exception with no logs. 😩

Flags: needinfo?(aerickson)

The original workers are docker-workers (that use a different payoload). The new workers are generic-workers and the worker-override option doesn't seem to change the payload to match what worker expects. I think some taskgraph hacking is required to get the jobs to run on the instances.

Flags: needinfo?(aerickson)

:aerickson and :jmaher have been extremely helpful in actually getting a working Try build that exercises this environment, and it seems that we are successfully running most WebGPU tests. I believe that the spirit of this ticket has been resolved, and that we just need to adjust test expectation metadata with the WIP patch stack I have against bug 1836805.

Thanks for your help, everyone! πŸ‘πŸ»πŸ™πŸ»

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.