Open Bug 1677963 (linux-arm64-ci) Opened 4 years ago Updated 1 month ago

add regular Linux ARM64 in the CI

Categories

(Release Engineering :: General, task, P3)

Tracking

(Not tracked)

People

(Reporter: mtabara, Unassigned)

References

(Blocks 8 open bugs)

Details

(Whiteboard: [sp3])

Now that we have this working for ARM64, we need regular CI build of Linux ARM64 + testsuites. Unclear to me whether we need to make adjustmenets in Taskcluster side for this.

This bug is tracking this work.

Blocks: 1646462

Hi Mihai, do you happen to know if anybody is actively working on this or if there are any plans? There appears to be an increasing demand for official ARM64 builds, especially flatpak, and I'd also like to help to push things forward.

Flags: needinfo?(mtabara)

Arm64 builds for Linux have been available on CI since bug 1532952. They're not tested, and not shippable builds, but they exist.

Thanks for that hint!

Depends on: 1532952
Flags: needinfo?(mtabara)

What can I do to help? The lack of Flatpak (see bug 1646462) for aarch64 is really sadening me, and the default shipped by Raspian (64-bits) is either Chrome, or an old ESR.

Do we know who would make a decision on us shipping a new build?

With Raspberry Pi offering official 64-bit O/S builds and other distros supporting arm64 architecture, it would be great to make aarch64 builds of Firefox available on Flatpak.

Is there any plan to move this issue forward and unblock the Flatpak issue?

so uh i didn't know this existed so i opened a discussion about this in mozilla connect sometime ago. If i overlooked the place were i can download the arm64 binaries, feel free to point me in the right direction

To provide some additional motivation for this FR - Apple is no longer shipping Intel machines, they are only shipping Arm-based machines. While there's a universal build for M1 desktop users, there's still no way for M1 Mac users to run Firefox inside of Docker for Mac, since it runs a Linux arm64 virtual machine. That means users of test automation tools such as Cypress cannot locally use Docker to run integration tests against Firefox on M1, which is a fairly popular use case.

Lack of aarch64 nightly builds and nightly ASAN builds is inhibiting fixing Firefox on Qualcomm Chromebooks such as https://acerrecertified.com/acer-spin-513-13-3-chromebook-qualcomm-7c-2-1ghz-4gb-ram-64gb-flash-chromeos-cp513-1h-s60f/. For example https://bugzilla.mozilla.org/show_bug.cgi?id=1783053 on the Firefox side and memory corruption issues in the combination of Firefox and Mesa https://gitlab.freedesktop.org/mesa/mesa/-/issues/7004.

Blocks: raspi

(In reply to Mike Kaply [:mkaply] from comment #5)

Do we know who would make a decision on us shipping a new build?

Mike, Happy to chat more about this. I am also quite interested by this

Depends on: 1784493
Depends on: 1787082
Alias: linux-arm64-ci
Depends on: 1795014
Depends on: 1612995
No longer depends on: 1612995
No longer depends on: 1795014
See Also: → 1795014

Hi,
any update on this? I was able to install firefox via snap and librewolf via flatpak on arm64 ubuntu system. Unfortunately there are not builds for firefox dev or nightly or torbrowser.
Thank you

Bug 1784493 probably should be fixed first.

No update to share yet. We are still considering doing it at some point.

Duplicate of this bug: 1675561
Severity: -- → S4
Priority: -- → P3
QA Contact: mtabara → jlorenzo

Any update? I am really looking forward to this. Graviton instances are a lot cheaper and more common. And I would love to install it on raspberry pi as soon as it is released and not wait for distro.

Sorry for the noise, just wanted to ask for a quick update: with the issue from Comment 16 resolved, is there anything left to do here? :)

Flags: needinfo?(pmoore)

Comment 16 doesn't address firefox CI.

Flags: needinfo?(pmoore)

Hey Julien,
do you know what the next steps are to progress this bug? Does a request for a new worker pool need to be raised with RelOps? Will taskgraph need to be updated, and who would do that?

Flags: needinfo?(jcristau)

This was raised with RelOps in https://mozilla-hub.atlassian.net/browse/RELOPS-265. I don't know that taskgraph would need any change.

Flags: needinfo?(jcristau)

I guess one question is docker-worker vs generic-worker, and how to set up new linux workers sanely.
There's some recent prior art for g-w with the wayland image, but AIUI that image creation is not yet automated, and relies on virtualbox which AFAIK doesn't exist for arm64.

(In reply to Julien Cristau [:jcristau] from comment #21)

I guess one question is docker-worker vs generic-worker, and how to set up new linux workers sanely.

I think Generic Worker is the preferred option here. We don't intend to continue supporting Docker Worker, and I'm not sure if it would be straightforward to deploy it arm64 environments.

I guess one question is docker-worker vs generic-worker, and how to set up new linux workers sanely.
There's some recent prior art for g-w with the wayland image, but AIUI that image creation is not yet automated, and relies on virtualbox which AFAIK doesn't exist for arm64.

This is a good point. The prior art indeed relates to Generic Worker running under Linux, specifically in GCP, when Wayland is enabled, and X11 over Wayland is not sufficient. There is an open support request with Google Cloud ("Google Cloud Support 46049484: Unable to autologin to Gnome Desktop Manager 3"). Google in turn have opened a support case with Canonical about this issue, since it is a device driver issue for a driver which Google does not maintain. We hope when this is resolved, the prior art will no longer be needed.

One of the following options may help here:

  1. running in a different cloud, e.g. in Azure or AWS, or on hardware workers, or
  2. using X11 over Wayland with e.g. a dummy video device driver
  3. waiting for resolution of Google Cloud Support ticket 46049484

I'm not too sure what the requirements are, so not sure which of the above options might be most suitable. Who can help decide that?

It is long overdue but I would like something which scales and that we don't have to patch every other monts (I am afraid that 2. will be this way).
Let's try the 3). We have contacts on both sides. Let's discuss in private about this?

Blocks: 1855463
See Also: → 1866025

(In reply to Sylvestre Ledru [:Sylvestre] from comment #23)

It is long overdue but I would like something which scales and that we don't have to patch every other monts (I am afraid that 2. will be this way).
Let's try the 3). We have contacts on both sides. Let's discuss in private about this?

The resolution of Google Cloud Support ticket 46049484 is still in progress. In addition, two launchpad bugs are in progress:

I tested the latest linux generic kernel today (6.2.0-37-generic #38~22.04.1-Ubuntu from ubuntu package linux-generic-hwe-22.04). The issue with device "VGA compatible controller: Google, Inc. Device a002 (rev 01)" not binding to a kernel driver persists. However, a 24 bit efifb framebuffer at /dev/fb0 is now successfully initialised.

The Google Cloud support ticket updates can be seen by the privileged at https://console.cloud.google.com/support/cases/detail/v2/46049484?project=90111867433

Please see https://github.com/taskcluster/taskcluster/issues/6412#issuecomment-1825683081 for a detailed update on the Google Cloud support ticket.

Note, these issues are currently blocking Wayland on x86_64 - perhaps they might not present under aarch64.

@aerickson, have we tested to see if these issues are also present on arm64 workers in gcp?

Flags: needinfo?(aerickson)
Whiteboard: [sp3]
Blocks: 1784493
No longer depends on: 1784493

(In reply to Pete Moore [:pmoore][:pete] from comment #25)

@aerickson, have we tested to see if these issues are also present on arm64 workers in gcp?

I haven't tested with arm64 yet.

Flags: needinfo?(aerickson)

For clarity: arm64/linux workers are already possible, and exist on commnunity deployment. An existing arm64/linux worker pool is https://community-tc.services.mozilla.com/worker-manager/proj-taskcluster%2Fgw-ubuntu-22-04-arm64

It is currently unknown (comment 25, comment 26) whether Wayland/gnome-shell/mutter compositor plays nicely, but it also isn't clear if that is a requirement or not. If it isn't a requirement (and e.g. X11 is ok, or XWayland) then there should be nothing blocking this issue. If gnome-shell/mutter is required, then the next step is to test this setup. Perhaps it works. It doesn't work on x86_64, and that is currently being addressed in https://github.com/taskcluster/taskcluster/issues/6412.

@rmader Do you have insight into whether Wayland/gnome-shell/mutter support is also a requirement for the arm64/Linux workers?

Flags: needinfo?(robert.mader)

(In reply to Pete Moore [:pmoore][:pete] from comment #27)

@rmader Do you have insight into whether Wayland/gnome-shell/mutter support is also a requirement for the arm64/Linux workers?

IIUC gnome-shell/mutter support is about running test, correct?

While running tests would of course be desirable, I don't think it should block aarch64 builds - especially now that Wayland is at least tested on x86_64. In fact I think it's much better to have a nightly population on linux/aarch64 that runs untested builds (and thus might run into issues a bit more often) than having neither tests nor nightly users. Because then people only find out once distros ship the next release, as happened in bug 1866025.

But that's just my opinion :)

Flags: needinfo?(robert.mader)

While running tests would of course be desirable, I don't think it should block aarch64 builds - especially now that Wayland is at least tested on x86_64. In fact I think it's much better to have a nightly population on linux/aarch64 that runs untested builds (and thus might run into issues a bit more often) than having neither tests nor nightly users.

Agreed, I have been working on a plan to get that.

  • Linux arm64 cross built with pgo/lto
  • No test until the issues mentioned above are fixed
  • Nightly only

It sounds like Wayland isn't required. Andrew, are you happy to pick this up from here?

Should this bug be moved to RelOps? Or do you want a separate bug/issue?

Flags: needinfo?(aerickson)

(In reply to Pete Moore [:pmoore][:pete] from comment #30)

It sounds like Wayland isn't required.

Just to be super clear (sorry for the noise): Wayland test are not required - but the build should definitely include the Wayland backend (i.e. the default, both Wayland+X11).

Wayland is required if we want to ship in release (which is what we want in 2024)

(In reply to Sylvestre Ledru [:Sylvestre] from comment #32)

Wayland is required if we want to ship in release (which is what we want in 2024)

It's already shipping in 121, see bug 1752398.

Note that this was done in part because various distro builds have been enabling the Wayland backend for years already (notably Fedora). That includes aarch64 builds - some mobile oriented distros don't even ship X11/Xwayland support any more because of its poor touch support.

It's already shipping in 121, see bug 1752398.

Yeah, I am the one who pushed for this :)

I was talking about Firefox on Wayland on arm64 - I think we have issues with this blocking us to run tests on this arch.

Oh err, right :) Thanks and sorry for the noise.

See Also: → 1867367

(In reply to Pete Moore [:pmoore][:pete] from comment #30)

It sounds like Wayland isn't required. Andrew, are you happy to pick this up from here?

I've lost track of what we want in this bug... do we want a headless arm64 worker or a X11/Xwayland arm64 worker?

Should this bug be moved to RelOps? Or do you want a separate bug/issue?

Let's keep this existing bug and I'll make blockers and link our Jiras (mentioned below).

We have an existing jira for a headless arm64 worker (https://mozilla-hub.atlassian.net/browse/RELOPS-686).

I will create a new bug if we need a X11/Xwayland arm64 worker.

I've got a Jira epic that tracks the rollout of all varieties of Ubuntu 22.04 workers (including these) at https://mozilla-hub.atlassian.net/browse/RELOPS-705.

Flags: needinfo?(aerickson)

(In reply to Andrew Erickson [:aerickson] from comment #36)

(In reply to Pete Moore [:pmoore][:pete] from comment #30)

It sounds like Wayland isn't required. Andrew, are you happy to pick this up from here?

I've lost track of what we want in this bug... do we want a headless arm64 worker or a X11/Xwayland arm64 worker?

AIUI, both. Firefox wants a X11 and/or wayland arm64 test worker (this bug). NSS wants a headless arm64 builder (bug 1814051).

I've created an l1 wayland arm64 image (name below).

Suggested ci-config worker-images.yaml addition:

monopacker-ubuntu-2204-wayland-arm64:
  fxci-level1-gcp: projects/taskcluster-imaging/global/images/gw-fxci-gcp-l1-arm64-gui-googlecompute-2024-02-14t21-16-11z
See Also: → 1880549

I'd like to run the generate-profile/PGO task on an ARM64 worker. Can we have a level-3 image too?

I started on a patch to spin up some ARM64 workers, but these can only be used from try and other level-1 projects.

aiui we need level-3 workers to run tasks on mozilla-central and the other release branches.

I've created a l3 arm64 gui config and generated an image.

monopacker-ubuntu-2204-wayland-arm64:
  fxci-level1-gcp: projects/taskcluster-imaging/global/images/gw-fxci-gcp-l1-arm64-gui-googlecompute-2024-02-14t21-16-11z
  fxci-level3-gcp: projects/fxci-production-level3-workers/global/images/gw-fxci-gcp-l3-arm64-gui-googlecompute-2024-02-15t21-29-13z

Thanks Andrew!

I am trying to spin up some gecko-1-b-linux-gcp-aarch64 workers in Firefox-CI using the images.
I have a task pending for gecko-1/b-linux-gcp-aarch64 that is not getting picked up by the worker.
I couldn't find logs in papertrail.
Worker manager says the gecko-1/b-linux-gcp-aarch64 worker pool has running capacity and I can see the workers.
I was digging through the logs in gcp, and I don't see much. I compared the closest thing which is one of translation's generic workers.

I see this line in the non-functional aarch64 worker's logs:

startup-script: configPath: /etc/generic-worker/config

vs. this line in the functional translation's worker logs:

startup-script: configPath: /home/ubuntu/generic_worker/generic-worker.config

Is there something busted related to the worker process/daemon on the image? I am not sure why these workers are not picking up tasks.

Flags: needinfo?(aerickson)

(In reply to Gabriel Bustamante [:gabriel] from comment #42)

I am trying to spin up some gecko-1-b-linux-gcp-aarch64 workers in Firefox-CI using the images.
I have a task pending for gecko-1/b-linux-gcp-aarch64 that is not getting picked up by the worker.
I couldn't find logs in papertrail.

We can't send logs to papertrail for every worker due to the costs associated.

Worker manager says the gecko-1/b-linux-gcp-aarch64 worker pool has running capacity and I can see the workers.
I was digging through the logs in gcp, and I don't see much. I compared the closest thing which is one of translation's generic workers.

I see this line in the non-functional aarch64 worker's logs:

startup-script: configPath: /etc/generic-worker/config

vs. this line in the functional translation's worker logs:

startup-script: configPath: /home/ubuntu/generic_worker/generic-worker.config

That's due to the translations images predating the newer wayland images. They're just set up a bit differently.

Is there something busted related to the worker process/daemon on the image? I am not sure why these workers are not picking up tasks.

The arm64 images are very similar to the amd64 images that are working.

In this case, the location in ci-config for the CoT key is incorrect and start-worker is complaining (I ssh'ed to the host and checked it out).

Please use the generic-worker/worker-runner-linux-multi default block on line 95 in worker-pools.yml.

An example working pool is pool_id: 'gecko-t/t-linux-vm-2204-wayland'

Flags: needinfo?(aerickson)

I am having issues with the source checkout on the gecko-1 image.

https://firefox-ci-tc.services.mozilla.com/tasks/AU1li8ZeSrSHyKMH38jfyg/runs/0/logs/public/logs/live.log#L59

[vcs ...] hg: unknown command 'robustcheckout'

I think the image is missing a hgrc file to define robustcheckout? Like this file taskcluster/docker/recipes/hgrc (mounted on docker-workers)

Robust checkout added. New images are:

gw-fxci-gcp-l1-arm64-gui-googlecompute-2024-03-01t17-45-17z
gw-fxci-gcp-l3-arm64-gui-googlecompute-2024-03-01t18-01-03z

See Also: → 1885669
You need to log in before you can comment on or make changes to this bug.