add regular Linux ARM64 in the CI
Categories
(Release Engineering :: General, task, P3)
Tracking
(Not tracked)
People
(Reporter: mtabara, Unassigned)
References
(Depends on 2 open bugs, Blocks 8 open bugs)
Details
(Whiteboard: [sp3])
Now that we have this working for ARM64, we need regular CI build of Linux ARM64 + testsuites. Unclear to me whether we need to make adjustmenets in Taskcluster side for this.
This bug is tracking this work.
Comment 1•3 years ago
|
||
Hi Mihai, do you happen to know if anybody is actively working on this or if there are any plans? There appears to be an increasing demand for official ARM64 builds, especially flatpak, and I'd also like to help to push things forward.
Comment 2•3 years ago
|
||
Arm64 builds for Linux have been available on CI since bug 1532952. They're not tested, and not shippable builds, but they exist.
Reporter | ||
Updated•3 years ago
|
Comment 4•3 years ago
|
||
What can I do to help? The lack of Flatpak (see bug 1646462) for aarch64 is really sadening me, and the default shipped by Raspian (64-bits) is either Chrome, or an old ESR.
Comment 5•3 years ago
|
||
Do we know who would make a decision on us shipping a new build?
With Raspberry Pi offering official 64-bit O/S builds and other distros supporting arm64 architecture, it would be great to make aarch64 builds of Firefox available on Flatpak.
Is there any plan to move this issue forward and unblock the Flatpak issue?
so uh i didn't know this existed so i opened a discussion about this in mozilla connect sometime ago. If i overlooked the place were i can download the arm64 binaries, feel free to point me in the right direction
Comment 8•2 years ago
|
||
To provide some additional motivation for this FR - Apple is no longer shipping Intel machines, they are only shipping Arm-based machines. While there's a universal build for M1 desktop users, there's still no way for M1 Mac users to run Firefox inside of Docker for Mac, since it runs a Linux arm64
virtual machine. That means users of test automation tools such as Cypress cannot locally use Docker to run integration tests against Firefox on M1, which is a fairly popular use case.
Comment 9•2 years ago
|
||
Lack of aarch64 nightly builds and nightly ASAN builds is inhibiting fixing Firefox on Qualcomm Chromebooks such as https://acerrecertified.com/acer-spin-513-13-3-chromebook-qualcomm-7c-2-1ghz-4gb-ram-64gb-flash-chromeos-cp513-1h-s60f/. For example https://bugzilla.mozilla.org/show_bug.cgi?id=1783053 on the Firefox side and memory corruption issues in the combination of Firefox and Mesa https://gitlab.freedesktop.org/mesa/mesa/-/issues/7004.
Updated•2 years ago
|
Comment 10•2 years ago
|
||
(In reply to Mike Kaply [:mkaply] from comment #5)
Do we know who would make a decision on us shipping a new build?
Mike, Happy to chat more about this. I am also quite interested by this
Updated•2 years ago
|
Comment 11•2 years ago
|
||
Hi,
any update on this? I was able to install firefox via snap and librewolf via flatpak on arm64 ubuntu system. Unfortunately there are not builds for firefox dev or nightly or torbrowser.
Thank you
Comment 12•2 years ago
|
||
Bug 1784493 probably should be fixed first.
Comment 13•2 years ago
|
||
No update to share yet. We are still considering doing it at some point.
Updated•1 years ago
|
Updated•1 years ago
|
Comment 15•1 year ago
|
||
Any update? I am really looking forward to this. Graviton instances are a lot cheaper and more common. And I would love to install it on raspberry pi as soon as it is released and not wait for distro.
Comment 16•1 year ago
|
||
Comment 17•1 year ago
|
||
Sorry for the noise, just wanted to ask for a quick update: with the issue from Comment 16 resolved, is there anything left to do here? :)
Comment 19•1 year ago
|
||
Hey Julien,
do you know what the next steps are to progress this bug? Does a request for a new worker pool need to be raised with RelOps? Will taskgraph need to be updated, and who would do that?
Comment 20•1 year ago
|
||
This was raised with RelOps in https://mozilla-hub.atlassian.net/browse/RELOPS-265. I don't know that taskgraph would need any change.
Comment 21•1 year ago
|
||
I guess one question is docker-worker vs generic-worker, and how to set up new linux workers sanely.
There's some recent prior art for g-w with the wayland image, but AIUI that image creation is not yet automated, and relies on virtualbox which AFAIK doesn't exist for arm64.
Comment 22•1 year ago
•
|
||
(In reply to Julien Cristau [:jcristau] from comment #21)
I guess one question is docker-worker vs generic-worker, and how to set up new linux workers sanely.
I think Generic Worker is the preferred option here. We don't intend to continue supporting Docker Worker, and I'm not sure if it would be straightforward to deploy it arm64 environments.
I guess one question is docker-worker vs generic-worker, and how to set up new linux workers sanely.
There's some recent prior art for g-w with the wayland image, but AIUI that image creation is not yet automated, and relies on virtualbox which AFAIK doesn't exist for arm64.
This is a good point. The prior art indeed relates to Generic Worker running under Linux, specifically in GCP, when Wayland is enabled, and X11 over Wayland is not sufficient. There is an open support request with Google Cloud ("Google Cloud Support 46049484: Unable to autologin to Gnome Desktop Manager 3"). Google in turn have opened a support case with Canonical about this issue, since it is a device driver issue for a driver which Google does not maintain. We hope when this is resolved, the prior art will no longer be needed.
One of the following options may help here:
- running in a different cloud, e.g. in Azure or AWS, or on hardware workers, or
- using X11 over Wayland with e.g. a dummy video device driver
- waiting for resolution of Google Cloud Support ticket 46049484
I'm not too sure what the requirements are, so not sure which of the above options might be most suitable. Who can help decide that?
Comment 23•1 year ago
|
||
It is long overdue but I would like something which scales and that we don't have to patch every other monts (I am afraid that 2. will be this way).
Let's try the 3). We have contacts on both sides. Let's discuss in private about this?
Comment 24•1 year ago
|
||
(In reply to Sylvestre Ledru [:Sylvestre] from comment #23)
It is long overdue but I would like something which scales and that we don't have to patch every other monts (I am afraid that 2. will be this way).
Let's try the 3). We have contacts on both sides. Let's discuss in private about this?
The resolution of Google Cloud Support ticket 46049484 is still in progress. In addition, two launchpad bugs are in progress:
- https://bugs.launchpad.net/ubuntu/+source/linux-gcp/+bug/2036273
- https://bugs.launchpad.net/ubuntu/+source/linux-gcp/+bug/2039732
I tested the latest linux generic kernel today (6.2.0-37-generic #38~22.04.1-Ubuntu from ubuntu package linux-generic-hwe-22.04). The issue with device "VGA compatible controller: Google, Inc. Device a002 (rev 01)" not binding to a kernel driver persists. However, a 24 bit efifb framebuffer at /dev/fb0 is now successfully initialised.
The Google Cloud support ticket updates can be seen by the privileged at https://console.cloud.google.com/support/cases/detail/v2/46049484?project=90111867433
Comment 25•1 year ago
|
||
Please see https://github.com/taskcluster/taskcluster/issues/6412#issuecomment-1825683081 for a detailed update on the Google Cloud support ticket.
Note, these issues are currently blocking Wayland on x86_64 - perhaps they might not present under aarch64.
@aerickson, have we tested to see if these issues are also present on arm64 workers in gcp?
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Comment 26•1 year ago
|
||
(In reply to Pete Moore [:pmoore][:pete] from comment #25)
@aerickson, have we tested to see if these issues are also present on arm64 workers in gcp?
I haven't tested with arm64 yet.
Comment 27•1 year ago
|
||
For clarity: arm64/linux workers are already possible, and exist on commnunity deployment. An existing arm64/linux worker pool is https://community-tc.services.mozilla.com/worker-manager/proj-taskcluster%2Fgw-ubuntu-22-04-arm64
It is currently unknown (comment 25, comment 26) whether Wayland/gnome-shell/mutter compositor plays nicely, but it also isn't clear if that is a requirement or not. If it isn't a requirement (and e.g. X11 is ok, or XWayland) then there should be nothing blocking this issue. If gnome-shell/mutter is required, then the next step is to test this setup. Perhaps it works. It doesn't work on x86_64, and that is currently being addressed in https://github.com/taskcluster/taskcluster/issues/6412.
@rmader Do you have insight into whether Wayland/gnome-shell/mutter support is also a requirement for the arm64/Linux workers?
Comment 28•1 year ago
|
||
(In reply to Pete Moore [:pmoore][:pete] from comment #27)
@rmader Do you have insight into whether Wayland/gnome-shell/mutter support is also a requirement for the arm64/Linux workers?
IIUC gnome-shell/mutter support is about running test, correct?
While running tests would of course be desirable, I don't think it should block aarch64 builds - especially now that Wayland is at least tested on x86_64. In fact I think it's much better to have a nightly population on linux/aarch64 that runs untested builds (and thus might run into issues a bit more often) than having neither tests nor nightly users. Because then people only find out once distros ship the next release, as happened in bug 1866025.
But that's just my opinion :)
Comment 29•1 year ago
|
||
While running tests would of course be desirable, I don't think it should block aarch64 builds - especially now that Wayland is at least tested on x86_64. In fact I think it's much better to have a nightly population on linux/aarch64 that runs untested builds (and thus might run into issues a bit more often) than having neither tests nor nightly users.
Agreed, I have been working on a plan to get that.
- Linux arm64 cross built with pgo/lto
- No test until the issues mentioned above are fixed
- Nightly only
Comment 30•1 year ago
|
||
It sounds like Wayland isn't required. Andrew, are you happy to pick this up from here?
Should this bug be moved to RelOps? Or do you want a separate bug/issue?
Updated•1 year ago
|
Comment 31•1 year ago
|
||
(In reply to Pete Moore [:pmoore][:pete] from comment #30)
It sounds like Wayland isn't required.
Just to be super clear (sorry for the noise): Wayland test are not required - but the build should definitely include the Wayland backend (i.e. the default, both Wayland+X11).
Comment 32•1 year ago
|
||
Wayland is required if we want to ship in release (which is what we want in 2024)
Comment 33•1 year ago
•
|
||
(In reply to Sylvestre Ledru [:Sylvestre] from comment #32)
Wayland is required if we want to ship in release (which is what we want in 2024)
It's already shipping in 121, see bug 1752398.
Note that this was done in part because various distro builds have been enabling the Wayland backend for years already (notably Fedora). That includes aarch64 builds - some mobile oriented distros don't even ship X11/Xwayland support any more because of its poor touch support.
Comment 34•1 year ago
|
||
It's already shipping in 121, see bug 1752398.
Yeah, I am the one who pushed for this :)
I was talking about Firefox on Wayland on arm64 - I think we have issues with this blocking us to run tests on this arch.
Comment 35•1 year ago
|
||
Oh err, right :) Thanks and sorry for the noise.
Updated•1 year ago
|
Comment 36•1 year ago
|
||
(In reply to Pete Moore [:pmoore][:pete] from comment #30)
It sounds like Wayland isn't required. Andrew, are you happy to pick this up from here?
I've lost track of what we want in this bug... do we want a headless arm64 worker or a X11/Xwayland arm64 worker?
Should this bug be moved to RelOps? Or do you want a separate bug/issue?
Let's keep this existing bug and I'll make blockers and link our Jiras (mentioned below).
We have an existing jira for a headless arm64 worker (https://mozilla-hub.atlassian.net/browse/RELOPS-686).
I will create a new bug if we need a X11/Xwayland arm64 worker.
I've got a Jira epic that tracks the rollout of all varieties of Ubuntu 22.04 workers (including these) at https://mozilla-hub.atlassian.net/browse/RELOPS-705.
Updated•1 year ago
|
Comment 37•1 year ago
|
||
(In reply to Andrew Erickson [:aerickson] from comment #36)
(In reply to Pete Moore [:pmoore][:pete] from comment #30)
It sounds like Wayland isn't required. Andrew, are you happy to pick this up from here?
I've lost track of what we want in this bug... do we want a headless arm64 worker or a X11/Xwayland arm64 worker?
AIUI, both. Firefox wants a X11 and/or wayland arm64 test worker (this bug). NSS wants a headless arm64 builder (bug 1814051).
Comment 38•10 months ago
|
||
I've created an l1 wayland arm64 image (name below).
Suggested ci-config worker-images.yaml addition:
monopacker-ubuntu-2204-wayland-arm64:
fxci-level1-gcp: projects/taskcluster-imaging/global/images/gw-fxci-gcp-l1-arm64-gui-googlecompute-2024-02-14t21-16-11z
Comment 39•10 months ago
•
|
||
I'd like to run the generate-profile/PGO task on an ARM64 worker. Can we have a level-3 image too?
aiui we need level-3 workers to run tasks on mozilla-central and the other release branches.
Comment 40•10 months ago
|
||
I've created a l3 arm64 gui config and generated an image.
monopacker-ubuntu-2204-wayland-arm64:
fxci-level1-gcp: projects/taskcluster-imaging/global/images/gw-fxci-gcp-l1-arm64-gui-googlecompute-2024-02-14t21-16-11z
fxci-level3-gcp: projects/fxci-production-level3-workers/global/images/gw-fxci-gcp-l3-arm64-gui-googlecompute-2024-02-15t21-29-13z
Comment 41•10 months ago
|
||
Thanks Andrew!
Comment 42•9 months ago
|
||
I am trying to spin up some gecko-1-b-linux-gcp-aarch64
workers in Firefox-CI using the images.
I have a task pending for gecko-1/b-linux-gcp-aarch64
that is not getting picked up by the worker.
I couldn't find logs in papertrail.
Worker manager says the gecko-1/b-linux-gcp-aarch64
worker pool has running capacity and I can see the workers.
I was digging through the logs in gcp, and I don't see much. I compared the closest thing which is one of translation's generic workers.
I see this line in the non-functional aarch64 worker's logs:
startup-script: configPath: /etc/generic-worker/config
vs. this line in the functional translation's worker logs:
startup-script: configPath: /home/ubuntu/generic_worker/generic-worker.config
Is there something busted related to the worker process/daemon on the image? I am not sure why these workers are not picking up tasks.
Comment 43•9 months ago
|
||
(In reply to Gabriel Bustamante [:gabriel] from comment #42)
I am trying to spin up some
gecko-1-b-linux-gcp-aarch64
workers in Firefox-CI using the images.
I have a task pending forgecko-1/b-linux-gcp-aarch64
that is not getting picked up by the worker.
I couldn't find logs in papertrail.
We can't send logs to papertrail for every worker due to the costs associated.
Worker manager says the
gecko-1/b-linux-gcp-aarch64
worker pool has running capacity and I can see the workers.
I was digging through the logs in gcp, and I don't see much. I compared the closest thing which is one of translation's generic workers.I see this line in the non-functional aarch64 worker's logs:
startup-script: configPath: /etc/generic-worker/config
vs. this line in the functional translation's worker logs:
startup-script: configPath: /home/ubuntu/generic_worker/generic-worker.config
That's due to the translations images predating the newer wayland images. They're just set up a bit differently.
Is there something busted related to the worker process/daemon on the image? I am not sure why these workers are not picking up tasks.
The arm64 images are very similar to the amd64 images that are working.
In this case, the location in ci-config for the CoT key is incorrect and start-worker is complaining (I ssh'ed to the host and checked it out).
Please use the generic-worker/worker-runner-linux-multi
default block on line 95 in worker-pools.yml.
An example working pool is pool_id: 'gecko-t/t-linux-vm-2204-wayland'
Comment 44•9 months ago
|
||
Thanks Andrew, trying it out with generic-worker/worker-runner-linux-multi
Comment 45•9 months ago
|
||
I am having issues with the source checkout on the gecko-1
image.
[vcs ...] hg: unknown command 'robustcheckout'
I think the image is missing a hgrc file to define robustcheckout? Like this file taskcluster/docker/recipes/hgrc (mounted on docker-workers
)
Comment 46•9 months ago
|
||
Robust checkout added. New images are:
gw-fxci-gcp-l1-arm64-gui-googlecompute-2024-03-01t17-45-17z
gw-fxci-gcp-l3-arm64-gui-googlecompute-2024-03-01t18-01-03z
Comment 47•9 months ago
|
||
Thanks! Trying them out.
Updated•4 months ago
|
Comment 48•3 months ago
|
||
Is this still open?
I am asking because if you go to the verified Mozilla snap at https://snapcraft.io/firefox, then use the drop-down for versions, then show architecture "arm64", then you will see an option line for "esr/stable 115.15.0esr-1 3 September 2024".
Based on the April 2024 announcement about nightly arm64 builds (Firefox Nightly Now Available for Linux on ARM64), it seems like an ESR/stable option was to be enabled by work in the CI and release pipeline. So, is the availability of that option an indicator that that work in the CI and release pipeline is complete and therefore this item is no longer open?
Comment 49•3 months ago
|
||
the snap is build on launchpad, this is open because we don't have regular arm64 deb/tarball releases from the CI yet if i understand correctly
Description
•