aarch64 workers of the NSS continuous integration are failing (since August 2021)
Categories
(Infrastructure & Operations :: RelOps: General, defect, P2)
Tracking
(Not tracked)
People
(Reporter: beurdouche, Unassigned)
References
(Depends on 1 open bug, Blocks 1 open bug)
Details
The aarch64 workers of the NSS continuous integration have stopped running all together in august 2021
https://treeherder.mozilla.org/jobs?repo=nss-try&selectedTaskRun=furyowVbTMCxpj10wP02qg.0
Comment 1•2 years ago
|
||
Hey Ben, do you know why these tasks were set up to run with a static worker-pool initially? It looks like we run our Firefox Linux aarch64 in GCP, do you think transitioning these tasks over there would be feasible?
Generally using a cloud based pool will be a lot less painful than a statically managed one. But maybe there's a reason it was set up this way?
Comment 2•2 years ago
•
|
||
Is there a list of workers that are supposed to be reachable that are not?
Do we know if these are hardware?
Reporter | ||
Comment 3•2 years ago
|
||
(In reply to Andrew Halberstadt [:ahal] from comment #1)
Hey Ben, do you know why these tasks were set up to run with a static worker-pool initially? It looks like we run our Firefox Linux aarch64 in GCP, do you think transitioning these tasks over there would be feasible?
Generally using a cloud based pool will be a lot less painful than a statically managed one. But maybe there's a reason it was set up this way?
Hi Andrew, as some other workers are working in GCP already, that would definitely be good to have aarch64 workers set up the same way.
I don't think those were static as they were based on docker images https://hg.mozilla.org/projects/nss/file/tip/automation/taskcluster/docker-aarch64.
Reporter | ||
Comment 4•2 years ago
|
||
(In reply to Michelle Goossens [:masterwayz] from comment #2)
Is there a list of workers that are supposed to be reachable that are not?
Do we know if these are hardware?
Unfortunately I don't have much more information on this, but I was pretty sure this was deployed on AWS hardware instances until I found this https://hg.mozilla.org/projects/nss/file/tip/automation/taskcluster/graph/src/extend.js#l287 which seems to indicate these were run on dedicated machines like our MacOS instances. I might be wrong though.
Comment 5•2 years ago
|
||
I have spotted one EC2 instance in an AWS account that is labelled nss-static-aarch64
, so I assume that is the instance. It seems to be a regular EC2 instance.
Comment 6•2 years ago
|
||
For posterity, I filed https://mozilla-hub.atlassian.net/browse/RELENG-1023.
Initially I was thinking we could use the same pool of workers that the Gecko Linux aarch64 tasks are using.. But turns out that pool is actually x86_64 and the builds are cross-compiled.
This means we'd either need to block on https://mozilla-hub.atlassian.net/browse/RELOPS-265, or get the NSS builds to also cross-compile.
Reporter | ||
Updated•1 year ago
|
Reporter | ||
Updated•1 year ago
|
Reporter | ||
Comment 7•1 year ago
|
||
Actually, that last RELOPS ticket seems solved now... :)
Comment 8•1 year ago
|
||
Looks like RELOPS-265 was repurposed, and https://mozilla-hub.atlassian.net/browse/RELOPS-686 is required here instead.
Comment hidden (Intermittent Failures Robot) |
Comment 10•10 months ago
|
||
Cross posting from https://bugzilla.mozilla.org/show_bug.cgi?id=1677963.
Headless multiuser g-w images are not currently possible, but here are arm64 images that should work.
monopacker-ubuntu-2204-wayland-arm64:
fxci-level1-gcp: projects/taskcluster-imaging/global/images/gw-fxci-gcp-l1-arm64-gui-googlecompute-2024-02-14t21-16-11z
fxci-level3-gcp: projects/fxci-production-level3-workers/global/images/gw-fxci-gcp-l3-arm64-gui-googlecompute-2024-02-15t21-29-13z
Description
•