Closed Bug 1858740 Opened 2 years ago Closed 1 year ago

Look into adding an apple silicon workers and using it for building custom-car

Categories

(Testing :: Performance, task, P3)

task

Tracking

(firefox131 fixed)

RESOLVED FIXED
131 Branch
Tracking Status
firefox131 --- fixed

People

(Reporter: kshampur, Assigned: kshampur)

References

(Blocks 1 open bug)

Details

(Whiteboard: [fxp] [relops-mac])

Attachments

(1 file)

while spec wise the m2 workers are not as good as the linux/win workers (16gb ram, 4-8 cores i i recalll?), the silicon chip should prove to be a decent speed boost for building CaR osx (intel and mac)

we might be able to make an alias in here https://searchfox.org/mozilla-central/rev/e9b338c2d597067f99e96d5f20769f41f312fa8f/taskcluster/ci/config.yml#621

some more info here https://chromium.googlesource.com/chromium/src.git/+/main/docs/mac_arm64.md

similar to how intel mac can compile to arm64, we should be able to compile from arm64 to x64. (but if not, it should at least hopefully speed up arm64 mac builds, since they take about 11-12 hours on our intel machines)

Assignee: nobody → kshampur
Status: NEW → ASSIGNED

:aerickson/:masterwayz (not sure who is best to ask for this, please re direct as needed thanks)

What would be the best way to make a m2 worker available to do builds, similar to the b-osx-1015 workers? Am I on the right track to make that change in this file https://searchfox.org/mozilla-central/rev/e9b338c2d597067f99e96d5f20769f41f312fa8f/taskcluster/ci/config.yml#621?
Does anything need to be changed on https://github.com/mozilla-platform-ops/ronin_puppet?

some context: currently (when passing) these mac x64 CaR builds take about 7-8 hours using the b-osx-1015 worker. In Bug 1846849 we want to also build for Mac Arm64, and I've seen that take about ~11 hours on the intel mac minis

I am curious in how much of a speed gain we'd have if we used the m2 mac minis instead to do the builds (is it possible without too much work?)

Flags: needinfo?(mgoossens)
Flags: needinfo?(aerickson)
Flags: needinfo?(aerickson)
Whiteboard: [fxp] → [fxp] [relops-mac]

I saw this error in the logs - features: Additional property chainOfTrust is not allowed.

I think the issue is that you're normally running on a level 3 builder with chain of trust secrets and these workers are just testers (level 1 - no signing secrets). If this won't be released as a product, I think you can build on L1 workers (if so, it would just be a tweak to the taskgraph for this job to remove the chainOfTrust property in the payload).

We usually have dedicated pools for building, so if this is going to be frequent we should get a dedicated pool and workers.

Flags: needinfo?(mgoossens)

thanks :aerickson! I added a temporary workaround for the chainoftrust thing and that seems to help

:masterwayz I am encountering this error in this Try

[task 2023-10-19T22:40:55.601Z] ________ running 'python3 src/tools/clang/scripts/update.py' in '/opt/worker/tasks/task_169775284558467/custom_car/chromium'
[task 2023-10-19T22:40:55.601Z] Downloading https://commondatastorage.googleapis.com/chromium-browser-clang/Mac/clang-llvmorg-18-init-8676-g11d07d9e-2.tar.xz 
[task 2023-10-19T22:40:55.601Z] <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1131)>
[task 2023-10-19T22:40:55.601Z] Retrying in 5 s ...
[task 2023-10-19T22:40:55.601Z] Downloading https://commondatastorage.googleapis.com/chromium-browser-clang/Mac/clang-llvmorg-18-init-8676-g11d07d9e-2.tar.xz 
[task 2023-10-19T22:40:55.601Z] <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1131)>
[task 2023-10-19T22:40:55.601Z] Retrying in 10 s ...
[task 2023-10-19T22:40:55.601Z] Downloading https://commondatastorage.googleapis.com/chromium-browser-clang/Mac/clang-llvmorg-18-init-8676-g11d07d9e-2.tar.xz 
[task 2023-10-19T22:40:55.601Z] <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1131)>
[task 2023-10-19T22:40:55.601Z] Retrying in 20 s ...
[task 2023-10-19T22:40:55.601Z] Downloading https://commondatastorage.googleapis.com/chromium-browser-clang/Mac/clang-llvmorg-18-init-8676-g11d07d9e-2.tar.xz 
[task 2023-10-19T22:40:55.601Z] <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1131)>
[task 2023-10-19T22:40:55.601Z] Failed to download prebuilt clang package clang-llvmorg-18-init-8676-g11d07d9e-2.tar.xz
[task 2023-10-19T22:40:55.601Z] Use build.py if you want to build locally.
[task 2023-10-19T22:40:55.601Z] Exiting.
[task 2023-10-19T22:40:55.602Z] Error: Command 'python3 src/tools/clang/scripts/update.py' returned non-zero exit status 1 in /opt/worker/tasks/task_169775284558467/custom_car/chromium
[task 2023-10-19T22:40:55.602Z] Downloading https://commondatastorage.googleapis.com/chromium-browser-clang/Mac/clang-llvmorg-18-init-8676-g11d07d9e-2.tar.xz 
[task 2023-10-19T22:40:55.602Z] <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1131)>
[task 2023-10-19T22:40:55.602Z] Retrying in 5 s ...
[task 2023-10-19T22:40:55.602Z] Downloading https://commondatastorage.googleapis.com/chromium-browser-clang/Mac/clang-llvmorg-18-init-8676-g11d07d9e-2.tar.xz 
[task 2023-10-19T22:40:55.602Z] <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1131)>
[task 2023-10-19T22:40:55.602Z] Retrying in 10 s ...
[task 2023-10-19T22:40:55.602Z] Downloading https://commondatastorage.googleapis.com/chromium-browser-clang/Mac/clang-llvmorg-18-init-8676-g11d07d9e-2.tar.xz 
[task 2023-10-19T22:40:55.602Z] <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1131)>
[task 2023-10-19T22:40:55.602Z] Retrying in 20 s ...
[task 2023-10-19T22:40:55.602Z] Downloading https://commondatastorage.googleapis.com/chromium-browser-clang/Mac/clang-llvmorg-18-init-8676-g11d07d9e-2.tar.xz 
[task 2023-10-19T22:40:55.602Z] <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1131)>
[task 2023-10-19T22:40:55.602Z] Failed to download prebuilt clang package clang-llvmorg-18-init-8676-g11d07d9e-2.tar.xz
[task 2023-10-19T22:40:55.602Z] Use build.py if you want to build locally.
[task 2023-10-19T22:40:55.602Z] Exiting.

would you happen to know if these SSL certificate errors are at all related to any permission/network configurations on the silicon workers?

Flags: needinfo?(mgoossens)

was able to curl that file from an m1/m2 staging node. do we know which node specifically this test failed on?

Ryan, sorry - what does node mean in this context?

Ryan's test verifies the host is reachable and the SSL certs for that host are fine.

I think your issue is in your python environment. Is the certifi pip present in your virtualenv?

Thanks, adding certifi seems to have helped a bit!

I now have an issue with the xcode

[task 2023-10-24T16:40:21.397Z] xcode-select: error: tool 'xcodebuild' requires Xcode, but active developer directory '/Library/Developer/CommandLineTools' is a command line tools instance
[task 2023-10-24T16:40:21.397Z] Traceback (most recent call last):
[task 2023-10-24T16:40:21.397Z]   File "/opt/worker/tasks/task_169816378248189/custom_car/chromium/src/build/config/apple/sdk_info.py", line 160, in <module>
[task 2023-10-24T16:40:21.397Z]     sys.exit(main())
[task 2023-10-24T16:40:21.397Z]   File "/opt/worker/tasks/task_169816378248189/custom_car/chromium/src/build/config/apple/sdk_info.py", line 145, in main
[task 2023-10-24T16:40:21.397Z]     FillXcodeVersion(settings, args.developer_dir)
[task 2023-10-24T16:40:21.397Z]   File "/opt/worker/tasks/task_169816378248189/custom_car/chromium/src/build/config/apple/sdk_info.py", line 62, in FillXcodeVersion
[task 2023-10-24T16:40:21.397Z]     lines = subprocess.check_output(['xcodebuild',
[task 2023-10-24T16:40:21.397Z]   File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 424, in check_output
[task 2023-10-24T16:40:21.397Z]     return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
[task 2023-10-24T16:40:21.397Z]   File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 528, in run
[task 2023-10-24T16:40:21.397Z]     raise CalledProcessError(retcode, process.args,
[task 2023-10-24T16:40:21.397Z] subprocess.CalledProcessError: Command '['xcodebuild', '-version']' returned non-zero exit status 1.

in particular xcode-select: error: tool 'xcodebuild' requires Xcode, but active developer directory '/Library/Developer/CommandLineTools' is a command line tools instance not sure what to make of this - looking into it

we currently do not deploy the full xcode ide but have an open issue regarding this: https://mozilla-hub.atlassian.net/browse/RELOPS-680

Flags: needinfo?(mgoossens)

Ah that is great to know- thank you!

See Also: → 1868946
Blocks: 1869058

Hi :rcurran I noticed the ticket in comment 9 was done,

I tried playing around with this again and have encountered this xcode issue here

[task 2024-07-23T23:04:54.353Z] 
[task 2024-07-23T23:04:54.353Z] xcode-select: error: tool 'xcodebuild' requires Xcode, but active developer directory '/Library/Developer/CommandLineTools' is a command line tools instance
[task 2024-07-23T23:04:54.353Z] Traceback (most recent call last):
[task 2024-07-23T23:04:54.353Z]   File "/opt/worker/tasks/task_172176741922985/custom_car/chromium/src/build/config/apple/sdk_info.py", line 160, in <module>
[task 2024-07-23T23:04:54.353Z]     sys.exit(main())
[task 2024-07-23T23:04:54.353Z]   File "/opt/worker/tasks/task_172176741922985/custom_car/chromium/src/build/config/apple/sdk_info.py", line 145, in main
[task 2024-07-23T23:04:54.353Z]     FillXcodeVersion(settings, args.developer_dir)
[task 2024-07-23T23:04:54.353Z]   File "/opt/worker/tasks/task_172176741922985/custom_car/chromium/src/build/config/apple/sdk_info.py", line 62, in FillXcodeVersion
[task 2024-07-23T23:04:54.353Z]     lines = subprocess.check_output(['xcodebuild',
[task 2024-07-23T23:04:54.353Z]   File "/opt/worker/tasks/task_172176741922985/fetches/python/lib/python3.8/subprocess.py", line 415, in check_output
[task 2024-07-23T23:04:54.353Z]     return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
[task 2024-07-23T23:04:54.353Z]   File "/opt/worker/tasks/task_172176741922985/fetches/python/lib/python3.8/subprocess.py", line 516, in run
[task 2024-07-23T23:04:54.353Z]     raise CalledProcessError(retcode, process.args,
[task 2024-07-23T23:04:54.353Z] subprocess.CalledProcessError: Command '['xcodebuild', '-version']' returned non-zero exit status 1.

It seems to be the same/similar to the issue I had in comment 8. Would you happen to have any insight into this?

Flags: needinfo?(rcurran)

:ksahmpur to date we have only deployed Xcode to builder systems.

Do you need Xcode deployed to the performance workers on gecko-t-osx-1400-m2?

If yes, is there a specific version you need or will the latest work?

Thank you

Flags: needinfo?(rcurran)

Thanks for the context. Is there a way I can modify worker-type: t-osx-1400-m2 attribute to instead point to the builder systems instead?

I just wanted to see if the CaR builds were 1) possible and 2) how much speedup we get over the intel machines, before putting in the effort to add xcode to the perf workers

If there is no easy way to point to a machine with xcode already present/ or rather I shouldn't use them, then perhaps installing it on an perf worker would be helpful.

as for the version, unfortunately a version is not specified here, so potentially any version could work, or the latest. At the moment we use macosx64 sdk version 14.4 toolchain, so whatever xcode version is associated with that sdk version might be safe? in which case, it seems xcode 15.3 is the associated version

Flags: needinfo?(rcurran)

:kshampur

One way you may be able to accomplish this is by using the --worker-override flag with mach

Example:

./mach try fuzzy --worker-override t-osx-1015-r8=releng-hardware/gecko-t-osx-1015-r8-staging --no-push

In this example, gecko-t-osx-1015-r8-staging is the pool we're targeting

Let me know if that helps

Flags: needinfo?(rcurran)
See Also: → 1910074

Build on silicon

https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C2%2C3&revision=ea90f1e2b6298c37f666379051494746b2c78a7f&selectedTaskRun=M-kKREqATi6KcsSgn-LO5A.1
276 minutes for the Arm build (for reference, on the intel machine it took about ~630 minutes, so more than 2x speed up!)
x64 build will take a bit longer due to cross compilation, but still looks like it will take much less time than the previous osx 10.15 intel builders were taking

The silicon chips are very good!
thanks for your help so far Ryan!

This patch switches the builder from the intel osx 10.15 builders to the
Apple Silicon builders. 10.15 Is no longer a supported build target for
CaR (Chromium-as-Release) so let's use this opportunity to just go to the latest version
builders we have.

Attachment #9417373 - Attachment description: WIP: Bug 1858740 - Use ARM builder for Mac CaR builds. r?#perftest → Bug 1858740 - Use ARM builder for Mac CaR builds. r?#perftest
Pushed by kshampur@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/6582643381c9 Use ARM builder for Mac CaR builds. r=perftest-reviewers,taskgraph-reviewers,ahal,sparky
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 131 Branch
See Also: → 1869058
See Also: → 1914816
See Also: → 1976130
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: