1808636 - Run performance tests on Apple Silicon

Andrej (:aglavic)

Assignee

Description

•

2 years ago

•

Edited

This bug is for running performance tests on the osx11 platform.

We’ll need to run a subset of the tests we run on OSX 10.15 on this platform because we only have 1/3-1/2 the capacity. However, we’ll need to enable all of them for try.

Associated Jira Task: https://mozilla-hub.atlassian.net/jira/software/c/projects/FXP/boards/393?modal=detail&selectedIssue=FXP-2454&quickFilter=1866

Joel Maher ( :jmaher ) (UTC -8)

Comment 1

•

2 years ago

we are pretty much at 100% machine utilization- can you determine how much capacity you will need (num_tests * avg_runtime * num_pushes) - that should give a daily cpu need, if it is >2 machines, then we need to increase the pool size. honestly I think we should increase the pool size if we are adding any new load (i.e. 1 perf job). Keep in mind current capacity could mean that try runs never complete within 24 hours (that is already the case)

Andrej (:aglavic)

Assignee

Comment 2

•

2 years ago

(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #1)

we are pretty much at 100% machine utilization- can you determine how much capacity you will need (num_tests * avg_runtime * num_pushes) - that should give a daily cpu need, if it is >2 machines, then we need to increase the pool size. honestly I think we should increase the pool size if we are adding any new load (i.e. 1 perf job). Keep in mind current capacity could mean that try runs never complete within 24 hours (that is already the case)

Flags: needinfo?(gmierz2)

Greg Mierzwinski [:sparky]

Comment 3

•

2 years ago

Autoland pushes: 4-5x daily
Average runtime: 25mins (based on OSX10.15 tests)
Num tests: At a minimum we'd like to add 6 essential tests that are specific to mac.
Additional capacity required: [4, 5] * 25mins * 6 = [600, 750] machine minutes

:jmaher, I'm not sure how many machines this translates to (seems like 1 extra machine), but the try runs never completing is concerning for us.
Let me know if you need any additional information here!

Flags: needinfo?(gmierz2) → needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 4

•

2 years ago

the data I was using for osx11 machines not being available was from try pushes I had done last month- looking on try for this week, jobs are completing on try and within a few hours, so try is not a concern.

rounding up, we have 900 minutes of CPU time needed for the 6 tests; *2 for try server or retriggers, so 1800 minutes which is 30 hours of CPU time/day.

Is there a desire to add more than 6 tests? what would be ideal? I think development could start now, and work can be done in parallel to increase the pool size. I just want to know if we need 1 or X machines :)

Flags: needinfo?(jmaher)

Greg Mierzwinski [:sparky]

Comment 5

•

2 years ago

•

Edited

I think it's safe to say that we'll want to add more tests to that platform. I don't know how many more though but if this OSX11 platform is the successor to OSX10.15 then we'll eventually want to run all of our tests on that platform but not within the next year afaict.

:haik, and others, might want to add some specific tests to that platform so we could set a limit of 6 additional tests running in the same configuration (autoland, 25mins) on top of the original 6 in the next year. I think that would result in about 3 additional machines, what do you think?

Joel Maher ( :jmaher ) (UTC -8)

Comment 6

•

2 years ago

currently we run osx 10.15 tests on central as well- is there a need to compare against another browser? I think we can use some math to say every test needs 4 hours of cpu/day.

:jgibbs, what is the process for adding machines to macstadium? I don't know if we need to add them as part of a contract renewal, or if we can add them piecemeal? please advise. If we can add them as needed, then we need 1 by the end of January. If we cannot, then I would request 2 during the next renewal period.

Flags: needinfo?(jgibbs)

Andrej (:aglavic)

Assignee

Updated

•

2 years ago

Severity: -- → S3

Priority: -- → P1

Greg Mierzwinski [:sparky]

Comment 7

•

2 years ago

(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #6)

currently we run osx 10.15 tests on central as well- is there a need to compare against another browser? I think we can use some math to say every test needs 4 hours of cpu/day.

That's a good point! Yes, we should be prepared to compare against other browsers (safari, chrome, and chromium) on m-c and 4 hours/day is reasonable given that they run 3 times a week (and we could decrease this frequency as well): 3 pushes/week * 25 * (6 * 3) = 1350 minutes/week => 3.2 hours/day

Julia Gibbs (:jgibbs)

Comment 8

•

2 years ago

•

Edited

(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #6)

currently we run osx 10.15 tests on central as well- is there a need to compare against another browser? I think we can use some math to say every test needs 4 hours of cpu/day.

:jgibbs, what is the process for adding machines to macstadium? I don't know if we need to add them as part of a contract renewal, or if we can add them piecemeal? please advise. If we can add them as needed, then we need 1 by the end of January. If we cannot, then I would request 2 during the next renewal period.

We can add additional machines at anytime. Would you like me to move forward with 1 additional device request?

Flags: needinfo?(jgibbs)

Joel Maher ( :jmaher ) (UTC -8)

Comment 9

•

2 years ago

yes, 1 extra mac m1 @ macstadium for now. We can fine tune when the tests are running regularly.

Andrej (:aglavic)

Assignee

Comment 10

•

2 years ago

Attached file Bug 1808636 - Run performace tests on OSX 11. r=#perftest (obsolete) — Details

Joel Maher ( :jmaher ) (UTC -8)

Comment 11

•

2 years ago

for reference this is an m-c push with all the tests run:
https://treeherder.mozilla.org/jobs?repo=mozilla-central&tier=1%2C2%2C3&searchStr=test-macosx1100&revision=db88fa190f63506c1da204a5ff73202d679611e9

it is 35 hours of runtime. We run all these per m-c commit (4/day), m-b commit, and as little as possible on autoland, also try (need --full to access these).

so there will be more runs here due to beta, but given that we don't test periodically on other browsers, I would say the same. In general we are at the equivalent of 6 machines running full time 24x7 to keep the load, that doesn't account for reality when there are merges and large load spikes (we often have 3+ hour backlogs). I see 33 machines available on earthangel: https://earthangel-b40313e5.influxcloud.net/d/avHECHgMk/dc-usage-workertypes?orgId=1&refresh=15m&var-provisioner=releng-hardware&var-workerType=gecko-t-osx-1100-m1&from=now-2d&to=now.

this indicates that for each hour of runtime (on a full m-c push) the ideal is 1 machine, but probably 2 hours of runtime per machine would be minimum.

The original goal of 6 jobs @ 3 hours/runtime, would be more like 1.5 machines, so 1 is ok for now, 2 would allow room for some growth.

The current patch adds 34 jobs, I assume 14-18 hours of cpu time, so if that is desirable we would want 7-9 additional machines.

Andrej (:aglavic)

Assignee

Updated

•

2 years ago

Blocks: andrej-2023H1

Andrej (:aglavic)

Assignee

Comment 12

•

2 years ago

Hi Sparky, wanted to get your thoughts on next steps:
The try jobs on OSX11 are failing because they are looking for scipy==1.7.3, we have it available ( https://pypi.pub.build.mozilla.org/pub/ ) for python 3.9 and macosx 10.9 or macosx 12.0, but we need macosx 11.0 and I don't see it available on pypi: https://pypi.org/project/scipy/1.7.3/#files. I have checked other versions and none of them appear to have OSX 11, will double check tomorrow morning and confirm this is the case
:jmaher discussed it with me in the perftest channel: https://matrix.to/#/!ECdrESRnYfSJXBzaMX:mozilla.org/$byowTx5-Zqkk0IsPB4zYK_RvO0uPrnP9tel4i-XCPi8?via=mozilla.org&via=matrix.org&via=yatrix.org

Flags: needinfo?(gmierz2)

Kash Shampur [:kshampur] ⌚EST

Comment 13

•

2 years ago

Unfortunately looks like Big Sur(OSX 11) is/was a known issue and there does not exist any 'official' osx11 wheel.
eg some recent-ish discussion, only M1 supported wheels exist for osx 12:
https://github.com/scipy/scipy/issues/15861
https://github.com/scipy/scipy/issues/16192

So it's not just 1.7.3 - this will be the case for all versions

I haven't looked further but it could be possible to build our own osx11 whl from source and host that on the moz pypi? (not sure how much effort is involved there and if it'd be reliable)

Greg Mierzwinski [:sparky]

Comment 14

•

2 years ago

I sent a message to andrej with suggested next steps. It looks like there are people that have been able to get it running on this machine so I think it's a matter of figuring out what will work for us.

Flags: needinfo?(gmierz2)

Andrej (:aglavic)

Assignee

Comment 15

•

2 years ago

Thank you all for the help and advice!

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

2 years ago

See Also: → https://mozilla-hub.atlassian.net/browse/FXP-2454

Whiteboard: [fxp

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

2 years ago

Summary: Run performace tests on OSX 11 → Run performance tests on OSX 11

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

2 years ago

Whiteboard: [fxp → [fxp]

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 16

•

2 years ago

@andrej could you provide an update on the status of this bug?

Kash Shampur [:kshampur] ⌚EST

Updated

•

2 years ago

Flags: needinfo?(aglavic)

Andrej (:aglavic)

Assignee

Comment 17

•

2 years ago

I have compiled a version osx1100 scipy locally, and have gotten a test to work on osx 11 with a mac mini from the toronto office, I have create a try job and am awaiting results.

Flags: needinfo?(aglavic)

Andrej (:aglavic)

Assignee

Comment 18

•

2 years ago

results on try have an error not previously encountered locally, have reduced priority to get the chrome tests working on android
Currently having discussions if it makes sense to test on osx1100 with the M1 when none of these issues are a problem with osx1200 with the M1

Joel Maher ( :jmaher ) (UTC -8)

Comment 19

•

2 years ago

osx 12.x is significantly more popular (representative of our users) than 11.x; 13.x is still pretty low, but could be the next cool thing. Maybe a small pool of 12.x machines? Not ideal, but it could lead to making unittest migration faster, then a single large pool

Greg Mierzwinski [:sparky]

Comment 20

•

2 years ago

+1 to a small pool of OSX12 machines. I brought this up with Julia in a thread on Slack yesterday too (I've pinged you in it). I think the reason OSX12 is so popular might be because of how bad support for OSX11 is. There doesn't seem to be any plans for scipy/etc to get pre-built modules on pypi for OSX11 so we'll be stuck with manual builds until we switch to OSX12.

Kash Shampur [:kshampur] ⌚EST

Comment 21

•

2 years ago

+1 as well to 12.x fwiw.
I can't think of a reason why anyone (user) would stay on OS11 if one owns an M1 (if it were intel machines, I would understand)
so this would be more "realistic" in that sense to have os12?

Agreed with Sparky that we'd be stuck with a manual/hacky build for scipy (and potentially other packages) which we'd have to revert anyway when going to os12...

Also, there is an added benefit of a more up-to-date Safari for testing with os12, but that's of lower importance :-)

Kash Shampur [:kshampur] ⌚EST

Updated

•

2 years ago

Blocks: 1821378

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 22

•

2 years ago

The motivation here is running the performance tests on Apple Silicon rather than Intel. The version of macOS is less important, though it sounds like we'll want at least macOS 12 to avoid some challenges with earlier versions.

Summary: Run performance tests on OSX 11 → Run performance tests on Apple Silicon

Andrej (:aglavic)

Assignee

Updated

•

2 years ago

See Also: → https://mozilla-hub.atlassian.net/browse/RELOPS-303

Phabricator Automation

Updated

•

2 years ago

Attachment #9311099 - Attachment is obsolete: true

Andrej (:aglavic)

Assignee

Comment 23

•

2 years ago

April 17 update: In talks with releng to get mac minis with OSX12

Andrej (:aglavic)

Assignee

Comment 24

•

2 years ago

Got a tracking bug for formatting the bugs, will follow up next week or so

Flags: needinfo?(aglavic)

Kash Shampur [:kshampur] ⌚EST

Updated

•

2 years ago

Comment 25

•

2 years ago

Attached file Bug 1808636 - Run performace tests on OSX 13 M2s r?#perftest (obsolete) — Details

Andrej (:aglavic)

Assignee

Comment 26

•

1 years ago

So we have decided to move to osx1300-M2 and have a try run.
Our tests are almost running! But it looks like a directory is set as read-only which is causing our tests to not be able to download mitm-proxy in the /builds folder
Log of a test https://firefoxci.taskcluster-artifacts.net/LusATbEZRnCP7D_FiGN8wQ/3/public/logs/live_backing.log (specific error: b"OSError: [Errno 30] Read-only file system: '/builds'")
mgoossens says it is because in osx1300 the builds folder cannot be set to read-write and we will have to have all those things in a new folder so we will need to move the location of where we download to for the tests on osx1300

Flags: needinfo?(aglavic)

Andrej (:aglavic)

Assignee

Updated

•

1 years ago

Flags: needinfo?(aglavic)

Kash Shampur [:kshampur] ⌚EST

Updated

•

1 years ago

Blocks: 1836341

Kash Shampur [:kshampur] ⌚EST

Comment 27

•

1 years ago

(In reply to Andrej Glavic (:aglavic) from comment #26)

So we have decided to move to osx1300-M2 and have a try run.
Our tests are almost running! But it looks like a directory is set as read-only which is causing our tests to not be able to download mitm-proxy in the /builds folder
Log of a test https://firefoxci.taskcluster-artifacts.net/LusATbEZRnCP7D_FiGN8wQ/3/public/logs/live_backing.log (specific error: b"OSError: [Errno 30] Read-only file system: '/builds'")
mgoossens says it is because in osx1300 the builds folder cannot be set to read-write and we will have to have all those things in a new folder so we will need to move the location of where we download to for the tests on osx1300

:aglavic we actually have that download error message in existing, passing tests, eg: https://treeherder.mozilla.org/logviewer?job_id=417772174&repo=mozilla-central&lineNumber=1216

now I am wondering what is even running if mitmproxy is not being downloaded? it is not immediately clear, we should look into this

Kash Shampur [:kshampur] ⌚EST

Comment 28

•

1 years ago

•

Edited

(In reply to Kash Shampur [:kshampur] ⌚EST from comment #27)

:aglavic we actually have that download error message in existing, passing tests, eg: https://treeherder.mozilla.org/logviewer?job_id=417772174&repo=mozilla-central&lineNumber=1216

now I am wondering what is even running if mitmproxy is not being downloaded? it is not immediately clear, we should look into this

Actually if i am understanding it right, playback is still successfully being started with mitmproxy
https://treeherder.mozilla.org/logviewer?job_id=417772174&repo=mozilla-central&lineNumber=1248

it just fails to add the files to the tooltool-cache folder. so ignore my previous comment ( i think it's fine, the more concerning failure is the test failing after 1 iteration after the window recorder, which is probably unrelated to mitmproxy)

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 29

•

1 year ago

Hey @andrej, could you provide an update?

Status: NEW → ASSIGNED

Andrej (:aglavic)

Assignee

Comment 30

•

1 year ago

Currently we are experiencing a permafail starting browsertime, I have a M2 loaner and today (assuming I get the password today since a sick person is back today) and can start debugging straight away

Phabricator Automation

Updated

•

1 year ago

Attachment #9335904 - Attachment is obsolete: true

Andrej (:aglavic)

Assignee

Comment 31

•

1 year ago

Our releng contact got back to me yesterday morning letting me know that the m2-1300 hosting service has applied a fix to resolve an issue on their end, today we got a successful run of the browsertime essential pageload and both speedometers.
https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C2%2C3&revision=85b08602c228128d5719fae04467674c08f6994e
It looks like it's a mistake with 10.15 in the jobs but this is because the command we got to run on these machines uses a worker-override to run on the M2s, this is evident by looking at the first few lines of the logs where the workertype settings are displayed
Will be putting in a patch asap to get this running on a nightly cron

Andrej (:aglavic)

Assignee

Updated

•

1 year ago

Depends on: 1844638

Kash Shampur [:kshampur] ⌚EST

Updated

•

1 year ago

Comment 32

•

1 year ago

Update: I am resolving an issue with chrome speedometer and speedometer3 tests, all other tests(other than safari) are passing(may be a chromedriver issue) hoping to be putting in a patch soon to merge the tests into moz-central

Andrej (:aglavic)

Assignee

Comment 33

•

1 year ago

Chrome issue is resolved and now working to resolve the safari perma-fails for M2s

Flags: needinfo?(aglavic)

Andrej (:aglavic)

Assignee

Updated

•

1 year ago

Flags: needinfo?(aglavic)

Andrej (:aglavic)

Assignee

Updated

•

1 year ago

Flags: needinfo?(aglavic)

Andrej (:aglavic)

Assignee

Comment 34

•

1 year ago

Closing bug as the changes to add performance tests were done here: 1844638

Status: ASSIGNED → RESOLVED

Closed: 1 year ago

Resolution: --- → FIXED

Andrej (:aglavic)

Assignee

Updated

•

1 year ago

Depends on: 1859622

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

1 year ago

No longer depends on: 1859622

Bug 1808636 - Run performace tests on OSX 11. r=#perftest 2 years ago Andrej (:aglavic) 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1808636 - Run performace tests on OSX 13 M2s r?#perftest 2 years ago Andrej (:aglavic) 48 bytes, text/x-phabricator-request		Details \| Review