Run performance tests on Apple Silicon
Categories
(Testing :: Performance, defect, P1)
Tracking
(Not tracked)
People
(Reporter: aglavic, Assigned: aglavic)
References
Details
(Whiteboard: [fxp])
Attachments
(2 obsolete files)
This bug is for running performance tests on the osx11 platform.
We’ll need to run a subset of the tests we run on OSX 10.15 on this platform because we only have 1/3-1/2 the capacity. However, we’ll need to enable all of them for try.
Associated Jira Task: https://mozilla-hub.atlassian.net/jira/software/c/projects/FXP/boards/393?modal=detail&selectedIssue=FXP-2454&quickFilter=1866
Comment 1•2 years ago
|
||
we are pretty much at 100% machine utilization- can you determine how much capacity you will need (num_tests * avg_runtime * num_pushes) - that should give a daily cpu need, if it is >2 machines, then we need to increase the pool size. honestly I think we should increase the pool size if we are adding any new load (i.e. 1 perf job). Keep in mind current capacity could mean that try runs never complete within 24 hours (that is already the case)
Assignee | ||
Comment 2•2 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #1)
we are pretty much at 100% machine utilization- can you determine how much capacity you will need (num_tests * avg_runtime * num_pushes) - that should give a daily cpu need, if it is >2 machines, then we need to increase the pool size. honestly I think we should increase the pool size if we are adding any new load (i.e. 1 perf job). Keep in mind current capacity could mean that try runs never complete within 24 hours (that is already the case)
Comment 3•2 years ago
|
||
Autoland pushes: 4-5x daily
Average runtime: 25mins (based on OSX10.15 tests)
Num tests: At a minimum we'd like to add 6 essential tests that are specific to mac.
Additional capacity required: [4, 5] * 25mins * 6 = [600, 750] machine minutes
:jmaher, I'm not sure how many machines this translates to (seems like 1 extra machine), but the try runs never completing is concerning for us.
Let me know if you need any additional information here!
Comment 4•2 years ago
|
||
the data I was using for osx11 machines not being available was from try pushes I had done last month- looking on try for this week, jobs are completing on try and within a few hours, so try is not a concern.
rounding up, we have 900 minutes of CPU time needed for the 6 tests; *2 for try server or retriggers, so 1800 minutes which is 30 hours of CPU time/day.
Is there a desire to add more than 6 tests? what would be ideal? I think development could start now, and work can be done in parallel to increase the pool size. I just want to know if we need 1 or X machines :)
Comment 5•2 years ago
•
|
||
I think it's safe to say that we'll want to add more tests to that platform. I don't know how many more though but if this OSX11 platform is the successor to OSX10.15 then we'll eventually want to run all of our tests on that platform but not within the next year afaict.
:haik, and others, might want to add some specific tests to that platform so we could set a limit of 6 additional tests running in the same configuration (autoland, 25mins) on top of the original 6 in the next year. I think that would result in about 3 additional machines, what do you think?
Comment 6•2 years ago
|
||
currently we run osx 10.15 tests on central as well- is there a need to compare against another browser? I think we can use some math to say every test needs 4 hours of cpu/day.
:jgibbs, what is the process for adding machines to macstadium? I don't know if we need to add them as part of a contract renewal, or if we can add them piecemeal? please advise. If we can add them as needed, then we need 1 by the end of January. If we cannot, then I would request 2 during the next renewal period.
Assignee | ||
Updated•2 years ago
|
Comment 7•2 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #6)
currently we run osx 10.15 tests on central as well- is there a need to compare against another browser? I think we can use some math to say every test needs 4 hours of cpu/day.
That's a good point! Yes, we should be prepared to compare against other browsers (safari, chrome, and chromium) on m-c and 4 hours/day is reasonable given that they run 3 times a week (and we could decrease this frequency as well): 3 pushes/week * 25 * (6 * 3) = 1350 minutes/week => 3.2 hours/day
Comment 8•2 years ago
•
|
||
(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #6)
currently we run osx 10.15 tests on central as well- is there a need to compare against another browser? I think we can use some math to say every test needs 4 hours of cpu/day.
:jgibbs, what is the process for adding machines to macstadium? I don't know if we need to add them as part of a contract renewal, or if we can add them piecemeal? please advise. If we can add them as needed, then we need 1 by the end of January. If we cannot, then I would request 2 during the next renewal period.
We can add additional machines at anytime. Would you like me to move forward with 1 additional device request?
Comment 9•2 years ago
|
||
yes, 1 extra mac m1 @ macstadium for now. We can fine tune when the tests are running regularly.
Assignee | ||
Comment 10•2 years ago
|
||
Comment 11•2 years ago
|
||
for reference this is an m-c push with all the tests run:
https://treeherder.mozilla.org/jobs?repo=mozilla-central&tier=1%2C2%2C3&searchStr=test-macosx1100&revision=db88fa190f63506c1da204a5ff73202d679611e9
it is 35 hours of runtime. We run all these per m-c commit (4/day), m-b commit, and as little as possible on autoland, also try (need --full to access these).
so there will be more runs here due to beta, but given that we don't test periodically on other browsers, I would say the same. In general we are at the equivalent of 6 machines running full time 24x7 to keep the load, that doesn't account for reality when there are merges and large load spikes (we often have 3+ hour backlogs). I see 33 machines available on earthangel: https://earthangel-b40313e5.influxcloud.net/d/avHECHgMk/dc-usage-workertypes?orgId=1&refresh=15m&var-provisioner=releng-hardware&var-workerType=gecko-t-osx-1100-m1&from=now-2d&to=now.
this indicates that for each hour of runtime (on a full m-c push) the ideal is 1 machine, but probably 2 hours of runtime per machine would be minimum.
The original goal of 6 jobs @ 3 hours/runtime, would be more like 1.5 machines, so 1 is ok for now, 2 would allow room for some growth.
The current patch adds 34 jobs, I assume 14-18 hours of cpu time, so if that is desirable we would want 7-9 additional machines.
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 12•2 years ago
|
||
Hi Sparky, wanted to get your thoughts on next steps:
The try jobs on OSX11 are failing because they are looking for scipy==1.7.3, we have it available ( https://pypi.pub.build.mozilla.org/pub/ ) for python 3.9 and macosx 10.9 or macosx 12.0, but we need macosx 11.0 and I don't see it available on pypi: https://pypi.org/project/scipy/1.7.3/#files. I have checked other versions and none of them appear to have OSX 11, will double check tomorrow morning and confirm this is the case
:jmaher discussed it with me in the perftest channel: https://matrix.to/#/!ECdrESRnYfSJXBzaMX:mozilla.org/$byowTx5-Zqkk0IsPB4zYK_RvO0uPrnP9tel4i-XCPi8?via=mozilla.org&via=matrix.org&via=yatrix.org
Comment 13•2 years ago
|
||
Unfortunately looks like Big Sur(OSX 11) is/was a known issue and there does not exist any 'official' osx11 wheel.
eg some recent-ish discussion, only M1 supported wheels exist for osx 12:
https://github.com/scipy/scipy/issues/15861
https://github.com/scipy/scipy/issues/16192
So it's not just 1.7.3 - this will be the case for all versions
I haven't looked further but it could be possible to build our own osx11 whl from source and host that on the moz pypi? (not sure how much effort is involved there and if it'd be reliable)
Comment 14•2 years ago
|
||
I sent a message to andrej with suggested next steps. It looks like there are people that have been able to get it running on this machine so I think it's a matter of figuring out what will work for us.
Assignee | ||
Comment 15•2 years ago
|
||
Thank you all for the help and advice!
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Comment 16•2 years ago
|
||
@andrej could you provide an update on the status of this bug?
Updated•2 years ago
|
Assignee | ||
Comment 17•2 years ago
|
||
I have compiled a version osx1100 scipy locally, and have gotten a test to work on osx 11 with a mac mini from the toronto office, I have create a try job and am awaiting results.
Assignee | ||
Comment 18•2 years ago
|
||
results on try have an error not previously encountered locally, have reduced priority to get the chrome tests working on android
Currently having discussions if it makes sense to test on osx1100 with the M1 when none of these issues are a problem with osx1200 with the M1
Comment 19•2 years ago
|
||
osx 12.x is significantly more popular (representative of our users) than 11.x; 13.x is still pretty low, but could be the next cool thing. Maybe a small pool of 12.x machines? Not ideal, but it could lead to making unittest migration faster, then a single large pool
Comment 20•2 years ago
|
||
+1 to a small pool of OSX12 machines. I brought this up with Julia in a thread on Slack yesterday too (I've pinged you in it). I think the reason OSX12 is so popular might be because of how bad support for OSX11 is. There doesn't seem to be any plans for scipy/etc to get pre-built modules on pypi for OSX11 so we'll be stuck with manual builds until we switch to OSX12.
Comment 21•2 years ago
|
||
+1 as well to 12.x fwiw.
I can't think of a reason why anyone (user) would stay on OS11 if one owns an M1 (if it were intel machines, I would understand)
so this would be more "realistic" in that sense to have os12?
Agreed with Sparky that we'd be stuck with a manual/hacky build for scipy (and potentially other packages) which we'd have to revert anyway when going to os12...
Also, there is an added benefit of a more up-to-date Safari for testing with os12, but that's of lower importance :-)
Comment 22•2 years ago
|
||
The motivation here is running the performance tests on Apple Silicon rather than Intel. The version of macOS is less important, though it sounds like we'll want at least macOS 12 to avoid some challenges with earlier versions.
Assignee | ||
Updated•2 years ago
|
Updated•2 years ago
|
Assignee | ||
Comment 23•2 years ago
|
||
April 17 update: In talks with releng to get mac minis with OSX12
Assignee | ||
Comment 24•2 years ago
|
||
Got a tracking bug for formatting the bugs, will follow up next week or so
Assignee | ||
Comment 25•2 years ago
|
||
Assignee | ||
Comment 26•1 years ago
|
||
So we have decided to move to osx1300-M2 and have a try run.
Our tests are almost running! But it looks like a directory is set as read-only which is causing our tests to not be able to download mitm-proxy in the /builds folder
Log of a test https://firefoxci.taskcluster-artifacts.net/LusATbEZRnCP7D_FiGN8wQ/3/public/logs/live_backing.log (specific error: b"OSError: [Errno 30] Read-only file system: '/builds'")
mgoossens says it is because in osx1300 the builds folder cannot be set to read-write and we will have to have all those things in a new folder so we will need to move the location of where we download to for the tests on osx1300
Assignee | ||
Updated•1 years ago
|
Comment 27•1 years ago
|
||
(In reply to Andrej Glavic (:aglavic) from comment #26)
So we have decided to move to osx1300-M2 and have a try run.
Our tests are almost running! But it looks like a directory is set as read-only which is causing our tests to not be able to download mitm-proxy in the /builds folder
Log of a test https://firefoxci.taskcluster-artifacts.net/LusATbEZRnCP7D_FiGN8wQ/3/public/logs/live_backing.log (specific error: b"OSError: [Errno 30] Read-only file system: '/builds'")
mgoossens says it is because in osx1300 the builds folder cannot be set to read-write and we will have to have all those things in a new folder so we will need to move the location of where we download to for the tests on osx1300
:aglavic we actually have that download error message in existing, passing tests, eg: https://treeherder.mozilla.org/logviewer?job_id=417772174&repo=mozilla-central&lineNumber=1216
now I am wondering what is even running if mitmproxy is not being downloaded? it is not immediately clear, we should look into this
Comment 28•1 years ago
•
|
||
(In reply to Kash Shampur [:kshampur] ⌚EST from comment #27)
:aglavic we actually have that download error message in existing, passing tests, eg: https://treeherder.mozilla.org/logviewer?job_id=417772174&repo=mozilla-central&lineNumber=1216
now I am wondering what is even running if mitmproxy is not being downloaded? it is not immediately clear, we should look into this
Actually if i am understanding it right, playback is still successfully being started with mitmproxy
https://treeherder.mozilla.org/logviewer?job_id=417772174&repo=mozilla-central&lineNumber=1248
it just fails to add the files to the tooltool-cache folder. so ignore my previous comment ( i think it's fine, the more concerning failure is the test failing after 1 iteration after the window recorder, which is probably unrelated to mitmproxy)
Assignee | ||
Comment 30•1 year ago
|
||
Currently we are experiencing a permafail starting browsertime, I have a M2 loaner and today (assuming I get the password today since a sick person is back today) and can start debugging straight away
Updated•1 year ago
|
Assignee | ||
Comment 31•1 year ago
|
||
Our releng contact got back to me yesterday morning letting me know that the m2-1300 hosting service has applied a fix to resolve an issue on their end, today we got a successful run of the browsertime essential pageload and both speedometers.
https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C2%2C3&revision=85b08602c228128d5719fae04467674c08f6994e
It looks like it's a mistake with 10.15 in the jobs but this is because the command we got to run on these machines uses a worker-override to run on the M2s, this is evident by looking at the first few lines of the logs where the workertype settings are displayed
Will be putting in a patch asap to get this running on a nightly cron
Assignee | ||
Comment 32•1 year ago
|
||
Update: I am resolving an issue with chrome speedometer and speedometer3 tests, all other tests(other than safari) are passing(may be a chromedriver issue) hoping to be putting in a patch soon to merge the tests into moz-central
Assignee | ||
Comment 33•1 year ago
|
||
Chrome issue is resolved and now working to resolve the safari perma-fails for M2s
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Comment 34•1 year ago
|
||
Closing bug as the changes to add performance tests were done here: 1844638
Description
•