run aarch64 spidermonkey jobs on pixel 2 hardware
Categories
(Testing :: General, defect, P2)
Tracking
(firefox75 affected)
Tracking | Status | |
---|---|---|
firefox75 | --- | affected |
People
(Reporter: jmaher, Assigned: bc)
References
(Blocks 1 open bug)
Details
Attachments
(4 files)
We have windows/aarch64 running spidermonkey jobs in order to get arm coverage for jittest/jsreftest- the key is this is substituting coverage for android/arm64 as well (we do run android/arm7)
In making this change almost a year ago it was an oversight that this is running on a windows builder, and not on a windows/aarch64 machine.
I see a two things to fix:
- get spidermonkey tests decoupled from the build or get everything running on aarch64 hardware (probably not ideal)
- get the tests running on aarch64 hardware
I imagine this will take quite a while to get running as it isn't just changing a few config files.
I also noticed we spend ~15 minutes at the end of the job uploading llvm-symbolizer.gz. That is unrelated to this, but worth revisiting as we probably don't use llvm-symbolizer.gz at the frequency it is generated/uploaded.
Reporter | ||
Comment 1•5 years ago
|
||
the original bug 1552051 had :sstangl as a contact, I don't think :sstangl is at mozilla anymore.
Reporter | ||
Comment 2•5 years ago
|
||
I filed bug 1618923 about the llvm-symbolizer.gz
Reporter | ||
Comment 3•5 years ago
|
||
I filed bug 1618924 to disable the current test as it doesn't provide any value.
Assignee | ||
Comment 4•5 years ago
|
||
Updated•5 years ago
|
Comment 6•5 years ago
|
||
bugherder |
Reporter | ||
Comment 7•5 years ago
|
||
I had intended that this bug was to get the tests running on aarch64
Updated•5 years ago
|
Assignee | ||
Comment 8•5 years ago
|
||
dmajor, bbouvier, sfink, callek: I could use your guidance here.
It appears to me that the SM jobs are designed as build jobs using autospider.py and require checkouts and toolchains on the builder. We are using Yoga C630 laptops for our win64-aarch64 coverage. To run these SM jobs on them we would require aarch64 toolchains as well, correct? We only cross compile our win64 aarch64 on normal windows 10 x86_64 builders and don't have aarch64 versions available?
What approach would you suggest for getting the spidermonkey tests actually running on win64 aarch64 ?
Comment 9•5 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #8)
dmajor, bbouvier, sfink, callek: I could use your guidance here.
What approach would you suggest for getting the spidermonkey tests actually running on win64 aarch64 ?
My preference would be for the first choice in c#0 (decouple build from tests) - That may not be easy/ideal.
The alternative in my mind would be for some way to run these jobs in the cloud, with an aarch64 win builder provided by relops. - :fubar, is that possible (and if so, I'll leave it to you to chat with :bc about timeline/priority)
Comment 10•5 years ago
|
||
Given the available choices, I highly recommend splitting builds from tests. Not only is that the architecturally-nice approach, but in practical matters, putting compilation on arm64-windows machines would be extremely slow and leave us with an ongoing corner-case maintenance burden. In the long term we'd have probably wanted to split the tasks anyway, as we move towards cross-compilation from Linux.
The good news is, Spidermonkey tests are a lot more self-contained than Gecko ones, and to my knowledge there isn't any inherent technical requirement for them to be tested on the build machine. In the past I've downloaded js.exe's from CI and run them through jit-test.py on my own machine locally, and it wasn't painful. This work may require some moderate Taskcluster-slinging but I don't expect to have to fight with the Spidermonkey testing architecture itself.
Updated•5 years ago
|
Comment 11•5 years ago
|
||
I agree that splitting builds from tests is the better approach, and should not be all that difficult to do today. There is no technical requirement to run on the build machines. The reasons it works this way are (1) the JS shell test suite used to be relatively miniscule, and once you'd done a build it was faster and less resource intensive to just run the tiny test suite rather than packaging things up + upload + download; (2) setting up nonstandard build/test dependent jobs used to be a PITA during BuildBot days; but most of all (3) laziness and inertia. #1 and #2 are no longer the case, especially now that we have the shell running the full test262 and wpt suites.
#3 is still very much in force, but can be overcome with a sufficiently motivating justification for change like we have here. ;-)
Related to #1, the JS shell used to take very little time to compile, so the overall SM jobs were very fast and adding upload/download overhead would have dramatically increased their latency. We've managed to slow the build down to the point where I'd guess sharing a single build is most likely now a decent resource win as well.
Comment 12•5 years ago
|
||
Makes sense to me to split builds from tests too. I managed to cross-compile Spidermonkey for ARMv7 in the past. Unfortunately, I didn't keep any notes when doing so, but I vaguely recall that setting the CC/CXX/LD environment variables to use a gcc to cross-compile was sufficient. The build system may have sufficiently changed now that this isn't valid anymore, and maybe it would just work the set the proper --target
when running configure.
Assignee | ||
Comment 13•5 years ago
|
||
Ok, I'll try to reproduce the tests run by autospider using target.jsshell.zip fetch from a build task
Comment 14•5 years ago
|
||
A potential subtlety to watch out for, that we wouldn't notice when tests are passing, is debug info / stack traces. I recall that some time ago we didn't have all the PDB files available to download from SM build tasks (bug 1428574). It's an old bug that might have changed shape over the years but it might still be present in some form. Would be good to double check.
Assignee | ||
Comment 15•5 years ago
|
||
Bug 1621295 inspired me to think maybe we could do these on arm64-simulators the same way it is done on arm32-simulators.
tcampbell: Thoughts?
Assignee | ||
Comment 16•5 years ago
|
||
We already run SM(arm64). Unless there is a need to actually run these on Windows on physical hardware in particular Yoga C630 laptops, I would vote for Won't fix this.
Assignee | ||
Comment 17•5 years ago
|
||
lth says that SM(aarch64/arm64) would be better / more relevant on phones than the laptops. If I must, can we just drop these laptops and go with SM(aarch64) on the Pixel2 devices?
Comment 18•5 years ago
|
||
From the point of view of the JS/wasm engine, we probably don't care much about which arm64 chip / device we're testing on, so long as we're testing on arm64 hardware. A current-ish phone will in any case be better than an obsolete-ish laptop / tablet.
It's concerning to me that it sounds like we've had no CI testing of arm64 hardware lately. (And I'm guessing ARM.) The simulators just are not that good. WONTFIX'ing this bug in the hope that we'll get to adding tests to the Pixel "later" is not appealing, to say the least.
Assignee | ||
Comment 19•5 years ago
•
|
||
We'll get this resolved much sooner than "later".
Comment 20•5 years ago
|
||
The priority flag is not set for this bug.
:gbrown, could you have a look please?
For more information, please visit auto_nag documentation.
Updated•5 years ago
|
Assignee | ||
Comment 21•5 years ago
|
||
Lars: Is pgo and not opt or debug and requiring --full on try ok with you?
Comment 22•5 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #21)
Lars: Is pgo and not opt or debug and requiring --full on try ok with you?
Anything's better than nothing, and I guess even if most people won't catch problems on try they will be caught eventually. I presume those restrictions are there to reduce the load on the machines?
Assignee | ||
Comment 23•5 years ago
|
||
Yes. We see a fair amount try -p all -u all which easily overwhelms the limited number of devices. Thanks.
Assignee | ||
Comment 24•5 years ago
|
||
Assignee | ||
Comment 25•5 years ago
|
||
Re-summarizing to match my plan which is to restore the previous coverage on aarch64 pixel 2 devices immediately while developing the future SM pixel 2 plan. If we go with a SM type job, I would expect we could replace the current jittests with SM on arm7 and aarch64.
Reporter | ||
Comment 26•5 years ago
|
||
is jittest critical for both arm7 and aarch64? there is a high cost to pay for running this and I want to make sure it makes sense. Questions might be:
market share of arm7 vs arm64
regressions found on arm7 in the last year - would these be the same for arm64?
why not jsreftests? spidermonkey?
runtime/cost of running these?
arm7 is on autoland, could we make it just central only?
Reporter | ||
Comment 27•5 years ago
|
||
for the last point, what is the risk of having a regression found on arm7/64 on mozilla-central and finding the root cause 2 days later because it wasn't ran on autoland?
Comment 28•5 years ago
|
||
More data points than opinions:
32-bit ARM appears to be at approximately 30% of our mobile population (query courtesy Mike Conca; refinements to that query are welcome): https://sql.telemetry.mozilla.org/queries/68948/source#174071. It is falling steadily but slowly, and with the delay of Fenix on the one hand and the collaboration with KaiOS on the other it is plausible to me that 32-bit ARM will remain relevant for at least the next couple of years. Obviously it is being phased out on Android, but we may be overrepresented on older phones / Android versions.
32-bit and 64-bit ARM are entirely different architectures and share very little code in our JS/Wasm engines. They are in truth about as related as 32-bit ARM and 64-bit x86. A failure in one back-end has no predictive value for the other back-end.
Our 32-bit ARM support is relatively mature and not expected to change much. Our 64-bit ARM support is relatively immature and we'll probably continue to improve it. Moving 32-bit ARM to central but keeping 64-bit ARM on autoland makes sense to me.
jit-tests and js shell reftests (JS and wasm) are both relevant for this testing. JS unit tests (C++, testing engine internals) possibly less so. I'm not the best qualified person to judge what is and is not most relevant, but without jit-tests we have zero wasm coverage, and that would be sad.
Reporter | ||
Comment 29•5 years ago
|
||
thanks for the good summary- I think this is a reasonable approach to get things running now without risking overloading devices.
lets approve the current patch and add a second one to reduce frequency of jittest/jsreftest on arm7 to m-c/try
Assignee | ||
Comment 30•5 years ago
|
||
Assignee | ||
Comment 31•5 years ago
|
||
This shows the differences in tasks from the reduce patch. Note that I explicitly added release which was implied I believe from the built-projects.
Comment 32•5 years ago
|
||
Comment 33•5 years ago
|
||
Comment 34•5 years ago
|
||
bugherder |
Assignee | ||
Comment 35•5 years ago
|
||
While getting jstests.py to run tests on an android device in bug 1626351 I learned that using a Pixel 2 locally:
- It takes approximately 2 minutes to setup the device by pushing the binary, libraries and tests. Found by running 1 chunk of 100000.
- I could reliably predict run time for the jit-tests through an estimate based on 1 chunk out of 100 which took approximately 3 minutes 42 seconds. Removing the 2 minute setup time left ~100 seconds to run 1 out of 100 chunks which scales to 10000 seconds to run them all ignoring setup. Splitting into 10 chunks with a 2 minute setup time for each chunk gives 1000+120 seconds or 18-19 minutes per chunk which agrees with the values we see in production.
- Using the same approach for the jstest to run 1 chunk of 100 using --jitflags=all, it took ~14 minutes 10 seconds or 850 seconds including setup. Removing setup time leaves 730 seconds to run 1 out of 100 chunks which scales to 73000 seconds to run them all ignoring setup. To run this in 10 chunks with a 2 minute setup time for each chunks give 7300+120 seconds or ~124 minutes per chunk.
- Eliminating the --jitflags argument reduced the time to run jstests to 3m 38 secs or 1 m 38 seconds or 100 seconds ignoring setup. This would result in the same 18-19 minutes to run 1 chunk out of 10 as we saw for the jit-tests.
We do not have sufficient devices to support running the jstests with --jitflags=all. I believe running the jstests.py with normal flags is already covered by the jsreftests which run on mozilla-central for pgo and debug.
I'll leave this open to add taskcluster/mozharness support to enable running the jstests against android at a future date.
Comment 36•5 years ago
|
||
Interesting data! We could probably find some way of pruning the JIT flags for jit-test. And for a number of tests the JIT flags probably don't matter much (for example, most of the wasm tests are probably unaffected by them). And as far as testing efficiency is concerned, perhaps it would be interesting for the team to discuss whether jit-tests could be optimized in some way; after all we've been spoiled with fast computers, and the tests are written for ease of understanding rather than performance of testing.
Comment 37•5 years ago
|
||
(In reply to Lars T Hansen [:lth] from comment #36)
Interesting data! We could probably find some way of pruning the JIT flags for jit-test. And for a number of tests the JIT flags probably don't matter much (for example, most of the wasm tests are probably unaffected by them).
The jit-test harness supports multiple JIT flags "profiles", we could just use a less heavy one for pixel 2. I don't think it needs --jitflags=all.
And yeah I think the wasm tests run x times, for each jitflag configuration, that does seem excessive.
(In reply to Bob Clary [:bc:] from comment #35)
We do not have sufficient devices to support running the jstests with --jitflags=all. I believe running the jstests.py with normal flags is already covered by the jsreftests which run on mozilla-central for pgo and debug.
Yeah it makes sense to focus on jit-tests instead of jstests. jstests are valuable but less interesting to run with multiple JIT flags on all platforms.
Comment 38•5 years ago
|
||
For the JS engine, knowing that we are already running 64 bits on Linux, I think our goal is to check platform differences:
- Assembler
- Allocatable range of pointers.
- Cache flushing.
- Boxing / Unboxing
Unfortunately, we have multiple backend which have to be exercised. We have to test our BaselineInterpreter, Baseline, IC Stubs, IonMonkey, and these backend do not get equal coverage based on the options which are used. For example the eager compilation mode for ion is good at finding bug in IonMonkey but would skip too quickly over the BaselineInterpreter and Baseline, and without it there is a large chunk of untested LIR of IonMonkey, as our test cases were designed to reproduce issues quickly or as transient behavior which are rarely exhibited otherwise.
One thing to note is that we are currently not highly working on the Aarch64 backend[1]. Thus, even if this might be impacted by other JIT changes, it might be sounds to spread the testing work across multiple commits, while limiting it the most important jit configurations.
I will suggest running --jitflags=ion
[2] while adding ['--blinterp-eager']
configuration and --ion-check-range-analysis
option to one of the configuration.
[1] https://github.com/mozilla/gecko-dev/commits/master/js/src/jit/arm64
[2] https://searchfox.org/mozilla-central/source/js/src/tests/lib/tests.py#33-36
Updated•4 years ago
|
Assignee | ||
Comment 39•4 years ago
|
||
I haven't been working on this. Removing myself to free it up for anyone else who wishes to carry it forward.
Comment 40•4 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:ahal, maybe it's time to close this bug?
Comment 41•4 years ago
|
||
Forwarding to Joel, since he did some work in this area recently.
On the same topic with newer data points, see Bug 1692096.
Reporter | ||
Comment 42•4 years ago
|
||
we optimized the jittest jobs on android p2, and plan to run the full set of jittests on apple aarch64 (security review of the new provider/pool tomorrow- eta is this month if not next week).
While this isn't spidermonkey
proper, I believe this is a reasonable enough substitute?
Comment 43•4 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC -0800) from comment #42)
we optimized the jittest jobs on android p2, and plan to run the full set of jittests on apple aarch64 (security review of the new provider/pool tomorrow- eta is this month if not next week).
While this isn't
spidermonkey
proper, I believe this is a reasonable enough substitute?
This sounds reasonable to me. While the system has some impact on the Aarch64 backend, I do think this is a good substitute.
Lars, what do you think?
Comment 45•3 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:ahal, maybe it's time to close this bug?
Comment 46•3 years ago
|
||
My read from the last time this nag came up was that this was good to close. Please re-open if my read was wrong.
Updated•3 years ago
|
Updated•3 years ago
|
Description
•