Closed Bug 1618919 Opened 5 years ago Closed 3 years ago

run aarch64 spidermonkey jobs on pixel 2 hardware

Categories

(Testing :: General, defect, P2)

Version 3
defect

Tracking

(firefox75 affected)

RESOLVED FIXED
mozilla75
Tracking Status
firefox75 --- affected

People

(Reporter: jmaher, Assigned: bc)

References

(Blocks 1 open bug)

Details

Attachments

(4 files)

We have windows/aarch64 running spidermonkey jobs in order to get arm coverage for jittest/jsreftest- the key is this is substituting coverage for android/arm64 as well (we do run android/arm7)

In making this change almost a year ago it was an oversight that this is running on a windows builder, and not on a windows/aarch64 machine.

I see a two things to fix:

  1. get spidermonkey tests decoupled from the build or get everything running on aarch64 hardware (probably not ideal)
  2. get the tests running on aarch64 hardware

I imagine this will take quite a while to get running as it isn't just changing a few config files.

I also noticed we spend ~15 minutes at the end of the job uploading llvm-symbolizer.gz. That is unrelated to this, but worth revisiting as we probably don't use llvm-symbolizer.gz at the frequency it is generated/uploaded.

the original bug 1552051 had :sstangl as a contact, I don't think :sstangl is at mozilla anymore.

I filed bug 1618923 about the llvm-symbolizer.gz

I filed bug 1618924 to disable the current test as it doesn't provide any value.

Assignee: nobody → bob
Status: NEW → ASSIGNED
Pushed by bclary@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/33cffdf747c8 Remove sm-plain-win64-aarch64/opt, r=jmaher.
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla75

I had intended that this bug was to get the tests running on aarch64

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

dmajor, bbouvier, sfink, callek: I could use your guidance here.

It appears to me that the SM jobs are designed as build jobs using autospider.py and require checkouts and toolchains on the builder. We are using Yoga C630 laptops for our win64-aarch64 coverage. To run these SM jobs on them we would require aarch64 toolchains as well, correct? We only cross compile our win64 aarch64 on normal windows 10 x86_64 builders and don't have aarch64 versions available?

What approach would you suggest for getting the spidermonkey tests actually running on win64 aarch64 ?

Flags: needinfo?(sphink)
Flags: needinfo?(dmajor)
Flags: needinfo?(bugspam.Callek)
Flags: needinfo?(bbouvier)

(In reply to Bob Clary [:bc:] from comment #8)

dmajor, bbouvier, sfink, callek: I could use your guidance here.

What approach would you suggest for getting the spidermonkey tests actually running on win64 aarch64 ?

My preference would be for the first choice in c#0 (decouple build from tests) - That may not be easy/ideal.

The alternative in my mind would be for some way to run these jobs in the cloud, with an aarch64 win builder provided by relops. - :fubar, is that possible (and if so, I'll leave it to you to chat with :bc about timeline/priority)

Flags: needinfo?(klibby)

Given the available choices, I highly recommend splitting builds from tests. Not only is that the architecturally-nice approach, but in practical matters, putting compilation on arm64-windows machines would be extremely slow and leave us with an ongoing corner-case maintenance burden. In the long term we'd have probably wanted to split the tasks anyway, as we move towards cross-compilation from Linux.

The good news is, Spidermonkey tests are a lot more self-contained than Gecko ones, and to my knowledge there isn't any inherent technical requirement for them to be tested on the build machine. In the past I've downloaded js.exe's from CI and run them through jit-test.py on my own machine locally, and it wasn't painful. This work may require some moderate Taskcluster-slinging but I don't expect to have to fight with the Spidermonkey testing architecture itself.

Flags: needinfo?(dmajor)
Flags: needinfo?(bugspam.Callek)

I agree that splitting builds from tests is the better approach, and should not be all that difficult to do today. There is no technical requirement to run on the build machines. The reasons it works this way are (1) the JS shell test suite used to be relatively miniscule, and once you'd done a build it was faster and less resource intensive to just run the tiny test suite rather than packaging things up + upload + download; (2) setting up nonstandard build/test dependent jobs used to be a PITA during BuildBot days; but most of all (3) laziness and inertia. #1 and #2 are no longer the case, especially now that we have the shell running the full test262 and wpt suites.

#3 is still very much in force, but can be overcome with a sufficiently motivating justification for change like we have here. ;-)

Related to #1, the JS shell used to take very little time to compile, so the overall SM jobs were very fast and adding upload/download overhead would have dramatically increased their latency. We've managed to slow the build down to the point where I'd guess sharing a single build is most likely now a decent resource win as well.

Flags: needinfo?(sphink)

Makes sense to me to split builds from tests too. I managed to cross-compile Spidermonkey for ARMv7 in the past. Unfortunately, I didn't keep any notes when doing so, but I vaguely recall that setting the CC/CXX/LD environment variables to use a gcc to cross-compile was sufficient. The build system may have sufficiently changed now that this isn't valid anymore, and maybe it would just work the set the proper --target when running configure.

Flags: needinfo?(bbouvier)
Blocks: 1620207

Ok, I'll try to reproduce the tests run by autospider using target.jsshell.zip fetch from a build task

Flags: needinfo?(klibby)

A potential subtlety to watch out for, that we wouldn't notice when tests are passing, is debug info / stack traces. I recall that some time ago we didn't have all the PDB files available to download from SM build tasks (bug 1428574). It's an old bug that might have changed shape over the years but it might still be present in some form. Would be good to double check.

Bug 1621295 inspired me to think maybe we could do these on arm64-simulators the same way it is done on arm32-simulators.

tcampbell: Thoughts?

Flags: needinfo?(tcampbell)

We already run SM(arm64). Unless there is a need to actually run these on Windows on physical hardware in particular Yoga C630 laptops, I would vote for Won't fix this.

lth says that SM(aarch64/arm64) would be better / more relevant on phones than the laptops. If I must, can we just drop these laptops and go with SM(aarch64) on the Pixel2 devices?

Flags: needinfo?(lhansen)

From the point of view of the JS/wasm engine, we probably don't care much about which arm64 chip / device we're testing on, so long as we're testing on arm64 hardware. A current-ish phone will in any case be better than an obsolete-ish laptop / tablet.

It's concerning to me that it sounds like we've had no CI testing of arm64 hardware lately. (And I'm guessing ARM.) The simulators just are not that good. WONTFIX'ing this bug in the hope that we'll get to adding tests to the Pixel "later" is not appealing, to say the least.

Flags: needinfo?(lhansen)

We'll get this resolved much sooner than "later".

Flags: needinfo?(tcampbell)

The priority flag is not set for this bug.
:gbrown, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(gbrown)
Flags: needinfo?(gbrown)
Priority: -- → P2

Lars: Is pgo and not opt or debug and requiring --full on try ok with you?

Flags: needinfo?(lhansen)

(In reply to Bob Clary [:bc:] from comment #21)

Lars: Is pgo and not opt or debug and requiring --full on try ok with you?

Anything's better than nothing, and I guess even if most people won't catch problems on try they will be caught eventually. I presume those restrictions are there to reduce the load on the machines?

Flags: needinfo?(lhansen)

Yes. We see a fair amount try -p all -u all which easily overwhelms the limited number of devices. Thanks.

See Also: → 1623327

Re-summarizing to match my plan which is to restore the previous coverage on aarch64 pixel 2 devices immediately while developing the future SM pixel 2 plan. If we go with a SM type job, I would expect we could replace the current jittests with SM on arm7 and aarch64.

Keywords: leave-open
Summary: win/aarch64 spidermonkey jobs are running on normal win10 builders → run aarch64 spidermonkey jobs on pixel 2 hardware

is jittest critical for both arm7 and aarch64? there is a high cost to pay for running this and I want to make sure it makes sense. Questions might be:

market share of arm7 vs arm64
regressions found on arm7 in the last year - would these be the same for arm64?
why not jsreftests? spidermonkey?
runtime/cost of running these?
arm7 is on autoland, could we make it just central only?

for the last point, what is the risk of having a regression found on arm7/64 on mozilla-central and finding the root cause 2 days later because it wasn't ran on autoland?

More data points than opinions:

32-bit ARM appears to be at approximately 30% of our mobile population (query courtesy Mike Conca; refinements to that query are welcome): https://sql.telemetry.mozilla.org/queries/68948/source#174071. It is falling steadily but slowly, and with the delay of Fenix on the one hand and the collaboration with KaiOS on the other it is plausible to me that 32-bit ARM will remain relevant for at least the next couple of years. Obviously it is being phased out on Android, but we may be overrepresented on older phones / Android versions.

32-bit and 64-bit ARM are entirely different architectures and share very little code in our JS/Wasm engines. They are in truth about as related as 32-bit ARM and 64-bit x86. A failure in one back-end has no predictive value for the other back-end.

Our 32-bit ARM support is relatively mature and not expected to change much. Our 64-bit ARM support is relatively immature and we'll probably continue to improve it. Moving 32-bit ARM to central but keeping 64-bit ARM on autoland makes sense to me.

jit-tests and js shell reftests (JS and wasm) are both relevant for this testing. JS unit tests (C++, testing engine internals) possibly less so. I'm not the best qualified person to judge what is and is not most relevant, but without jit-tests we have zero wasm coverage, and that would be sad.

thanks for the good summary- I think this is a reasonable approach to get things running now without risking overloading devices.

lets approve the current patch and add a second one to reduce frequency of jittest/jsreftest on arm7 to m-c/try

This shows the differences in tasks from the reduce patch. Note that I explicitly added release which was implied I believe from the built-projects.

Pushed by bclary@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/baeef2812684 re-enable jit-tests on android-hw-aarch64 to match android-hw-arm7 r=jmaher,aerickson
Pushed by bclary@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/694aa1f078c3 reduce frequency of jittest/jsreftest on arm7 to mozilla-central/try, add testing on release branches, r=jmaher.
See Also: → 1626351

While getting jstests.py to run tests on an android device in bug 1626351 I learned that using a Pixel 2 locally:

  • It takes approximately 2 minutes to setup the device by pushing the binary, libraries and tests. Found by running 1 chunk of 100000.
  • I could reliably predict run time for the jit-tests through an estimate based on 1 chunk out of 100 which took approximately 3 minutes 42 seconds. Removing the 2 minute setup time left ~100 seconds to run 1 out of 100 chunks which scales to 10000 seconds to run them all ignoring setup. Splitting into 10 chunks with a 2 minute setup time for each chunk gives 1000+120 seconds or 18-19 minutes per chunk which agrees with the values we see in production.
  • Using the same approach for the jstest to run 1 chunk of 100 using --jitflags=all, it took ~14 minutes 10 seconds or 850 seconds including setup. Removing setup time leaves 730 seconds to run 1 out of 100 chunks which scales to 73000 seconds to run them all ignoring setup. To run this in 10 chunks with a 2 minute setup time for each chunks give 7300+120 seconds or ~124 minutes per chunk.
  • Eliminating the --jitflags argument reduced the time to run jstests to 3m 38 secs or 1 m 38 seconds or 100 seconds ignoring setup. This would result in the same 18-19 minutes to run 1 chunk out of 10 as we saw for the jit-tests.

We do not have sufficient devices to support running the jstests with --jitflags=all. I believe running the jstests.py with normal flags is already covered by the jsreftests which run on mozilla-central for pgo and debug.

I'll leave this open to add taskcluster/mozharness support to enable running the jstests against android at a future date.

Interesting data! We could probably find some way of pruning the JIT flags for jit-test. And for a number of tests the JIT flags probably don't matter much (for example, most of the wasm tests are probably unaffected by them). And as far as testing efficiency is concerned, perhaps it would be interesting for the team to discuss whether jit-tests could be optimized in some way; after all we've been spoiled with fast computers, and the tests are written for ease of understanding rather than performance of testing.

(In reply to Lars T Hansen [:lth] from comment #36)

Interesting data! We could probably find some way of pruning the JIT flags for jit-test. And for a number of tests the JIT flags probably don't matter much (for example, most of the wasm tests are probably unaffected by them).

The jit-test harness supports multiple JIT flags "profiles", we could just use a less heavy one for pixel 2. I don't think it needs --jitflags=all.

And yeah I think the wasm tests run x times, for each jitflag configuration, that does seem excessive.

(In reply to Bob Clary [:bc:] from comment #35)

We do not have sufficient devices to support running the jstests with --jitflags=all. I believe running the jstests.py with normal flags is already covered by the jsreftests which run on mozilla-central for pgo and debug.

Yeah it makes sense to focus on jit-tests instead of jstests. jstests are valuable but less interesting to run with multiple JIT flags on all platforms.

For the JS engine, knowing that we are already running 64 bits on Linux, I think our goal is to check platform differences:

  • Assembler
  • Allocatable range of pointers.
  • Cache flushing.
  • Boxing / Unboxing

Unfortunately, we have multiple backend which have to be exercised. We have to test our BaselineInterpreter, Baseline, IC Stubs, IonMonkey, and these backend do not get equal coverage based on the options which are used. For example the eager compilation mode for ion is good at finding bug in IonMonkey but would skip too quickly over the BaselineInterpreter and Baseline, and without it there is a large chunk of untested LIR of IonMonkey, as our test cases were designed to reproduce issues quickly or as transient behavior which are rarely exhibited otherwise.

One thing to note is that we are currently not highly working on the Aarch64 backend[1]. Thus, even if this might be impacted by other JIT changes, it might be sounds to spread the testing work across multiple commits, while limiting it the most important jit configurations.

I will suggest running --jitflags=ion [2] while adding ['--blinterp-eager'] configuration and --ion-check-range-analysis option to one of the configuration.

[1] https://github.com/mozilla/gecko-dev/commits/master/js/src/jit/arm64
[2] https://searchfox.org/mozilla-central/source/js/src/tests/lib/tests.py#33-36

Severity: normal → S3

I haven't been working on this. Removing myself to free it up for anyone else who wishes to carry it forward.

Assignee: bob → nobody
Status: REOPENED → NEW

The leave-open keyword is there and there is no activity for 6 months.
:ahal, maybe it's time to close this bug?

Flags: needinfo?(ahal)

Forwarding to Joel, since he did some work in this area recently.
On the same topic with newer data points, see Bug 1692096.

Flags: needinfo?(ahal) → needinfo?(jmaher)

we optimized the jittest jobs on android p2, and plan to run the full set of jittests on apple aarch64 (security review of the new provider/pool tomorrow- eta is this month if not next week).

While this isn't spidermonkey proper, I believe this is a reasonable enough substitute?

Flags: needinfo?(jmaher)

(In reply to Joel Maher ( :jmaher ) (UTC -0800) from comment #42)

we optimized the jittest jobs on android p2, and plan to run the full set of jittests on apple aarch64 (security review of the new provider/pool tomorrow- eta is this month if not next week).

While this isn't spidermonkey proper, I believe this is a reasonable enough substitute?

This sounds reasonable to me. While the system has some impact on the Aarch64 backend, I do think this is a good substitute.

Lars, what do you think?

Flags: needinfo?(lhansen)

I agree, this is a good outcome.

Flags: needinfo?(lhansen)

The leave-open keyword is there and there is no activity for 6 months.
:ahal, maybe it's time to close this bug?

Flags: needinfo?(ahal)

My read from the last time this nag came up was that this was good to close. Please re-open if my read was wrong.

Status: NEW → RESOLVED
Closed: 5 years ago3 years ago
Flags: needinfo?(ahal)
Resolution: --- → FIXED
Assignee: nobody → bob
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: