Closed Bug 1530737 Opened 6 months ago Closed 6 months ago

unable to run talos/raptor on win/aarch64 builds in CI

Categories

(Testing :: Performance, enhancement)

Version 3
enhancement
Not set

Tracking

(firefox67 fixed)

RESOLVED FIXED
mozilla67
Tracking Status
firefox67 --- fixed

People

(Reporter: jmaher, Assigned: gbrown)

References

Details

Attachments

(2 files)

as seen in CI:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c0abb56001522773bc87748f6042efc97dc45797

we get an error like:
15:38:01 INFO - mozrunner.errors.RunnerNotStartedError: Failed to start the process: [Error 216] <no description>

I can still run raptor-speedometer, for instance, locally with the script. I don't see any difference in the mozharness script command, nor the raptor script command. Strange...

I did verify we were downloading target.zip from the aarch64 build, I do see xpcshell runs use the signing build instead of the regular build, possibly that is the problem.

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #0)

15:38:01 INFO - mozrunner.errors.RunnerNotStartedError: Failed to start the process: [Error 216] <no description>

Windows error 216 is ERROR_EXE_MACHINE_TYPE_MISMATCH

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #0)

as seen in CI:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c0abb56001522773bc87748f6042efc97dc45797

15:38:01 INFO - raptor-main raptor config: {'binary': 'C:\Users\task_1551179659\build\application\firefox\firefox.exe', 'local_profile_dir': 'c:\users\task_1551179659\appdata\local\temp\tmphjstel.mozrunner', 'symbols_path': 'https://queue.taskcluster.net/v1/task/dxcmBDpjSte0v6bfe2cTTg/artifacts/public/build/target.crashreporter-symbols.zip', 'app': 'firefox', 'gecko_profile_entries': None, 'power_test': False, 'run_local': False, 'platform': 'win', 'host': '127.0.0.1', 'is_release_build': False, 'gecko_profile_interval': None, 'processor': 'x86_64', 'gecko_profile': False, 'obj_path': None}

NB ==> 'processor': 'x86_64'

My log (comment 3) has

18:18:04 INFO - raptor-main main raptor init, config is: {'binary': 'c:\Users\gbrown\moztest\build\application\firefox\firefox.exe', 'symbols_path': 'https://queue.taskcluster.net/v1/task/dxcmBDpjSte0v6bfe2cTTg/artifacts/public/build/target.crashreporter-symbols.zip', 'app': 'firefox', 'gecko_profile_entries': None, 'power_test': False, 'run_local': False, 'platform': 'win', 'host': '127.0.0.1', 'is_release_build': False, 'gecko_profile_interval': None, 'processor': 'ARM64', 'gecko_profile': False, 'obj_path': None}

NB ==> 'processor': 'ARM64'

But I still don't understand how we are getting different builds from the same build task, dxcmBDpjSte0v6bfe2cTTg.

could it be we are getting the same builds, but setting some env variable in our harness code?

raptor gets processor from:

could it be we are getting the same builds, but setting some env variable in our harness code?

raptor gets processor from:
https://searchfox.org/mozilla-central/source/testing/raptor/raptor/raptor.py#71

which from what I can tell is:
https://searchfox.org/mozilla-central/source/testing/mozbase/mozinfo/mozinfo/mozinfo.py#80

and what we have is:
15:37:58 INFO - 'PROCESSOR_ARCHITECTURE': 'AMD64',

which is odd, because from the failing log we should get AMD64 instead of x86_64. but odd, why are you getting amd64 locally:
https://searchfox.org/mozilla-central/source/testing/mozbase/mozinfo/mozinfo/mozinfo.py#139

it seems to force us to x86 or x86_64.

the downloaded mozinfo.json from the build is:
https://queue.taskcluster.net/v1/task/dxcmBDpjSte0v6bfe2cTTg/runs/0/artifacts/public/build/target.mozinfo.json

which has:
processor: aarch64

I wonder if this is some side effect of taskcluster generic-worker.

Interesting! There are differences in the original environments. The taskcluster worker environment has

15:37:31 INFO - 'PROCESSOR_ARCHITECTURE': 'AMD64',
15:37:31 INFO - 'PROCESSOR_IDENTIFIER': 'Intel64 Family 6 Model 94 Stepping 3, GenuineIntel',

while my local laptop has

18:18:03 INFO - 'PROCESSOR_ARCHITECTURE': 'x86',
18:18:03 INFO - 'PROCESSOR_ARCHITEW6432': 'ARM64',
18:18:03 INFO - 'PROCESSOR_IDENTIFIER': 'ARMv8 (64-bit) Family 8 Model 803 Revision 70C, Qualcomm Technologies Inc',

Oh, and the reftest logs from that same try push have PROCESSOR_ARCHITEW6432, like my local laptop.

Ha!

Talos and Raptor have their own worker types! Still using x86!

[taskcluster 2019-02-26T15:36:55.557Z] Worker Type (gecko-t-win10-64-hw) settings:

Is that a consequence of virtualization: hardware in the tc yml? Should it be virtual-with-gpu for aarch64?

Assignee: nobody → gbrown
Flags: needinfo?(jmaher)

I would rather change the transform as there is work to evaluate if we can run in a virtualized environment. Nice find.

Flags: needinfo?(jmaher)
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/cdd251b5b7f1
Run windows10-aarch64 talos/raptor tasks on bitbar laptops; r=jmaher
Status: NEW → RESOLVED
Closed: 6 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67
Blocks: 1531876
You need to log in before you can comment on or make changes to this bug.