Closed Bug 1557614 Opened 5 years ago Closed 2 years ago

Some gecko-t-win64-aarch64-laptop don't have a complete python3 install

Categories

(Infrastructure & Operations :: RelOps: OpenCloudConfig, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: glandium, Assigned: grenade)

References

Details

Attachments

(1 file, 1 obsolete file)

+++ This bug was initially created as a clone of Bug #1545339 +++

When bug 1525373 landed, it caused similar problems as those that me file bug 1545339... but on the gecko-t-win64-aarch64-laptop workers.

No longer blocks: 1525373
Assignee: nobody → rthijssen
Status: NEW → ASSIGNED
Attached file GitHub Pull Request

this patch updates arm64 laptops to use the python 3 msi installers in the same way as the windows 7 32 bit workers do (windows aarch64 systems are compatible with x86 (32 bit) software binaries).

a slight complication is that we need to add a few arguments to the msi install command:

  • InstallAllUsers=1
  • TargetDir=C:\mozilla-build\python3

windows 10 hardware workers (both x86_64 and aarch64) bypass dsc for their software installations and use a custom install mechanism from the occ powershell module which is a few commits ahead of occ master, in the occ gamma branch.

i have already updated the module to support adding the additional msi install arguments.

Attachment #9070511 - Flags: review?(mcornmesser)
Attachment #9070511 - Flags: review?(mcornmesser) → review+
Depends on: 1558192

What is the status, here?

Flags: needinfo?(rthijssen)

apologies for the slow response here. the problem is due to the fact that the aarch64 systems are not running occ at boot and as such don't pick up the new configuration which would use the python 3 installer scripts.
someone at bitbar will have to manually run occ on these systems or we wait for the migration to ronin (bug 1530414) on these systems.

Flags: needinfo?(rthijssen)

Any update here?

Flags: needinfo?(rthijssen)

(In reply to Mike Hommey [:glandium] from comment #5)

Any update here?

no.
the aarch64 occ implementation is broken. we don't have a way to update them without someone at bitbar manually running commands on each instance.
this is compounded by the fact that we don't intend to fix occ on these and are migrating these workers to ronin puppet.
we don't yet have puppet manifests for aarch64 since this platform uses x86 rather than x86_64 installers which we still need to write.

so there are still a few yaks to shave before this will be fixed.

Flags: needinfo?(rthijssen)

Apparently, we have been able to fix the x86-64 hardware at bitbar in bug 1569091. Can we do the same somehow here?

Flags: needinfo?(rthijssen)

it's in progress

Flags: needinfo?(rthijssen)

i have been patching yoga systems today to make them run occ between tasks (which also causes them to pick up the python 3 install).

the following aarch64 systems have picked up the patch:

  • yoga-001
  • yoga-002
  • yoga-003
  • yoga-004
  • yoga-005
  • yoga-008
  • yoga-009
  • yoga-010
  • yoga-011
  • yoga-012
  • yoga-014
  • yoga-015
  • yoga-016
  • yoga-017
  • yoga-018
  • yoga-019
  • yoga-020
  • yoga-021
  • yoga-022
  • yoga-023
  • yoga-027

this represents 21 out of 35 systems. i will check back in the morning to see if all systems have picked up the patch. if not we might need manual intervention at bitbar on the remaining systems.

none of the remaining 14 systems picked up the patch overnight so i am going to attempt to force them to do so. if that fails, i will ask bitbar to intervene when the sun comes up over santa monica boulevard...

the remaining working systems have been patched.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Blocks: 1578963

Comment on attachment 9080162 [details]
Bug 1557614 - Enable run-task on aarch64 laptop workers.

Revision D39100 was moved to bug 1578963. Setting attachment 9080162 [details] to obsolete.

Attachment #9080162 - Attachment is obsolete: true

The gecko-t-win64-aarch64-laptop workers still have a busted python 3. Example from today: https://tools.taskcluster.net/groups/ROHjH_hWT8a2ihaxnrsPUw/tasks/SDImf8bNTxGKjXojr1Y3PQ/runs/0/logs/public%2Flogs%2Flive.log

Status: RESOLVED → REOPENED
Flags: needinfo?(rthijssen)
Resolution: FIXED → ---

investigating

Flags: needinfo?(rthijssen)

i believe i have found the issues.

  • on at least 1 system (yoga-026), occ has not run and has not installed python3. generic worker is continuously rebooting the system before it has had a chance to run occ. i believe this is due to an outdated gw-wrapper script which i will attempt to patch using the elevated task mechanism.
  • on many of the rest of these systems, python 3.7 was installed by occ, however it was installed over the top of a previous python 3.6 installation. the 3.6 install includes C:\mozilla-build\python3\python3.exe. the 3.7 install does not. it includes only C:\mozilla-build\python3\python.exe. this explains the broken python behaviours, since the task linked in comment 13 above unwittingly calls the broken python 3.6 install. i have patched this by using elevated tasks to first remove C:\mozilla-build\python3\python3.exe and then replace it with a symlink to C:\mozilla-build\python3\python.exe.

on a final note, i see that the task in comment 13 appears to invoke python 2.7. i don't know if this is intentional or not but the command reads:
C:/mozilla-build/python3/python3.exe run-task -- c:\mozilla-build\python\python.exe -u mozharness\scripts\desktop_unittest.py ... which to my reasoning looks like a python 3 call wrapping a python 2 call. this might be what is intended or it might be something else.

the following yoga systems are awol from the worker explorer and not taking tasks:

  • yoga-002
  • yoga-031
  • yoga-032

if someone at bitbar has a moment, please run the following command from an elevated powershell prompt on the above machines (they should reboot themselves when it completes):

iex (New-Object Net.WebClient).DownloadString('https://raw.githubusercontent.com/mozilla-releng/OpenCloudConfig/master/userdata/rundsc.ps1')

backlog cleanup

Status: REOPENED → RESOLVED
Closed: 5 years ago2 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: