Closed Bug 1545339 Opened 6 years ago Closed 6 years ago

Some gecko-t-win7-32 workers don't have a complete python3 install

Categories

(Infrastructure & Operations :: RelOps: OpenCloudConfig, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: glandium, Assigned: grenade)

References

Details

Attachments

(1 file)

While working on bug 1525373 I got weird failures that I went on to debug, and while doing so, I got even weirder behavior.

After further investigation, it turns out C:\mozilla-build\python3, on some workers, isn't complete.

On one worker, where I reproduced the problem with some small script printing out some information, the contents of that directory was:
['DLLs', 'Lib', 'libs', 'python3.exe', 'Scripts', 'tcl']

while on a run where everything went fine, the contents were:
['DLLs', 'Doc', 'include', 'Lib', 'libs', 'LICENSE.txt', 'NEWS.txt', 'python3.dll', 'python3.exe', 'python36.dll', 'pythonw.exe', 'Scripts', 'tcl', 'Tools', 'vcruntime140.dll']

Running python3.exe in the former just outputs "Exit Code: -1073741515"

FWIW, the worker I got with the incomplete python3 was i-08c0fce6e303ca682.

Another one: i-0b4d5d0ab0112d3da

Summary: Some windows workers don't have a complete python3 install → Some gecko-t-win7-32 workers don't have a complete python3 install

There's another kind of broken workers that have a different set of files, and fail differently:
['DLLs', 'Lib', 'LICENSE.txt', 'NEWS.txt', 'python3.dll', 'python3.exe', 'python36.dll', 'pythonw.exe', 'Scripts', 'vcruntime140.dll']

Those fail with:

Fatal Python error: Py_Initialize: unable to load the file system codec
ModuleNotFoundError: No module named 'encodings'

Example of worker that I got in that situation: i-0c3fc6af7e360d426

Assignee: nobody → rthijssen
Status: NEW → ASSIGNED
Component: Workers → RelOps: OpenCloudConfig
Product: Taskcluster → Infrastructure & Operations
QA Contact: rthijssen

debugging today, i can see that the python install is very frequently failing on windows 7 with log messages like this:

May 07 14:33:04 i-0fbe0390616d5fac9.gecko-t-win7-32.euc1.mozilla.com occ-dsc: Invoke-LoggedCommandRun (Python3) :: command (C:\windows\Temp\0aecc2a136909051f4099015b9cc0ac52155160203e1cea2c82f397178818388c43d152b8678b98cd8a8d871626204909b21ba91b0d7aae566642f2f81570ebd.exe /quiet InstallAllUsers=1 TargetDir=C:\mozilla-build\python3) exited with code: 1603 after a processing time of: 00:00:00.0312002

1603 is a generic "Fatal Error During Installation", which is not helpful as to what the hell is going on :(
MS site has a few suggestions of things to try:
https://support.microsoft.com/en-us/help/834484/you-receive-an-error-1603-a-fatal-error-occurred-during-installation

this is a fairly ugly rabbit hole.

  • on 64 bit windows, we install a newer version of mozilla-build which contains python 3
  • on 32 bit windows, we install mozilla-build 2.2 which was the last version to support 32 bit systems and does not contain python 3
  • we install python 3 using the exe installer which creates (among other things) c:\mozilla-build\python3\python.exe
  • since we want c:\mozilla-build\python3 directory in the system path (see bug 1505057) but we also want python.exe calls to default to python 2, we rename c:\mozilla-build\python3\python.exe to c:\mozilla-build\python3\python3.exe so that calls to python are deterministic about which python version gets called
  • the python 3 installer in use (https://www.python.org/ftp/python/3.6.3/python-3.6.3.exe) has an interesting quirk in that it does not wait for the install to complete, before returning the session. this means that our bootstrap process thinks the python 3 install has completed, when it is in fact still in progress.
  • because of this, our current incomplete installation errors are caused by the rename of python3\python.exe to python3\python3.exe while the python 3 installer is still running. the installer actually calls and runs python3\python.exe as part of the install process in order to build components of the install. that process is interrupted by the rename that is triggered before the python 3 install has completed.

i am testing several mechanisms to prevent the rename from occurring, before the install process has completed. the most promising of which is also the ugliest. eg:

python-3.6.3.exe /quiet /repair InstallAllUsers=1 TargetDir=C:\mozilla-build\python3 && sleep 120
Attached file GitHub Pull Request

i tried a number of approaches to making occ wait for the python exe install to complete and none were reliable. the problem is that the installer spawns a number of background processes to install the various python components and there isn't a good mechanism for tracking completion on all of them.

what does work is to use the individual msi component installers and install them each individually. no msi installer that includes all components is provided so, as far as i can see, this is the only reliable mechanism that is available to us.

i have tested this on the windows 7 beta workers and consistently arrived at complete installations of python 3. all other mechanisms i tried resulted in roughly a 50/50 chance of a complete installation existing at the time of first task execution.

i figured that i would like this time investment in fixing the windows 7 python 3 installer for our infra, to last a little while so i opted to also update python 3 to python 3.7.3 which is the current recommended stable version (we were using 3.6.3), although the same mechanism would work for 3.6.3 if we want to do this all over again soon.

Attachment #9064700 - Flags: review?(mcornmesser)
Attachment #9064700 - Flags: feedback?(mh+mozilla)
Attachment #9064700 - Flags: review?(mcornmesser) → review+

deployment is complete

Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Comment on attachment 9064700 [details] [review] GitHub Pull Request Seems to have worked.
Attachment #9064700 - Flags: feedback?(mh+mozilla) → feedback+
Blocks: 1557614
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: