Intermittent-infra mozmake.EXE[5]: *** [win32.obj] Error 127 OR [win64.obj] Error 127

RESOLVED FIXED in mozilla56

Status

defect
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: aryx, Assigned: pmoore)

Tracking

({intermittent-failure})

Trunk
mozilla56
Dependency tree / graph

Details

(Whiteboard: [stockwell infra])

Attachments

(1 attachment)

https://treeherder.mozilla.org/logviewer.html#?job_id=98866334&repo=autoland

08:01:59     INFO -  mozmake.EXE[5]: Leaving directory 'z:/build/build/src/obj-firefox/dom/plugins/test/testplugin/secondplugin'
08:01:59     INFO -  dtlscon.c
08:01:59     INFO -  z:/build/build/src/sccache2/sccache.exe z:/build/build/src/vs2015u3/VC/bin/amd64_x86/cl.exe -Foprfile.obj -c  -DNDEBUG=1 -DTRIMMED=1 -D_NSPR_BUILD_ -DWIN32 -DXP_PC -D_PR_GLOBAL_THREADS_ONLY -DWIN95 -UWINNT -D_X86_ -Iz:/build/build/src/config/external/nspr/pr -Iz:/build/build/src/obj-firefox/config/external/nspr/pr -Iz:/build/build/src/config/external/nspr -Iz:/build/build/src/nsprpub/pr/include -Iz:/build/build/src/nsprpub/pr/include/private -Iz:/build/build/src/obj-firefox/dist/include  -Iz:/build/build/src/obj-firefox/dist/include/nspr -Iz:/build/build/src/obj-firefox/dist/include/nss        -MD -FI z:/build/build/src/obj-firefox/mozilla-config.h -DMOZILLA_CLIENT -deps.deps/prfile.obj.pp  -TC -nologo -wd4091 -D_HAS_EXCEPTIONS=0 -W3 -Gy -Zc:inline -utf-8 -arch:SSE2 -Gw -wd4244 -wd4267 -we4553  -Z7 -O1 -Oi -Oy-    z:/build/build/src/nsprpub/pr/src/io/prfile.c
08:01:59     INFO -  z:/build/build/src/sccache2/sccache.exe z:/build/build/src/vs2015u3/VC/bin/amd64_x86/cl.exe -Fos_asinh.obj -c -Iz:/build/build/src/obj-firefox/dist/stl_wrappers  -DNDEBUG=1 -DTRIMMED=1 -DMOZ_HAS_MOZGLUE -Iz:/build/build/src/modules/fdlibm/src -Iz:/build/build/src/obj-firefox/modules/fdlibm/src  -Iz:/build/build/src/obj-firefox/dist/include  -Iz:/build/build/src/obj-firefox/dist/include/nspr -Iz:/build/build/src/obj-firefox/dist/include/nss        -MD -FI z:/build/build/src/obj-firefox/mozilla-config.h -DMOZILLA_CLIENT -deps.deps/s_asinh.obj.pp  -TP -nologo -wd5026 -wd5027 -Zc:sizedDealloc- -wd4091 -wd4577 -D_HAS_EXCEPTIONS=0 -W3 -Gy -Zc:inline -utf-8 -arch:SSE2 -Gw -wd4251 -wd4244 -wd4267 -wd4800 -wd4595 -we4553 -GR-  -Z7 -O1 -Oi -Oy- -WX -wd4018 -wd4146 -wd4305 -wd4723 -wd4756   z:/build/build/src/modules/fdlibm/src/s_asinh.cpp
08:01:59     INFO -  z:\build\build\src\js\src\ctypes\libffi\msvcc.sh: line 235: cl: command not found
08:01:59     INFO -  z:/build/build/src/config/rules.mk:1055: recipe for target 'win32.obj' failed
08:01:59     INFO -  mozmake.EXE[5]: *** [win32.obj] Error 127
08:01:59     INFO -  mozmake.EXE[5]: Leaving directory 'z:/build/build/src/obj-firefox/config/external/ffi'
08:01:59     INFO -  z:/build/build/src/config/recurse.mk:73: recipe for target 'config/external/ffi/target' failed
Summary: Intermittent-infra → Intermittent-infra mozmake.EXE[5]: *** [win32.obj] Error 127 OR [win64.obj] Error 127
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Since this bug is about taskcluster jobs running on taskcluster instances where (insert some handwaving here) something starts deleting things like entire directories or essential programs like cl.exe in the middle of a build, let's let taskcluster have it.
Component: Mozharness → General
Product: Release Engineering → Taskcluster
Version: unspecified → Trunk
Pete, do you have some background on this?
Assignee: nobody → pmoore
Comment hidden (Intermittent Failures Robot)
Whiteboard: [stockwell infra]
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Blocks: 1367404
Blocks: 1365918
Blocks: 1367329
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
just a thought:

very early in the build-on-taskcluster-windows experiment, we had weird problems with builds failing with strange race conditions that we didn't properly understand.

we could make the builds succeed by adding -j1 to the mach command. as i understand it, this forced mach to do everything sequentially rather than in parallel. the downside was that builds using -j1 would take upwards of 4 hours to complete.

one day we discovered, through much trial and error, that wrapping calls to mach with bash.exe (instead of python.exe), magically made all the race conditions go away without the use of -j1 so building in parallel just worked. we didn't try to understand why, we were just very happy for our good fortune.

i notice that on may 3rd, changes landed that got rid of the magic bash hack (see https://hg.mozilla.org/mozilla-central/rev/843439b1f0d5#l1.22). i also note that this bug was opened 10 days later. i don't know if this is a coincidence.
Assignee

Updated

2 years ago
See Also: → 1361912
I'm going to push a backout of that patch to see what happens.
Keywords: leave-open

Comment 13

2 years ago
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/54163bd59f7b
Add back the hack invoking mach via bash to see if it makes the TC build machines happy again. r=pmoore
And so we're clear, here's a rough list of the problems that may have been caused by reverting that hack back in early May:
https://docs.google.com/spreadsheets/d/1T5SL6jflnRByIVfNt4-MNiXQje4Ov6qMWiqs42L0hcg/edit#gid=0
I did 10 runs of every TC Windows build job on the push in comment 13 and not a single version of the failures covered by lines 2-7 in the spreadsheet. I think we have a winner!

Greg, do you want to investigate this more for a root cause or should we close the bug out when it merges around?
Flags: needinfo?(gps)
Comment hidden (Intermittent Failures Robot)
I'd love to investigate root cause. But my understanding is grenade and others burned hours on this esoteric workaround as part of early TC work. I'm not inclined to spend several hours to reach the same head scratching conclusion. The workaround - hacky and mysterious as it is - works for me.
Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(gps)
Resolution: --- → FIXED
arr has asked me to annotate the code-base pointing to this bug, to warn future refactor authors
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment hidden (mozreview-request)
reopen is only to get review-board to accept the commit. can be closed again on merge.
Duplicate of this bug: 1365918
Duplicate of this bug: 1367404
No longer blocks: 1367329

Comment 24

2 years ago
mozreview-review
Comment on attachment 8876650 [details]
Bug 1364651 - annotate mach bash hack;

https://reviewboard.mozilla.org/r/147998/#review152514

::: testing/mozharness/mozharness/mozilla/building/buildbase.py:1626
(Diff revision 1)
>              self.copyfile(
>                  buildprops,
>                  os.path.join(dirs['abs_work_dir'], 'buildprops.json'))
>  
>          if 'MOZILLABUILD' in os.environ:
> +            # here be dragons. see bug 1364651

NIT: lets put more actual info into the comment than a scary pointer at a bug:

"We found many issues with intermittent build failures when not invoking mach via bash. See bug 1364651 before considering changing"

or some such.

All in all though, +1 to commenting and I won't block on a second round of review.
Attachment #8876650 - Flags: review?(bugspam.Callek) → review+
Comment hidden (mozreview-request)
Assignee

Comment 26

2 years ago
(In reply to Justin Wood (:Callek) from comment #24)
> NIT: lets put more actual info into the comment than a scary pointer at a
> bug:
> 
> "We found many issues with intermittent build failures when not invoking
> mach via bash. See bug 1364651 before considering changing"
> 
> or some such.

FWIW I (personally) prefer the succintness and immediate danger of "here be dragons". A wordy comment can be more readily overlooked.

Comment 27

2 years ago
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/71a661bc483c
annotate mach bash hack; r=Callek
Keywords: checkin-needed

Comment 28

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/71a661bc483c
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Comment hidden (Intermittent Failures Robot)
You need to log in before you can comment on or make changes to this bug.