Closed Bug 1364651 Opened 7 years ago Closed 7 years ago

Intermittent-infra mozmake.EXE[5]: *** [win32.obj] Error 127 OR [win64.obj] Error 127

Categories

(Taskcluster :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla56

People

(Reporter: aryx, Assigned: pmoore)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell infra])

Attachments

(1 file)

https://treeherder.mozilla.org/logviewer.html#?job_id=98866334&repo=autoland

08:01:59     INFO -  mozmake.EXE[5]: Leaving directory 'z:/build/build/src/obj-firefox/dom/plugins/test/testplugin/secondplugin'
08:01:59     INFO -  dtlscon.c
08:01:59     INFO -  z:/build/build/src/sccache2/sccache.exe z:/build/build/src/vs2015u3/VC/bin/amd64_x86/cl.exe -Foprfile.obj -c  -DNDEBUG=1 -DTRIMMED=1 -D_NSPR_BUILD_ -DWIN32 -DXP_PC -D_PR_GLOBAL_THREADS_ONLY -DWIN95 -UWINNT -D_X86_ -Iz:/build/build/src/config/external/nspr/pr -Iz:/build/build/src/obj-firefox/config/external/nspr/pr -Iz:/build/build/src/config/external/nspr -Iz:/build/build/src/nsprpub/pr/include -Iz:/build/build/src/nsprpub/pr/include/private -Iz:/build/build/src/obj-firefox/dist/include  -Iz:/build/build/src/obj-firefox/dist/include/nspr -Iz:/build/build/src/obj-firefox/dist/include/nss        -MD -FI z:/build/build/src/obj-firefox/mozilla-config.h -DMOZILLA_CLIENT -deps.deps/prfile.obj.pp  -TC -nologo -wd4091 -D_HAS_EXCEPTIONS=0 -W3 -Gy -Zc:inline -utf-8 -arch:SSE2 -Gw -wd4244 -wd4267 -we4553  -Z7 -O1 -Oi -Oy-    z:/build/build/src/nsprpub/pr/src/io/prfile.c
08:01:59     INFO -  z:/build/build/src/sccache2/sccache.exe z:/build/build/src/vs2015u3/VC/bin/amd64_x86/cl.exe -Fos_asinh.obj -c -Iz:/build/build/src/obj-firefox/dist/stl_wrappers  -DNDEBUG=1 -DTRIMMED=1 -DMOZ_HAS_MOZGLUE -Iz:/build/build/src/modules/fdlibm/src -Iz:/build/build/src/obj-firefox/modules/fdlibm/src  -Iz:/build/build/src/obj-firefox/dist/include  -Iz:/build/build/src/obj-firefox/dist/include/nspr -Iz:/build/build/src/obj-firefox/dist/include/nss        -MD -FI z:/build/build/src/obj-firefox/mozilla-config.h -DMOZILLA_CLIENT -deps.deps/s_asinh.obj.pp  -TP -nologo -wd5026 -wd5027 -Zc:sizedDealloc- -wd4091 -wd4577 -D_HAS_EXCEPTIONS=0 -W3 -Gy -Zc:inline -utf-8 -arch:SSE2 -Gw -wd4251 -wd4244 -wd4267 -wd4800 -wd4595 -we4553 -GR-  -Z7 -O1 -Oi -Oy- -WX -wd4018 -wd4146 -wd4305 -wd4723 -wd4756   z:/build/build/src/modules/fdlibm/src/s_asinh.cpp
08:01:59     INFO -  z:\build\build\src\js\src\ctypes\libffi\msvcc.sh: line 235: cl: command not found
08:01:59     INFO -  z:/build/build/src/config/rules.mk:1055: recipe for target 'win32.obj' failed
08:01:59     INFO -  mozmake.EXE[5]: *** [win32.obj] Error 127
08:01:59     INFO -  mozmake.EXE[5]: Leaving directory 'z:/build/build/src/obj-firefox/config/external/ffi'
08:01:59     INFO -  z:/build/build/src/config/recurse.mk:73: recipe for target 'config/external/ffi/target' failed
Summary: Intermittent-infra → Intermittent-infra mozmake.EXE[5]: *** [win32.obj] Error 127 OR [win64.obj] Error 127
Since this bug is about taskcluster jobs running on taskcluster instances where (insert some handwaving here) something starts deleting things like entire directories or essential programs like cl.exe in the middle of a build, let's let taskcluster have it.
Component: Mozharness → General
Product: Release Engineering → Taskcluster
Version: unspecified → Trunk
Pete, do you have some background on this?
Assignee: nobody → pmoore
Whiteboard: [stockwell infra]
Blocks: 1367404
Blocks: 1365918
Blocks: 1367329
just a thought:

very early in the build-on-taskcluster-windows experiment, we had weird problems with builds failing with strange race conditions that we didn't properly understand.

we could make the builds succeed by adding -j1 to the mach command. as i understand it, this forced mach to do everything sequentially rather than in parallel. the downside was that builds using -j1 would take upwards of 4 hours to complete.

one day we discovered, through much trial and error, that wrapping calls to mach with bash.exe (instead of python.exe), magically made all the race conditions go away without the use of -j1 so building in parallel just worked. we didn't try to understand why, we were just very happy for our good fortune.

i notice that on may 3rd, changes landed that got rid of the magic bash hack (see https://hg.mozilla.org/mozilla-central/rev/843439b1f0d5#l1.22). i also note that this bug was opened 10 days later. i don't know if this is a coincidence.
See Also: → 1361912
I'm going to push a backout of that patch to see what happens.
Keywords: leave-open
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/54163bd59f7b
Add back the hack invoking mach via bash to see if it makes the TC build machines happy again. r=pmoore
And so we're clear, here's a rough list of the problems that may have been caused by reverting that hack back in early May:
https://docs.google.com/spreadsheets/d/1T5SL6jflnRByIVfNt4-MNiXQje4Ov6qMWiqs42L0hcg/edit#gid=0
I did 10 runs of every TC Windows build job on the push in comment 13 and not a single version of the failures covered by lines 2-7 in the spreadsheet. I think we have a winner!

Greg, do you want to investigate this more for a root cause or should we close the bug out when it merges around?
Flags: needinfo?(gps)
I'd love to investigate root cause. But my understanding is grenade and others burned hours on this esoteric workaround as part of early TC work. I'm not inclined to spend several hours to reach the same head scratching conclusion. The workaround - hacky and mysterious as it is - works for me.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(gps)
Resolution: --- → FIXED
arr has asked me to annotate the code-base pointing to this bug, to warn future refactor authors
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
reopen is only to get review-board to accept the commit. can be closed again on merge.
No longer blocks: 1367329
Comment on attachment 8876650 [details]
Bug 1364651 - annotate mach bash hack;

https://reviewboard.mozilla.org/r/147998/#review152514

::: testing/mozharness/mozharness/mozilla/building/buildbase.py:1626
(Diff revision 1)
>              self.copyfile(
>                  buildprops,
>                  os.path.join(dirs['abs_work_dir'], 'buildprops.json'))
>  
>          if 'MOZILLABUILD' in os.environ:
> +            # here be dragons. see bug 1364651

NIT: lets put more actual info into the comment than a scary pointer at a bug:

"We found many issues with intermittent build failures when not invoking mach via bash. See bug 1364651 before considering changing"

or some such.

All in all though, +1 to commenting and I won't block on a second round of review.
Attachment #8876650 - Flags: review?(bugspam.Callek) → review+
(In reply to Justin Wood (:Callek) from comment #24)
> NIT: lets put more actual info into the comment than a scary pointer at a
> bug:
> 
> "We found many issues with intermittent build failures when not invoking
> mach via bash. See bug 1364651 before considering changing"
> 
> or some such.

FWIW I (personally) prefer the succintness and immediate danger of "here be dragons". A wordy comment can be more readily overlooked.
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/71a661bc483c
annotate mach bash hack; r=Callek
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/71a661bc483c
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: