Closed Bug 691467 Opened 13 years ago Closed 13 years ago

hg_tool.py should clobber busted repo on non-share code path too (Win64 builds frequently fail in make buildsymbols with "Failed to get HG Repo for e:\builds\moz2_slave\(tree)\build"

Categories

(Release Engineering :: General, defect, P5)

x86_64
Windows Server 2008
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Assigned: catlee)

References

Details

(Whiteboard: [win64][hg])

The fallout exception about "UnboundLocalError: local variable 'cleanroot' referenced before assignment" is bug 534992, but before that,

https://tbpl.mozilla.org/php/getParsedLog.php?id=6656226&tree=Fx-Team
WINNT 6.1 x86-64 fx-team build on 2011-10-03 10:44:16 PDT for push cc34d8625f6b

echo building symbol store
building symbol store
rm -f -r ./dist/crashreporter-symbols
rm -f "./dist/firefox-10.0a1.en-US.win64-x86_64.crashreporter-symbols.zip"
e:/builds/moz2_slave/fx-team-w64/build/obj-firefox/config/nsinstall.exe -D ./dist/crashreporter-symbols
c:/mozilla-build/python/python2.6.exe /e/builds/moz2_slave/fx-team-w64/build/toolkit/crashreporter/tools/symbolstore.py \
	  -c --vcs-info -i                                          \
	  -s /e/builds/moz2_slave/fx-team-w64/build               \
	  /e/builds/moz2_slave/fx-team-w64/build/toolkit/crashreporter/tools/win32/dump_syms_vc1500.exe                                                \
	  ./dist/crashreporter-symbols                                   \
	  . >                                        \
	  ./dist/crashreporter-symbols/firefox-10.0a1-WINNT-20111003104333-win64-fx-team-symbols.txt
Processing file: .\accessible\public\ia2\IA2Marshal.pdb
Failed to get HG Repo for e:\builds\moz2_slave\fx-team-w64\build
Whiteboard: [win64]
The one mozilla-central & several ionmonkey failures are all w64-ix-slave17, ux are w64-ix-slave12, fx-team is w64-ix-slave19.

Looking more closely at slave12, and assuming the rest are the same, the recent history is:
Build number	Master	Result	Start time
25	bm13-build1	2	2011-10-09 16:49:55
24	bm13-build1	2	2011-10-09 13:44:03
23	bm13-build1	4	2011-10-07 09:45:56
22	bm13-build1	0	2011-10-01 16:59:55
21	bm13-build1	0	2011-09-29 21:26:55

Build 23 was a clone timeout:
command: START
command: hg clone -r 70e30fe7d2ab198263e535ecb6810149d5b1ecd8 http://hg.mozilla.org/projects/ux e:\\\\builds\\\\moz2_slave\\\\ux-w64\\\\build
command: cwd: e:\builds\moz2_slave\ux-w64
command: output:

command timed out: 3600 seconds without output, attempting to kill
SIGKILL failed to kill process
using fake rc=-1
program finished with exit code -1
----
which is neatly covered by bug 693202.

Then build 24 doesn't mention clobbering and does an hg *pull*, and got
 added 78317 changesets with 373134 changes to 80355 files
effectively a clone of all the history into a malformed existing repo. It failed in 'make buildsymbols' because there is no ux-w64\.hg\hgrc file. Build 25 also fails on buildsymbols.

The other machines/branches have a similar pattern - a purple build followed by red - so I suspect they really are the same.
The short term fix is to clobber the appropriate slave-branch combinations, namely:
* w64-ix-slave17 ionmonkey
* w64-ix-slave12 ux
which I've done. The other combinations have had green build since the failures, either someone set a clobber or we went a week without using the build dir and got auto-clobbered.

The longer term fixes are
* make hg.m.o not take too long to clone (bug 693202)
* hg_tool.py should recover better when buildbot kills it, on the non-share code path (this bug)

Note that we don't use hg share on win64, but do on other platforms. Bug 685124 fixed the same problem on the code path that does use 'hg share'.
Blocks: 685124
Depends on: 693202
Summary: Win64 builds frequently fail in make buildsymbols with "Failed to get HG Repo for e:\builds\moz2_slave\(tree)\build" → hg_tool.py should clobber busted repo on non-share code path too (Win64 builds frequently fail in make buildsymbols with "Failed to get HG Repo for e:\builds\moz2_slave\(tree)\build"
I've added the hgrc for the win64 jaegermonkey build on w64-ix-slave11, since it failed to clone and is currently compiling a second build.
OS: Windows Server 2003 → Windows Server 2008
Priority: -- → P5
Whiteboard: [win64] → [win64][hg]
Maybe a broadening, maybe actually a separate bug:

https://tbpl.mozilla.org/php/getParsedLog.php?id=6946490&tree=Mozilla-Inbound is w32-ix-slave41 doing a Win32 build, "Disabling sharing since share extension doesn't seem to work," then timing out doing a full clone, then bug 690232 SIGKILL failed to kill process.

https://tbpl.mozilla.org/php/getParsedLog.php?id=6948378&tree=Mozilla-Inbound is its next mozilla-inbound build, when it must have hit this, given itself a busted repo, and died in buildsymbols.
https://tbpl.mozilla.org/php/getParsedLog.php?id=6951179&tree=Firefox (retrigger must have beaten the clobberer)
catlee has a fix for this in bug 687064.
Depends on: 687064
Assignee: nobody → catlee
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.