At least 4 instances following the pattern: - hg pull into shared from hg.m.o succeeds - hg clone shared into builder's space succeeds - hg update -C times out (40 min) Occurred on: - b-2008-ix-0127 http://buildbot-master82.srv.releng.scl3.mozilla.com:8001/builders/release-mozilla-beta-win32_repack_4%2F10/builds/21 - b-2008-ix-0117 http://buildbot-master85.srv.releng.scl3.mozilla.com:8001/builders/release-mozilla-beta-win32_repack_2%2F10/builds/17 - b-2008-ix-0104 http://buildbot-master85.srv.releng.scl3.mozilla.com:8001/builders/release-mozilla-beta-win32_repack_5%2F10/builds/18 - b-2008-ix-0172 http://buildbot-master82.srv.releng.scl3.mozilla.com:8001/builders/release-mozilla-beta-win32_repack_5%2F10/builds/19
One thing I noted to Hal was that I disabled all the b-2008-sm slaves today, so we might be using b-2008-ix machines of questionable pedigree. At the very least, these slaves may never have cloned mozilla-beta. I pulled b-2008-ix-0172, and ran an |hg update -C| by hand on c:\builds\moz2_slave\rel-m-beta-w32_rpk_5-000000000\mozilla-beta. It took 13m to complete, but I don't know whether that would have been affected by the attempt in the failed job: http://buildbot-master82.srv.releng.scl3.mozilla.com:8001/builders/release-mozilla-beta-win32_repack_5%2F10/builds/19/steps/run_script/logs/stdio A subsequent |hg update -C| on the same dir completed in just a few seconds.
note that those boxes are using hg client version 1.9.1 - while old, there are no mentions of share related bugs being fixed in subsequent releases (we don't use the unshare feature, which did receive bug fixes) (client will be updated in bug 1056981)
FTR, the same happened for 31.1.0esr: b-2008-ix-0126: http://buildbot-master84.srv.releng.scl3.mozilla.com:8001/builders/release-mozilla-esr31-win32_build/builds/1 b-2008-ix-0164: http://buildbot-master84.srv.releng.scl3.mozilla.com:8001/builders/release-mozilla-esr31-win32_build/builds/2 b-2008-ix-0164: http://buildbot-master84.srv.releng.scl3.mozilla.com:8001/builders/release-mozilla-esr31-win32_build/builds/3 b-2008-ix-0103: http://buildbot-master84.srv.releng.scl3.mozilla.com:8001/builders/release-mozilla-esr31-win32_build/builds/4
Anyone have any thoughts on how to "prime" these builders for all the various builds we have coming up over the next 2 weeks? Or just take it as a possible issue on first build from idle branch?
FTR, also same for 24.8.0esr: - b-2008-ix-0071 http://buildbot-master86.srv.releng.scl3.mozilla.com:8001/builders/release-mozilla-esr24-win32_repack_9%2F10/builds/2 - b-2008-ix-0109 http://buildbot-master84.srv.releng.scl3.mozilla.com:8001/builders/release-mozilla-esr24-win32_repack_4%2F10/builds/3 NOTE: this was timeout on operation after failed hg clone from hg.m.o into shared area - b-2008-ix-0004 http://buildbot-master84.srv.releng.scl3.mozilla.com:8001/builders/release-mozilla-esr24-win32_repack_4%2F10/builds/4 NOTE: this was timeout on clobber
Per postmortem meeting, extending to include anything that looks like hung/slow local disk I/O. Changed summary to reflect that. Also seems to be part of the general unhappiness of win32 builds, so blocks bug 1026870 On b-2008-ix-0168 for TB 31.1.0 build, this occurred during a purge operation: http://buildbot-master86.srv.releng.scl3.mozilla.com:8001/builders/release-comm-esr31-win32_repack_10%2F10/builds/0
on b-2008-ix-0109 for TB 31.1.0 build, repack (local disk intensive operation) failed at 40m timeout: http://buildbot-master84.srv.releng.scl3.mozilla.com:8001/builders/release-comm-esr31-win32_repack_8%2F10/builds/1
The root cause was believed to be machines that were not upgraded to VS2013 correctly (half upgrade) which then proceeded to finish the upgrade during a build. See bug 1062877.