series of local disk operations timeouts on win32 builders during release builds

RESOLVED FIXED

Status

Release Engineering
General Automation
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: hwine, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [release-impacting])

(Reporter)

Description

3 years ago
At least 4 instances following the pattern:
 - hg pull into shared from hg.m.o succeeds
 - hg clone shared into builder's space succeeds
 - hg update -C times out (40 min)

Occurred on:
 - b-2008-ix-0127 http://buildbot-master82.srv.releng.scl3.mozilla.com:8001/builders/release-mozilla-beta-win32_repack_4%2F10/builds/21
 - b-2008-ix-0117 http://buildbot-master85.srv.releng.scl3.mozilla.com:8001/builders/release-mozilla-beta-win32_repack_2%2F10/builds/17
 - b-2008-ix-0104 http://buildbot-master85.srv.releng.scl3.mozilla.com:8001/builders/release-mozilla-beta-win32_repack_5%2F10/builds/18
 - b-2008-ix-0172 http://buildbot-master82.srv.releng.scl3.mozilla.com:8001/builders/release-mozilla-beta-win32_repack_5%2F10/builds/19

Comment 1

3 years ago
One thing I noted to Hal was that I disabled all the b-2008-sm slaves today, so we might be using b-2008-ix machines of questionable pedigree. At the very least, these slaves may never have cloned mozilla-beta.

I pulled b-2008-ix-0172, and ran an |hg update -C| by hand on c:\builds\moz2_slave\rel-m-beta-w32_rpk_5-000000000\mozilla-beta. It took 13m to complete, but I don't know whether that would have been affected by the attempt in the failed job:

http://buildbot-master82.srv.releng.scl3.mozilla.com:8001/builders/release-mozilla-beta-win32_repack_5%2F10/builds/19/steps/run_script/logs/stdio

A subsequent |hg update -C| on the same dir completed in just a few seconds.
(Reporter)

Comment 2

3 years ago
note that those boxes are using hg client version 1.9.1 - while old, there are no mentions of share related bugs being fixed in subsequent releases (we don't use the unshare feature, which did receive bug fixes) (client will be updated in bug 1056981)
FTR, the same happened for 31.1.0esr:

b-2008-ix-0126: http://buildbot-master84.srv.releng.scl3.mozilla.com:8001/builders/release-mozilla-esr31-win32_build/builds/1
b-2008-ix-0164: http://buildbot-master84.srv.releng.scl3.mozilla.com:8001/builders/release-mozilla-esr31-win32_build/builds/2
b-2008-ix-0164: http://buildbot-master84.srv.releng.scl3.mozilla.com:8001/builders/release-mozilla-esr31-win32_build/builds/3
b-2008-ix-0103: http://buildbot-master84.srv.releng.scl3.mozilla.com:8001/builders/release-mozilla-esr31-win32_build/builds/4
Summary: series of local hg update -C timeouts on win32 builders during ff32.0b9 build1 → series of local hg update -C timeouts on win32 builders during release builds
(Reporter)

Comment 4

3 years ago
Anyone have any thoughts on how to "prime" these builders for all the various builds we have coming up over the next 2 weeks? Or just take it as a possible issue on first build from idle branch?
(Reporter)

Comment 5

3 years ago
FTR, also same for 24.8.0esr:
 - b-2008-ix-0071 http://buildbot-master86.srv.releng.scl3.mozilla.com:8001/builders/release-mozilla-esr24-win32_repack_9%2F10/builds/2
 - b-2008-ix-0109 http://buildbot-master84.srv.releng.scl3.mozilla.com:8001/builders/release-mozilla-esr24-win32_repack_4%2F10/builds/3
   NOTE: this was timeout on operation after failed hg clone from hg.m.o into shared area

 - b-2008-ix-0004 http://buildbot-master84.srv.releng.scl3.mozilla.com:8001/builders/release-mozilla-esr24-win32_repack_4%2F10/builds/4
   NOTE: this was timeout on clobber
(Reporter)

Comment 6

3 years ago
Per postmortem meeting, extending to include anything that looks like hung/slow local disk I/O. Changed summary to reflect that. Also seems to be part of the general unhappiness of win32 builds, so blocks bug 1026870

On b-2008-ix-0168 for TB 31.1.0 build, this occurred during a purge operation:
 http://buildbot-master86.srv.releng.scl3.mozilla.com:8001/builders/release-comm-esr31-win32_repack_10%2F10/builds/0
Blocks: 1026870
Summary: series of local hg update -C timeouts on win32 builders during release builds → series of local disk operations timeouts on win32 builders during release builds
Whiteboard: [release-impacting]
(Reporter)

Comment 7

3 years ago
on b-2008-ix-0109 for TB 31.1.0 build, repack (local disk intensive operation) failed at 40m timeout:
 http://buildbot-master84.srv.releng.scl3.mozilla.com:8001/builders/release-comm-esr31-win32_repack_8%2F10/builds/1
Blocks: 1062877
The root cause was believed to be machines that were not upgraded to VS2013 correctly (half upgrade) which then proceeded to finish the upgrade during a build. See bug 1062877.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.