Closed Bug 488345 Opened 15 years ago Closed 15 years ago

Windows 1.9.0.x unittest boxes are frequently red (often with "command timed out: 1200 seconds without output" during CVS checkout)

Categories

(Release Engineering :: General, defect, P2)

x86
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dholbert, Assigned: nthomas)

References

Details

(Keywords: intermittent-failure)

Over the past 24 hours, there have been an exceptionally large number of red cycles in the Firefox 3.0 unittest boxes.

Here are the logs & errors (with duplicates grouped together):

"LINK : fatal error LNK1104: cannot open file 'npbasic.lib'"
============================================================
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.0/1239650398.1239653571.25639.gz
WINNT 5.2 fx-win32-1.9-slave09 (pgo01) dep unit test on 2009/04/13 12:19:58

During mochitest: "command timed out: 1200 seconds without output"
==================================================================
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.0/1239650398.1239654282.29093.gz
WINNT 5.2 fx-win32-1.9-slave07 dep unit test on 2009/04/13 12:19:58
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.0/1239696000.1239701650.17426.gz
WINNT 5.2 fx-win32-1.9-slave08 dep unit test on 2009/04/14 01:00:00

During CVS checkout: "command timed out: 1200 seconds without output"
=====================================================================
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.0/1239682798.1239684195.21662.gz
WINNT 5.2 fx-win32-1.9-slave09 (pgo01) dep unit test on 2009/04/13 21:19:58
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.0/1239737340.1239738787.30551.gz
WINNT 5.2 fx-win32-1.9-slave08 dep unit test on 2009/04/14 12:29:00
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.0/1239737340.1239738846.30644.gz
WINNT 5.2 fx-win32-1.9-slave09 (pgo01) dep unit test on 2009/04/14 12:29:00

The CVS checkout one is most annoying, because (a) it's been the most frequent during this time-period, (b) it means no testing whatsoever gets done, and (c) it's definitely not code-related
Blocks: 438871
Whiteboard: [orange]
Summary: Windows 1.9.0.x unittest boxes are frequently orange/red (often with "command timed out: 1200 seconds without output" during CVS checkout) → Windows 1.9.0.x unittest boxes are frequently red (often with "command timed out: 1200 seconds without output" during CVS checkout)
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → DUPLICATE
Sorry Ben tells me, they are not related.  So re-opening.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
RE - can you help investigate?
Assignee: server-ops → nobody
Component: Server Operations: Tinderbox Maintenance → Release Engineering
QA Contact: mrz → release
I'll take a look at this.
Assignee: nobody → nthomas
Priority: -- → P2
Machines are 
  fx-win32-1.9-slave07 - non-PGO - c-fcal-build-002
  fx-win32-1.9-slave08 - non-PGO - eql01-bm03
  fx-win32-1.9-slave09 - PGO     - eql01-bm03

All are running on the (busier) INTEL01 ESX cluster. 07 and 09 have out of date VMWare tools.

I've moved 07 to INTEL02 and updated VMWare tools.
07 completed checkout and compile about a minute quicker than the previous run (both sub 10 minute tasks), so that's a promising sign given the mochitest hangs that we've been mostly having.
I moved 08 over to INTEL02 as well but it failed the first build in mochitest:
 *** 23534 INFO PASS | at min font size 18, 28px should compute to 28px
 *** 23536 INFO Running /tests/layout/style/test/test_bug405818.html...

 command timed out: 1200 seconds without output
 SIGKILL failed to kill process
Clobbered this slave, but I suspect there is a mochitest crasher hanging around.

09 I also moved over to INTEL02, its doing a build now.
Haven't had any cvs timouts since the VM moves. IT are also monitoring the backend storage much more closely and should catch slowness there before it impacts too much.
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
Whiteboard: [orange]
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.