Closed Bug 624210 Opened 14 years ago Closed 14 years ago

attach linux-ix-slave06 to buildbot-master3.b.m.o when it's back

Categories

(Release Engineering :: General, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Unassigned)

References

Details

(Whiteboard: [buildslaves][hardware][slaveduty])

The 'rm_configs' step of builder "Android R7 tryserver build" has been failing repeatedly on slave linux-ix-slave06. rm -rf configs in dir /builds/slave/try-mb-br-andrd-r7-bld/build (timeout 1200 secs) watching logfiles {} argv: ['rm', '-rf', 'configs'] environment: ... closing stdin using PTY: True command timed out: 1200 seconds without output, killing pid 3638 process killed by signal 9 program finished with exit code -1 elapsedTime=1200.682441 I checked earlier, passing builds, and this was taking steadily longer for each build: <several failures here> Sat Jan 8 06:11:05 2011: 19m 13s <failed here> Sat Jan 8 00:41:16 2011: 18m 23s Fri Jan 7 21:53:04 2011: 17m 24s Fri Jan 7 17:36:04 2011: 13m 36s I don't know exactly how this directory is checked out, from my quick look through the preceding steps, but it's a checkout of build/buildbot-configs, so it's not huge by any means. The steadily increasing time to rm is an interesting data point! I've disabled buildbot on this slave and added it to the slave trackign spreadsheet.
[cltbld@linux-ix-slave06 slave]$ mv buildbot.tac buildbot.tac.bug624210 [cltbld@linux-ix-slave06 slave]$ touch DO_NOT_START
I stopped the build during the hg_update step. I don't know if configs would have been filled with more cruft before being removed, but it certainly didn't take long to remove by hand: [cltbld@linux-ix-slave06 build]$ time rm -rf configs/ real 0m0.040s user 0m0.002s sys 0m0.010s dmesg ends with the usual eth0: no IPv6 routers present so I don't see anything indicating hardware failures there. A mystery for the ages?
Assignee: dustin → nobody
I also scheduled 4 days of downtime for this host in nagios.
Summary: linux-ix-slave06 taking ~20M to remove 954 files (4.0M) → linux-ix-slave06 taking ~20mins to remove 954 files (4.0M)
Sounds like another ix with slow/sad disk. zandr, your already investigating a batch of those in other bugs, is this machine already on your list (in which case, we'll close as DUP), or is this problem a new report (in which case, I guess we push it you?).
Sad, but not very sad. Here's the usual test data for iX: [root@linux-ix-slave06 ~]# hdparm -I /dev/sda /dev/sda: ATA device, with non-removable media Model Number: ST3250318AS Serial Number: 9VY95DR1 Firmware Revision: CC38 Transport: Serial Standards: Supported: 8 7 6 5 Likely used: 8 Configuration: Logical max current cylinders 16383 16383 heads 16 16 sectors/track 63 63 -- CHS current addressable sectors: 16514064 LBA user addressable sectors: 268435455 LBA48 user addressable sectors: 488397168 device size with M = 1024*1024: 238475 MBytes device size with M = 1000*1000: 250059 MBytes (250 GB) Capabilities: LBA, IORDY(can be disabled) Queue depth: 32 Standby timer values: spec'd by Standard, no device specific minimum R/W multiple sector transfer: Max = 16 Current = ? Recommended acoustic management value: 254, current value: 0 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 *udma3 udma4 udma5 udma6 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=120ns IORDY flow control=120ns Commands/features: Enabled Supported: * SMART feature set Security Mode feature set * Power Management feature set * Write cache * Look-ahead * Host Protected Area feature set * WRITE_BUFFER command * READ_BUFFER command * DOWNLOAD_MICROCODE SET_MAX security extension * Automatic Acoustic Management feature set * 48-bit Address feature set * Device Configuration Overlay feature set * Mandatory FLUSH_CACHE * FLUSH_CACHE_EXT * SMART error logging * SMART self-test * General Purpose Logging feature set * WRITE_{DMA|MULTIPLE}_FUA_EXT * 64-bit World wide name Write-Read-Verify feature set * WRITE_UNCORRECTABLE command * {READ,WRITE}_DMA_EXT_GPL commands * Segmented DOWNLOAD_MICROCODE * SATA-I signaling speed (1.5Gb/s) * SATA-II signaling speed (3.0Gb/s) * Native Command Queueing (NCQ) * Phy event counters Device-initiated interface power management * Software settings preservation Security: Master password revision code = 65534 supported not enabled not locked not frozen not expired: security count supported: enhanced erase 42min for SECURITY ERASE UNIT. 42min for ENHANCED SECURITY ERASE UNIT. Checksum: correct [root@linux-ix-slave06 ~]# hdparm -tT /dev/sda /dev/sda: Timing cached reads: 29336 MB in 1.99 seconds = 14738.11 MB/sec Timing buffered disk reads: 280 MB in 3.01 seconds = 93.00 MB/sec
See Also: → 606716
Are we still following up on this? What's the current state?
Priority: -- → P3
When this slave is ready to come online again, it should point to the new try master in MV as per bug 617321 (either to test-master02.b.m.o or buildbot-master3.b.m.o depending on whether bug 627803 has been fixed yet)
(In reply to comment #6) > Sad, but not very sad. Here's the usual test data for iX: > > [root@linux-ix-slave06 ~]# hdparm -I /dev/sda > > /dev/sda: > > ATA device, with non-removable media > Model Number: ST3250318AS > Serial Number: 9VY95DR1 > Firmware Revision: CC38 > Transport: Serial > Standards: > Supported: 8 7 6 5 > Likely used: 8 > Configuration: > Logical max current > cylinders 16383 16383 > heads 16 16 > sectors/track 63 63 > -- > CHS current addressable sectors: 16514064 > LBA user addressable sectors: 268435455 > LBA48 user addressable sectors: 488397168 > device size with M = 1024*1024: 238475 MBytes > device size with M = 1000*1000: 250059 MBytes (250 GB) > Capabilities: > LBA, IORDY(can be disabled) > Queue depth: 32 > Standby timer values: spec'd by Standard, no device specific minimum > R/W multiple sector transfer: Max = 16 Current = ? > Recommended acoustic management value: 254, current value: 0 > DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 *udma3 udma4 udma5 udma6 > Cycle time: min=120ns recommended=120ns > PIO: pio0 pio1 pio2 pio3 pio4 > Cycle time: no flow control=120ns IORDY flow control=120ns > Commands/features: > Enabled Supported: > * SMART feature set > Security Mode feature set > * Power Management feature set > * Write cache > * Look-ahead > * Host Protected Area feature set > * WRITE_BUFFER command > * READ_BUFFER command > * DOWNLOAD_MICROCODE > SET_MAX security extension > * Automatic Acoustic Management feature set > * 48-bit Address feature set > * Device Configuration Overlay feature set > * Mandatory FLUSH_CACHE > * FLUSH_CACHE_EXT > * SMART error logging > * SMART self-test > * General Purpose Logging feature set > * WRITE_{DMA|MULTIPLE}_FUA_EXT > * 64-bit World wide name > Write-Read-Verify feature set > * WRITE_UNCORRECTABLE command > * {READ,WRITE}_DMA_EXT_GPL commands > * Segmented DOWNLOAD_MICROCODE > * SATA-I signaling speed (1.5Gb/s) > * SATA-II signaling speed (3.0Gb/s) > * Native Command Queueing (NCQ) > * Phy event counters > Device-initiated interface power management > * Software settings preservation > Security: > Master password revision code = 65534 > supported > not enabled > not locked > not frozen > not expired: security count > supported: enhanced erase > 42min for SECURITY ERASE UNIT. 42min for ENHANCED SECURITY ERASE UNIT. > Checksum: correct > [root@linux-ix-slave06 ~]# hdparm -tT /dev/sda > > /dev/sda: > Timing cached reads: 29336 MB in 1.99 seconds = 14738.11 MB/sec > Timing buffered disk reads: 280 MB in 3.01 seconds = 93.00 MB/sec
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → DUPLICATE
reopening and morphing for comment 8
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Summary: linux-ix-slave06 taking ~20mins to remove 954 files (4.0M) → attach linux-ix-slave06 to buildbot-master3.b.m.o when it's back
Whiteboard: [buildslaves][hardware][buildduty] → [buildslaves][hardware][slaveduty]
Shouldn't this bug depend on bug 596366, rather than the other way around?
No longer blocks: ix-drive-issues
Depends on: ix-drive-issues
The slave is in scl1, and should point to one of the scl1 try masters, not the master in mtv1. That can happen with slavealloc.
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.