Closed
Bug 624210
Opened 14 years ago
Closed 13 years ago
attach linux-ix-slave06 to buildbot-master3.b.m.o when it's back
Categories
(Release Engineering :: General, defect, P3)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dustin, Unassigned)
References
Details
(Whiteboard: [buildslaves][hardware][slaveduty])
The 'rm_configs' step of builder "Android R7 tryserver build" has been failing repeatedly on slave linux-ix-slave06. rm -rf configs in dir /builds/slave/try-mb-br-andrd-r7-bld/build (timeout 1200 secs) watching logfiles {} argv: ['rm', '-rf', 'configs'] environment: ... closing stdin using PTY: True command timed out: 1200 seconds without output, killing pid 3638 process killed by signal 9 program finished with exit code -1 elapsedTime=1200.682441 I checked earlier, passing builds, and this was taking steadily longer for each build: <several failures here> Sat Jan 8 06:11:05 2011: 19m 13s <failed here> Sat Jan 8 00:41:16 2011: 18m 23s Fri Jan 7 21:53:04 2011: 17m 24s Fri Jan 7 17:36:04 2011: 13m 36s I don't know exactly how this directory is checked out, from my quick look through the preceding steps, but it's a checkout of build/buildbot-configs, so it's not huge by any means. The steadily increasing time to rm is an interesting data point! I've disabled buildbot on this slave and added it to the slave trackign spreadsheet.
Reporter | ||
Comment 1•14 years ago
|
||
[cltbld@linux-ix-slave06 slave]$ mv buildbot.tac buildbot.tac.bug624210 [cltbld@linux-ix-slave06 slave]$ touch DO_NOT_START
Reporter | ||
Comment 2•14 years ago
|
||
I stopped the build during the hg_update step. I don't know if configs would have been filled with more cruft before being removed, but it certainly didn't take long to remove by hand: [cltbld@linux-ix-slave06 build]$ time rm -rf configs/ real 0m0.040s user 0m0.002s sys 0m0.010s dmesg ends with the usual eth0: no IPv6 routers present so I don't see anything indicating hardware failures there. A mystery for the ages?
Reporter | ||
Updated•14 years ago
|
Assignee: dustin → nobody
Reporter | ||
Comment 3•14 years ago
|
||
I also scheduled 4 days of downtime for this host in nagios.
Updated•14 years ago
|
Summary: linux-ix-slave06 taking ~20M to remove 954 files (4.0M) → linux-ix-slave06 taking ~20mins to remove 954 files (4.0M)
Comment 5•14 years ago
|
||
Sounds like another ix with slow/sad disk. zandr, your already investigating a batch of those in other bugs, is this machine already on your list (in which case, we'll close as DUP), or is this problem a new report (in which case, I guess we push it you?).
Comment 6•14 years ago
|
||
Sad, but not very sad. Here's the usual test data for iX: [root@linux-ix-slave06 ~]# hdparm -I /dev/sda /dev/sda: ATA device, with non-removable media Model Number: ST3250318AS Serial Number: 9VY95DR1 Firmware Revision: CC38 Transport: Serial Standards: Supported: 8 7 6 5 Likely used: 8 Configuration: Logical max current cylinders 16383 16383 heads 16 16 sectors/track 63 63 -- CHS current addressable sectors: 16514064 LBA user addressable sectors: 268435455 LBA48 user addressable sectors: 488397168 device size with M = 1024*1024: 238475 MBytes device size with M = 1000*1000: 250059 MBytes (250 GB) Capabilities: LBA, IORDY(can be disabled) Queue depth: 32 Standby timer values: spec'd by Standard, no device specific minimum R/W multiple sector transfer: Max = 16 Current = ? Recommended acoustic management value: 254, current value: 0 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 *udma3 udma4 udma5 udma6 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=120ns IORDY flow control=120ns Commands/features: Enabled Supported: * SMART feature set Security Mode feature set * Power Management feature set * Write cache * Look-ahead * Host Protected Area feature set * WRITE_BUFFER command * READ_BUFFER command * DOWNLOAD_MICROCODE SET_MAX security extension * Automatic Acoustic Management feature set * 48-bit Address feature set * Device Configuration Overlay feature set * Mandatory FLUSH_CACHE * FLUSH_CACHE_EXT * SMART error logging * SMART self-test * General Purpose Logging feature set * WRITE_{DMA|MULTIPLE}_FUA_EXT * 64-bit World wide name Write-Read-Verify feature set * WRITE_UNCORRECTABLE command * {READ,WRITE}_DMA_EXT_GPL commands * Segmented DOWNLOAD_MICROCODE * SATA-I signaling speed (1.5Gb/s) * SATA-II signaling speed (3.0Gb/s) * Native Command Queueing (NCQ) * Phy event counters Device-initiated interface power management * Software settings preservation Security: Master password revision code = 65534 supported not enabled not locked not frozen not expired: security count supported: enhanced erase 42min for SECURITY ERASE UNIT. 42min for ENHANCED SECURITY ERASE UNIT. Checksum: correct [root@linux-ix-slave06 ~]# hdparm -tT /dev/sda /dev/sda: Timing cached reads: 29336 MB in 1.99 seconds = 14738.11 MB/sec Timing buffered disk reads: 280 MB in 3.01 seconds = 93.00 MB/sec
Reporter | ||
Updated•14 years ago
|
Blocks: ix-drive-issues
See Also: → 606716
Comment 7•13 years ago
|
||
Are we still following up on this? What's the current state?
Updated•13 years ago
|
Priority: -- → P3
Comment 8•13 years ago
|
||
When this slave is ready to come online again, it should point to the new try master in MV as per bug 617321 (either to test-master02.b.m.o or buildbot-master3.b.m.o depending on whether bug 627803 has been fixed yet)
Comment 9•13 years ago
|
||
(In reply to comment #6) > Sad, but not very sad. Here's the usual test data for iX: > > [root@linux-ix-slave06 ~]# hdparm -I /dev/sda > > /dev/sda: > > ATA device, with non-removable media > Model Number: ST3250318AS > Serial Number: 9VY95DR1 > Firmware Revision: CC38 > Transport: Serial > Standards: > Supported: 8 7 6 5 > Likely used: 8 > Configuration: > Logical max current > cylinders 16383 16383 > heads 16 16 > sectors/track 63 63 > -- > CHS current addressable sectors: 16514064 > LBA user addressable sectors: 268435455 > LBA48 user addressable sectors: 488397168 > device size with M = 1024*1024: 238475 MBytes > device size with M = 1000*1000: 250059 MBytes (250 GB) > Capabilities: > LBA, IORDY(can be disabled) > Queue depth: 32 > Standby timer values: spec'd by Standard, no device specific minimum > R/W multiple sector transfer: Max = 16 Current = ? > Recommended acoustic management value: 254, current value: 0 > DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 *udma3 udma4 udma5 udma6 > Cycle time: min=120ns recommended=120ns > PIO: pio0 pio1 pio2 pio3 pio4 > Cycle time: no flow control=120ns IORDY flow control=120ns > Commands/features: > Enabled Supported: > * SMART feature set > Security Mode feature set > * Power Management feature set > * Write cache > * Look-ahead > * Host Protected Area feature set > * WRITE_BUFFER command > * READ_BUFFER command > * DOWNLOAD_MICROCODE > SET_MAX security extension > * Automatic Acoustic Management feature set > * 48-bit Address feature set > * Device Configuration Overlay feature set > * Mandatory FLUSH_CACHE > * FLUSH_CACHE_EXT > * SMART error logging > * SMART self-test > * General Purpose Logging feature set > * WRITE_{DMA|MULTIPLE}_FUA_EXT > * 64-bit World wide name > Write-Read-Verify feature set > * WRITE_UNCORRECTABLE command > * {READ,WRITE}_DMA_EXT_GPL commands > * Segmented DOWNLOAD_MICROCODE > * SATA-I signaling speed (1.5Gb/s) > * SATA-II signaling speed (3.0Gb/s) > * Native Command Queueing (NCQ) > * Phy event counters > Device-initiated interface power management > * Software settings preservation > Security: > Master password revision code = 65534 > supported > not enabled > not locked > not frozen > not expired: security count > supported: enhanced erase > 42min for SECURITY ERASE UNIT. 42min for ENHANCED SECURITY ERASE UNIT. > Checksum: correct > [root@linux-ix-slave06 ~]# hdparm -tT /dev/sda > > /dev/sda: > Timing cached reads: 29336 MB in 1.99 seconds = 14738.11 MB/sec > Timing buffered disk reads: 280 MB in 3.01 seconds = 93.00 MB/sec
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
Reporter | ||
Comment 10•13 years ago
|
||
reopening and morphing for comment 8
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Summary: linux-ix-slave06 taking ~20mins to remove 954 files (4.0M) → attach linux-ix-slave06 to buildbot-master3.b.m.o when it's back
Updated•13 years ago
|
Whiteboard: [buildslaves][hardware][buildduty] → [buildslaves][hardware][slaveduty]
Comment 11•13 years ago
|
||
Shouldn't this bug depend on bug 596366, rather than the other way around?
No longer blocks: ix-drive-issues
Depends on: ix-drive-issues
Reporter | ||
Comment 12•13 years ago
|
||
The slave is in scl1, and should point to one of the scl1 try masters, not the master in mtv1. That can happen with slavealloc.
Status: REOPENED → RESOLVED
Closed: 13 years ago → 13 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•