Closed
Bug 624210
Opened 14 years ago
Closed 14 years ago
attach linux-ix-slave06 to buildbot-master3.b.m.o when it's back
Categories
(Release Engineering :: General, defect, P3)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dustin, Unassigned)
References
Details
(Whiteboard: [buildslaves][hardware][slaveduty])
The 'rm_configs' step of builder "Android R7 tryserver build" has been failing repeatedly on slave linux-ix-slave06.
rm -rf configs
in dir /builds/slave/try-mb-br-andrd-r7-bld/build (timeout 1200 secs)
watching logfiles {}
argv: ['rm', '-rf', 'configs']
environment:
...
closing stdin
using PTY: True
command timed out: 1200 seconds without output, killing pid 3638
process killed by signal 9
program finished with exit code -1
elapsedTime=1200.682441
I checked earlier, passing builds, and this was taking steadily longer for each build:
<several failures here>
Sat Jan 8 06:11:05 2011: 19m 13s
<failed here>
Sat Jan 8 00:41:16 2011: 18m 23s
Fri Jan 7 21:53:04 2011: 17m 24s
Fri Jan 7 17:36:04 2011: 13m 36s
I don't know exactly how this directory is checked out, from my quick look through the preceding steps, but it's a checkout of build/buildbot-configs, so it's not huge by any means. The steadily increasing time to rm is an interesting data point!
I've disabled buildbot on this slave and added it to the slave trackign spreadsheet.
| Reporter | ||
Comment 1•14 years ago
|
||
[cltbld@linux-ix-slave06 slave]$ mv buildbot.tac buildbot.tac.bug624210
[cltbld@linux-ix-slave06 slave]$ touch DO_NOT_START
| Reporter | ||
Comment 2•14 years ago
|
||
I stopped the build during the hg_update step. I don't know if configs would have been filled with more cruft before being removed, but it certainly didn't take long to remove by hand:
[cltbld@linux-ix-slave06 build]$ time rm -rf configs/
real 0m0.040s
user 0m0.002s
sys 0m0.010s
dmesg ends with the usual
eth0: no IPv6 routers present
so I don't see anything indicating hardware failures there.
A mystery for the ages?
| Reporter | ||
Updated•14 years ago
|
Assignee: dustin → nobody
| Reporter | ||
Comment 3•14 years ago
|
||
I also scheduled 4 days of downtime for this host in nagios.
Updated•14 years ago
|
Summary: linux-ix-slave06 taking ~20M to remove 954 files (4.0M) → linux-ix-slave06 taking ~20mins to remove 954 files (4.0M)
Comment 5•14 years ago
|
||
Sounds like another ix with slow/sad disk.
zandr, your already investigating a batch of those in other bugs, is this machine already on your list (in which case, we'll close as DUP), or is this problem a new report (in which case, I guess we push it you?).
Comment 6•14 years ago
|
||
Sad, but not very sad. Here's the usual test data for iX:
[root@linux-ix-slave06 ~]# hdparm -I /dev/sda
/dev/sda:
ATA device, with non-removable media
Model Number: ST3250318AS
Serial Number: 9VY95DR1
Firmware Revision: CC38
Transport: Serial
Standards:
Supported: 8 7 6 5
Likely used: 8
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 488397168
device size with M = 1024*1024: 238475 MBytes
device size with M = 1000*1000: 250059 MBytes (250 GB)
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = ?
Recommended acoustic management value: 254, current value: 0
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 *udma3 udma4 udma5 udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_BUFFER command
* READ_BUFFER command
* DOWNLOAD_MICROCODE
SET_MAX security extension
* Automatic Acoustic Management feature set
* 48-bit Address feature set
* Device Configuration Overlay feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* SMART error logging
* SMART self-test
* General Purpose Logging feature set
* WRITE_{DMA|MULTIPLE}_FUA_EXT
* 64-bit World wide name
Write-Read-Verify feature set
* WRITE_UNCORRECTABLE command
* {READ,WRITE}_DMA_EXT_GPL commands
* Segmented DOWNLOAD_MICROCODE
* SATA-I signaling speed (1.5Gb/s)
* SATA-II signaling speed (3.0Gb/s)
* Native Command Queueing (NCQ)
* Phy event counters
Device-initiated interface power management
* Software settings preservation
Security:
Master password revision code = 65534
supported
not enabled
not locked
not frozen
not expired: security count
supported: enhanced erase
42min for SECURITY ERASE UNIT. 42min for ENHANCED SECURITY ERASE UNIT.
Checksum: correct
[root@linux-ix-slave06 ~]# hdparm -tT /dev/sda
/dev/sda:
Timing cached reads: 29336 MB in 1.99 seconds = 14738.11 MB/sec
Timing buffered disk reads: 280 MB in 3.01 seconds = 93.00 MB/sec
| Reporter | ||
Updated•14 years ago
|
Blocks: ix-drive-issues
See Also: → 606716
Comment 7•14 years ago
|
||
Are we still following up on this? What's the current state?
Updated•14 years ago
|
Priority: -- → P3
Comment 8•14 years ago
|
||
When this slave is ready to come online again, it should point to the new try master in MV as per bug 617321 (either to test-master02.b.m.o or buildbot-master3.b.m.o depending on whether bug 627803 has been fixed yet)
Comment 9•14 years ago
|
||
(In reply to comment #6)
> Sad, but not very sad. Here's the usual test data for iX:
>
> [root@linux-ix-slave06 ~]# hdparm -I /dev/sda
>
> /dev/sda:
>
> ATA device, with non-removable media
> Model Number: ST3250318AS
> Serial Number: 9VY95DR1
> Firmware Revision: CC38
> Transport: Serial
> Standards:
> Supported: 8 7 6 5
> Likely used: 8
> Configuration:
> Logical max current
> cylinders 16383 16383
> heads 16 16
> sectors/track 63 63
> --
> CHS current addressable sectors: 16514064
> LBA user addressable sectors: 268435455
> LBA48 user addressable sectors: 488397168
> device size with M = 1024*1024: 238475 MBytes
> device size with M = 1000*1000: 250059 MBytes (250 GB)
> Capabilities:
> LBA, IORDY(can be disabled)
> Queue depth: 32
> Standby timer values: spec'd by Standard, no device specific minimum
> R/W multiple sector transfer: Max = 16 Current = ?
> Recommended acoustic management value: 254, current value: 0
> DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 *udma3 udma4 udma5 udma6
> Cycle time: min=120ns recommended=120ns
> PIO: pio0 pio1 pio2 pio3 pio4
> Cycle time: no flow control=120ns IORDY flow control=120ns
> Commands/features:
> Enabled Supported:
> * SMART feature set
> Security Mode feature set
> * Power Management feature set
> * Write cache
> * Look-ahead
> * Host Protected Area feature set
> * WRITE_BUFFER command
> * READ_BUFFER command
> * DOWNLOAD_MICROCODE
> SET_MAX security extension
> * Automatic Acoustic Management feature set
> * 48-bit Address feature set
> * Device Configuration Overlay feature set
> * Mandatory FLUSH_CACHE
> * FLUSH_CACHE_EXT
> * SMART error logging
> * SMART self-test
> * General Purpose Logging feature set
> * WRITE_{DMA|MULTIPLE}_FUA_EXT
> * 64-bit World wide name
> Write-Read-Verify feature set
> * WRITE_UNCORRECTABLE command
> * {READ,WRITE}_DMA_EXT_GPL commands
> * Segmented DOWNLOAD_MICROCODE
> * SATA-I signaling speed (1.5Gb/s)
> * SATA-II signaling speed (3.0Gb/s)
> * Native Command Queueing (NCQ)
> * Phy event counters
> Device-initiated interface power management
> * Software settings preservation
> Security:
> Master password revision code = 65534
> supported
> not enabled
> not locked
> not frozen
> not expired: security count
> supported: enhanced erase
> 42min for SECURITY ERASE UNIT. 42min for ENHANCED SECURITY ERASE UNIT.
> Checksum: correct
> [root@linux-ix-slave06 ~]# hdparm -tT /dev/sda
>
> /dev/sda:
> Timing cached reads: 29336 MB in 1.99 seconds = 14738.11 MB/sec
> Timing buffered disk reads: 280 MB in 3.01 seconds = 93.00 MB/sec
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → DUPLICATE
| Reporter | ||
Comment 10•14 years ago
|
||
reopening and morphing for comment 8
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Summary: linux-ix-slave06 taking ~20mins to remove 954 files (4.0M) → attach linux-ix-slave06 to buildbot-master3.b.m.o when it's back
Updated•14 years ago
|
Whiteboard: [buildslaves][hardware][buildduty] → [buildslaves][hardware][slaveduty]
Comment 11•14 years ago
|
||
Shouldn't this bug depend on bug 596366, rather than the other way around?
No longer blocks: ix-drive-issues
Depends on: ix-drive-issues
| Reporter | ||
Comment 12•14 years ago
|
||
The slave is in scl1, and should point to one of the scl1 try masters, not the master in mtv1. That can happen with slavealloc.
Status: REOPENED → RESOLVED
Closed: 14 years ago → 14 years ago
Resolution: --- → FIXED
| Assignee | ||
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•