Closed Bug 720167 (w32-ix-slave16) Opened 12 years ago Closed 12 years ago

w32-ix-slave16 problem tracking

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86
All
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Unassigned)

Details

(Whiteboard: [buildduty][capacity][buildslaves])

From bug 719284:
(In reply to Jake Watkins [:dividehex] from comment #6)
> (In reply to Jake Watkins [:dividehex] from comment #2)
> > Also missed this one in that last reboot bug
> > w32-ix-slave16 (w32-ix-slave16-mgmt isn't responding, this is one of the
> > machines that didn't like the extra RAM in bug 672969)
> 
> This slave couldn't keep a solid net link going.  Cable tested ok.  A reboot
> seems to have cleared this up but if it happens again, we should pull it for
> repairs.

Nagios thinks it's down again.
Assignee: server-ops-releng → jwatkins
colo-trip: --- → scl1
QA Contact: mrz → zandr
Net link was going up and down.  I've pulled this for repairs.
IX Ticket ID: IGS-317706

IX will pick this up when the drop off the repaired systems.
IX will repair/remove for repairs tomorrow (1/31) when they visit SCL1 for the memory upgrades that didn't take.
Chris from IX has taken this for repairs
Assignee: jwatkins → mlarrain
Talked to iX today they will be at SCL1 either Wednesday or Thursday to drop off this system.
Status: NEW → ASSIGNED
This machine was still broken when iX brought it back they took it again for service.
Digipengi has been emailing with Matt Finney at IX and apparently they lost track of the repair ticket and closed it before it was returned to us.  It has been repaired and we are arranging for a time/date for them to return it.
We just received this slave back from IX.  Net link issue is fixed but it refuses to detect a HDD attached to it now.  I have emailed IX about it.
Paul from IX came back out and found the HDD power cable was not actually attached to the cable coming from the PSU.  He reattached it and it is now detecting the drive.

It is currently being re-imaged.
Machine has been imaged and is ready to go back into the pool
Assignee: mlarrain → nobody
Component: Server Operations: RelEng → Release Engineering
QA Contact: zandr → release
Component: Release Engineering → Release Engineering: Machine Management
QA Contact: release → armenzg
Whiteboard: [buildduty][capacity]
Updated hostname, deleted from opsi; it appears to now be installing opsi packages.
Also disabled in slavealloc atm.
Alias: w32-ix-slave16
Summary: Pull w32-ix-slave16 for repairs → w32-ix-slave16 problem tracking
Cleared opsi log (per alert dialog) and added that to https://wiki.mozilla.org/ReleaseEngineering/How_To/Set_Up_a_Freshly_Imaged_Slave#Reimaged .

Reenabled in slavealloc and rebooted.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
It didn't manage to sync with opsi past that first time, and has the wrong ssh keys. I've fixed the latter, and taking a quick look at opsi.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
According to c:\tmp\logonlog.txt, production-opsi is returning http 401 Unauthorized errors to this slave. What's the fix for that armenzg ?
Whiteboard: [buildduty][capacity] → [buildduty][capacity][buildslaves]
Deleted C:\Program Files\opsi.org\preloginloader\cfg\locked.cfg and rebooted. Looks to be back in business.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.