Closed Bug 680457 Opened 14 years ago Closed 12 years ago

Network hiccups affected OPSI XP slaves

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task, P5)

x86
Windows XP

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: armenzg, Unassigned)

Details

(Whiteboard: [opsi])

Attachments

(1 file)

With all the issues between SCL and SJC we hit this problem. The XP slaves started slowly hitting issues around between 1-2 days ago. The problem took all of these slaves out of the pool: 04,05,08,11,13,14,15,16,17,18,23,26,27,28,29,31,32,33,36,37,41,43,52,54,56,57,59,60,62 The OPSI prompt is the following: zugriffsverletzung bei Adresse 005C435E in Modul 'winst32.exe'. Lesen von Adresse 00000000 which means: Access violation at address in module 005C435E 'winst32.exe'. Reading address 00000000 If we have had the buildbot started check we would have caught this earlier. Not sure if OPSI would have a way of rebooting the slave upon failure or not blocking it. Probably having a second OPSI master in SCL would have helped. Not sure how much we would like to get dragged into this since it is not normal conditions and we are going to attempt to get rid of OPSI ASAP.
Attached image screenshot
talos-r3-xp-038 too. Clicking OK didn't unstick it, and using ssh to reboot only resulted in the ssh service stopping. I've put it on the reboots list in bug 678883. talos-r3-xp-09 & talos-r3-xp-039 were running the screensaver, and continued as soon as I connected with VNC and waggled the mouse - this is another OPSI + network glitch type of issue. talos-r3-xp-025 & talos-r3-xp-053 were sitting with runslave.py still running despite a couple of ^C's. Rebooted them. talos-r3-xp-045 is a long term issue (bug 661377). Otherwise http://build.mozilla.org/builds/last-job-per-slave.txt looks clean.
talos-r3-xp-022 hit this same issue after finishing a reftest job ending at 2012-01-12 16:58:41. Clicking "OK" did not do anything. RDF fails to help. VNC fails to help. ssh fails to help.
I don't know how much we can do here. I imagine this will remain broken until we move to GPO or invest some effort into puppet on Windows.
Severity: normal → enhancement
Component: Release Engineering → Release Engineering: Platform Support
OS: All → Windows XP
Priority: P4 → P5
QA Contact: release → coop
Whiteboard: [opsi]
Kittenherder and GPO will save us here.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
Product: mozilla.org → Release Engineering
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: