Closed
Bug 437257
Opened 16 years ago
Closed 16 years ago
talos slaves lose connectivity with buildbot master (qm-rhel02)
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: anodelman, Unassigned)
References
Details
Outage occurred at 10:30:05am causing multiple talos machines to lose connectivity. Machines reconnected at 10:32:52am. Affected the following machines: qm-mini-xp01/02/03/04/05 qm-pxp-fast01/02 qm-pxp-jss01/02/03 qm-mini-vista01/02/03/04/05 qm-mini-ubuntu02/03/04/05 qm-plinux-fast02 qm-pmac02/03/04 qm-pmac-trunk04/05 qm-pmac-fast02 qm-pxp-trunk01/02/03/04/05/06 qm-plinux-trunk01/02/04/05/06 qm-pvista-trunk01/02/03 Machines unaffected: qm-pmac-trunk01/02/03/07/08/09 qm-pleopard-trunk01/02/03 qm-plinux-trunk03 qm-pmac-fast01 qm-pmac01/05 qm-plinux-fast01 qm-mini-ubuntu01 Machines on the try perfmaster appeared unaffected, as do those on stage (qm-buildbot01).
Comment 1•16 years ago
|
||
What did they lose connectivity to? Any error logs?
Reporter | ||
Comment 2•16 years ago
|
||
The master itself stayed up and the logs are pretty full of messages about slaves disconnecting: <snip> 2008/06/04 10:32 PDT [Broker,21] <Builder 'WINNT 6.0 talos trunk' at -1215589556>.detached qm-mini-vista03 2008/06/04 10:32 PDT [Broker,21] Buildslave qm-mini-vista03 detached from WINNT 6.0 talos trunk 2008/06/04 10:32 PDT [Broker,21] BotPerspective.detached(qm-mini-vista03) 2008/06/04 10:32 PDT [Broker,21] <Build WINNT 6.0 talos trunk>.lostRemote 2008/06/04 10:32 PDT [Broker,21] stopping currentStep <perfrunner.MozillaRunPerfTests instance at 0xb016eb8c> 2008/06/04 10:32 PDT [Broker,21] addCompleteLog(interrupt) 2008/06/04 10:32 PDT [Broker,21] RemoteCommand.interrupt <RemoteShellCommand '['python', 'run_tests.py', '--noisy', '20080604_0911_config.yml']'> [Failure instance: Traceback (failure with no frames): twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion. ] </snip> The affected slaves all have the same error: remoteFailed: [Failure instance: Traceback (failure with no frames): twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion. ]
Comment 3•16 years ago
|
||
I'm seeing similar things occasionally with the moz2 buildbot. This morning, it was between moz2-win32-slave1 and production-master. The slave just dropped, causing the build to go red.
Comment 4•16 years ago
|
||
Just saw the same thing between bm-xserve16 and production-master
Comment 5•16 years ago
|
||
Would like to move qm-rhel02 off netapp-d-fcal1 as a stab at fixing this. I'm hoping this is related to the netapp perf issues and the misconfigured LUNs. Most moves are failing mid way with read errors and taking the VM offline. We've had better luck moving powered off VMs. bhearsum says this needs to be scheduled though.
Comment 6•16 years ago
|
||
moved, tossing back to RE - this might very well be fixed with the netapp issues.
Assignee: mrz → nobody
Component: Server Operations → Release Engineering: Talos
QA Contact: justin → release
Comment 7•16 years ago
|
||
believe this is fixed as a result of netapp fixes. Please reopen if it happens again.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•