Closed
Bug 1238035
Opened 8 years ago
Closed 8 years ago
possible network issues with buildbot-master115
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: arich, Unassigned)
References
Details
Attachments
(1 file)
43.71 KB,
image/png
|
Details |
We've been seeing puppet timeout email from buildbot-master115, and manual runs connecting to releng-puppet1.srv.releng.usw2.mozilla.com sometimes work and sometimes don't. This might indicate that it's having general network problems or resource problems on the host it currently resides on. It might be worth shutting down buildbot-master115, destroying it, and recreating it so it resides on a different physical host.
Comment 1•8 years ago
|
||
hm, during the tree closure window bm115 never came back to life. I wonder if this bug is related. At any rate, it's down now so it may be best to recreate it on monday
Comment 3•8 years ago
|
||
Attached you can find the event from AWS for bm115
Comment 4•8 years ago
|
||
In order to resolve the AWS event I stopped and started the instance. Everything started OK less than builbot who has problems by connecting to mysql server "buildbot-rw-vip.db.scl3.mozilla.com" The exception from logs : _mysql_exceptions.OperationalError: (2005, "Unknown MySQL server host 'buildbot-rw-vip.db.scl3.mozilla.com' (2)")
Comment 5•8 years ago
|
||
(In reply to Vlad Ciobancai [:vladC] from comment #4) > The exception from logs : _mysql_exceptions.OperationalError: (2005, > "Unknown MySQL server host 'buildbot-rw-vip.db.scl3.mozilla.com' (2)") We think the above error has been created when puppet run on boot. I started manually the buildbot and from what we can see everything is running as expected.
Comment 6•8 years ago
|
||
I monitored the buildbot master and I haven't see any network issue.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Comment 7•8 years ago
|
||
thanks! We should keep an eye on https://bugzilla.mozilla.org/show_bug.cgi?id=1238035#c0 happening again and general performance of this master. I wouldn't be surprised if we need to recreate it.
Comment 8•8 years ago
|
||
Master tanked my restart script today and is currently inaccessible. I've disabled it in slavealloc. Next step is to terminate and recreate.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 9•8 years ago
|
||
(In reply to Chris Cooper [:coop] from comment #8) > Next step is to terminate and recreate. Master has been terminated. I'm recreating it now.
Comment 10•8 years ago
|
||
Master is back up.
Status: REOPENED → RESOLVED
Closed: 8 years ago → 8 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•4 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•