All users were logged out of Bugzilla on October 13th, 2018
We've been seeing puppet timeout email from buildbot-master115, and manual runs connecting to releng-puppet1.srv.releng.usw2.mozilla.com sometimes work and sometimes don't. This might indicate that it's having general network problems or resource problems on the host it currently resides on. It might be worth shutting down buildbot-master115, destroying it, and recreating it so it resides on a different physical host.
hm, during the tree closure window bm115 never came back to life. I wonder if this bug is related. At any rate, it's down now so it may be best to recreate it on monday
Created attachment 8706246 [details] buildbot-master115-aws-event.png Attached you can find the event from AWS for bm115
In order to resolve the AWS event I stopped and started the instance. Everything started OK less than builbot who has problems by connecting to mysql server "buildbot-rw-vip.db.scl3.mozilla.com" The exception from logs : _mysql_exceptions.OperationalError: (2005, "Unknown MySQL server host 'buildbot-rw-vip.db.scl3.mozilla.com' (2)")
(In reply to Vlad Ciobancai [:vladC] from comment #4) > The exception from logs : _mysql_exceptions.OperationalError: (2005, > "Unknown MySQL server host 'buildbot-rw-vip.db.scl3.mozilla.com' (2)") We think the above error has been created when puppet run on boot. I started manually the buildbot and from what we can see everything is running as expected.
I monitored the buildbot master and I haven't see any network issue.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
thanks! We should keep an eye on https://bugzilla.mozilla.org/show_bug.cgi?id=1238035#c0 happening again and general performance of this master. I wouldn't be surprised if we need to recreate it.
Master tanked my restart script today and is currently inaccessible. I've disabled it in slavealloc. Next step is to terminate and recreate.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Chris Cooper [:coop] from comment #8) > Next step is to terminate and recreate. Master has been terminated. I'm recreating it now.
Master is back up.
Status: REOPENED → RESOLVED
Last Resolved: 3 years ago → 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.