Closed Bug 869132 Opened 7 years ago Closed 7 years ago
Finish implementing kittenherder's PDU reboot support
Kittenherder has most of the PDU reboot code in place. Need to finish it and test.
Patch deployment notes: - need to "pip install pytz" into the kittenherder virtualenv - add inventory details to secrets.cfg file
Comment on attachment 747030 [details] [diff] [review] [briar-patch] fetch PDU details from inventory, fetch lastSeen from buildapi, rewrite rebootIfNeeded logic, improve logging Review of attachment 747030 [details] [diff] [review]: ----------------------------------------------------------------- A few nits, but no showstoppers. r+ with nits fixed. We should have a conversation soon about fixing the mail output. Now that we're running more often (and you're fixing the actions we actually take), we should decide what the important issues are that we want to know about for each pass, and how we want to maintain state from one pass to the next. I'd like to get to the point where the emails are not going only to me, but are actually useful to buildduty. This will probably require the bugzilla integration bit first, but I don't necessarily want to gate on that. ::: releng/buildapi.py @@ +38,5 @@ > +if __name__ == '__main__': > + # test > + rb = last_build_endtime('talos-r3-w7-029') > + import pprint > + pprint.pprint(rb) Do we want to check rb before printing it here? Nice to see at least basic tests being added for libraries, btw. ::: releng/remote.py @@ +341,5 @@ > + (pdu, deviceID) = key_value['value'].split(':') > + if not pdu.endswith('.mozilla.com'): > + pdu = pdu + '.mozilla.com' > + break > + WS @@ +353,2 @@ > return False > + WS @@ +357,5 @@ > + result = False > + remoteEnv = self.remoteEnv > + > + if None in [self.pdu['pdu'], self.pdu['deviceID']]: > + return result Is this a log-worthy error, or should we have already caught that error by this point? @@ +849,4 @@ > return result > > def rebootIfNeeded(self, host, lastSeen=None, indent='', dryrun=True, verbose=False): > + """ Reboot a host if needed. if lastSeen is None we will WS @@ +869,5 @@ > data = host.tail_twistd_log(10) > if not data or "Main loop terminated" in data or "ProcessExitedAlready" in data: > break > else: > + # failed gracefull shutdown of buildbot client process sp. graceful @@ +929,5 @@ > + if dryrun: > + log.debug("would have hard-rebooted but dryrun is True") > + else: > + if not (host.hasPDU or host.hasIPMI): > + log.info("unreachable host does not have PDU or IPMI support") We should exit out of the conditional at this point to avoid the subsequent checks that won't succeed anyway. @@ +975,5 @@ > + tzinfo=timezone('UTC')).astimezone( \ > + timezone('US/Pacific')).replace(tzinfo=None) > + log.debug('defaulting lastseen to %s' % status['lastseen']) > + except requests.exceptions.HTTPError: > + pass Should we still set status['lastseen'] here?
Attachment #747030 - Flags: review?(coop) → review+
Comment on attachment 747030 [details] [diff] [review] [briar-patch] fetch PDU details from inventory, fetch lastSeen from buildapi, rewrite rebootIfNeeded logic, improve logging Addressed nits and landed in https://github.com/mozilla/briar-patch/commit/b4ad58a1496b607c8c8d64ae69adb74687cae215. Deployed to cruncher.
Attachment #747030 - Flags: checked-in+
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.