Closed
Bug 869132
Opened 11 years ago
Closed 11 years ago
Finish implementing kittenherder's PDU reboot support
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jhopkins, Assigned: jhopkins)
References
Details
Attachments
(1 file)
Kittenherder has most of the PDU reboot code in place. Need to finish it and test.
Assignee | ||
Comment 1•11 years ago
|
||
Attachment #747030 -
Flags: review?(coop)
Assignee | ||
Comment 2•11 years ago
|
||
Patch deployment notes: - need to "pip install pytz" into the kittenherder virtualenv - add inventory details to secrets.cfg file
Comment 3•11 years ago
|
||
Comment on attachment 747030 [details] [diff] [review] [briar-patch] fetch PDU details from inventory, fetch lastSeen from buildapi, rewrite rebootIfNeeded logic, improve logging Review of attachment 747030 [details] [diff] [review]: ----------------------------------------------------------------- A few nits, but no showstoppers. r+ with nits fixed. We should have a conversation soon about fixing the mail output. Now that we're running more often (and you're fixing the actions we actually take), we should decide what the important issues are that we want to know about for each pass, and how we want to maintain state from one pass to the next. I'd like to get to the point where the emails are not going only to me, but are actually useful to buildduty. This will probably require the bugzilla integration bit first, but I don't necessarily want to gate on that. ::: releng/buildapi.py @@ +38,5 @@ > +if __name__ == '__main__': > + # test > + rb = last_build_endtime('talos-r3-w7-029') > + import pprint > + pprint.pprint(rb) Do we want to check rb before printing it here? Nice to see at least basic tests being added for libraries, btw. ::: releng/remote.py @@ +341,5 @@ > + (pdu, deviceID) = key_value['value'].split(':') > + if not pdu.endswith('.mozilla.com'): > + pdu = pdu + '.mozilla.com' > + break > + WS @@ +353,2 @@ > return False > + WS @@ +357,5 @@ > + result = False > + remoteEnv = self.remoteEnv > + > + if None in [self.pdu['pdu'], self.pdu['deviceID']]: > + return result Is this a log-worthy error, or should we have already caught that error by this point? @@ +849,4 @@ > return result > > def rebootIfNeeded(self, host, lastSeen=None, indent='', dryrun=True, verbose=False): > + """ Reboot a host if needed. if lastSeen is None we will WS @@ +869,5 @@ > data = host.tail_twistd_log(10) > if not data or "Main loop terminated" in data or "ProcessExitedAlready" in data: > break > else: > + # failed gracefull shutdown of buildbot client process sp. graceful @@ +929,5 @@ > + if dryrun: > + log.debug("would have hard-rebooted but dryrun is True") > + else: > + if not (host.hasPDU or host.hasIPMI): > + log.info("unreachable host does not have PDU or IPMI support") We should exit out of the conditional at this point to avoid the subsequent checks that won't succeed anyway. @@ +975,5 @@ > + tzinfo=timezone('UTC')).astimezone( \ > + timezone('US/Pacific')).replace(tzinfo=None) > + log.debug('defaulting lastseen to %s' % status['lastseen']) > + except requests.exceptions.HTTPError: > + pass Should we still set status['lastseen'] here?
Attachment #747030 -
Flags: review?(coop) → review+
Assignee | ||
Comment 4•11 years ago
|
||
Comment on attachment 747030 [details] [diff] [review] [briar-patch] fetch PDU details from inventory, fetch lastSeen from buildapi, rewrite rebootIfNeeded logic, improve logging Addressed nits and landed in https://github.com/mozilla/briar-patch/commit/b4ad58a1496b607c8c8d64ae69adb74687cae215. Deployed to cruncher.
Attachment #747030 -
Flags: checked-in+
Assignee | ||
Updated•11 years ago
|
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Updated•6 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•4 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•