921067 - Slaveapi should support mobile devices for reboots

Assignee

Description

•

11 years ago

Slaveapi for reboots currently does escalation (ssh->ipmi/etc) reboots with verification. While our mobile devices would use a different logic path for reboots entirely. Tegras would prefer SUTAgent->PDU, while pandas we should just go through mozpool (and/or relay board depending on timing for reference implementation of mozpool's reboot magic) This bug is to add the reboot support for these devices.

bhearsum@mozilla.com (:bhearsum)

Comment 2

•

11 years ago

From bug 915231: (In reply to Ben Hearsum [:bhearsum] from comment #0) > Right now, slaveapi reboots assume that SSH is always available, which isn't > true for Pandas and Tegras. Need to fix this up and triple check that the > PDU reboot logic works for them.

bhearsum@mozilla.com (:bhearsum)

Comment 3

•

11 years ago

(In reply to Justin Wood (:Callek) from comment #0) > While our mobile devices would use a different logic path for reboots > entirely. Actually, I don't think there's any reason to use a parallel code path for this. Other than SUTAgent there's nothing here that's mobile device specific. More on this below. > Tegras would prefer SUTAgent->PDUwhile pandas we should just go > through mozpool (and/or relay board depending on timing for reference > implementation of mozpool's reboot magic) Dustin tells me that talking to relays directly will break them. Assuming that's true, we shouldn't do that. Here's some high level thoughts on how I think this could integrate with the existing code. Note that I'm still of the mind that we shouldn't have logic like "is a tegra/panda", but rather look for the actual things we care about. * Create a Mozpool client (in slaveapi/clients/mozpool.py) that knows how to ask for a reboot for a slave. Note that this shouldn't be Panda-specific - we've talked a fair amount about moving other/all slaves to Mozpool down the road. * Create a SUTAgent client that knows how to reboots through it. This means that we need the SUTAgent code + devices.json. Ideally, the SUTAgent code will be pip installable. If it's not, and it's difficult to make it installable, we'll figure something else out. As for devices.json, we can probably request that from hg as required. The link to it should go in slaveapi.ini. * Both of the above will need load_xxx_info methods like the other clients that a Slave uses. You probably know better than me how to figure out whether or not a slave has a sutagent or mozpool interface. I'm happy to help figure out how to do this if not though. * Need to adjust the current logic to not assume that an SSH interface is available. This is a bit tricky without special casing Tegras/Pandas, but I _think_ that changing get_console (http://git.mozilla.org/?p=build/slaveapi.git;a=blob;f=slaveapi/slave.py;h=bfdbf8c6f2aef4d4bfa051bc325ffd1e7945247e;hb=HEAD#l130) to catch Paramiko exceptions (probably the root SSHException) and return None (instead of raising them) would be enough. Because Tegras/Pandas don't listen on 22 that should fail quite quickly. This has the added benefit of speeding up reboots of other slaves that are down because we won't cycle through the reboot commands even though we can't connect. Once we have the above, it should be simple to change the reboot escalation order to: * SSH * SUTAgent (calls slave.sutagent.reboot) * Mozpool (calls slave.mozpool.reboot) * IPMI * PDU * Bug

WIP 11 years ago Justin Wood (:Callek) 4.31 KB, patch		Details \| Diff \| Splinter Review
WIP.2 11 years ago Justin Wood (:Callek) 3.86 KB, patch	bhearsum : feedback+	Details \| Diff \| Splinter Review
WIP.3 11 years ago Justin Wood (:Callek) 14.25 KB, patch	bhearsum : feedback+	Details \| Diff \| Splinter Review
[slaveapi] v1 11 years ago Justin Wood (:Callek) 9.91 KB, patch	bhearsum : review+	Details \| Diff \| Splinter Review
[slaveapi] panda v1 11 years ago Justin Wood (:Callek) 3.08 KB, patch	bhearsum : review+ Callek : checked-in+	Details \| Diff \| Splinter Review
[puupet] slaveapi version bump 11 years ago Justin Wood (:Callek) 919 bytes, patch	nthomas : review+ Callek : checked-in+	Details \| Diff \| Splinter Review