Closed Bug 821380 Opened 12 years ago Closed 11 years ago

Add tool to help deal with B2G panda boards

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task, P3)

x86
macOS

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: armenzg, Unassigned)

References

Details

Attachments

(2 files)

This tool allows you to get info about the panda and allows you to manage the boards (like power cycling and re-imaging).

Here's an early output:
Armens-MacBook-Air:scripts armenzg$ python mobile_buildduty.py -d panda-0165 -d panda-0166 -q
name:         panda-0165
foopy:        foopy45
state:        free
log:          http://mobile-imaging-001.p1.releng.scl1.mozilla.com/ui/log.html?device=panda-0165
last status:  2012-12-13T09:40:10       bmm     initiating power cycle

name:         panda-0166
foopy:        foopy45
state:        free
log:          http://mobile-imaging-001.p1.releng.scl1.mozilla.com/ui/log.html?device=panda-0166
last status:  2012-12-13T09:22:08       bmm     ping of panda-0166.p1.releng.scl1.mozilla.com complete: failed
This does some of the basic scenarios like collecting data for a panda and allowing for rebooting.

I have put the slow requests behind a --verbose flag.

I have to find a way to not need a credentials file with the password for my encrypted personal key.
I will have to research paramiko to determine what I can do instead.

I have to remove the hard coded values.
I don't know if to checkout the tools and buildbot-configs if they don't exist and then maintain them and/or allow the users to specify their own local repos and/or just pull the two files I need from hgweb/raw-file.

I have some code paths for tegras and panda-androids but I have not really developed them.

I'm not extremely happy on the prettiness of this but it does the job.

I will continue developing this but I will consider this my base diff to work from.
Assignee: nobody → armenzg
Status: NEW → ASSIGNED
Attachment #692344 - Flags: feedback?(bugspam.Callek)
Attachment #692344 - Flags: feedback?(aki)
This log shows how panda-0169 was down. I rebooted it. I stop its' buildbot instance and started it again.
Comment on attachment 692344 [details] [diff] [review]
mobile buildduty script - take 1

>+# Requirments:

sp: requirements

>+# - Python 2.7
>+# - Paramiko (pip install paramiko) for SSH connections
>+import getpass
>+import paramiko

1. We're already creating a virtualenv for requests, so we could add paramiko to the list.  Mozharness tries to be runnable out of the box, which is why it creates a virtualenv for you.

To change the set of python modules we install in the virtualenv, set

    'virtualenv_modules': ['requests', 'paramiko']

and when you run the create-virtualenv action it will install both for you.

2. I think we actually don't need paramiko at all.  I'll comment further on this below.

>+const_devices_path="/Users/armenzg/repos/tools/buildfarm/mobile/devices.json"
>+const_production_config_path="/Users/armenzg/repos/buildbot-configs/mozilla-tests/production_config.py"
>+const_gettac_url="http://slavealloc.build.mozilla.org/gettac"
>+keyfile="/Users/armenzg/.ssh/id_dsa"

These all seem like candidates to go in self.config via either __init__ or a config file.

>+class MobileBuilddutyHelper(BaseScript, VirtualenvMixin, MozpoolMixin):

You were asking why you need to specify VirtualenvMixin.  The reason is MozpoolMixin could inherit it for you, but doesn't.  I'd be ok with making MozpoolMixin inherit VirtualenvMixin by default, or leaving it and allowing for a script that requires 'requests' be installed before running.

>+    config_options = [
>+        [["-d", "--device"], {
>+            "dest":   "devices",
>+            "action": "append",
>+            "help":   "List of devices to manage",
>+        }],

Try "extend" instead of "append" here.  It lets you add devices via either

    -d device1 -d device2 -d device3

or

    -d device1,device2,device3

or

    -d device1,device2 -d device3

etc.

>+        [["--info"], {
>+            "dest":   "info_action",
>+            "action": "append",
>+            "help":   "Informational actions to be taken for a device",
>+        }],
>+        [["--buildbot"], {
>+            "dest":   "buildbot_action",
>+            "help":   "Manage a device's buildbot instance",
>+        }],
>+        [["--do"], {
>+            "dest":   "do_action",
>+            "help":   "Manage power and imaging of the device",
>+        }],

I'm not sure these are intuitive names, but I've seen worse :)

I'm also not quite sure what you're trying to do here.
It seems like you're trying to avoid mozharness actions and invent your own.

Instead of trying to invent your own way of doing actions, why don't you do this?

    'all_actions': [
        'clobber',
        'create-virtualenv',
        'info',
        'buildbot',
        'reboot',
        'usage',
    ],
    'default_actions': [
        'usage',
    ],

This would mean if you run mobile_buildduty.py without any action commandline options, you'll just get the output from usage() (which you can define yourself).

If you want to get info on a set of devices, you can

    mobile_buildduty.py -d device1,device2,device3 --info

etc.

If you want to keep this layout, I'd suggest you use 'choice' rather than 'append' since that allows you to limit the list to a valid set of choices, rather than letting the user type anything as a --buildbot value, for example.

e.g. http://hg.mozilla.org/build/mozharness/file/65c0b5be50ef/scripts/sign_android.py#l73

>+    mozpool_api_url="http://mobile-imaging-001.p1.releng.scl1.mozilla.com"

This seems like something that might belong in a config file, if this could foreseeably change or be different for certain machines compared to others.

>+    devices_json=None

I don't think you need this line, but it doesn't hurt.

>+        assert self.config.has_key("devices"), "You have to specify at least one --device"

self.fatal() ?

>+        self.devices_json = json.load(open((const_devices_path)))

>+        for device in self.config["devices"]:
>+            assert device in self.devices_json, "%s does not appear on devices.json" % device

self.fatal() ?

>+        # XXX we should have a way of not needing a password around to decrypt my key

ssh-agent.

>+        self.options = {
>+            "buildbot": ["start", "stop", "restart"],
>+            "do": ["reboot"],
>+            "info": ["ping"],
>+        }

These would go in the 'choices' list above.

>+        self.log_obj = None
>+        shutil.copyfile(const_production_config_path, os.path.join(os.getcwd(), "production_config.py"))

self.copyfile() ?

>+        from production_config import SLAVES

This seems like you'd have to have a specific directory layout or PYTHONPATH.

If you want to be more userfriendly, you can clone/update buildbot-configs in a pull action (so you always have an up-to-date copy) and execfile production_config.

http://docs.python.org/2/library/functions.html#execfile
http://hg.mozilla.org/build/mozharness/file/65c0b5be50ef/mozharness/base/config.py#l125

>+    def main(self):
>+        mph = self.query_mozpool_handler()
>+        for device in self.config["devices"]:
>+            # BUILDBOT ACTIONS
>+            if self.config.has_key("buildbot_action"):
>+                action = self.config["buildbot_action"]
>+                assert action in self.options["buildbot"], "You have to use the right acctions"

self.fatal()

also, sp: actions

>+                results = self._run_remotely("%s.build.mozilla.org" % self._query_foopy(device), \
>+                    "/builds/manage_buildslave.sh %s %s" % (action, device))

See below; I think you can use get_output_from_command().

>+                pprint.pprint(results)

You're trying to get around using mozharness logging, so ok.

>+            # INFO ACTIONS
>+            elif self.config.has_key("info_action"):
>+                action = self.config["info_action"]
>+                assert action in self.options["info"], "You have to use the right actions."

self.fatal()
You would get this check for free with 'choice'.

>+                self._info_for_device(device, verbose=self.config.get("verbose", False))

Why are you passing self.config('verbose') here, rather than checking self.config.get('verbose') inside of _info_for_device() ?  Both methods have full access to all of self.config.

>+            # DO ACTIONS
>+            elif self.config.has_key("do_action"):
>+                action = self.config["do_action"]
>+                assert action in self.options["do"], "You have to use the right actions."

See above.

>+                self._info_for_device(device, self.config.get("verbose", False))

See above.

>+    def _query_pool(self, device):
>+        fqdn = socket.getfqdn(device)
>+        panda_in_rack = re.search(".*p1.releng.scl1.mozilla.com", fqdn)
>+        tegra = re.search(".*build.mtv1.mozilla.com", fqdn)

These hardcodes might be better either at the top of the file, or in config, so we can change easily when these assumptions change.

>+        if panda_in_rack:
>+            # The device has to exist in one of the two production pools
>+            # XXX: Perhaps we need to add support for the staging pool as well
>+            if device not in self.pools["android_panda"] + self.pools["b2g_panda"]:
>+                return "unable_to_determine_running_os"

Trying to figure out if this is a special string ; if you even have to have this be a single word ; or if you could write this as a more readable spaced phrase.  (Or even a self.warning() or self.error())

>+            # XXX: It is one of the pandas that has not yet been racked
>+            return "android_panda_no_mozpool"

Here too.

>+            if verbose:
>+                # querying the device details takes time
>+                details = mph.query_device_details(device)
>+                # pinging takes 1/2 second
>+                ping_state = mph.device_ping(device)["success"]
>+                # finding the last reboot requires loading the log which takes time
>+                last_reboot = self._query_last_reboot(device)
>+                ping = mph.device_ping(device)["success"]
>+                print "last reboot:  %s" % last_reboot 
>+                print "environment:  %s" % details["environment"]
>+                print "comment:      %s" % details["comments"]
>+                print "ping:         %s" % ping 
>+                print "long status:  " 
>+                for i in log:
>+                    print "%s %s" % (i["timestamp"], i["message"])
>+                if ping_state == False and status["state"] == "free":
>+                    print "\nBUILDDUTY, you're recommended to reboot the device.\n"

I think you're trying to make sure you can use stdout to print brief information, but if something in the script breaks it'll be hard to debug what's wrong.  If I were writing this I'd consider leaving the mozharness output alone and creating a report file.

>+    def _query_buildbot_master(self, device):
>+        # XXX perhaps is best to get the contents directly through SSH but it adds extra time
>+        # since the pandas and tegras don't retrieve their tac on every job
>+        content = urllib2.urlopen("%s/%s" % (const_gettac_url, device)).readlines()

self.download_file()?

>+    def _run_remotely(self, host, command):
>+        ssh = paramiko.SSHClient()
>+        ssh.load_system_host_keys()
>+        ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
>+        ssh.connect(host, username='cltbld', key_filename=keyfile, password=self.key_password)
>+        results = ssh.exec_command(command)
>+        stdout = results[1].readlines()
>+        ssh.close()
>+        return stdout

What exactly are you getting out of this?
It seems like you're gaining

* accepting the host key, which we could either work around by providing a jumphost, or providing a valid set of host keys in a text file.

* passing a key password, which ssh-agent can do for you
http://www.openbsd.org/cgi-bin/man.cgi?query=ssh-agent&sektion=1

Other than that it seems like you're basically

    self.get_output_from_command(['ssh', '-i', self.config['ssh_key'], '-l',
                                  'cltbld', host, command])


I can really tell you're fighting mozharness every step of the way here.  It's up to you how you approach it, but I can say that when I've seen people use mozharness properly it seems to go well, and when people fight it it seems to go poorly.  I'm not entirely sure if the problem is that mozharness doesn't support what you're trying to do, or if there's a lack of understanding of what can be done inside mozharness, or what.
Attachment #692344 - Flags: feedback?(aki) → feedback+
Priority: -- → P2
Depends on: 827545
Priority: P2 → P3
Attachment #692344 - Flags: feedback?(bugspam.Callek)
Later down the road. There's not much managing though because of mozpool.
Querying info is useful though.
Assignee: armenzg → nobody
Status: ASSIGNED → NEW
Comment on attachment 692344 [details] [diff] [review] [diff] [review]
mobile buildduty script - take 1

---------AUTOMATIC COMMENT---------
--  filter pep8-callek-june-2013 --
---------AUTOMATIC COMMENT---------

$pep8 <dir> --diff --max-line-length=159 --show-source < attachment.diff
/tmp/tmp.8jrTe0a2sb/scripts/mobile_buildduty.py:45:19: E225 missing whitespace around operator
const_devices_path="/Users/armenzg/repos/tools/buildfarm/mobile/devices.json"
                  ^
/tmp/tmp.8jrTe0a2sb/scripts/mobile_buildduty.py:46:29: E225 missing whitespace around operator
const_production_config_path="/Users/armenzg/repos/buildbot-configs/mozilla-tests/production_config.py"
                            ^
/tmp/tmp.8jrTe0a2sb/scripts/mobile_buildduty.py:47:17: E225 missing whitespace around operator
const_gettac_url="http://slavealloc.build.mozilla.org/gettac"
                ^
/tmp/tmp.8jrTe0a2sb/scripts/mobile_buildduty.py:48:8: E225 missing whitespace around operator
keyfile="/Users/armenzg/.ssh/id_dsa"
       ^
/tmp/tmp.8jrTe0a2sb/scripts/mobile_buildduty.py:50:1: E302 expected 2 blank lines, found 1
class MobileBuilddutyHelper(BaseScript, VirtualenvMixin, MozpoolMixin):
^
/tmp/tmp.8jrTe0a2sb/scripts/mobile_buildduty.py:76:20: E225 missing whitespace around operator
    mozpool_api_url="http://mobile-imaging-001.p1.releng.scl1.mozilla.com"
                   ^
/tmp/tmp.8jrTe0a2sb/scripts/mobile_buildduty.py:80:17: E225 missing whitespace around operator
    devices_json=None
                ^
/tmp/tmp.8jrTe0a2sb/scripts/mobile_buildduty.py:100:27: W601 .has_key() is deprecated, use 'in'
        assert self.config.has_key("devices"), "You have to specify at least one --device"
                          ^
/tmp/tmp.8jrTe0a2sb/scripts/mobile_buildduty.py:127:27: W601 .has_key() is deprecated, use 'in'
            if self.config.has_key("buildbot_action"):
                          ^
/tmp/tmp.8jrTe0a2sb/scripts/mobile_buildduty.py:130:98: E502 the backslash is redundant between brackets
                results = self._run_remotely("%s.build.mozilla.org" % self._query_foopy(device), \
                                                                                                 ^
/tmp/tmp.8jrTe0a2sb/scripts/mobile_buildduty.py:131:21: E128 continuation line under-indented for visual indent
                    "/builds/manage_buildslave.sh %s %s" % (action, device))
                    ^
/tmp/tmp.8jrTe0a2sb/scripts/mobile_buildduty.py:134:29: W601 .has_key() is deprecated, use 'in'
            elif self.config.has_key("info_action"):
                            ^
/tmp/tmp.8jrTe0a2sb/scripts/mobile_buildduty.py:139:29: W601 .has_key() is deprecated, use 'in'
            elif self.config.has_key("do_action"):
                            ^
/tmp/tmp.8jrTe0a2sb/scripts/mobile_buildduty.py:142:26: E225 missing whitespace around operator
                if action=="reboot":
                         ^
/tmp/tmp.8jrTe0a2sb/scripts/mobile_buildduty.py:181:32: W291 trailing whitespace
            log = status["log"]
                               ^
/tmp/tmp.8jrTe0a2sb/scripts/mobile_buildduty.py:185:44: E222 multiple spaces after operator
            print "log url:      %s/%s%s" %  (self._query_mobile_imaging_server(device), "ui/log.html?device=", device)
                                           ^
/tmp/tmp.8jrTe0a2sb/scripts/mobile_buildduty.py:195:55: W291 trailing whitespace
                print "last reboot:  %s" % last_reboot
                                                      ^
/tmp/tmp.8jrTe0a2sb/scripts/mobile_buildduty.py:198:48: W291 trailing whitespace
                print "ping:         %s" % ping
                                               ^
/tmp/tmp.8jrTe0a2sb/scripts/mobile_buildduty.py:199:39: W291 trailing whitespace
                print "long status:  "
                                      ^
/tmp/tmp.8jrTe0a2sb/scripts/mobile_buildduty.py:202:31: E712 comparison to False should be 'if cond is False:' or 'if not cond:'
                if ping_state == False and status["state"] == "free":
                              ^
/tmp/tmp.8jrTe0a2sb/scripts/mobile_buildduty.py:207:44: E222 multiple spaces after operator
            print "log url:      %s/%s%s" %  (self._query_mobile_imaging_server(device), "ui/log.html?device=", device)
                                           ^
/tmp/tmp.8jrTe0a2sb/scripts/mobile_buildduty.py:246:68: E711 comparison to None should be 'if cond is not None:'
        if not self._query_pool(device).startswith("b2g") and text != None:
                                                                   ^
Product: mozilla.org → Release Engineering
These have all been migrated over to mozpool.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: