Closed Bug 650890 Opened 13 years ago Closed 11 years ago

port remote talos to mozharness


(Release Engineering :: General, defect, P1)



(Not tracked)



(Reporter: mozilla, Unassigned)



(Whiteboard: [mozharness][talos][android][tegras][mozharness+talos])


(3 files, 1 obsolete file)

This should get complexity out of buildbotcustom and allow the a-team,
developers, and releng to all use the same scripts.

The dependency on bug 650887 isn't a hard dependency.
Assignee: nobody → aki
In progress: .
Blocks: 651974
Removing 'sut' from the summary.
Standalone users will most likely be using adb instead of sut.
We're currently considering altering our Tegra setup to only one apk, that starts IP-based adbd on boot.

The benefits here include:

* adds more android-specific functionality
* removes the need for logcat in a separate process for stdout

The drawbacks include:

* if we add another device platform, SUT would be the lowest common denominator.  all the work we do with adb will be Android-specific and cannot carry over.

I'm going to concentrate on adb for now, while keeping in mind that I may need to shoehorn sut functionality in later.
OS: Android → MeeGo
Summary: port remote sut talos to mozharness → port remote talos to mozharness
OS: MeeGo → Android
Depends on: 698957
Depends on: 698959
Depends on: 698961
Blocks: 713003
Blocks: 713055
Priority: P4 → P1
Whiteboard: [mozharness][talos][android][tegras] → [mozharness][talos][android][tegras][mozharness+talos]
Depends on: 738824
Larger steps to do here:

* Merge latest mozharness changes into the old talosrunner branch (DONE)
* Get working again locally with current talos.
** This was working on tegras-over-sut and tablets-over-adb previously

Once I get re-familiarized with the script I may find other things that need doing; it's been a while since I touched this code.  I think I was waiting for --develop to work properly before allowing for using that local python webserver; I think that's been resolved.

I think I also found a bunch of tests that weren't running properly, but that may have been tablet-vs-tegra.  Enough has changed in Talos and Fennec land that I'll have to get a new baseline.

I could then run these directly on the Foopies through buildbot, but that doesn't solve a lot of the existing problems.  To solve those problems, I need to modify how we use the tegras:

* bug 738824 - tegra library webapp
* Add logic to check out + check in tegras to mozharness
* Take a small number of tegras out of production and put them into this pool.
* Run for a while in staging to see what kind of stability + numbers we're seeing
* Determine where we're running these: on test slaves? On Foopies but without sut_tools scripts?  The former would be better but would require more optimization to prevent Android tests from taking over our desktop test slave pools.
** To optimize for test slaves: instead of running tests directly in the script body, create objects to run the tests on the tegras, and run multiple tests in parallel.
** Those objects would have separate log files and OutputParser's and would be independent of each other.
** If we do this, we need to rethink how we get jobs from buildbot to the script, and how we get status from the script to TBPL, since both buildbot and TBPL assume one test suite == one job.

The closest I have to this in mozharness are OutputParser and MercurialVCS currently.  But splitting out discrete tasks to multiple parallel objects would help speed up a number of things, like downloads, repacks, or multiple parallel test suites.
Here's what I have in OmniFocus for device_talosrunner.
It's been a while, so I'm not entirely what some of them mean right now.

* talosrunner on test slaves no foopies blog post
* figure out how to determine own/device's IP addresses
* check in / check out tegra solution
  (get rid of flag files?)
* if --develop
* webserver pidfile
  (__main__ try/except?
  looks like wlach/jmaher might have it solved )
* which pool to run on?
* add FQDN to tegras.json
* multiple device support in talos?
* device flags, or lack thereof
* get working with new talos (just added this one)
* changing mozhttpd ports
In my github talosrunner branch, I've created a mozharness/test/ that I use to talk to devices over adb or sut.  I think this belongs in a directory with and

I've moved mozharness.mozilla.talos to mozharness.mozilla.testing.talos and mozharness.mozilla.testing to mozharness.mozilla.testing.testbase (really not sold on this name; open to suggestions).

If this is cool, I'll then merge back into my talosrunner branch, then move mozharness.test.device to mozharness.mozilla.testing.device .
Attachment #614221 - Flags: review?(jhammel)
While trying to get to inherit Talos, I found that I couldn't override the various items sent to BaseScript.__init__().

This patch includes the above move to mozharness.mozilla.testing.talos, and uses kwargs to allow for a more overrideable Talos object.
Attachment #614221 - Attachment is obsolete: true
Attachment #614221 - Flags: review?(jhammel)
Attachment #614238 - Flags: review?(jhammel)
Comment on attachment 614238 [details] [diff] [review]
same, but with override-friendly __init__()

+    def __init__(self, **kwargs):
+        if 'config_options' not in kwargs:
+            kwargs['config_options'] = self.config_options
+        if 'all_actions' not in kwargs:
+            kwargs['all_actions'] = self.actions
+        if 'default_actions' not in kwargs:
+            kwargs['default_actions'] = self.actions
+        if 'config' not in kwargs:
+            kwargs['config'] = {}
+        if 'virtualenv_modules' not in kwargs['config']:
+            kwargs['config']['virtualenv_modules'] = ["talos", "mozinstall"]

dict.setdefault is probably much cleaner than all of these if checks . Other than that, looks fine
Comment on attachment 614238 [details] [diff] [review]
same, but with override-friendly __init__()

r+ if you change this to use setdefault
Attachment #614238 - Flags: review?(jhammel) → review+
Comment on attachment 614238 [details] [diff] [review]
same, but with override-friendly __init__()

Attachment #614238 - Flags: checked-in+
Depends on: 748197
Depends on: 749042
Assignee: aki → bugspam.Callek
No longer blocks: 713055
Can this be closed, and if not, what's left to do? (And can we help)
No this cannot be closed, we still need to get remote talos working on mozharness, I *hope* to dive back into this work by EOW, but it is an AT RISK Q2 goal at this point.
Attached patch v 0.5Splinter Review
So, this has been tested from a foopy, most of the mozharness specific stuff works on windows as well, but Talos itself is broken on windows for remote.

The items I tested on the foopy were tSVG, and tdhtml -- so far.

This does not yet install robocop, nor does it do all the cleanup/verify stuff we wrote into sut_tools, because that is *specific* to the directory layout of the foopies so far, so will need additional work to get all of it into here, hopefully rxdroid makes that sane for us here.

And this will need some additional cleanup, of course. But all in all, works, though is not ready for production.

I have included all changes against base mozharness repo, including the changes aki made on the github-fork talosrunner branch.
Attachment #640119 - Flags: review?(aki)
Attachment #640119 - Flags: feedback?(jmaher)
Attachment #640119 - Flags: feedback?(jlund)
Comment on attachment 640119 [details] [diff] [review]
v 0.5

Review of attachment 640119 [details] [diff] [review]:
----------------------------------------------------------------- looks like a good place for my work in to help fill in.

Overall, this is looking pretty good.

::: configs/users/aki/
@@ +17,5 @@
> +    "server_port": None,
> +    "tracer_threshold": 50,
> +    "tracer_interval": 10,
> +    "symbols_path": None,
> +}

why is this file included in here?

::: configs/users/aki/
@@ +16,5 @@
> +    "talos_config_file": "remote.config",
> +
> +    # this needs to be set to either your_IP:8000, or an existing webserver
> +    # that serves talos.
> +    "talos_webserver": "",

I really don't like this hardcoded ip

@@ +22,5 @@
> +    # Set this to start a webserver automatically
> +    "start_python_webserver": True,
> +
> +    # adb or sut
> +    "device_protocol": "adb",

we have all been leaning towards SUT, is there a reason for defaulting to adb?

::: configs/users/aki/
@@ +18,5 @@
> +    "talos_config_file": "remote.config",
> +
> +    # this needs to be set to either your_IP:8000, or an existing webserver
> +    # that serves talos.
> +#    "talos_webserver": "",

would be nice to remove this commented out line.

::: configs/users/callek/
@@ +10,5 @@
> +    "device_package_name": "org.mozilla.fennec",
> +    "talos_device_name": "tegra-224",
> +    "virtualenv_modules": ["pywin32", "talos"],
> +    "exes": { "easy_install": ['d:\\Sources\\mozharness\\build\\venv\\Scripts\\python.exe',
> +                               'd:\\Sources\\mozharness\\build\\venv\\scripts\\'], },

hard coded paths?

::: mozharness/base/
@@ +618,5 @@
>              self.copy_to_upload_dir(os.path.join(dirs['abs_log_dir'], log_file),
>                                      dest=os.path.join('logs', log_file),
>                                      short_desc='%s log' % log_name,
> +                                    long_desc='%s log' % log_name,
> +                                    rotate=True)

what is rotate used for in copy to upload dir?

::: scripts/
@@ +138,5 @@
> +            additional_options.extend(['--remotePort', '-1'])
> +        if c.get('start_python_webserver'):
> +            additional_options.append('--develop')
> +#        if c.get('repository'):
> +#            additional_options.append('repository', c['repository'])

nice!  put a comment in here that we could delete unpack()
Attachment #640119 - Flags: feedback?(jmaher) → feedback+
Blocks: 772959
Comment on attachment 640119 [details] [diff] [review]
v 0.5

> new file mode 100644
> --- /dev/null
> +++ b/configs/users/callek/
> @@ -0,0 +1,53 @@
> +config = {^M
> +    "log_name": "talos",^M

Nit: I'd prefer that your were in unix format rather than dos, though it is explicitly noted as your test config.

Same for your

>+                      'generate_config',

this should be generate-config.

>+'copying %s to %s' % (inifile, remoteappini))
>+        self.run_command(['cp', inifile, remoteappini])

self.copyfile() ?

>         except ImportError, e:
-            self.log("Can't import DeviceManagerSUT! %s\nDid you check out talos?" % str(e), level=error_level)
-            raise
+            self.fatal("Can't import DeviceManagerSUT! %s\nDid you check out talos?" % str(e), level=error_level)

self.fatal() doesn't take a level argument.  This is a fine change, but we can get rid of error_level references in this method with this change.

>-            self.fatal("dev_root %s not correct!" % str(dev_root))
>+            self.fatal("dev_root %s not correct!" % str(dev_root))        

Nit: kill the trailing whitespace, please?

>+            dm.getInfo('process')
>+            dm.getInfo('memory')
>+            dm.getInfo('uptime')

Are these useful for us or developers?
Aiui, this causes more noise than we usually want.
If these are useful, let's keep them with a comment that we want to get these in the log proper at some point.  If not, let's remove them.

>+            try:
>+      'process')))
>+      'memory')))
>+      'uptime')))
>+      ['exec su -c "logcat -d -v time *:W"'])))

Here too.  Does this work?  (I'm under the impression that dm prints these to screen, and doesn't return the strings, but I might be wrong.)

>+            self.fatal("Remote Device Error: updateApp() call failed - exiting")

Nit: Trailing whitespace.

>-        if c['device_type'] not in ("tegra250",):
>+        if c['device_type'] not in ("tegra250",) or True:

What's going on here?

I think the main point here is getting this working and merged in for future fixes, so for the most part I think this can land with the above fixed.

This carries some cruft from my talosrunner branch circa mid-late 2011.
I think I never really fleshed out device flags here and thought about ditching them.
I also added default log rotation that doesn't actually work past a single backup in base/  I think we can land these without issue, as long as we target these for cleanup later.

Jordan: if this causes conflicts with your code, I can help unbitrot.
Attachment #640119 - Flags: review?(aki) → review+
This goes on top of v 0.5.
I also snuck in a multilocale fix that pylint was complaining about.
This gets the call down to 4 pyflakes warnings:

mozharness/mozilla/testing/ local variable 'p' is assigned to but never used
mozharness/mozilla/testing/ 'DMError' imported but unused
mozharness/mozilla/testing/ undefined name 'DMError'
mozharness/mozilla/testing/ undefined name 'DMError'

I'm not entirely sure how to deal with those, so I'm leaving them for the moment.
Attachment #643642 - Flags: review?(bugspam.Callek)
Comment on attachment 643642 [details] [diff] [review]
dealing with review comments

Review of attachment 643642 [details] [diff] [review]:

interdiff looks good
Attachment #643642 - Flags: review?(bugspam.Callek) → review+
Comment on attachment 640119 [details] [diff] [review]
v 0.5

sorry this will seem random. Callek, I emailed you about my lack of feedback a  while ago but never heard back. Anyway adding a plus now(even though its not relevant, helpful) to stop email spam. :)
Attachment #640119 - Flags: feedback?(jlund) → feedback+
I'm not actively working on this right now.
Assignee: bugspam.Callek → nobody
Product: → Release Engineering
I'm going to call this a dup of bug 829211 for the panda portion, and a WONTFIX for the tegra portion.
Closed: 11 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.