Closed Bug 713017 Opened 13 years ago Closed 12 years ago

mozharness talos on cedar

Categories

(Release Engineering :: Applications: MozharnessCore, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: k0scist, Assigned: mozilla)

References

Details

(Whiteboard: [mozharness][mozharness+talos])

Attachments

(4 files, 10 obsolete files)

1.00 KB, patch
Callek
: review+
mozilla
: checked-in+
Details | Diff | Splinter Review
51.45 KB, patch
k0scist
: review+
mozilla
: checked-in+
Details | Diff | Splinter Review
1.97 KB, patch
Callek
: review+
mozilla
: checked-in+
Details | Diff | Splinter Review
6.29 KB, patch
Callek
: review+
mozilla
: checked-in+
Details | Diff | Splinter Review
Once desktop talos (bug 650887) and remote talos (Bug 650890 - port remote talos to mozharness) are done, production talos may be switched over to use mozharness.  This will involve substantive alterations in buildbotcustom.  This will have to be heavily staged
we will need to run this side by side for a while with production to look for number fluctuations.
Blocks: 713055
Priority: -- → P1
Whiteboard: [mozharness] → [mozharness][mozharness+talos]
Depends on: 738478
Depends on: 741622
Assignee: nobody → aki
Status: NEW → ASSIGNED
Attached patch [custom] wip 1 (obsolete) — Splinter Review
This only increases the default script maxtime in generateTestBuilder().

There are two easy ways to get ScriptFactories for talos:

1) follow the peptest model, and go through the existing code path for generateTestBuilder() in misc.py.

Pros: This is pretty easy to do in mozilla-tests/config.py, and completely bypasses the mess in generateTalosBranchObjects() while having buildbot Talos and mozharness Talos running in parallel.

Cons: generateTestBuilder() calls assumes these are unit tests, not talos, and will be treated as such in sendchanges and trychooser.  This will also require graphserver db tweaks.

2) replace the TalosFactory creation code in generateTalosBranchObjects with a call to generateTestBuilder(), which will replace all talos builders with their mozharness equivalents.

Pros: This cleans up a lot of ugliness in generateTalosBranchObjects() and allows us to stop using the ugly TalosFactory.  We can run this on a separate master while we look at timing shifts.  Those can go to graphs-stage.

Cons: This is harder to run in parallel with TalosFactory at scale.

3) Not as easy, but possibly the direction I need to go:
We create talos_builders and talos_pgo_builders dicts.  I can create two more dicts for the mozharness equivalents.  This will balloon the number of test builders and will require graphserver db tweaks.
Attached patch [configs] wip1 (obsolete) — Splinter Review
This cleans up the peptest code to allow talos to use the same info.
It also switches peptest to py27 on windows, since it's installed on all our windows test boxes.

There's some misc cleanup in here, as well as a stub per-branch 'talos_codebase' item that I have set to buildbot, both, or mozharness.  Whether this sticks around depends on the buildbotcustom patch strategy.
Attached patch [mozharness] peptest -> py27 (obsolete) — Splinter Review
This'll need to land after the reconfig, if/when we do switch to py27 in the buildbot-configs patch.
Depends on: 774973
Comment on attachment 633322 [details] [diff] [review]
[mozharness] peptest -> py27

If we switch mozharness_python to 2.7 on windows, we'll need to land this.
Attachment #633322 - Flags: review?(jlund)
Comment on attachment 633322 [details] [diff] [review]
[mozharness] peptest -> py27

This patch has been tested for unittests with mozharness_python in buildbot-configs pointing to python 2.7 for windows platforms.
Attachment #633322 - Flags: review?(jlund) → review+
In my testing, I'm finding that

* talos hits getInfo.html and doesn't do anything with the browser (fails to shut down)
* the test slave isn't rebooting (I need to add a hg.m.o/build/tools clone or otherwise download count_and_reboot.py)
* I seem unable to keep pip from spamming the log with "Downloading X.tar.gz (Y Mb): Z% NKb" lines, despite trying a number of times.  I don't want to call pip install with -q since the rest of the install might be informative.  This should be a trivial fix and is merely cosmetic, but it bugs me.
if getInfo.html fails to shut down, then the profile is incorrectly setup.  I am sure there could be another cause, but that is the most common.
I switched from using talos-249452fec498.tar.gz to using the tip of hg.m.o/build/talos (among other changes) and it's now running tests, which is good news.
Are there reasons to use |--develop| over |--webServer localhost| ?

The former would be easier all around, but the Apache server is already running if that would make numbers more stable.
--develop uses a python webserver, probably not as optimized as --webServer localhost.  

We could benchmark the numbers for 50 runs or so using --develop and see if they produce the same results :)
If you're running from the command line (i.e. not buildbot) you want to use --develop because you don't want to assume that the local developer has apache installed. From buildbot, you want to assume apache is there if you want to continue running the way we currently do.  Othewise, we'd need to benchmark these things like Joel mentions in comment 11.
I think my above issues (comment 7 on) are resolved.
I still need to download and extract tp5n.zip, but preferably only do so on tp5n jobs.
I think this would be easier to do if I had the suites defined in talos.json; I'll work on that next.
Attached patch [configs] wip 2 (obsolete) — Splinter Review
Attachment #633314 - Attachment is obsolete: true
Attachment #633322 - Attachment is obsolete: true
Attached patch [custom] wip 2 (obsolete) — Splinter Review
Attachment #633310 - Attachment is obsolete: true
Attached patch [mozharness] wip 2 (obsolete) — Splinter Review
Attached patch [custom] wip 2, for real (obsolete) — Splinter Review
Attachment #659061 - Attachment is obsolete: true
(In reply to Aki Sasaki [:aki] from comment #13)

> I think this would be easier to do if I had the suites defined in
> talos.json; I'll work on that next.

So I'm not sure what you mean by this but I'm very intrigued
(In reply to Jeff Hammel [:jhammel] from comment #18)
> (In reply to Aki Sasaki [:aki] from comment #13)
> 
> > I think this would be easier to do if I had the suites defined in
> > talos.json; I'll work on that next.
> 
> So I'm not sure what you mean by this but I'm very intrigued

I've been talking about this off and on at mozharness meetings.
Basically, if the various talos suite configs are moved out of buildbotcustom, into the tree as part of talos.json, then they can merge with the tree and people can test changes via Try etc.

That would also allow us to do more suite-specific configuration like 'tp5n needs tp5n.zip but no other suite does' and '"other" needs a longer timeout'.
Attached patch [wip] cedar talos.json (obsolete) — Splinter Review
Here I'm ditching the enable_by_default, merge, and platform data.
The first isn't relevant when we have per-tree suite info; the latter two can be implemented later if we start scheduling jobs with this information.
(In reply to Aki Sasaki [:aki] from comment #20)
> Created attachment 662365 [details] [diff] [review]
> [wip] cedar talos.json
> 
> Here I'm ditching the enable_by_default, merge, and platform data.
> The first isn't relevant when we have per-tree suite info; the latter two
> can be implemented later if we start scheduling jobs with this information.

Where is BZ hiding the "like" button from me? :D woohoo!
/me also likes;  from the point of view of trying to explain what buildbot suite corresponds to which talos jobs are run, a la http://k0s.org:8080/ , twould be much easier and more reliable to have this in a JSON file vs my complicated solution for reading our buildbot-configs
Depends on: 792518
Attached patch [mozharness] wip 3 (obsolete) — Splinter Review
Using http://people.mozilla.org/~asasaki/talos.json currently, which was updated to include the old talos.zip url (amongst other things) so it should be backwards compatible.  In theory that means we could land this and let it merge out to other branches before flipping the switch on the buildbot side.

Using the talos json means there are now many ways config can get in: buildprops.json, talos.json, the config file, command line options, and the script defaults.  Hence the added complexity of the code.

I'm now downloading and extracting the addons, plugins, symbols, and pagesets as required by each suite.  Since the talos.json defines the basic PerfConfigurator options, I was able to add per-suite test timeouts.

I still need to deal with some OSX screen resolution changes (which is why I split the linux config file apart from the osx one).  I still need to download the talos.json from the tree (from the revision the build is from, in fact).  I plan to do this by looking at the build .txt file (e.g. http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-macosx64/1347909661/firefox-18.0a1.en-US.mac.txt ) and send that through a query_talos_json_url() method.

I have new unbitrotted custom+configs patches as well; if you want to see those I can attach them as well.
Attachment #659064 - Attachment is obsolete: true
Attachment #662365 - Attachment is obsolete: true
Verified that landing the talos.json in Cedar (with non-mozharness talos) doesn't break buildbot-talos.
https://tbpl.mozilla.org/?tree=Cedar&rev=2e3c1ce191f5
http://hg.mozilla.org/projects/cedar/file/2e3c1ce191f5/testing/talos/talos.json
Attachment #663184 - Flags: review?(bugspam.Callek)
Attachment #663184 - Flags: review?(bugspam.Callek) → review+
Comment on attachment 663184 [details] [diff] [review]
lock cedar platforms to opt desktop only for now

http://hg.mozilla.org/build/buildbot-configs/rev/ba5e06245aac
Attachment #663184 - Flags: checked-in+
I'm still seeing red on tpn and other (other may just be a test_timeout bump).
I'm considering rolling this out to Cedar anyway, so others can help debug this.
Depends on: 794587
Attached patch mozharness mhtalos branch (obsolete) — Splinter Review
Sorry for the 52k patch.
If you need me to split this up I can.  History is viewable here:
http://hg.mozilla.org/users/asasaki_mozilla.com/mozharness (mhtalos branch, since about June)
I got this diff via

    hg diff -r default

when on mhtalos rev bca55d374f60.

I still need to add some Fedora-specific commands, at least, but I think I can do that in a smaller add-on patch later.

Feel free to add others to the review if you feel appropriate.

This patch:

* adds linux, mac, and windows config files
* moves the unittest pre-test checks to TestingMixin so we can take advantage of it in Talos
* makes virtualenv downloads less noisy
* allows Talos to populate the test slaves' webroot by untarring the talos/ subdirectory from the talos zip
* allows for a different set of virtualenv modules than self.config['virtualenv_modules'], since we're getting talos_url from the talos.json
* adds a generic pull() to VCSScript
* adds talos json support to Talos, with the added complexity that brings
* adds symbol download+extract to TestingMixin

Known issues:

* bug 794587 (tpaint disabled)
* I still have to add some more platform-specific stuff to the config files,
e.g. Fedora reboot fixes.  I should be able to do this like the SCREEN_RESOLUTION_CHECK in the mac config.

I figured it would be best to get this review started earlier, and deal with those things later.
Attachment #662769 - Attachment is obsolete: true
Attachment #665266 - Flags: review?(jhammel)
Oops, wrong patch
Attachment #665266 - Attachment is obsolete: true
Attachment #665266 - Flags: review?(jhammel)
Attachment #665269 - Flags: review?(jhammel)
Also a bit of whitespace cleanup for linux64.
Attachment #665551 - Flags: review?(bugspam.Callek)
This patch also cleans up some android-xul cruft.
Attachment #659060 - Attachment is obsolete: true
Attachment #659065 - Attachment is obsolete: true
Attachment #665556 - Flags: review?(bugspam.Callek)
Summary: get buildbotcustom to use mozharness + production talos → mozharness talos on cedar
Comment on attachment 665269 [details] [diff] [review]
mozharness mhtalos branch

diff --git a/configs/talos/linux_config.py b/configs/talos/linux_config.py
new file mode 100644
--- /dev/null
+++ b/configs/talos/linux_config.py
@@ -0,0 +1,5 @@
+import os
+import socket
+
+PYTHON = '/tools/buildbot/bin/python'
+VENV_PATH = '%s/talos-slave/test/build/venv' % os.environ['HOME']

Is there a reason to us os.environ['HOME'] vs verbosely specifying the
path, since this config is machine-specific anyway?

+config = {
+    "log_name": "talos",
+    "buildbot_json_path": "buildprops.json",
+    "installer_path": "installer.exe",
+    "virtualenv_path": VENV_PATH,
+    "pypi_url": "http://puppetagain.pub.build.mozilla.org/data/python/packages/",

While this works, since of course this isn't a pypi page, there's no
real advantage of pointing here vs, say, a blank web page.  I'd be
more inclined to explicitly disable fetching from pypi e.g. if this is
None or what not, which is at least doable from pip (not sure about
easy_install OTTOMH).

diff --git a/configs/talos/mac_config.py b/configs/talos/mac_config.py
new file mode 100644
--- /dev/null
+++ b/configs/talos/mac_config.py
<snip/>
+PYTHON = '/tools/buildbot/bin/python'
+VENV_PATH = '%s/talos-slave/test/build/venv' % os.environ['HOME']

It would be really nice to have a way such that all of this
information wasn't repeated multiple places.

diff --git a/mozharness/base/python.py b/mozharness/base/python.py
--- a/mozharness/base/python.py
+++ b/mozharness/base/python.py
@@ -235,21 +235,24 @@ class VirtualenvMixin(object):
 
         virtualenv_options = c.get('virtualenv_options',
ct                                           ['--no-site-packages', '--distribute'])
 
         self.run_command(virtualenv + virtualenv_options + [venv_path],
                          cwd=dirs['abs_work_dir'],
                          error_list=VirtualenvErrorList,
                          halt_on_failure=True)
-        for module in c.get('virtualenv_modules', []):
+        if not modules:
+            modules = c.get('virtualenv_modules', [])
+        for module in modules:
             module_url = module
             if isinstance(module, dict):
                 (module, module_url) = module.items()[0]
-            module_url = self.config.get('%s_url' % module, module_url)
+            else:
+                module_url = self.config.get('%s_url' % module, module_url)

IMHO, I would like this reworked a bit.  See also comments on the only
consumer, that being mozharness.mozmill.testing.talos . To me it is a
bit mysterious that if modules are passed that we completely ignore
the configuration.  That said, it *seems* that the method expects that
if modules is passed, that they be equivalent, at least in spirit, to
virtualenv_modules.

diff --git a/mozharness/mozilla/testing/talos.py b/mozharness/mozilla/testing/talos.py
--- a/mozharness/mozilla/testing/talos.py
+++ b/mozharness/mozilla/testing/talos.py
@@ -26,9 +28,15 @@ TalosErrorList = PythonErrorList + [
  {'regex': re.compile(r'''No machine_name called '.*' can be found'''), 'level': CRITICAL},
  {'substr': r"""No such file or directory: 'browser_output.txt'""",
   'level': CRITICAL,
   'explanation': r"""Most likely the browser failed to launch, or the test was otherwise unsuccessful in even starting."""},
 ]
 
 # TODO: check for running processes on script invocation
 
-class Talos(TestingMixin, BaseScript):
+class TalosOutputParser(OutputParser):
+    def parse_single_line(self, line):
+        """ In Talos land, every line that starts with RETURN: needs to be
+        printed with a TinderboxPrint:"""
+        if line.startswith("RETURN:"):
+            line.replace("RETURN:", "TinderboxPrint:")
+        super(TalosOutputParser, self).parse_single_line(line)

Is this something that should be done in talos core?

@@ -54,52 +64,98 @@ class Talos(TestingMixin, BaseScript):
<snip/> 
+    def query_talos_json_url(self):
+        """Hacky, but I haven't figured out a better way to get the
+        talos json url before we install the build.
+
+        We can't get this information after we install the build, because
+        we have to create the virtualenv to use mozinstall, and talos_url
+        is specified in the talos json.
+        """
+        if self.talos_json_url:
+            return self.talos_json_url
+        self.info("Guessing talos json url...")
+        if not self.installer_url:
+            self.read_buildbot_config()
+            self.postflight_read_buildbot_config()
+            if not self.installer_url:
+                self.fatal("Can't figure out talos_json_url without an installer_url!")
+        installer_re = re.compile(r'''(\.tar\.bz2|\.zip|\.dmg|\.exe|\.apk)''')
+        m = installer_re.search(self.installer_url)

Assuming that these are suffixes, I rather see a list here and loop
(or list-comprehend) through it.

+        if not m:
+            self.fatal("Can't figure out talos_json_url from installer_url %s!" % self.installer_url)
+        build_txt_url = self.installer_url.replace(m.group(1), '.txt')
+        build_txt_file = self.download_file(build_txt_url, parent_dir=self.workdir)
+        if not build_txt_file:
+            self.fatal("Can't download %s to guess talos_json_url!" % build_txt_url)
+        # HG hardcode?
+        revision_re = re.compile(r'''([a-zA-Z]+://.+)/rev/([0-9a-fA-F]{10})''')
+        contents = self.read_from_file(build_txt_file, error_level=FATAL).splitlines()
+        for line in contents:
+            m = revision_re.match(line)
+            if m:
+                break
+        if not m:

Looks like a for-else case to me

+            self.fatal("Can't figure out talos_json_url from %s!" % build_txt_file)
+        self.talos_json_url = "%s/raw-file/%s/testing/talos/talos.json" % (m.group(1), m.group(2))
+        return self.talos_json_url

This is all a bit magical but fine for now.

+    def download_talos_json(self):
+        talos_json_url = self.query_talos_json_url()
+        self.talos_json = self.download_file(talos_json_url,
+                                             parent_dir=self.workdir,
+                                             error_level=FATAL)
+
+    def query_talos_json_config(self):
+        if self.talos_json_config:
+            return self.talos_json_config
+        c = self.config
+        if not c['use_talos_json']:
+            return
+        if not c['suite']:
+            self.fatal("To use talos_json, you must define use_talos_json, suite.")
+            return
+        if not self.talos_json:
+            talos_json_url = self.query_talos_json_url()
+            if not talos_json_url:
+                self.fatal("Can't download talos_json without a talos_json_url!")
+            self.download_talos_json()
+        self.talos_json_config = parse_config_file(self.talos_json)
+        self.info(pprint.pformat(self.talos_json_config))
+        return self.talos_json_config
+
+    def query_tests(self):
+        """Determine if we have tests to run
+        """
+        if self.tests is not None:
+            return self.tests
+        c = self.config
+        if c['use_talos_json']:
+            if not c['suite']:
+                self.fatal("Can't use_talos_json without a --suite!")

Not sure if this is possible and guessing you'll oppose the idea, but
I'd prefer checking this ahead of time versus dying here.

+            talos_config = self.query_talos_json_config()
+            try:
+                self.tests = talos_config['suites'][c['suite']]['tests']
+            except KeyError, e:
+                self.error("Badly formed talos_json for suite %s: %s"
% (c['suite'], str(e)))

It'd be nice if the error more pointed to what was wrong and how to
fix it.

+        elif c['tests']:
+            self.tests = c['tests']

So if you specify config['tests'] and ['use_talos_json'] the latter
will win silently?

+        # Ignore these tests, specifically so we can not run a11yr on osx
+        if c.get('ignore_tests'):
+            for test in c['ignore_tests']:
+                if test in self.tests:
+                    del self.tests[self.tests.index(test)]

I don't really like this approach.

+        return self.tests
+
+    def query_talos_options(self):
+        options = []
+        c = self.config
+        if self.query_talos_json_config():
+            options += self.talos_json_config['suites'][c['suite']].get('talos_options', [])
+        if c['talos_options']:
+            options += c['talos_options']
+        return options
+
+    def query_talos_url(self):
+        if self.query_talos_json_config():
+            return self.talos_json_config['global']['talos_url']
+        else:
+            return self.config.get('talos_url')
+
+    def query_pagesets_url(self):
+        if self.pagesets_url:
+            return self.pagesets_url
+        if self.query_talos_json_config():
+            self.pagesets_url = self.talos_json_config['suites'][self.config['suite']].get('pagesets_url')
+            return self.pagesets_url

I find it a bit ambiguous why sometimes the talos.json wins and other
times the script config values win.

+    def query_pagesets_parent_dir_path(self):
+        if self.pagesets_parent_dir_path:
+            return self.pagesets_parent_dir_path
+        if self.query_talos_json_config():
+            self.pagesets_parent_dir_path = self.talos_json_config['suites'][self.config['suite']].get('pagesets_parent_dir_path')
+            return self.pagesets_parent_dir_path

Can you explain what this is supposed to do?  A docstring might help
(that goes for other functions too).

+    def query_pagesets_manifest_path(self):
+        if self.pagesets_manifest_path:
+            return self.pagesets_manifest_path
+        if self.query_talos_json_config():
+            self.pagesets_manifest_path = self.talos_json_config['suites'][self.config['suite']].get('pagesets_manifest_path')
+            return self.pagesets_manifest_path

Again....

     def PerfConfigurator_options(self, args=None, **kw):
         """return options to PerfConfigurator"""
-
         # binary path
         binary_path = self.binary_path or self.config.get('binary_path')
         if not binary_path:
-            self.fatal("Talos requires a path to the binary")
+            self.fatal("Talos requires a path to the binary.  You can specify binary_path or add download-and-extract to your action list.")
 
         # talos options
-        options = ['-v', '--develop'] # hardcoded options (for now)
+        options = ['-v',] # hardcoded options (for now)
+        if self.config.get('python_webserver', True):
+            options.append('--develop')
         kw_options = {'output': 'talos.yml', # options overwritten from **kw
                       'executablePath': binary_path,
-                      'activeTests': self.tests,
                       'results_url': self.results_url}
+        kw_options['activeTests'] = self.query_tests()
         if self.config.get('title'):
             kw_options['title'] = self.config['title']
+        if self.config.get('branch'):
+            kw_options['branchName'] = self.config['branch']
+        if self.symbols_path:
+            kw_options['symbolsPath'] = self.symbols_path
         kw_options.update(kw)
-
         # talos expects tests to be in the format (e.g.) 'ts:tp5:tsvg'
-        tests = kw_options['activeTests']
-        if not isinstance(tests, basestring):
+        tests = kw_options.get('activeTests')
+        if tests and not isinstance(tests, basestring):
             tests = ':'.join(tests) # Talos expects this format
             kw_options['activeTests'] = tests

Why would you want to specify this as a basestring?  At some point the
talos handling of this will change (probably, in fact, right after
mozharness becomes the production canon).

-
         for key, value in kw_options.items():
             options.extend(['--%s' % key, value])
-
         # add datazilla results urls
         for url in self.config.get('datazilla_urls', []):
             options.extend(['--datazilla-url', url])
-
         # extra arguments
         if args is None:
-            args = self.config.get('talos_options', [])
+            args = self.query_talos_options()
         options += args
 
         return options
 
     def talos_conf_path(self, conf):
         """return the full path for a talos .yml configuration file"""
         if os.path.isabs(conf):
             return conf
         return os.path.join(self.workdir, conf)
 
+    def _download_unzip(self, url, parent_dir):
+        zipfile = self.download_file(url, parent_dir=self.workdir,
+                                     error_level=FATAL)
+        command = self.query_exe('unzip', return_type='list')
+        command.extend(['-q', zipfile])
+        self.run_command(command, cwd=parent_dir, halt_on_failure=True)

This seems like something that should go somewhere elsewhere in
mozharness more generic than the talos script.

<snip/>

+    def create_virtualenv(self, **kwargs):
+        """VirtualenvMixin.create_virtualenv() assuemes we're using
+        self.config['virtualenv_modules'].  Since we're overriding talos_url
+        when using the talos json, we have to wrap that method here."""
+        if self.query_talos_json_config():
+            talos_url = self.query_talos_url()
+            virtualenv_modules = self.config.get('virtualenv_modules', [])
+            if 'talos' in virtualenv_modules:
+                i = virtualenv_modules.index('talos')
+                virtualenv_modules[i] = {'talos': talos_url}
+                self.info(pprint.pformat(virtualenv_modules))
+            return super(Talos,
self).create_virtualenv(modules=virtualenv_modules)

This also seems a bit magic to me.  I realize the reason you want to
lock config, but I hate to see code that works around this.  I would
much rather see this done, I don't know, earlier vs having a specialty
override for it.

+        else:
+            return super(Talos, self).create_virtualenv(**kwargs)
+
+    def postflight_create_virtualenv(self):
+        """ This belongs in download_and_install() but requires the
+        virtualenv to be set up :(
+
+        The real fix here may be a --tpmanifest option for PerfConfigurator.
+        """

Could you ticket, please?  We're likely to completely rearrange how
talos does all of this soon anyway.  We would like to get away from
e.g. 'tp5' being some magical thing with magical defaults, etc, and
having what a test is being defined in a more rigorous way. (Note that
it is mostly already not true that tp5 == tp5, but we pretend that it
is and we severely pay for it in development cost and frustration.)

diff --git a/mozharness/mozilla/testing/testbase.py b/mozharness/mozilla/testing/testbase.py
--- a/mozharness/mozilla/testing/testbase.py
+++ b/mozharness/mozilla/testing/testbase.py
<snip/> 
 # TestingMixin {{{1
 class TestingMixin(VirtualenvMixin, BuildbotMixin):
     """
     The steps to identify + download the proper bits for [browser] unit
     tests and Talos.
     """
 
     installer_url = None
     installer_path = None
     binary_path = None
     test_url = None
     test_zip_path = None
+    symbols_url = None
+    symbols_path = None
+
+    def query_symbols_url(self):
+        if self.symbols_url:
+            return self.symbols_url
+        if not self.installer_url:
+            self.fatal("Can't figure out symbols_url without an installer_url!")
+        installer_re = re.compile(r'''(\.tar\.bz2|\.zip|\.dmg|\.exe|\.apk)''')
+        m = installer_re.search(self.installer_url)
+        if not m:
+            self.fatal("Can't figure out symbols_url from
     installer_url %s!" % self.installer_url)

Since .fatal (AIUI) kills the run anyway, is there any point in having
an else clause?

+        else:
+            self.symbols_url = self.installer_url.replace(m.group(1), '.crashreporter-symbols.zip')
+            return self.symbols_url
 
diff --git a/scripts/talos_script.py b/scripts/talos_script.py
--- a/scripts/talos_script.py
+++ b/scripts/talos_script.py
@@ -8,8 +8,9 @@
 
 """
 
 import os
 import sys
 
 # load modules from parent dir
 sys.path.insert(1, os.path.dirname(sys.path[0]))
+#os.environ['PYTHONUNBUFFERED'] = "1"

Ummm ?  I'd prefer this not be checked in.

 from mozharness.mozilla.testing.talos import Talos
 
 if __name__ == '__main__':
     talos = Talos()
     talos.run()


I don't really like all of the whitespace reduction, as it makes it hard to read for me, but I guess we'll have to agree to disagree on that (I'm also not a huge fan of the vi folding markers).
Attachment #665269 - Flags: review?(jhammel) → review+
Thanks for the review!

(In reply to Jeff Hammel [:jhammel] from comment #33)
> +VENV_PATH = '%s/talos-slave/test/build/venv' % os.environ['HOME']
>
> Is there a reason to us os.environ['HOME'] vs verbosely specifying the
> path, since this config is machine-specific anyway?

This is a remnant from when linux and osx shared configs; fixed.
http://hg.mozilla.org/users/asasaki_mozilla.com/mozharness/rev/bf3bc1d52a1f

> +    "pypi_url":
> "http://puppetagain.pub.build.mozilla.org/data/python/packages/",
>
> While this works, since of course this isn't a pypi page, there's no
> real advantage of pointing here vs, say, a blank web page.  I'd be
> more inclined to explicitly disable fetching from pypi e.g. if this is
> None or what not, which is at least doable from pip (not sure about
> easy_install OTTOMH).

Yeah, I think we had this conversation in another bug.
This is currently the [not ideal] way to disable hitting pypi.  Setting it to None means use the default pypi server, which isn't what we want here.

Filed bug 795153 for this.

> It would be really nice to have a way such that all of this
> information wasn't repeated multiple places.

I tend to want verbose over overly normalized or programmatic; readability, annotations/blame, diffs, ease of debugging all matter more to me than having to write less.  But agreed, there should be a way to normalize if it makes sense.

I have a number of answers regarding this in the in-progress FAQ.
bug 779294 will probably help here.

> -        for module in c.get('virtualenv_modules', []):
> +        if not modules:
> +            modules = c.get('virtualenv_modules', [])
> +        for module in modules:
>              module_url = module
>              if isinstance(module, dict):
>                  (module, module_url) = module.items()[0]
> -            module_url = self.config.get('%s_url' % module, module_url)
> +            else:
> +                module_url = self.config.get('%s_url' % module, module_url)
>
> IMHO, I would like this reworked a bit.  See also comments on the only
> consumer, that being mozharness.mozmill.testing.talos . To me it is a
> bit mysterious that if modules are passed that we completely ignore
> the configuration.  That said, it *seems* that the method expects that
> if modules is passed, that they be equivalent, at least in spirit, to
> virtualenv_modules.

By default we don't pass any modules, and we use c['virtualenv_modules'], and you have to explicitly override this method in your class to override the modules=None parameter.

These changes were necessary to allow for talos_url to be specifiable in talos.json.  I imagine we'll have future consumers that want to specify different virtualenv modules, that we won't know about until midway through the script run.

Given those two points, do you have a better approach in mind?

> +        """ In Talos land, every line that starts with RETURN: needs to be
> +        printed with a TinderboxPrint:"""
> +        if line.startswith("RETURN:"):
> +            line.replace("RETURN:", "TinderboxPrint:")
> +        super(TalosOutputParser, self).parse_single_line(line)
>
> Is this something that should be done in talos core?

I'd love for that to happen.  bug 794895 too.
I'm not entirely sure what that will break atm, if anything, but we have buildbot, tbpl, and mozharness all translating talos messages into standardized messages, so doing it once at the source would be best.

> +        installer_re =
> re.compile(r'''(\.tar\.bz2|\.zip|\.dmg|\.exe|\.apk)''')
> +        m = installer_re.search(self.installer_url)
>
> Assuming that these are suffixes, I rather see a list here and loop
> (or list-comprehend) through it.

I don't have strong feelings between a list/tuple vs. regex, but I did want the list of suffixes to not be hardcoded inside these methods.

http://hg.mozilla.org/users/asasaki_mozilla.com/mozharness/rev/7bcf21de7454 to deal with both things.

> +        for line in contents:
> +            m = revision_re.match(line)
> +            if m:
> +                break
> +        if not m:
>
> Looks like a for-else case to me

Fixed

> +            self.fatal("Can't figure out talos_json_url from %s!" %
> build_txt_file)
> +        self.talos_json_url = "%s/raw-file/%s/testing/talos/talos.json" %
> (m.group(1), m.group(2))
> +        return self.talos_json_url
>
> This is all a bit magical but fine for now.

This is so I can download talos.json from hg.mozilla.org/projects/cedar/raw-file/b81b3c685d90/testing/talos/talos.json , f/e, instead of cloning the whole repo.

I don't love it, especially the hgweb behavior hardcode, but it should get us running.

> +    def query_tests(self):
> +        """Determine if we have tests to run
> +        """
> +        if self.tests is not None:
> +            return self.tests
> +        c = self.config
> +        if c['use_talos_json']:
> +            if not c['suite']:
> +                self.fatal("Can't use_talos_json without a --suite!")
>
> Not sure if this is possible and guessing you'll oppose the idea, but
> I'd prefer checking this ahead of time versus dying here.

We actually do.
You call preflight_generate_config() in __init__() if 'generate-config' is in the action list.
preflight_generate_config() calls query_tests().

So we hit this fatal at the end of __init__().

> +            try:
> +                self.tests = talos_config['suites'][c['suite']]['tests']
> +            except KeyError, e:
> +                self.error("Badly formed talos_json for suite %s: %s"
> % (c['suite'], str(e)))
>
> It'd be nice if the error more pointed to what was wrong and how to
> fix it.

Fixed.
http://hg.mozilla.org/users/asasaki_mozilla.com/mozharness/rev/b086dc0db14d

> +        elif c['tests']:
> +            self.tests = c['tests']
>
> So if you specify config['tests'] and ['use_talos_json'] the latter
> will win silently?

Yup.
We can reverse this if it makes sense, or document the rules about how the tests are determined given various settings, or log something.

I had to decide on a default to get this working, though, and this is it.
My logic: it doesn't really make sense to run a suite and then change the test list to a set of tests that don't match that suite's settings at all.
If this is a bad default, though, we can change it.

> +        # Ignore these tests, specifically so we can not run a11yr on osx
> +        if c.get('ignore_tests'):
> +            for test in c['ignore_tests']:
> +                if test in self.tests:
> +                    del self.tests[self.tests.index(test)]
>
> I don't really like this approach.

I was trying to skip a11yr on osx without affecting windows or linux.

My alternative was having a lot more platform-specific config in talos.json, or explode the number of talos mozharness config files, or some other solution, and this was the least ugly.

Since it turns out we don't have to skip a11yr for osx, we could cut this code, but we can keep it if we think we might have different sets of tests per platform in a suite again.

> +    def query_pagesets_url(self):
> +        if self.pagesets_url:
> +            return self.pagesets_url
> +        if self.query_talos_json_config():
> +            self.pagesets_url =
> self.talos_json_config['suites'][self.config['suite']].get('pagesets_url')
> +            return self.pagesets_url
>
> I find it a bit ambiguous why sometimes the talos.json wins and other
> times the script config values win.

Yeah, I think it's just different days of writing this, and not seeing the need to override some things via config file and needing to override others.

As above, I wrote something that works that we can fix or document or log or w/e.

> +    def query_pagesets_parent_dir_path(self):
> +        if self.pagesets_parent_dir_path:
> +            return self.pagesets_parent_dir_path
> +        if self.query_talos_json_config():
> +            self.pagesets_parent_dir_path =
> self.talos_json_config['suites'][self.config['suite']].
> get('pagesets_parent_dir_path')
> +            return self.pagesets_parent_dir_path
>
> Can you explain what this is supposed to do?  A docstring might help
> (that goes for other functions too).

This, and the query_pagesets_manifest_path, were due to this:

> +        The real fix here may be a --tpmanifest option for PerfConfigurator.

Joel said I'd have to copy the tp5.manifest from the webroot into the talos root.  To do so, I'd either have to hardcode some paths, or get it from, say, talos.json.  I chose to do the latter.

Added some docstrings.
http://hg.mozilla.org/users/asasaki_mozilla.com/mozharness/rev/72d0201077a1

>          # talos expects tests to be in the format (e.g.) 'ts:tp5:tsvg'
> -        tests = kw_options['activeTests']
> -        if not isinstance(tests, basestring):
> +        tests = kw_options.get('activeTests')
> +        if tests and not isinstance(tests, basestring):
>              tests = ':'.join(tests) # Talos expects this format
>              kw_options['activeTests'] = tests
>
> Why would you want to specify this as a basestring?  At some point the
> talos handling of this will change (probably, in fact, right after
> mozharness becomes the production canon).

I think that's your code that originally checked for basestring?
http://hg.mozilla.org/build/mozharness/rev/c5249a886c6d#l2.196

My change here was left over from when I was allowing for tests to be specified from --addOption instead of c['tests'] or talos.json (this was pre-talos.json)

> +    def _download_unzip(self, url, parent_dir):
> +        zipfile = self.download_file(url, parent_dir=self.workdir,
> +                                     error_level=FATAL)
> +        command = self.query_exe('unzip', return_type='list')
> +        command.extend(['-q', zipfile])
> +        self.run_command(command, cwd=parent_dir, halt_on_failure=True)
>
> This seems like something that should go somewhere elsewhere in
> mozharness more generic than the talos script.

Yeah, if this is useful to other scripts in this form, I'm happy to move at that point.
Same with what we did with pull(), and clobber(), and various other methods.

> +    def create_virtualenv(self, **kwargs):
> +        """VirtualenvMixin.create_virtualenv() assuemes we're using
> +        self.config['virtualenv_modules'].  Since we're overriding talos_url
> +        when using the talos json, we have to wrap that method here."""
> +        if self.query_talos_json_config():
> +            talos_url = self.query_talos_url()
> +            virtualenv_modules = self.config.get('virtualenv_modules', [])
> +            if 'talos' in virtualenv_modules:
> +                i = virtualenv_modules.index('talos')
> +                virtualenv_modules[i] = {'talos': talos_url}
> +                self.info(pprint.pformat(virtualenv_modules))
> +            return super(Talos,
> self).create_virtualenv(modules=virtualenv_modules)
>
> This also seems a bit magic to me.  I realize the reason you want to
> lock config, but I hate to see code that works around this.  I would
> much rather see this done, I don't know, earlier vs having a specialty
> override for it.

You can't always do this earlier.
I'm getting the json early on here, but sometimes you can't get the information you need until you, say, download and extract a zip, or clone a large repo, or something else like that.  Forcing large or time-consuming steps like that into __init__() is bad as well.

I think allowing create_virtualenv() to not be hardcoded to use self.config['virtualenv_modules'], but allow those modules to be overridden depending on need (but use that as a default), is fine.  If it's an action, it'll be called without any kwargs.  If it's called as a non-action method, like we're doing here in the super(), you can override the modules.

I'm definitely weighing the config lock's usefulness as we add to mozharness.  Currently I still think it's more useful than not.  Maybe reviews would help lower the need, and maybe at some point we'll change ReadOnlyDict to LoudlyComplainingDict.  I'm not fully convinced yet, though.

A lot of the complexity here is due to the fact that we're getting our config from three places.  The mozharness config (which is itself script config + config file + command line options), the buildprops.json file which we need to be able to specify which branch/build to test, and talos.json which allows us to keep the config for these suites in-tree.

buildprops.json may go away in distant futureland, when we move away from buildbot entirely.  I'm not sure how to get talos.json's advantages without its added complexity.

> +    def postflight_create_virtualenv(self):
> +        """ This belongs in download_and_install() but requires the
> +        virtualenv to be set up :(
> +
> +        The real fix here may be a --tpmanifest option for PerfConfigurator.
> +        """
>
> Could you ticket, please?  We're likely to completely rearrange how
> talos does all of this soon anyway.  We would like to get away from
> e.g. 'tp5' being some magical thing with magical defaults, etc, and
> having what a test is being defined in a more rigorous way. (Note that
> it is mostly already not true that tp5 == tp5, but we pretend that it
> is and we severely pay for it in development cost and frustration.)

I'd love to see a --tpmanifest option.  bug 795172

> +            self.fatal("Can't figure out symbols_url from
>      installer_url %s!" % self.installer_url)
>
> Since .fatal (AIUI) kills the run anyway, is there any point in having
> an else clause?

Altered in http://hg.mozilla.org/users/asasaki_mozilla.com/mozharness/rev/7bcf21de7454

> +#os.environ['PYTHONUNBUFFERED'] = "1"
>
> Ummm ?  I'd prefer this not be checked in.

Testing cruft, removed.

> I don't really like all of the whitespace reduction, as it makes it hard to
> read for me, but I guess we'll have to agree to disagree on that (I'm also
> not a huge fan of the vi folding markers).

Yeah, I have the opposite opinion.  Whitespace lines in a long method make seeing method breaks harder, unless spacing between methods gets larger.  Maybe it's our editor settings or maybe it's just personal preference.

I generally try to leave this type of stuff alone, but since I spent a significant amount of time in here and rewrote large swaths of it, I made it easier for me to read and navigate while doing so.
Comment on attachment 665556 [details] [diff] [review]
[custom] generate the mozharness talos factory if mozharness_talos is set

Review of attachment 665556 [details] [diff] [review]:
-----------------------------------------------------------------

::: misc.py
@@ +1853,5 @@
>  
>                      builddir = "%s_%s_test-%s" % (branch, slave_platform, suite)
>                      slavebuilddir= 'test'
> +                    if branch_config.get('mozharness_talos') and not platform_config.get('is_mobile'):
> +                        extra_args = ['--suite', suite,

I'm assuming this lines up with talos.json as you use it and the mozharness code properly.

Note that with this patch we still have to identify the talos suites we want to run jobs for on a given branch.

@@ +1866,5 @@
> +                            extra_args.extend(['--cfg', 'talos/windows_config.py'])
> +                        elif 'mac' in platform:
> +                            extra_args.extend(['--cfg', 'talos/mac_config.py'])
> +                        else:
> +                            extra_args.extend(['--cfg', 'talos/linux_config.py'])

nit, I'd rather test for linux and throw if we have an unknown (so we know we have a problem when we're enabling any new platform types)
Attachment #665556 - Flags: review?(bugspam.Callek) → review+
Attachment #665551 - Flags: review?(bugspam.Callek) → review+
Comment on attachment 665551 [details] [diff] [review]
[configs] enable mozharness talos on cedar

http://hg.mozilla.org/build/buildbot-configs/rev/b8986f07f6f2
Attachment #665551 - Flags: checked-in+
Comment on attachment 665556 [details] [diff] [review]
[custom] generate the mozharness talos factory if mozharness_talos is set

Yeah, I've got this working in staging with talos.json.

Added an assert for ya. http://hg.mozilla.org/build/buildbotcustom/rev/9966755da4cb
Attachment #665556 - Flags: checked-in+
No longer depends on: 794587
This is now live.
Depends on: 795526
Live on Cedar and green on all platforms.

Bug 713055 tracks issues that block rolling out to m-c and all project branches.

Resolving.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Component: Other → Mozharness
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: