Closed Bug 895451 Opened 11 years ago Closed 11 years ago

Automate Talos setup & running Talos on the tree

Categories

(Testing :: Talos, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla25

People

(Reporter: BenWa, Assigned: jyeo)

References

Details

Attachments

(3 files, 4 obsolete files)

I've had a few discussions on IRC and here's what I've learned:
(1) We can add a make talos target to testsuite-targets.mk. Similar to Bug 688604. This is very easy to do (I have a patch in fact) but it means patching Makefile which we're deprecating.
(2) The 'right thing' to do is setup mozharness but there several limitations that make this a long term project (which I may be wrong):
** I believe the plan is having mozharness in the tree. This means either moving the repo or solving logistics to do a merges been m-c mozharness and the standalone mozharness repo.
** mozharness currently requires a package firefox build which is copies. It need to support a dist/ path for lower turn around times and debugging.

I'd like for us to automate Talos setup, running and ideally expose this functionality via Mach. How long will integrating mozharness? If this wont be done this quarter I'd like for us to continue forward with (1) even if it's a temporary patch. I imagine at some point we will port all of 'testsuite-targets.mk' which should be easy to add this additional support for Talos. Since we'd be exposing this via Mach then we wouldn't disrupt how people run it in the mean time and could get this done immediately.
I am fine with option #1, I don't know the timeline for mozharness, but I have cc'd some folks who might know.

aki?
Flags: needinfo?(aki)
I would prefer we not add anything new to testsuite-targets.mk, especially if the make target will just behave like a shell script (which I suspect is the case here).

If integrating mozharness is truly too much work (would it be too difficult to clone the mozharness repo and set up a custom config file?), my vote is to provide a new mach command(s) to set up and run Talos.
I guess the shell script ish command could clone mozharness, then reference a config.  I suspect that will take a while longer to get sorted out, but it might be a good approach until mozharness is in the tree for running tests locally.

Alternatively I am fine with a mach command, I suspect that is the safest route to go.
(In reply to Gregory Szorc [:gps] from comment #2)
> If integrating mozharness is truly too much work (would it be too difficult
> to clone the mozharness repo and set up a custom config file?)

The problem is that mozharness needs more work to be usable by developers even once the repo is cloned. I think if we just throw in a command that clones mozharness and only use it for talos we're not gaining any value from mozharness and inheriting its limitations. What we really want is integrate mozharness so that it can used to replace 'testsuite-targets.mk' which is a much bigger project.

In the meantime the GFX team has pressure in improving our performance tests so I want to solve the pain points ASAP. I'd be happy to rebind a mach command to use mozharness once it's properly integrated.
(In reply to Joel Maher (:jmaher) from comment #1)
> I am fine with option #1, I don't know the timeline for mozharness, but I
> have cc'd some folks who might know.
> 
> aki?

I think we don't know the status of using mozharness talos locally until we try it and see what issues we hit.  I don't think that's been a focus for jyeo and armenzg, but it's more likely turning certain things off (datazilla/graphs publishing, plus pointing to a local built objdir rather than downloading the build, f/e) than anything else.
Flags: needinfo?(aki)
I see some changes needed in mozharness:
1) create developer versions of configs/talos/*py
  1.a) pretty much copy/paste plus some modifications OR read the original ones and overwrite the necessary one
2) add option for path to binary
3) reduce "all_actions" to "clone-talos", "create-virtualenv" and "run-tests"
  3.a) if the talos repo exists do not clone
  3.b) if the venv exists do not recreate

NOTE: I believe we should get the cloning talos step out of "download-and-extract" into its own step
Attachment #778261 - Flags: review?(aki)
Attachment #778261 - Flags: review?(aki) → review+
Assignee: nobody → yshun
Attached patch Added Talos Mozharness to mach (obsolete) — Splinter Review
Attachment #778591 - Flags: feedback?(bgirard)
Attachment #778591 - Flags: feedback?(armenzg)
Attachment #778591 - Flags: feedback?(aki)
(In reply to Jason Yeo [:jyeo] from comment #7)
> Created attachment 778261 [details] [diff] [review]
> add_step.diff

Landed on mozharness default and merged to production.
Attachment #778261 - Attachment description: add_step.diff → [checked-in] add_step.diff
To use the mach command, you need to do:

    mach build

and then

    mach talos-test {chromez | dromaeojs | other | dirtypaint | svgr | tp5o | xperf}.
Comment on attachment 778591 [details] [diff] [review]
Added Talos Mozharness to mach

It run for me. I don't know mach to know if it is the right way of doing it.

NOTE: I had to close Nightly to make it run
Attachment #778591 - Flags: feedback?(armenzg) → feedback+
Comment on attachment 778591 [details] [diff] [review]
Added Talos Mozharness to mach

I'm not familiar with mach, and I only eyeballed this patch.  But it looks good from here.  The only thing might be changing the mozharness_rev to 'production' ... and --download-symbols might not work if it's a local build.
jyeo++
Attachment #778591 - Flags: feedback?(aki) → feedback+
Attachment #778591 - Attachment is obsolete: true
Attachment #778591 - Flags: feedback?(bgirard)
Attachment #779168 - Flags: feedback?(bgirard)
Attachment #779168 - Flags: review?(gps)
Comment on attachment 779168 [details] [diff] [review]
Added Talos Mozharness command to mach

Review of attachment 779168 [details] [diff] [review]:
-----------------------------------------------------------------

This patch makes me so happy.

I have a number of low-level nits. Nothing too major aside from 'build' shouldn't be in the global sys.path list.

I envision that we'll want to eventually split out the "run a mozharness script" functionality into its own reusable Python module. But, I think that's the territory of a followup bug. It's certainly not something that should delay this awesomeness from landing \o/.

::: build/mach_bootstrap.py
@@ +30,5 @@
>      'python/mozbuild',
>      'python/blessings',
>      'python/psutil',
>      'python/which',
> +    'build',

I /think/ this will break things. Specifically, there are multiple automationutils.py files in the objdir and adding build to sys.path will cause some test harnesses to pick up build's automationutils.py instead of the one in the test harness. This is silly, I know.

This is one reason why I want as much Python as possible to exist in actual Python packages and not in random .py files.

The module you are importing (build/util/hg.py) IMO should not have been imported into build/util. But, I don't want fixing this to slow you down. Simply perform sys.path munging inside the @Command method and import hg there.

::: testing/talos/mach_commands.py
@@ +1,5 @@
> +# This Source Code Form is subject to the terms of the Mozilla Public
> +# License, v. 2.0. If a copy of the MPL was not distributed with this
> +# file, You can obtain one at http://mozilla.org/MPL/2.0/.
> +
> +# Integrates Talos mozharness with mach

Please add a "from __future__ import unicode_literals" as the first executed statement in the file (unless it breaks Talos). This helps Python 3 compatibility because it forces you to think about bytes vs unicode for strings.

@@ +30,5 @@
> +        2. Make config for Talos Mozharness
> +        3. Run mozharness
> +        """
> +
> +        print "Running Talos test suite %s" % params['suite']

Nit: Use print() function, not statement (Python 3 compatibility).

@@ +39,5 @@
> +        self.make_args()
> +        return_code = self.run_mozharness()
> +        return return_code
> +
> +    def init_variables(self, params):

It seems like this is something better put in a constructor. But, overloading __init__ for a MozbuildObject is kinda annoying, so I'll let this pass.

@@ +49,5 @@
> +        self.mozharness_dir = os.path.join(self.talos_dir, 'mozharness')
> +        self.config_dir = os.path.join(self.mozharness_dir, 'configs', 'talos')
> +        self.talos_json = os.path.join(self.talos_dir, 'talos.json')
> +        self.config_filename = 'in_tree_conf.json'
> +        self.binary_path = os.path.join(self.bindir, 'firefox')

This assumes the built application is browser, which may not always be true.

Also, MozbuildObject exposes a get_binary_path() API!

@@ +59,5 @@
> +    def clone_mozharness(self):
> +        """Clones mozharness into topsrcdir/testing/talos/mozharness
> +           using mercurial."""
> +        mercurial(self.mozharness_repo, self.mozharness_dir,
> +            self.mozharness_rev)

Must we clone into topsrcdir? IMO topsrcdir should be read-only if possible.

@@ +104,5 @@
> +    def run_mozharness(self):
> +        command = [self.python_interp] + self.args
> +        print command
> +        return_code = subprocess.call(command)
> +        return return_code

return subprocess.call(command)

Note that we also have a self.run_process() helper function. This comes from python/mach/mach/mixin/process.py. It uses mozprocess and can do things such as clean up child processes.

That being said, I'm actually tempted to say we should just invoke the mozharness script via native Python! Add the path to sys.path, import the script as a module, and go. That opens the door for cool behavior such as receiving callbacks when things happen inside mozharness (e.g. progress bars). But, this could always be done later.

@@ +108,5 @@
> +        return return_code
> +
> +@CommandProvider
> +class MachCommands(MachCommandBase):
> +    mozharness_repo = 'http://hg.mozilla.org/build/mozharness'

https, please.

@@ +119,5 @@
> +        help='The mozharness repository to clone from. Defaults to '
> +             'http://hg.mozilla.org/build/mozharness')
> +    @CommandArgument('--rev', default=mozharness_rev,
> +        help='The mozharness repository to clone from. Defaults to '
> +             'default')

Help message doesn't match mozharness_rev value.

@@ +122,5 @@
> +        help='The mozharness repository to clone from. Defaults to '
> +             'default')
> +    def run_talos_test(self, **params):
> +        talos = self._spawn(TalosRunner)
> +        return talos.run_test(**params)

It's a Python best practice to avoid ** if the arguments are known. This ensures APIs are more well-defined. This prevents things like accidentally passing arguments that don't actually do anything.

They are in this case, so you should:

def run_talos_test(self, repo=None, rev=None):
  talos = self._spawn(TalosRunner)
  return talos.run_test(repo=repo, rev=rev)

There are a few other places in this file where you pass a dict where I think you really want named arguments.
Attachment #779168 - Flags: review?(gps) → feedback+
- Removed 'build' from global sys.path. I decided to call mercurial directly. I don't want to depend on Bug 763903 as it's not fixed and it's a moving target. Let's get this landed asap!
- Added from __future__ import unicode_literals, print_function
- Changed print statements to print functions
- Changed 'firebox' to self.get_binary_path()
- Changed subprocess to self.run_process
- Removed **params. Using named arguments instead.
Attachment #779168 - Attachment is obsolete: true
Attachment #779168 - Flags: feedback?(bgirard)
Attachment #779961 - Flags: review?(gps)
Using import Talos instead of run_process.
Attachment #779961 - Attachment is obsolete: true
Attachment #779961 - Flags: review?(gps)
Attachment #780416 - Flags: review?(gps)
Comment on attachment 780416 [details] [diff] [review]
Added Talos Mozharness command to mach v3

Review of attachment 780416 [details] [diff] [review]:
-----------------------------------------------------------------

I trust you will address the nits before r+.

::: testing/talos/mach_commands.py
@@ +6,5 @@
> +
> +from __future__ import print_function, unicode_literals
> +
> +import os
> +import subprocess

subprocess is unused.

If you install my Mercurial extension at https://hg.mozilla.org/users/gszorc_mozilla.com/hgext-gecko-dev, it will automatically notify you of Python style violations when you run |hg qref|.

@@ +121,5 @@
> +        sys.path.append(self.mozharness_dir)
> +        from mozharness.mozilla.testing.talos import Talos
> +        talos_mh = Talos(config=self.args['config'],
> +                         initial_config_file=self.args['initial_config_file'])
> +        talos_mh.run_and_exit()

return talos_mh.run()

Just in case the caller wants to do something after this (like chain multiple invocations together).

@@ +130,5 @@
> +    mozharness_rev = 'production'
> +
> +    @Command('talos-test', category='testing',
> +             description='Run talos tests.')
> +    @CommandArgument('suite', help='Talos test suite to run.')

We should document what the valid suites are. A goal of |mach help| is for it to be self-supporting: you shouldn't need information outside of |mach help| to know how to invoke a command.

Note that you can pass choices=['foo', 'foo'] to limit the options to a subset of values. However, a side-effect of this is that new test suites will need to be added here in order for us to invoke them. You may just want to stick to enumerating the strings in the help string.

@@ +143,5 @@
> +
> +        try:
> +            talos.run_test(suite, repo, rev)
> +        except Exception as e:
> +            print(str(e))

return 1 if there is an error so exit code is proper.
Attachment #780416 - Flags: review?(gps) → review+
- Added return statements.
- Removed unused import.
- Added valid list of suites to help message.

Propagating r+.
Attachment #781701 - Flags: review+
Attachment #780416 - Attachment is obsolete: true
Fixed "return talos_mh.run()"
and landed it:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d08133d025b3

Do we have to add any documentation anywhere about this feature?

I guess a post to dev.platform is also due.
https://hg.mozilla.org/mozilla-central/rev/d08133d025b3
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
Sorry I was away and didn't get a chance to provide feedback. I just pulled and got an error on the following:
~/ssd-mozilla/mozilla-central/tree> mach.sh talos-test svgr
Running Talos test suite svgr
 0:00.14 /usr/bin/hg pull -r production -u
Invalid action clone-talos not in ['clobber', 'read-buildbot-config', 'download-and-extract', 'create-virtualenv', 'install', 'run-tests']!

Maybe I'm missing something?
I did './mach talos-test other' and it ran the other tests.

One thing we really should do is allow for mach to run any talos test, not just the suites (collection of tests) that we run on buildbot.
I got the same error with './mach talos-test other'.
are you using hg for source tree management?
(In reply to Joel Maher (:jmaher) from comment #24)
> are you using hg for source tree management?

Yes, I have git installed but mozilla-central is cloned using hg:

~/ssd-mozilla/mozilla-central/tree> hg summary
parent: 140387:3d40d270c031 tip
 Merge m-c to fx-team.
branch: default
commit: 453 unknown (clean)
update: (current)
mq:     170 unapplied
(In reply to Benoit Girard (:BenWa) from comment #21)
> Sorry I was away and didn't get a chance to provide feedback. I just pulled
> and got an error on the following:
> ~/ssd-mozilla/mozilla-central/tree> mach.sh talos-test svgr
> Running Talos test suite svgr
>  0:00.14 /usr/bin/hg pull -r production -u
> Invalid action clone-talos not in ['clobber', 'read-buildbot-config',
> 'download-and-extract', 'create-virtualenv', 'install', 'run-tests']!
> 
> Maybe I'm missing something?

I looked at BenWa's sys.path and noticed that he has an old version of mozharness installed on his system as an egg. mach was importing the old mozharness all along. He might need to uninstall the old mozharness so that mach would pick up the one that is cloned into his objdir.

BenWa, is everything working for you now?
Flags: needinfo?(bgirard)
Attached file talos_raw.log
Running ./mach talos-test svgr I get this log; of note:

18:48:07    ERROR -  Traceback (most recent call last):
18:48:07     INFO -    File "/home/jhammel/mozilla/src/obj-browser/mozharness/venv/bin/talos", line 9, in <module>
18:48:07     INFO -      load_entry_point('talos==0.0', 'console_scripts', 'talos')()
18:48:07     INFO -    File "/home/jhammel/mozilla/src/obj-browser/mozharness/venv/local/lib/python2.7/site-packages/talos/run_tests.py", line 337, in main
18:48:07     INFO -      sys.exit(run_tests(parser))
18:48:07     INFO -    File "/home/jhammel/mozilla/src/obj-browser/mozharness/venv/local/lib/python2.7/site-packages/talos/run_tests.py", line 291, in run_tests
18:48:07     INFO -      utils.stamped_msg("Failed %" % testname, "Stopped")
18:48:07     INFO -  ValueError: incomplete format
(In reply to Jeff Hammel [:jhammel] from comment #27)
> Created attachment 783825 [details]
> talos_raw.log
> 
> Running ./mach talos-test svgr I get this log; of note:
> 
> 18:48:07    ERROR -  Traceback (most recent call last):
> 18:48:07     INFO -    File
> "/home/jhammel/mozilla/src/obj-browser/mozharness/venv/bin/talos", line 9,
> in <module>
> 18:48:07     INFO -      load_entry_point('talos==0.0', 'console_scripts',
> 'talos')()
> 18:48:07     INFO -    File
> "/home/jhammel/mozilla/src/obj-browser/mozharness/venv/local/lib/python2.7/
> site-packages/talos/run_tests.py", line 337, in main
> 18:48:07     INFO -      sys.exit(run_tests(parser))
> 18:48:07     INFO -    File
> "/home/jhammel/mozilla/src/obj-browser/mozharness/venv/local/lib/python2.7/
> site-packages/talos/run_tests.py", line 291, in run_tests
> 18:48:07     INFO -      utils.stamped_msg("Failed %" % testname, "Stopped")
> 18:48:07     INFO -  ValueError: incomplete format

My mistake; this only happens when Firefox is already running.  I ran the command again with Firefox closed and get successful results.  We might want to figure out what to do here (explicit instructions, checking earlier, etc) since this will likely be a common complaint.
Blocks: 900116
(In reply to Jason Yeo [:jyeo] from comment #26)
> 
> BenWa, is everything working for you now?

Everything works with the patch from bug 900189 applied.
Flags: needinfo?(bgirard)
Depends on: 908051
No longer blocks: 900116
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: