reduce the usage of configuration.py in talos

RESOLVED FIXED

Status

defect
RESOLVED FIXED
4 years ago
Last year

People

(Reporter: jmaher, Assigned: parkouss)

Tracking

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox41 affected)

Details

Attachments

(3 attachments, 5 obsolete attachments)

configuration.py served a great purpose for us, but in reality it is overkill.  Some good work was done a couple of months ago to convert from optparse to argparse.

I believe we can reduce more of the code, if not remove it all.  Even if we have to move some functions into perfconfigurator.py.

My goal here is to make the code function like other harnesses without custom bits which are not as easy to hack on.

Current usage:
* perfconfigurator.py cli arguments, then output to .yml file with all the options (defaults + user specified)
* input a .config file that overloads the default values, but the cli values will overload the .config file values
* run_tests.py will read from .yml file and run

Ideal usage:
* run_tests.py does everything- no need to read/write .yml or .config files
* we could remove talos/*.config as well (xperf.config is references in talos.json: https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos.json?from=talos.json&case=true#153)

Given the ideal usage, we would then have simplified talos greatly- if we don't care about reading/writing the config, then we can remove a lot of the code and logic in configuration.py and perfconfigurator.py !!
I think we can achieve this goal!

It will be a lot easier to refactor the configuration code once android support will be dropped, because then the PerfConfigurator.py file will only be used to write configuration, and no more interact with devices or webservers.

At least this is what we should do while rewriting this part of the code, just make it generate configuration from the command line and nothing more. :)
Depends on: 1182429
This is the first step - remove the Perfconfigurator.py call from harness.

This patch just do that, and I believe it works well since run_tests.py reuse PerfConfigurator options machinery (and this is what we use when running locally). We'll see on try.

Once we have that, we can work out to remove configuration.py/PerfConfigurator.py easily, since the only entry point will be run_tests.py (ie the "talos" command line when installed in virtual env).
Assignee: nobody → j.parkouss
Status: NEW → ASSIGNED
Attachment #8639713 - Flags: review?(jmaher)
Depends on: 1187684
No longer depends on: 1182429
Comment on attachment 8639713 [details] [diff] [review]
remove_perfconfigurator_call_from_harness.patch

Review of attachment 8639713 [details] [diff] [review]:
-----------------------------------------------------------------

this is simple, please test this well and lets get it in!
Attachment #8639713 - Flags: review?(jmaher) → review+
Blocks: 1188756
Blocks: 797624
Blocks: 747187
Posted patch remove_PerfConfigurator.patch (obsolete) — Splinter Review
(nice, previous run is green, will land that later today!)

So, this is the talos side patch. Removing perfconfigurator.py!

Well this is not perfect (I suppose we could rewrite that in a better way) but given the constraints of refactoring, I find it nice. This remove around 1000 lines of code. :)

So this works well locally. The patch HAVE TO be applied on top of the one in bug 1188763 (because I changed here bcontroller yaml to json) - and of course on top of the removing android stuff.
Attachment #8640352 - Flags: review?(jmaher)
Posted patch remove_PerfConfigurator.patch (obsolete) — Splinter Review
I forgot to remove PyYAML from requirements.txt. :) Now the patch is complete.
Attachment #8640352 - Attachment is obsolete: true
Attachment #8640352 - Flags: review?(jmaher)
Attachment #8640408 - Flags: review?(jmaher)
Comment on attachment 8640408 [details] [diff] [review]
remove_PerfConfigurator.patch

Review of attachment 8640408 [details] [diff] [review]:
-----------------------------------------------------------------

overall I like this.  A huge win.  As for my comments, I am not sure if you simplified things by ensuring all options are on the cli.  If that is the case, then ignore my comments except for the ones that we can get rid of.  Most likely we can file followup bugs to remove options throughout talos which are not used.

::: talos/config.py
@@ +54,5 @@
> +    add_arg('--noChrome', action='store_true',
> +            help="do not run tests as chrome")
> +    add_arg('--rss', action='store_true',
> +            help="Collect RSS counters from pageloader instead of the"
> +                 " operating system")

we might be able to get rid of --noChrome and --rss as cli options, possibly noChrome altogether

@@ +59,5 @@
> +    add_arg('--mainthread', action='store_true',
> +            help="Collect mainthread IO data from the browser by setting"
> +                 " an environment variable")
> +    add_arg("--mozAfterPaint", action='store_true', dest="tpmozafterpaint",
> +            help="wait for MozAfterPaint event before recording the time")

both mainthread and mozAfterPaint can be internal variables only

@@ +68,5 @@
> +    add_arg('--spsProfileEntries', dest="sps_profile_entries", type=int,
> +            help="How many samples to take with the profiler")
> +    add_arg('--extension', dest='extensions', action='append',
> +            default=['${talos}/talos-powers', '${talos}/pageloader'],
> +            help="Extension to install while running")

we don't ever specify an extension on the command line.

@@ +70,5 @@
> +    add_arg('--extension', dest='extensions', action='append',
> +            default=['${talos}/talos-powers', '${talos}/pageloader'],
> +            help="Extension to install while running")
> +    add_arg('--fast', action='store_true',
> +            help="Run tp tests as tp_fast")

I think fast can go away for good

@@ +76,5 @@
> +            help="Path to the symbols for the build we are testing")
> +    add_arg('--xperf_path',
> +            help="Path to windows performance tool xperf.exe")
> +    add_arg('--test_timeout', type=int, default=1200,
> +            help="Time to wait for the browser to output to the log file")

we never use this as a cli argument, and I think with android going away we can default to a const in the code.

@@ +85,5 @@
> +            default=os.path.abspath('browser_failures.txt'),
> +            help="Filename to store the errors found during the test."
> +                 " Currently used for xperf only.")
> +    add_arg('--noShutdown', dest='shutdown', action='store_true',
> +            help="Record time browser takes to shutdown after testing")

we don't use logFile errorFile or noShutdown from the command line, and noShutdown could probably go away altogether.

@@ +90,5 @@
> +    add_arg('--setPref', action='append', default=[], dest="extraPrefs",
> +            metavar="PREF=VALUE",
> +            help="defines an extra user preference")
> +    add_arg('--webServer', dest='webserver',
> +            help="address of the webserver hosting the talos files")

setPref is not used on the cli, but internally.
webServer won't be needed on the cli, --develop can handle it all

@@ +99,5 @@
> +    add_arg('--responsiveness', action='store_true',
> +            help="turn on responsiveness collection")
> +    add_arg('--filter', dest='filters', action='append',
> +            help="filters to apply to the data from talos.filters [DEFAULT:"
> +                 " ignore_max, median]")

both responsiveness and filter are not needed on the cli.

@@ +113,5 @@
> +    add_arg('--tppagecycles', type=int,
> +            help='number of pageloader cycles to run for each page in'
> +                 ' the manifest')
> +    add_arg('--tpdelay', type=int,
> +            help="length of the pageloader delay")

I don't think we use tpdelay at all
Attachment #8640408 - Flags: review?(jmaher) → review+
(In reply to Joel Maher (:jmaher) from comment #7)
> Comment on attachment 8640408 [details] [diff] [review]
> remove_PerfConfigurator.patch
> 
> Review of attachment 8640408 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> overall I like this.  A huge win.  As for my comments, I am not sure if you
> simplified things by ensuring all options are on the cli.  If that is the
> case, then ignore my comments except for the ones that we can get rid of. 
> Most likely we can file followup bugs to remove options throughout talos
> which are not used.

Yes, I would prefer to create a followup bug for that, I think it will be easier. There might be some tricky issues when we will remove useless options, so let's keep things simple and do one thing at a time!

Nice list though, we should link to comment 7 from the new bug.
Posted patch remove_PerfConfigurator.patch (obsolete) — Splinter Review
updating the patch with unit tests fixes.
Attachment #8640408 - Attachment is obsolete: true
Attachment #8640443 - Flags: review+
Carrying r+ from previous review by jmaher (comment 3).

Try push is green, done in comment4.
Attachment #8639713 - Attachment is obsolete: true
Attachment #8640450 - Flags: review+
Pleas land only  remove_perfconfigurator_call_from_harness.patch, the other patch is for the external talos repo.
for reference we have a few issues, the obvious ones are "from talos[.config] import <module>", and with the latest changes in mozharness we don't have a talos scope.
So we have (at least) two issues:

- bug 1189021 - this causes the import errors.
- mozharness run_tests.py call still use the --output option (that was used to write the talos.yml file) and so the script fails with unknown option error.

I propose to temporary use import statements that work for mozharnes - we will fix the complete issue in bug 1189021.
Not yet, this time a -v option that I removed also...

This time I am more confident, I tried with local mach talos-test command (grr I should have done that before - plus I wrote the the patch to make it works again!)

New and hopefully last try:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=37746934a730
Ok, this is better! Still xperf builds are failing with:

run_tests.py: error: unrecognized arguments: C:/slave/talos-data/talos/xperf.config

it looks like we have a talos/xperf.config file in the talos repo. it is empty. I suspect this is the one referenced in testing/talos/talos.json.

So this mozharness patch should fix that.
Attachment #8640931 - Flags: review?(jmaher)
Posted patch remove_PerfConfigurator.patch (obsolete) — Splinter Review
So this is the patch for talos, with import "fixes" and removing the xperf.config empty file.

This patch worked great on the try below. Except for:

x tests: probably only a harness issue. see previous comment
g2 tests: I don't understand why. so I backed out this patch, and run again one of the g2 tests, to be sure that this patch is the cause - or if it is something else. We'll see soon.
Attachment #8640443 - Attachment is obsolete: true
Attachment #8640956 - Flags: review+
Ok, then the new job is in state:

44 mins overdue, typically takes ~ 17 mins

Typically the patches here are probably not the cause. I suspect a regression from either:

2beaa8352bb4
574da36e8723
da09edf6d984

I will wait for confirmation the complete fail of the job, then will try to backout these changes one after the other to find the issue.
Oh, I can confirm that da09edf6d984 seems to be the regression cause. I successively backed out:

2beaa8352bb4 - busted
574da36e8723 - busted
da09edf6d984 - completed OK.

You can see that in https://treeherder.mozilla.org/#/jobs?repo=try&revision=37746934a730 - LInux opt, g2 builds.

The first one was up 2 date talos + talos patch from this bug
the second is a user cancel
Then I successivelly backed out this patch, 2beaa8352bb4, 574da36e8723, da09edf6d984 - and only the last build is working.
Flags: needinfo?(jmaher)
(note that in treeherder the order is based on the build end - so look at the Started date of the job for to be in sync with comment 21)
Comment on attachment 8640931 [details] [diff] [review]
remove_obsolete_mozharness_talos_options.patch

Review of attachment 8640931 [details] [diff] [review]:
-----------------------------------------------------------------

great patch!
Attachment #8640931 - Flags: review?(jmaher) → review+
I am aware that :BenWa had landed these changes and it needs a fresh copy of the tree.  He said there might be some issues which I am confirming and will backout if needed and confirm the fix.  Thanks for bringing it up!
Flags: needinfo?(jmaher)
So, I tried the patch for mozharness (comment 23):

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b934f20346f1

All green.

Now I just started a push with the same change, but this time on talos up to date (I saw you backouted the changesets), with the patch above applied (comment 19):

https://treeherder.mozilla.org/#/jobs?repo=try&revision=3815f18f8938
This is it, fixed!

So the fix was to replace 'bcontroller.json' to '${talos}/bcontroller.json'. Now everything seems to works fine, including xperf tests.

Note that I ran the test on top of your backout (:BenWa wrote a new patch, but I'm afraid to test that!)

Anyway this means that this patch is all good on try - so land it at your convenience, or I can do that later today.
Attachment #8640956 - Attachment is obsolete: true
Attachment #8641586 - Flags: review+
talos bits landed:
https://hg.mozilla.org/build/talos/rev/544cb93509a3

all that is left is to deploy the talos bits on trunk.
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Depends on: 1190053
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.