Closed Bug 1272113 Opened 8 years ago Closed 7 years ago

Move AWSY to TaskCluster

Categories

(Testing :: AWSY, defect)

defect
Not set
normal

Tracking

(firefox54 fixed, firefox55 fixed)

RESOLVED FIXED
mozilla55
Tracking Status
firefox54 --- fixed
firefox55 --- fixed

People

(Reporter: jgriffin, Assigned: pyang)

References

Details

Attachments

(4 files)

Currently AWSY runs on some custom infrastructure; we'd like to migrate it to TaskCluster. Before Syd left, he did some investigation of what this would involve, his notes are here:

https://public.etherpad-mozilla.org/p/awsy_notes
I would like to see this moved forward.  We have Ubuntu 16.04 docker images on AWS- given the fact that we are measuring memory and not time, this is realistic to do on an virtualized environment.

I see a few steps required:
1) create a taskcluster test job for this
2) create a mozharness script to run this (maybe this will shoe horn into an existing test type?!?)
3) bundle up the tools/scripts to run AWSY and make them accessible (maybe tooltool- we already have tp5o.zip up there)
4) test on try, possibly split up in multiple chunks if needed
5) deploy on linux64, consider different platforms such as Android and Windows since we have AWS scalability there.
A few notes from a meeting I had with erahm and mrbkap about this last week:

To quick summarize my understanding of the current state of things:
* AWSY currently runs with e10s disabled on a Linux machine under dvander's desk in MTV
* It runs through the tp5 pageset using the Marionette harness with a mostly-vanilla profile
* The current setup is a huge bottleneck and also becoming more and more non-representative of our users
* Raw data is stored in an SQLite database on the host machine

Where we'd like to go:
* We would like to be able to better-scale AWSY and cover more platforms/configurations (e10s, e10s-multi, Win32, Win64, etc)
* We would also like to have a better data storage model for the result data instead of putting everything in a big SQLite database

My team can probably help with the data storage question and it sounds like Joel has some ideas for how to proceed with the automation side of things.
Alison, do you have availability to work on this project?
Flags: needinfo?(ashiue)
One more point: AWSY currently has data going back for more than 5 years. Preserving that data is *not* a priority, because AWSY is most useful as a regression detector rather than as a historical record.
(In reply to Ryan VanderMeulen [:RyanVM] from comment #2)

> My team can probably help with the data storage question and it sounds like
> Joel has some ideas for how to proceed with the automation side of things.

AWSY is already submitting / storing data in Perfherder, so I think we're good there -- if there's some important feature that perfherder is missing to track AWSY data, I'm not aware of it.

Of course that data only goes back so far (to the beginning of 2016), but we can always keep a mirror of the old site around as an historical reference.
(In reply to William Lachance (:wlach) from comment #5)
> (In reply to Ryan VanderMeulen [:RyanVM] from comment #2)
> 
> > My team can probably help with the data storage question and it sounds like
> > Joel has some ideas for how to proceed with the automation side of things.
> 
> AWSY is already submitting / storing data in Perfherder, so I think we're
> good there -- if there's some important feature that perfherder is missing
> to track AWSY data, I'm not aware of it.
> 
> Of course that data only goes back so far (to the beginning of 2016), but we
> can always keep a mirror of the old site around as an historical reference.

AWSY generates millions of data points per run, only a very small fraction are submitted to PerfHerder (enough to roughly duplicate areweslimyet.com's graphs, ~20). The full data set can be used to perform about:memory diffs between runs and can be used to zoom in on a data point on areweslimyet.com. 

I'm happy to change *how* we store the full dataset (and how long it's archived). To simplify things, it seems like we could just store the actual memory report json files somewhere and call it good. Arguably the detailed reports aren't going to be terribly useful past a year.
(In reply to Eric Rahm [:erahm] from comment #6)
> (In reply to William Lachance (:wlach) from comment #5)
> > (In reply to Ryan VanderMeulen [:RyanVM] from comment #2)
> > 
> > > My team can probably help with the data storage question and it sounds like
> > > Joel has some ideas for how to proceed with the automation side of things.
> > 
> > AWSY is already submitting / storing data in Perfherder, so I think we're
> > good there -- if there's some important feature that perfherder is missing
> > to track AWSY data, I'm not aware of it.
> > 
> > Of course that data only goes back so far (to the beginning of 2016), but we
> > can always keep a mirror of the old site around as an historical reference.
> 
> AWSY generates millions of data points per run, only a very small fraction
> are submitted to PerfHerder (enough to roughly duplicate areweslimyet.com's
> graphs, ~20). The full data set can be used to perform about:memory diffs
> between runs and can be used to zoom in on a data point on areweslimyet.com. 
> 
> I'm happy to change *how* we store the full dataset (and how long it's
> archived). To simplify things, it seems like we could just store the actual
> memory report json files somewhere and call it good. Arguably the detailed
> reports aren't going to be terribly useful past a year.

Sounds good to me, :rwood recently did something exactly like this for Talos (store the raw replicates used to generate the summarized data we store in Perfherder) in bug 1316692. We're thinking about exposing some kind of user interface for exploring this data (bug 1164891) so it would be good if AWSY used the same schema if that makes sense.
Flags: needinfo?(ashiue)
Paul is going to look at this.
Assignee: nobody → pyang
Hi Paul, do you have any updates here?
Flags: needinfo?(pyang)
Not much.  I talked to erahm and start to experiment on try. By our network policy we might need to move awsy into m-c but it's not blocking so far.
Flags: needinfo?(pyang)
This is blocking e10s-multi (since it's the best automated measure of all content processes and the parent). We're hoping to ride some trains in the 55 timeframe and the project is high priority. Can you prioritize this work to be higher?
Flags: needinfo?(pyang)
We're putting more resource in this and trying to catch up 55.
Leave ni? for reminder.
We have a taskcluster friendly version of awsy up at https://github.com/EricRahm/awsy-lite. This reduces the test down to a few files, mainly: a marionette test that saves memory reports and a script that parses the memory reports and spits out a perfherder friendly log entry. Ideally these will get merged in-tree.

This should work cross platform and with non-e10s and e10s(-multi).

I think at this point we just need to handle the taskcluster setup/integration portion which Paul is actively working on.
Depends on: 1341232
Could we get an update in here of what remaining piece are yet to be solved?

I see the bug for migrating code from github -> mozilla-central, from reading that bug there is work there to finish other work prior to doing that.

I believe we have:
* serve tp5 pages from a python webserver
* get tp5n.zip from tooltool
* match up the page lists used by AWSY with tp5n (possibly shared with talos)
* write a taskcluster task (definition + runner) to run this
* write other automation? (mozharness, using ./mach, etc.?)
* move code into mozilla-central

Knowing the remaining steps helps folks know the status and ask meaningful questions- as usual there are steps we do not realize until further, but this bug has been pretty silent.
Most of your list remains to be completed.

* We can serve tp5 pages from a python webserver in a standalone environemnt.
* Paul has been working on the taskcluster task but appears to have more work to do.
* I believe Paul has also been working on the mozharness script as part of the taskcluster task work.
* I believe getting tp5n.zip from tooltool will be a minor part of the taskcluster task work.
* Implementing an awsy mach command is a logical requirement I hadn't thought of before now.

* I will match up the page lists used by AWSY with tp5n.
* I will get Paul's patches to implement the taskcluster task working.
Those wip patches covers most work mentioned excepting mach command implementations and proper tp5 page match up.

Bob - please help to check if modification in awsy lib corresponds your idea, thanks!
Flags: needinfo?(pyang) → needinfo?(bob)
Attached patch modifications-v1.patch — — Splinter Review
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2a5373223522d12670a787c652a1a1fdfc3efe14&selectedJob=80625865

There was a slight conflict with more recent changes to taskcluster/ci/test/tests.yml.

Unzipping the tp5n.zip file creates a directory named tp5n. It is simpler to just use that rather than attempt to move tp5n to tp5 in my opinion. Therefore I edited the urls to use tp5n which I don't think will affect the results. The original try run using the reviewboard patches, didn't actually load any pages from what I understand since they used the tp5 subdirectory in the urls but unzipped the files into tp5n. It is problematic that we don't see that in the logs I think. Not sure how to fix.

The marionette_raw.log was over-ridden by the raw log to stdout. Trying to use the marionette_raw log instead caused errors that the output wasn't valid structured log output.

We need to pass the preferences and testvars to the marionette script. I used the files in example directory. We need to determine the proper location for these files and create the e10s and non-e10s versions.

We need to better handle the setup. I introduced a download_tp5n variable to the test vars we can use to determine how to handle the tp5n.zip installation outside of mozharness. I haven't tested this as a stand alone test.

The example testvars ran quickly since the max tabs, per tab pause and settle wait time were much smaller than the defaults. Running the full set with the default test vars set in awsy/__init__.py takes almost two hours even on my workstation. I don't believe this is a reasonable choice. We need to determine the minimum useful values in testvars.

This doesn't yet upload the results to perfherder.

We haven't yet addressed the mach command.

Let's meet up when you get into the office in the morning your time.
Flags: needinfo?(bob)
Thanks bc. Nice patch, I'll do e10s and config later.
erahm, how long do you think to run in tc is ok?  currently longest duration is about 30 minutes I think.  But awsy is pretty special.
Flags: needinfo?(erahm)
Great news on the progress!

(In reply to Bob Clary [:bc:] from comment #23)
> The example testvars ran quickly since the max tabs, per tab pause and
> settle wait time were much smaller than the defaults. Running the full set
> with the default test vars set in awsy/__init__.py takes almost two hours
> even on my workstation. I don't believe this is a reasonable choice. We need
> to determine the minimum useful values in testvars.

Unfortunately one major point of the test is to simulate a long lived session (and thus detect small, but repeated leaks). Decreasing the time reduces our ability to detect these things. On the other hand getting better coverage is a big win, so maybe we can split the difference and set iterations to 3 instead of 5 and see how that goes.
Flags: needinfo?(erahm)
Try by small url set.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=62a6a8ea5e8a46edd49bf63e7e5ccf561f920c64
Result can be triggered to build perfherder tab.
(In reply to Paul Yang [:pyang] from comment #26)
> Try by small url set.
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=62a6a8ea5e8a46edd49bf63e7e5ccf561f920c64
> Result can be triggered to build perfherder tab.

This is great, it looks like the perfherder data was ingested properly. e10s vs non-e10s seems to be working, but I'd need to check the memory reports to verify. Unfortunately the artifact upload step seems to be broken (trying to run |ll| which doesn't exist on the server), but we're super close.
Thanks, I did little cleanup and push again.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=36ed56f0cf1cf5bbc7c7361ce5c35542f25d2404

Also storing perfherder data in artifact perfherder_data.json
Comment on attachment 8841238 [details]
Bug 1272113: Adding awsy package into m-c

Code based on awsy-lite and modifying mostly for config and packaging.
Eric, can you help to feedback and also recommend config? Thanks :)
Attachment #8841238 - Flags: feedback?(erahm)
Comment on attachment 8841238 [details]
Bug 1272113: Adding awsy package into m-c

https://reviewboard.mozilla.org/r/115514/#review118758

This looks good, there are a few minor points below. Also we might want to consider just removing the `example` directory entirely and just add details on how to run locally to the `README`.

::: testing/awsy/awsy/test_memory_usage.py:76
(Diff revision 3)
> +        self._pages_to_load = self.testvars.get("entities", len(self._urls))
> +        self._iterations = self.testvars.get("iterations", ITERATIONS)
> +        self._perTabPause = self.testvars.get("perTabPause", PER_TAB_PAUSE)
> +        self._settleWaitTime = self.testvars.get("settleWaitTime", SETTLE_WAIT_TIME)
> +        self._maxTabs = self.testvars.get("maxTabs", MAX_TABS)
> +        self._resultsDir = os.path.join(os.getcwd(), "tests", "results")

I think this should just default to the upload dir, so like the perf data below in `tearDown`. Something like:

'''python
self._resultsDir = os.environ.get(
    "MOZ_UPLOAD_DIR",
    os.path.join(os.getcwd(), "tests", "results")
'''

It's possible I'm missing something from other patch  sets that take care of this.

::: testing/awsy/awsy/test_memory_usage.py:89
(Diff revision 3)
> +        self.logger.info("tearing down webservers!")
> +        self._webservers.stop()
> +        self.logger.info("processing data in %s!" % self._resultsDir)
> +        perf_blob = process_perf_data.create_perf_data(self._resultsDir)
> +        self.logger.info("PERFHERDER_DATA: %s" % json.dumps(perf_blob))
> +        dest_file = os.path.join(os.environ["MOZ_UPLOAD_DIR"], "perfherder_data.json")

We should add a fallback option if this isn't being run in automation (for local testing), something similar to what I suggested for `_resultsDir`.

::: testing/awsy/conf/testvars.json:4
(Diff revision 3)
> +{
> +  "entities": 100,
> +  "iterations": 5,
> +  "perTabPause": 1,

`perTabPause` and `settleWaitTime` need to be 30. The other option is to just comment these out (but leave them as documentation of which options can be set) and use the defaults.

::: testing/awsy/conf/testvars.json:6
(Diff revision 3)
> +{
> +  "entities": 100,
> +  "iterations": 5,
> +  "perTabPause": 1,
> +  "settleWaitTime": 1,
> +  "download_tp5n": false

I'm not clear on what this does, is this just for testing? If so we can remove it.
Attachment #8841238 - Flags: review-
Comment on attachment 8841240 [details]
Bug 1272113: Creating awsy task and mozharness script

https://reviewboard.mozilla.org/r/115518/#review118766

::: taskcluster/ci/test/tests.yml:1205
(Diff revision 3)
>  
> +awsy:
> +    description: "Are we slim yet"
> +    suite: awsy
> +    treeherder-symbol: tc-SY(sy)
> +    run-on-projects: ['mozilla-beta', 'mozilla-aurora', 'mozilla-central', 'mozilla-inbound', 'try']

We might want to disable this by default for debug builds, I'm not sure the right way to do that though. Currently AWSY is only run on opt builds.

::: testing/mozharness/scripts/awsy_script.py:133
(Diff revision 3)
> +        env = {}
> +        error_summary_file = os.path.join(dirs['abs_blob_upload_dir'],
> +                                          'marionette_errorsummary.log')
> +        output_timeout = 6500
> +
> +        cmd = ['marionette']

I imagine there's already a marionette task we can reuse, hopefully someone more familiar with task cluster can speak up.
Attachment #8841239 - Flags: review?(jmaher)
Attachment #8841240 - Flags: review?(jmaher)
Here is the result using iteration=3, perTabPause=30
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3f46e34e0156f05f3506737de9f190f2f854f021&selectedJob=81529331
Still over the boundary of 90 minutes.
I'm not sure if that is a hard limit but we certainly want to reduce overall running time on taskcluster.
Eric, do you have any suggestion?
Flags: needinfo?(erahm)
Comment on attachment 8841238 [details]
Bug 1272113: Adding awsy package into m-c

https://reviewboard.mozilla.org/r/115514/#review119034

::: testing/awsy/awsy/test_memory_usage.py:76
(Diff revision 3)
> +        self._pages_to_load = self.testvars.get("entities", len(self._urls))
> +        self._iterations = self.testvars.get("iterations", ITERATIONS)
> +        self._perTabPause = self.testvars.get("perTabPause", PER_TAB_PAUSE)
> +        self._settleWaitTime = self.testvars.get("settleWaitTime", SETTLE_WAIT_TIME)
> +        self._maxTabs = self.testvars.get("maxTabs", MAX_TABS)
> +        self._resultsDir = os.path.join(os.getcwd(), "tests", "results")

A default directory is absolutely considerable.
However I'm not sure if we use MOZ_UPLOAD_DIR.

I tried in https://treeherder.mozilla.org/#/jobs?repo=try&author=pyang@mozilla.com&selectedJob=81329413

Memory report artifacts push log files to the bottom.
Maybe we can tar them and make it cleaner?

::: testing/awsy/awsy/test_memory_usage.py:89
(Diff revision 3)
> +        self.logger.info("tearing down webservers!")
> +        self._webservers.stop()
> +        self.logger.info("processing data in %s!" % self._resultsDir)
> +        perf_blob = process_perf_data.create_perf_data(self._resultsDir)
> +        self.logger.info("PERFHERDER_DATA: %s" % json.dumps(perf_blob))
> +        dest_file = os.path.join(os.environ["MOZ_UPLOAD_DIR"], "perfherder_data.json")

Agree. I'll work on it. Thanks.

::: testing/awsy/conf/testvars.json:4
(Diff revision 3)
> +{
> +  "entities": 100,
> +  "iterations": 5,
> +  "perTabPause": 1,

Agree. I have another revision for this and will push later. thanks.

::: testing/awsy/conf/testvars.json:6
(Diff revision 3)
> +{
> +  "entities": 100,
> +  "iterations": 5,
> +  "perTabPause": 1,
> +  "settleWaitTime": 1,
> +  "download_tp5n": false

bc added this for local environment. In mozharness script we actually pull tp5 package so default is False.
Comment on attachment 8841240 [details]
Bug 1272113: Creating awsy task and mozharness script

https://reviewboard.mozilla.org/r/115518/#review119042

::: testing/mozharness/scripts/awsy_script.py:133
(Diff revision 3)
> +        env = {}
> +        error_summary_file = os.path.join(dirs['abs_blob_upload_dir'],
> +                                          'marionette_errorsummary.log')
> +        output_timeout = 6500
> +
> +        cmd = ['marionette']

This is from other mozharness script which I think it took advantage of passing output_parser.

I'll double check if any usage of marionette can do it.
Comment on attachment 8841239 [details]
Bug 1272113: Packaging awsy in build job

https://reviewboard.mozilla.org/r/115516/#review119260

thanks, this looks good
Attachment #8841239 - Flags: review?(jmaher) → review+
Comment on attachment 8841240 [details]
Bug 1272113: Creating awsy task and mozharness script

https://reviewboard.mozilla.org/r/115518/#review119264

I see a lot of references to buildbot, also I am not clear that we have tested the mac/windows runner.

::: taskcluster/ci/test/test-platforms.yml:109
(Diff revision 4)
>  windows7-32-vm/opt:
>      build-platform: win32/opt
>      test-sets:
>          - windows-vm-tests
>          - external-media-tests
> +        - awsy

does this work in win7?  if not, add it here with a # to make it a comment

::: taskcluster/ci/test/test-platforms.yml:132
(Diff revision 4)
>  windows10-64-vm/opt:
>      build-platform: win64/opt
>      test-sets:
>          - windows-vm-tests
>          - external-media-tests
> +        - awsy

does this work in win10?  if not, add it here with a # to make it a comment

::: taskcluster/ci/test/tests.yml:1209
(Diff revision 4)
>  
> +awsy:
> +    description: "Are we slim yet"
> +    suite: awsy
> +    treeherder-symbol: tc-SY(sy)
> +    run-on-projects: ['mozilla-beta', 'mozilla-aurora', 'mozilla-central', 'mozilla-inbound', 'try']

how about autoland?

::: taskcluster/ci/test/tests.yml:1220
(Diff revision 4)
> +        script: awsy_script.py
> +        no-read-buildbot-config: true
> +        config:
> +            by-test-platform:
> +                default:
> +                    - awsy/linux_config.py

we added options for win7/10, but this says to always use linux_config.py?  are we missing other configs?

::: taskcluster/taskgraph/transforms/task.py:366
(Diff revision 4)
>      'tc-M-V': 'Mochitests on Valgrind executed by TaskCluster',
>      'tc-R': 'Reftests executed by TaskCluster',
>      'tc-R-e10s': 'Reftests executed by TaskCluster with e10s',
>      'tc-T': 'Talos performance tests executed by TaskCluster',
>      'tc-T-e10s': 'Talos performance tests executed by TaskCluster with e10s',
> +    'tc-SY': 'Are we slim yet tests by TaskCluster',

do we need the non-e10s version of this?  With the long runtime and the fact that we mostly care about e10s, I would like to justify why we are running non-e10s.

::: testing/mozharness/configs/awsy/linux_config.py:4
(Diff revision 4)
> +import os
> +import platform
> +
> +PYTHON = "/tools/buildbot/bin/python"

why is python in the buildbot path?  I would think this is a buildbot patch, not a fresh taskcluster patch.

::: testing/mozharness/configs/awsy/linux_config.py:18
(Diff revision 4)
> +BINARY_PATH = os.path.join(ABS_WORK_DIR, "application", "firefox", "firefox-bin")
> +INSTALLER_PATH = os.path.join(ABS_WORK_DIR, "installer.tar.bz2")
> +
> +config = {
> +    "log_name": "awsy",
> +    "buildbot_json_path": "buildprops.json",

do we need buildprops.json?

::: testing/mozharness/configs/awsy/linux_config.py:36
(Diff revision 4)
> +        'tooltool.py': "/tools/tooltool.py",
> +    },
> +    "title": os.uname()[1].lower().split('.')[0],
> +    "default_actions": [
> +        "clobber",
> +        "read-buildbot-config",

why do we have read-buildbot-config?  I thought these ran in taskcluster?

::: testing/mozharness/configs/awsy/mac_config.py:13
(Diff revision 4)
> +    "enabled": ENABLE_SCREEN_RESOLUTION_CHECK
> +}
> +
> +import os
> +
> +PYTHON = '/tools/buildbot/bin/python'

more buildbot references

::: testing/mozharness/configs/awsy/windows_config.py:44
(Diff revision 4)
> +    ],
> +    "default_blob_upload_servers": [
> +        "https://blobupload.elasticbeanstalk.com",
> +    ],
> +    "blob_uploader_auth_file": os.path.join(os.getcwd(), "oauth.txt"),
> +    "metro_harness_path_frmt": "%(metro_base_path)s/metro/metrotestharness.exe",

I thought we got rid of metro related code?

::: testing/mozharness/scripts/awsy_script.py:32
(Diff revision 4)
> +class AWSY(TestingMixin, MercurialScript, BlobUploadMixin,TooltoolMixin):
> +    config_options = [
> +        [["--branch-name"],
> +         {"action": "store",
> +          "dest": "branch",
> +          "help": "Graphserver branch to report to"

graphserver is long dead, can we make a proper comment here?  is branch-name used?

::: testing/mozharness/scripts/awsy_script.py:34
(Diff revision 4)
> +        [["--branch-name"],
> +         {"action": "store",
> +          "dest": "branch",
> +          "help": "Graphserver branch to report to"
> +          }],
> +        [["--e10s"],

our other harnesses have switched where posible to be e10s by default and using --disable-e10s as the cli flag- as this is a new harness I would prefer we do that here.
Attachment #8841240 - Flags: review?(jmaher) → review-
(In reply to Paul Yang [:pyang] from comment #35)
> Here is the result using iteration=3, perTabPause=30
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=3f46e34e0156f05f3506737de9f190f2f854f021&selectedJob=8
> 1529331
> Still over the boundary of 90 minutes.
> I'm not sure if that is a hard limit but we certainly want to reduce overall
> running time on taskcluster.
> Eric, do you have any suggestion?

Ahah! Sorry `perTabPause` should be 10 seconds, `settleWaitTime` should be 30 seconds. That'll shave 6000 seconds or so of the run time.
Flags: needinfo?(erahm)
Comment on attachment 8841240 [details]
Bug 1272113: Creating awsy task and mozharness script

https://reviewboard.mozilla.org/r/115518/#review119264

Thanks! Will remove windows/mac and non-e10s trials.
Buildbot references are strange so need to do more investigation.

> does this work in win7?  if not, add it here with a # to make it a comment

I tried to test on windows/osx platform since areweslimyet depends only on marionette.
Will remove those and keep linux, linux64

> how about autoland?

I followed talos's setting since it's a long running job. Autoland
we can change to 
talos, ['mozilla-beta', 'mozilla-aurora', 'mozilla-central', 'mozilla-inbound', 'autoland', 'try'] 
or simply remove it like marionette does.

> do we need the non-e10s version of this?  With the long runtime and the fact that we mostly care about e10s, I would like to justify why we are running non-e10s.

Will remove this in next revision.

> why is python in the buildbot path?  I would think this is a buildbot patch, not a fresh taskcluster patch.

All those buildbot references are inherited from mochitest/marionette and others.
see also https://dxr.mozilla.org/mozilla-central/search?q=buildbot%2Fbin%2Fpython&redirect=false
I'll try to remove them if possible.

> why do we have read-buildbot-config?  I thought these ran in taskcluster?

This one was weird.
I removed this task but failed because it's required.  I guess it's because https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/transforms/tests.py#213

So I kept task "read-buildbot-config" and add "no-read-buildbot-config" in tests.yml just like https://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/test/tests.yml#412

> I thought we got rid of metro related code?

Will take it out.

> graphserver is long dead, can we make a proper comment here?  is branch-name used?

will remove it. thanks.

> our other harnesses have switched where posible to be e10s by default and using --disable-e10s as the cli flag- as this is a new harness I would prefer we do that here.

Agree.  Will fix it.
See Also: → 1344813
(In reply to Joel Maher ( :jmaher) from comment #42)
> ::: testing/mozharness/scripts/awsy_script.py:34
> (Diff revision 4)
> > +        [["--branch-name"],
> > +         {"action": "store",
> > +          "dest": "branch",
> > +          "help": "Graphserver branch to report to"
> > +          }],
> > +        [["--e10s"],
> 
> our other harnesses have switched where posible to be e10s by default and
> using --disable-e10s as the cli flag- as this is a new harness I would
> prefer we do that here.

--e10s option was passed by https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/transforms/tests.py?q=path%3Ataskcluster+%22--e10s%22&redirect_type=single#510
to the harness, and was required in https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/transforms/tests.py#124

We probably can
1. Patch transforms/tests.py to use --disable-e10s, will have a bunch of followup modification.
2. Move e10s option into a general config of mozharness, such like https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/testbase.py#51 or https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/try_tools.py#16

Joel, what do you think?
Flags: needinfo?(jmaher)
thanks for looking into this Paul- Given your data and for me looking into it more, I think that going with what we do for taskcluster->mozharness seems reasonable.  So lets leave this --e10s flag as it is for now:)
Flags: needinfo?(jmaher)
Another revision pushed, comments inlined.

(In reply to Eric Rahm [:erahm] from comment #33)
> Comment on attachment 8841238 [details]
> Bug 1272113: Adding awsy package into m-c

> ::: testing/awsy/awsy/test_memory_usage.py:76
> (Diff revision 3)
> > +        self._pages_to_load = self.testvars.get("entities", len(self._urls))
> > +        self._iterations = self.testvars.get("iterations", ITERATIONS)
> > +        self._perTabPause = self.testvars.get("perTabPause", PER_TAB_PAUSE)
> > +        self._settleWaitTime = self.testvars.get("settleWaitTime", SETTLE_WAIT_TIME)
> > +        self._maxTabs = self.testvars.get("maxTabs", MAX_TABS)
> > +        self._resultsDir = os.path.join(os.getcwd(), "tests", "results")
> 
> I think this should just default to the upload dir, so like the perf data
> below in `tearDown`. Something like:
> 
> '''python
> self._resultsDir = os.environ.get(
>     "MOZ_UPLOAD_DIR",
>     os.path.join(os.getcwd(), "tests", "results")
> '''
> 
> It's possible I'm missing something from other patch  sets that take care of
> this.
> 
> ::: testing/awsy/awsy/test_memory_usage.py:89
> (Diff revision 3)
> > +        self.logger.info("tearing down webservers!")
> > +        self._webservers.stop()
> > +        self.logger.info("processing data in %s!" % self._resultsDir)
> > +        perf_blob = process_perf_data.create_perf_data(self._resultsDir)
> > +        self.logger.info("PERFHERDER_DATA: %s" % json.dumps(perf_blob))
> > +        dest_file = os.path.join(os.environ["MOZ_UPLOAD_DIR"], "perfherder_data.json")
> 
> We should add a fallback option if this isn't being run in automation (for
> local testing), something similar to what I suggested for `_resultsDir`.
> 

I tend to leave the tests/results, so it is now like
1. memory tgz goes to self._resultsDir(tests/results)
2. archive all memory tgz into another tar file.
3. perfherder json is dumped into self._resultsDir as well.
4. perf json file and aggregated memory report tar file are moved to MOZ_UPLOAD_DIR


> ::: testing/awsy/conf/testvars.json:4
> (Diff revision 3)
> > +{
> > +  "entities": 100,
> > +  "iterations": 5,
> > +  "perTabPause": 1,
> 
> `perTabPause` and `settleWaitTime` need to be 30. The other option is to
> just comment these out (but leave them as documentation of which options can
> be set) and use the defaults.
> 

I tried some combinations and looks like we have sort of magic numbers here.
For instance, entities should be greater than 5, and we should have proper number of perTabPause and settleWaitTime to make 30 sec interval.
We can take out testvars.json in case of mis-used; or leave it for now since it's triggered only from treeherder.


> ::: testing/awsy/conf/testvars.json:6
> (Diff revision 3)
> > +{
> > +  "entities": 100,
> > +  "iterations": 5,
> > +  "perTabPause": 1,
> > +  "settleWaitTime": 1,
> > +  "download_tp5n": false
> 
> I'm not clear on what this does, is this just for testing? If so we can
> remove it.

Removed in current revision.
Flags: needinfo?(erahm)
(In reply to Paul Yang [:pyang] from comment #47)
> I tend to leave the tests/results, so it is now like
> 1. memory tgz goes to self._resultsDir(tests/results)
> 2. archive all memory tgz into another tar file.
> 3. perfherder json is dumped into self._resultsDir as well.
> 4. perf json file and aggregated memory report tar file are moved to
> MOZ_UPLOAD_DIR
> 

From an https://areweslimyet.com perspective we probably want to leave the memory reports separate (no tar file). Putting them in their own directory that gets uploaded seems fine (similar to how logs have there own directory in task cluster builds), it doesn't really matter where they are as long as they're uploaded.

> I tried some combinations and looks like we have sort of magic numbers here.
> For instance, entities should be greater than 5, and we should have proper
> number of perTabPause and settleWaitTime to make 30 sec interval.

These values could definitely use some documentation:

*entities*
The number of pages to load. We have 100 unique web pages, the default is to load all 100.

*iterations*
The number of times to repeat the web page loading portion of the test. So load the web pages and take all memory measurements, repeat |iterations| number of times.

I'm suggesting we change the default for this from 5 iterations to 3 iterations in order to cut down the run time.
 
*perTabPause*
The amount of time to wait before loading the next page. The default is 10 seconds, this gives the page a change to fully load dynamic content and in general settle down.

If you want to do a quick test run, reducing this to 1 seconds is helpful. Note: this won't give reliable data, but will allow you to iterate on code changes quickly.

*settleWaitTime*
The amount of time to wait in between memory measurements. So for example we measure memory once all pages have been loaded, then wait |settleWaitTime| seconds before measuring again (this makes sure garbage collection, cache eviction, etc) is working. The default is 30 seconds.

> We can take out testvars.json in case of mis-used; or leave it for now since
> it's triggered only from treeherder.

This file is really only useful for performing quick tests, instead we could just modify the values in __init__.py. It's fine to remove it, just make sure you update the default iterations to 3 as we've discussed (in order to reduce overall run time, but still catch memory leaks).
Flags: needinfo?(erahm)
Eric,
I saw iterations was used to create dump file
https://github.com/EricRahm/awsy-lite/blob/master/awsylite/test_memory_usage.py#L222

However it's been assigned iteration=4 in process_per_data
https://github.com/EricRahm/awsy-lite/blob/master/awsylite/process_perf_data.py#L16

I think we can use glob to parse directory with prefix memory-report and rebuild checkpoint list.
Flags: needinfo?(erahm)
Comment on attachment 8841238 [details]
Bug 1272113: Adding awsy package into m-c

Fixes including
- Usage of _resultsDir and MOZ_UPLOAD_DIR
- Get memory report file info and update CHECKPOINTS in process_perf_data.py
Attachment #8841238 - Flags: review?(erahm)
Comment on attachment 8841240 [details]
Bug 1272113: Creating awsy task and mozharness script

https://reviewboard.mozilla.org/r/115518/#review120572

::: taskcluster/ci/test/tests.yml:1230
(Diff revision 5)
> +    treeherder-symbol: tc-SY(sy)
> +    run-on-projects: ['mozilla-beta', 'mozilla-aurora', 'mozilla-central', 'mozilla-inbound', 'try']
> +    docker-image: {"in-tree": "desktop1604-test"}
> +    max-run-time: 7200
> +    instance-size: xlarge
> +    allow-software-gl-layers: false

where is the |e10s: true| line item here?

::: taskcluster/taskgraph/transforms/task.py:366
(Diff revision 5)
>      'tc-M-V': 'Mochitests on Valgrind executed by TaskCluster',
>      'tc-R': 'Reftests executed by TaskCluster',
>      'tc-R-e10s': 'Reftests executed by TaskCluster with e10s',
>      'tc-T': 'Talos performance tests executed by TaskCluster',
>      'tc-T-e10s': 'Talos performance tests executed by TaskCluster with e10s',
> +    'tc-SY': 'Are we slim yet tests by TaskCluster',

if we are only running in e10s, do we need tc-SY defined?  or are we running in both modes for a short while?

::: testing/mozharness/scripts/awsy_script.py:124
(Diff revision 5)
> +        '''
> +        dirs = self.abs_dirs
> +        env = {}
> +        error_summary_file = os.path.join(dirs['abs_blob_upload_dir'],
> +                                          'marionette_errorsummary.log')
> +        output_timeout = 6500

we have 7200 defined in the taskcluster definition, is this needed in this harness?  if so, please document it better- magic numbers == no good!
Attachment #8841240 - Flags: review?(jmaher) → review-
Comment on attachment 8841238 [details]
Bug 1272113: Adding awsy package into m-c

https://reviewboard.mozilla.org/r/115514/#review120650

::: testing/awsy/awsy/process_perf_data.py:34
(Diff revisions 3 - 5)
>      { 'name': "Heap Unclassified", 'node': "explicit/heap-unclassified" },
>      { 'name': "JS", 'node': "js-main-runtime" },
>      { 'name': "Images", 'node': "explicit/images" }
>  ]
>  
> +def get_checkpoints(checkpoint_files):

Nice, good flexible solution. Minor nit: maybe change the name to something like `update_checkpoint_paths`.

::: testing/awsy/awsy/test_memory_usage.py:86
(Diff revisions 3 - 5)
>          self.logger.info("PERFHERDER_DATA: %s" % json.dumps(perf_blob))
> -        dest_file = os.path.join(os.environ["MOZ_UPLOAD_DIR"], "perfherder_data.json")
> -        with open(dest_file, 'w') as fp:
> +
> +        report_file = shutil.make_archive(base_name="memory-report",
> +                                          format="gztar",
> +                                          root_dir=self._resultsDir,
> +                                          base_dir=".")

I'd still prefer if we uploaded the files individually -- they're already compressed and users will really only want to download one file at a time. This also makes integration into areweslimyet.com simpler.

::: testing/awsy/awsy/test_memory_usage.py:183
(Diff revisions 3 - 5)
>          except ScriptTimeoutException:
>              self.logger.error("Memory report timed out")
>          except:
>              self.logger.error("Unexpected error: %s" % sys.exc_info()[0])
>          else:
> -            self.logger.info("checkpoint created!")
> +            self.logger.info("checkpoint created, sotred in %s" %checkpoint_path)

Minor nit: 'sotred' -> 'sorted', space between '%checkpoint_path'.
Attachment #8841238 - Flags: review?(erahm) → review-
Comment on attachment 8841238 [details]
Bug 1272113: Adding awsy package into m-c

https://reviewboard.mozilla.org/r/115514/#review120650

Meant to say, this looks good. Just some really minor points. The only real issue is I'd like to have the memory reports uploaded individually rather than in an archive.
[Updated]
AWSY-lib
- Memory-report file now uploaded by original format
- Fix typo

AWSY-task
- Add e10s:true in tc's tests.yml
- Remove non-e10s row of awsy
- Move output_timeout=6500 to awsy_script's config - cmd_timeout
  - remain global task max-run-time: 7200 for possibly hang in setup/teardown
Comment on attachment 8841240 [details]
Bug 1272113: Creating awsy task and mozharness script

https://reviewboard.mozilla.org/r/115518/#review121268

now I am getting into the small details of nits, the scheduling on linux64-nightly instead of opt or pgo seems to be the most problematic.  We only build pgo for beta/aurora, so we need it scheduled on there (assuming we uplift this patch).  The other comments about e10s and remove_executables.py can be handled in a followup bug if you wish.

::: taskcluster/ci/test/test-platforms.yml:64
(Diff revisions 4 - 6)
> +        - external-media-tests
>          - web-platform-tests
>          - opt-only-tests
>          - desktop-screenshot-capture
>          - talos
>          - awsy

will awsy run on nightly or per push?  right now it is scheduled for nightly

::: taskcluster/ci/test/tests.yml:1255
(Diff revisions 4 - 6)
> -        no-read-buildbot-config: true
>          config:
>              by-test-platform:
>                  default:
>                      - awsy/linux_config.py
>                      - remove_executables.py

I am not sure if we need remove_executables.py here, I thought we added that as a way to work around buildbot scripts that still needed to run.

::: testing/mozharness/scripts/awsy_script.py:34
(Diff revisions 4 - 6)
>          [["--e10s"],
>          {"action": "store_true",
>           "dest": "e10s",
>           "default": False,
>           "help": "Run tests with multiple processes. (Desktop builds only)",
>           }]

as a note, if we are only running in e10s, do we need this cli option?  Possibly we do as this is needed to work smoothly with taskcluster.
Attachment #8841240 - Flags: review?(jmaher) → review-
In bug 1344813, I added some changes in order to support the mach command which we will have to incorporate into the patches in this bug.
Since we are downloading the tp5n.zip file from tooltool, I was thinking we could remove the TEST_SITES_TEMPLATES from the awsy module completely.

With the changes to how the webRootDir are specified, we might want to allow the user to point to an existing non-tp5n web page manifest and webRootDir. This would be especially convenient for testing different page sets locally.

pyang, jmaher: What do you think?
Flags: needinfo?(pyang)
Flags: needinfo?(jmaher)
I am all for simplifying things- and via mach/mozharness/etc. we will be using tooltool as much as we can to get tp5n.zip.
Flags: needinfo?(jmaher)
(In reply to Bob Clary [:bc:] from comment #64)
> Since we are downloading the tp5n.zip file from tooltool, I was thinking we
> could remove the TEST_SITES_TEMPLATES from the awsy module completely.
> 
> With the changes to how the webRootDir are specified, we might want to allow
> the user to point to an existing non-tp5n web page manifest and webRootDir.
> This would be especially convenient for testing different page sets locally.
> 
> pyang, jmaher: What do you think?

Agreed. We don't really need a local setting in our code.
Only thing I concerned to prepare manifest might not intuitive for newcomers; perhaps putting a default one is good but it's up to your design.
Flags: needinfo?(pyang)
Comment on attachment 8841238 [details]
Bug 1272113: Adding awsy package into m-c

Eric, I did small fixes according to comments.  Can you help to see this?
Attachment #8841238 - Flags: review?(erahm)
Comment on attachment 8841238 [details]
Bug 1272113: Adding awsy package into m-c

https://reviewboard.mozilla.org/r/115514/#review121792

Looks good! Super close to landing :)
Attachment #8841238 - Flags: review?(erahm) → review+
Flags: needinfo?(erahm)
(In reply to Joel Maher ( :jmaher) from comment #62)
> Comment on attachment 8841240 [details]
> Bug 1272113: Creating awsy task and mozharness script
> 
> https://reviewboard.mozilla.org/r/115518/#review121268
> 
> now I am getting into the small details of nits, the scheduling on
> linux64-nightly instead of opt or pgo seems to be the most problematic.  We
> only build pgo for beta/aurora, so we need it scheduled on there (assuming
> we uplift this patch).  The other comments about e10s and
> remove_executables.py can be handled in a followup bug if you wish.
> 
I added into task-set conservatively, so glad to know it's ok to add awsy in other platform once confirmed.

> ::: taskcluster/ci/test/test-platforms.yml:64
> (Diff revisions 4 - 6)
> > +        - external-media-tests
> >          - web-platform-tests
> >          - opt-only-tests
> >          - desktop-screenshot-capture
> >          - talos
> >          - awsy
> 
> will awsy run on nightly or per push?  right now it is scheduled for nightly
> 
> ::: taskcluster/ci/test/tests.yml:1255
> (Diff revisions 4 - 6)
> > -        no-read-buildbot-config: true
> >          config:
> >              by-test-platform:
> >                  default:
> >                      - awsy/linux_config.py
> >                      - remove_executables.py
> 
> I am not sure if we need remove_executables.py here, I thought we added that
> as a way to work around buildbot scripts that still needed to run.
>

https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/configs/remove_executables.py
I doubt it's relative for buildbot; it seems to avoid download of minidump_stackwalk and nodejs.
But it looks like a workaround since most of test applied this config.  Perhaps we can take it out but definitely need someone to provide more information.
 
> ::: testing/mozharness/scripts/awsy_script.py:34
> (Diff revisions 4 - 6)
> >          [["--e10s"],
> >          {"action": "store_true",
> >           "dest": "e10s",
> >           "default": False,
> >           "help": "Run tests with multiple processes. (Desktop builds only)",
> >           }]
> 
> as a note, if we are only running in e10s, do we need this cli option? 
> Possibly we do as this is needed to work smoothly with taskcluster.

Agreed this option confuses people.  We might need to work on all those options and make sure e10s is default.
(In reply to Joel Maher ( :jmaher) from comment #62) 
> ::: taskcluster/ci/test/test-platforms.yml:64
> (Diff revisions 4 - 6)
> > +        - external-media-tests
> >          - web-platform-tests
> >          - opt-only-tests
> >          - desktop-screenshot-capture
> >          - talos
> >          - awsy
> 
> will awsy run on nightly or per push?  right now it is scheduled for nightly

I just realize mercurial auto-merge my patch with https://hg.mozilla.org/mozilla-central/annotate/1bce56d3f45d/taskcluster/ci/test/test-platforms.yml#l30

hg didn't complain so I missed this issue.  Thanks for eagle eyes.
> > 
> > ::: taskcluster/ci/test/tests.yml:1255
> > (Diff revisions 4 - 6)
> > > -        no-read-buildbot-config: true
> > >          config:
> > >              by-test-platform:
> > >                  default:
> > >                      - awsy/linux_config.py
> > >                      - remove_executables.py
> > 
> > I am not sure if we need remove_executables.py here, I thought we added that
> > as a way to work around buildbot scripts that still needed to run.

We added this remove_executables.py so jobs that originally ran in buildbot could run in buildbot and taskcluster.  Since this doesn't run in buildbot, I wonder why we need this?  I don't want to just copy everything else.

> https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/configs/
> remove_executables.py
> I doubt it's relative for buildbot; it seems to avoid download of
> minidump_stackwalk and nodejs.
> But it looks like a workaround since most of test applied this config. 
> Perhaps we can take it out but definitely need someone to provide more
> information.
>  
> > ::: testing/mozharness/scripts/awsy_script.py:34
> > (Diff revisions 4 - 6)
> > >          [["--e10s"],
> > >          {"action": "store_true",
> > >           "dest": "e10s",
> > >           "default": False,
> > >           "help": "Run tests with multiple processes. (Desktop builds only)",
> > >           }]
> > 
> > as a note, if we are only running in e10s, do we need this cli option? 
> > Possibly we do as this is needed to work smoothly with taskcluster.
> 
> Agreed this option confuses people.  We might need to work on all those
> options and make sure e10s is default.

this doesn't answer my question- do we need this option or not?  Since we are running in e10s only I would believe we do not need this option- do let me know if it is needed.
Any chance we can move some of these nits to a follow up? I'd like to start getting numbers in perfherder -- we're going to need a fair amount of data to validate running in AWS is going to give us reliable numbers.

Also stand-alone awsy has been dead since January so it would be nice to have data again :)
Flags: needinfo?(jmaher)
sure, lets get this landed and see where we get... :)
Flags: needinfo?(jmaher)
Comment on attachment 8841240 [details]
Bug 1272113: Creating awsy task and mozharness script

https://reviewboard.mozilla.org/r/115518/#review122630

previous r- had nits, filed a new bug to track those nits.
Attachment #8841240 - Flags: review- → review+
Pushed by erahm@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/01772e6e073a
Adding awsy package into m-c r=erahm
https://hg.mozilla.org/integration/autoland/rev/99d61037afe8
Packaging awsy in build job r=jmaher
https://hg.mozilla.org/integration/autoland/rev/d378f8f5eab1
Creating awsy task and mozharness script r=jmaher
See Also: → 1382725
Component: General → AWSY
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: