Closed Bug 1078619 Opened 10 years ago Closed 9 years ago

Allow to run talos jobs as a developer

Categories

(Release Engineering :: Applications: MozharnessCore, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: armenzg, Assigned: s244sing)

References

Details

(Whiteboard: [good next bug][easier-mozharness])

Attachments

(2 files, 2 obsolete files)

Hello!
Thanks for looking into helping the Mozilla project. Your help is very much appreciated and essential to the sustainability of our support to the Mozilla mission. [1]

In order to contribute with this issue you should follow the "setup" instructions in here:
https://wiki.mozilla.org/Auto-tools/Projects/Mozharness#Setup

WARNING: This bug is a bit more involved than a first mozharness bug.

= The issue =
For Firefox, we run performance tests to ensure that we perform properly. This job is called "talos". [2]

In order to run it, we use mozharness, however, we cannot currently run it on our local machines.

See [3] on how we should be able to run talos locally.

The current issue we have is that we have hard-coded paths relevant to the release engineering machines and fail with errors like:
 OSError: [Errno 13] Permission denied: '/home/cltbld'

Some of these hardcoded issues live under the talos config [4], for instance:
 PYTHON = '/tools/buildbot/bin/python'
 VENV_PATH = '/home/cltbld/talos-slave/test/build/venv'
 "webroot": '/builds/slave/talos-slave/talos-data',

I assume that we need to fix the configs to be a bit less releng specifc plus perhaps needed to add something to developer_config.py.

We can see in the configs for normal desktop unit tests how we define the virtualenv path:
'exes': {
    'virtualenv': ['/tools/buildbot/bin/python', '/tools/misc-python/virtualenv.py'],
    ...
}

This value then gets overwritten by the value in developer_config.py [5]
'exes': {},

= Get help =
If you need help or guidance feel free to write a comment in this bug.
You can also chat with us by visiting our IRC channel https://chat.mibbit.com/?url=irc%3A%2F%2Firc.mozilla.org%2F%23ateam

To know more about mozharness read:
https://wiki.mozilla.org/Auto-tools/Projects/Mozharness

To know more about IRC you can watch:
http://codefirefox.com/video/irc

[1] https://www.mozilla.org/en-US/about/manifesto/
[2] https://tbpl.mozilla.org/?jobname=talos
[3] python scripts/talos_script.py --suite chromez --add-option --webServer,localhost --branch-name Firefox-Non-PGO --system-bits 64 --cfg talos/linux_config.py --download-symbols ondemand --use-talos-json --blob-upload-branch Firefox-Non-PGO --cfg developer_config.py --installer-url http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/firefox-35.0a1.en-US.linux-x86_64.tar.bz2
[4] http://hg.mozilla.org/build/mozharness/file/default/configs/talos/linux_config.py
[5] http://hg.mozilla.org/build/mozharness/file/default/configs/unittests/linux_unittest.py#l22
[6] http://hg.mozilla.org/build/mozharness/file/default/configs/developer_config.py#l12
Blocks: 1078638
This patch comes from some personal attempt, however, I had to give up due to the urgency of my current work.
Hmm... just ran a test with the following line:

python scripts/talos_script.py --suite chromez --add-option --webServer,localhost --branch-name Firefox-Non-PGO --system-bits 64 --cfg talos/linux_config.py --download-symbols ondemand --use-talos-json --blob-upload-branch Firefox-Non-PGO --cfg developer_config.py --installer-url http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/firefox-36.0a1.en-US.linux-x86_64.tar.bz2

And observed this log: http://pastebin.mozilla.org/7419821

So I'm guessing it was correctly reproduced, correct? If so, is there a standard that I should follow before I go on to write the overrides in the developer_config.py file?
Flags: needinfo?(armenzg)
Assignee: nobody → s244sing
You're on the right track.
Feel free to start adding overrides to developer_config.py. We can then investigate further once you have a working patch.

For that issue I suggest adding a "webroot" override see:
configs/talos/mac_config.py:    "webroot": '%s/../talos-data' % os.getcwd(),
configs/talos/windows_config.py:    "webroot": 'c:/slave/talos-data',
configs/talos/linux_config.py:    "webroot": '/builds/slave/talos-slave/talos-data',
Flags: needinfo?(armenzg)
So I've been trying to run a modified developer_config.py that I'm working on. So far I've been able to get the right python and virtualenv paths from the host system. But I'm running into this issue [1]

I'm not making a review as the script still isn't correct. But here is the current script that I have [2]

Not too sure what's causing the permission denied messages to pop up. Any hints?

[1]: http://pastebin.mozilla.org/7495947
[2]: http://pastebin.mozilla.org/7496230
Flags: needinfo?(armenzg)
Well that does make sense when I checked out the permission bits on those virtualenv.py files. Here's an output of the two instances of virtualenv.py that I have on my machine. [1]

[1]: http://pastebin.mozilla.org/7496516
Attached patch talos.diffSplinter Review
Feel free to attach your diff (hg diff > your_patch.diff) to the bug and use the "feedback" flag on the attachment instead of the "review" flag.

How come do you set the exes? I believe your system should discover them.
     "exes": {
         'python': PYTHON,
         'virtualenv': [VENV_SCRIPT_PATH],
     },

I've run it locally and it seems we get far.
For the note, you need to close Firefox to be able to run the script.
I think we just need to tweak a path or two to make it work as expected.

After you run it once and it fails, you can then look for the following command and try to run it by itself adjusting the parameters until it does what you would expect it to do.
16:49:56     INFO - Calling ['/home/armenzg/repos/mozharness/venv/bin/talos', '--noisy', '--debug', '-v', '--executablePath', '/home/armenzg/repos/mozharness/build/application/firefox/firefox', '--title', 'armenzg-thinkpad', '--symbolsPath', 'https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/firefox-36.0a1.en-US.linux-x86_64.crashreporter-symbols.zip', '--activeTests', 'tresize:tcanvasmark', '--results_url', 'http://graphs.mozilla.org/server/collect.cgi', '--output', 'talos.yml', '--branchName', 'Firefox-Non-PGO', '--datazilla-url', 'https://datazilla.mozilla.org/talos', '--authfile', '/home/armenzg/repos/mozharness/oauth.txt', '--webServer', 'localhost'] with output_timeout 3600

Before running that command you should activate the venv first with:
"source build/venv/bin/activate"
Attachment #8504999 - Attachment is obsolete: true
Flags: needinfo?(armenzg)
(In reply to Armen Zambrano - Automation & Tools Engineer (:armenzg) from comment #6)
> Created attachment 8527935 [details] [diff] [review]
> talos.diff
> 
> Feel free to attach your diff (hg diff > your_patch.diff) to the bug and use
> the "feedback" flag on the attachment instead of the "review" flag.
> 
> How come do you set the exes? I believe your system should discover them.
>      "exes": {
>          'python': PYTHON,
>          'virtualenv': [VENV_SCRIPT_PATH],
>      },
> 

Yes, there could be several reasons why I had to manually set them. 1) My virtuanenv wasn't setup correctly OR 2) virtualenv has a bug in the ubuntu package that I found out from this link: [1] Will take care of this.


> I've run it locally and it seems we get far.
> For the note, you need to close Firefox to be able to run the script.
> I think we just need to tweak a path or two to make it work as expected.
> 

Ok, with the new script I have locally (will attach for feedback soon), is the test classified as a "pass" if upon running the command I am able to launch firefox?


> After you run it once and it fails, you can then look for the following
> command and try to run it by itself adjusting the parameters until it does
> what you would expect it to do.
> 16:49:56     INFO - Calling
> ['/home/armenzg/repos/mozharness/venv/bin/talos', '--noisy', '--debug',
> '-v', '--executablePath',
> '/home/armenzg/repos/mozharness/build/application/firefox/firefox',
> '--title', 'armenzg-thinkpad', '--symbolsPath',
> 'https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/
> firefox-36.0a1.en-US.linux-x86_64.crashreporter-symbols.zip',
> '--activeTests', 'tresize:tcanvasmark', '--results_url',
> 'http://graphs.mozilla.org/server/collect.cgi', '--output', 'talos.yml',
> '--branchName', 'Firefox-Non-PGO', '--datazilla-url',
> 'https://datazilla.mozilla.org/talos', '--authfile',
> '/home/armenzg/repos/mozharness/oauth.txt', '--webServer', 'localhost'] with
> output_timeout 3600
> 
> Before running that command you should activate the venv first with:
> "source build/venv/bin/activate"

Oops, forgot to activate it before testing. Will take care of this in the next test run.


[1]: https://bugs.launchpad.net/ubuntu/+source/python2.7/+bug/1115466
Flags: needinfo?(armenzg)
This is after fixing my virtualenv installation and also considering your comments from last time.

I'm able to successfully launch firefox after this test. What else should we be looking into adding in terms of overrides into developer_config.py?
Attachment #8530680 - Flags: feedback?(armenzg)
Hi Simarpreet,
I will try the code again this week.
From your comment it seems that it worked for you; I wonder why in my local run I didn't declare victory.
I will try again this week.

Notice that I will be flying and attending a week long conference so I won't be able to get back to you promptly.

(In reply to Simarpreet Singh from comment #7)
> (In reply to Armen Zambrano - Automation & Tools Engineer (:armenzg) from
> comment #6)
> 
> > I've run it locally and it seems we get far.
> > For the note, you need to close Firefox to be able to run the script.
> > I think we just need to tweak a path or two to make it work as expected.
> > 
> 
> Ok, with the new script I have locally (will attach for feedback soon), is
> the test classified as a "pass" if upon running the command I am able to
> launch firefox?
> 
You should see a lot of pages being loaded up consecutively.
I can verify this once I run the code.
Flags: needinfo?(armenzg)
I will be looking at this today. My apologies for the delay. It was difficult last week with all the flying and work week.
Simarpreet,
I have been talking with other developers and we believe that spending time on fixing this is _not_ the long term way of fixing things, however, I believe you can still gain a lot of skills if you're willing to stick around with this bug. I'm already a learning a bunch by just trying to think it through it!

If you're interested on the longer term fix let me know and we I can file a bug for it, however, be aware that it will be a lot of discovery and mutual learning so we might have to redo some work at times.

Please let me know if anything in here is not clear.

#######################
## Back to the bug

From my local attempt, I can't seem to be able to make this run as expected.
Yes, I see a browser startup, however, we should see it load and close multiple tabs with various content.
Currently, it only loads a browser up trying to load localhost/getInfo.html.

What I have noticed is that we're not running a webserver.
In the production systems [1], there is an always running webserver, however, in our local machines we don't.

A way to make this run is like this:
* python scripts/talos_script.py --suite chromez --add-option --webServer,localhost --branch-name Firefox-Non-PGO --system-bits 64 --cfg talos/linux_config.py --download-symbols ondemand --use-talos-json --blob-upload-branch Firefox-Non-PGO --cfg developer_config.py --installer-url http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/firefox-37.0a1.en-US.linux-x86_64.tar.bz2

After it fails, you can do this:
* source build/venv/bin/activate

NOTE: The following command will show up in the output of the talos_script.py run (or check under logs/talos_info.log)
* $PWD/build/venv/bin/talos --noisy --debug -v --executablePath $PWD/build/application/firefox/firefox --title armenzg-thinkpad    --symbolsPath https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/firefox-37.0a1.en-US.linux-x86_64.crashreporter-symbols.zip --activeTests tresize:tcanvasmark --develop

The --develop flag starts up a webserver for us.

Adding "python_webserver" to True [2] to developer_config does not seem to be enough.
It is probably because we pass --webserver and we should not:
http://hg.mozilla.org/build/talos/file/f3179facd945/talos/PerfConfigurator.py#l497

Simarpreet, would you mind trying to tinker with this until the --developer flag is enabled?

This is what it currently run instead:
* $PWD/build/venv/bin/talos --noisy --debug -v --executablePath $PWD/build/application/firefox/firefox --title armenzg-thinkpad    --symbolsPath https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/firefox-37.0a1.en-US.linux-x86_64.crashreporter-symbols.zip --activeTests tresize:tcanvasmark --results_url http://graphs.mozilla.org/server/collect.cgi --output talos.yml --branchName Firefox-Non-PGO --datazilla-url https://datazilla.mozilla.org/talos --authfile /home/armenzg/repos/mozharness/oauth.txt --webServer localhost


[1] https://tbpl.mozilla.org/?jobname=Ubuntu%20HW%2012.04%20x64%20mozilla-central%20talos%20chromez
[2] http://dxr.mozilla.org/build:mozharness/source/mozharness/mozilla/testing/talos.py#426
Comment on attachment 8530680 [details] [diff] [review]
[PATCH][v1]: Adding paths for webroot and venv on Linux.

Review of attachment 8530680 [details] [diff] [review]:
-----------------------------------------------------------------

Removing the request for feedback as I already replied in my previous comment.
Attachment #8530680 - Flags: feedback?(armenzg)
Hey Armen,

Sorry about the delayed responses, I've been busy with school work and exams lately.

I understand the points you've mentioned with regards to the approach and value of a fix coming out of this bug. I'm fine either way, for me this is just trying to learn a new infrastructure and a code base. I would however be more than happy to work on this if this is something that is really important in the long run and could be worked over a longer period of time (compared to being a good first or a good next bug category).

I will look into the points and hints you've raised in your comment and will try to reproduce the same, unfortunately only after next week. I appreciate your help.
Hello Armen,

I worked on this today for a bit, I've listed my findings (inline) below:

(In reply to Armen Zambrano - Automation & Tools Engineer (:armenzg) from comment #11)
> Simarpreet,
> I have been talking with other developers and we believe that spending time
> on fixing this is _not_ the long term way of fixing things, however, I
> believe you can still gain a lot of skills if you're willing to stick around
> with this bug. I'm already a learning a bunch by just trying to think it
> through it!
> 
> If you're interested on the longer term fix let me know and we I can file a
> bug for it, however, be aware that it will be a lot of discovery and mutual
> learning so we might have to redo some work at times.
> 
> Please let me know if anything in here is not clear.
> 

Okay, I think I have an idea of how the code flow is structured right now. What essentially we want to mimic is the various tests that are run on the production servers, locally, using a locally installed webserver and running the various scripts that are used to test.

The command that we are tying to run handles it this way: It invokes the talos_script.py script with a bunch of options and also specifies the config files to use. We base it with the talos/linux_config.py file but override it with the developer_config.py file which has manual overrides for various params that are needed on a local test host (rather than a production server).

The script in turn invokes the talos binary file (after activating venv ofcourse) and does the actual execution of the tests.

Please correct me if I'm incorrect on any of the above.

> #######################
> ## Back to the bug
> 

I've listed my responses below:

> From my local attempt, I can't seem to be able to make this run as expected.
> Yes, I see a browser startup, however, we should see it load and close
> multiple tabs with various content.
> Currently, it only loads a browser up trying to load localhost/getInfo.html.
> 

Exactly. This is what I had when I last worked on it.

> What I have noticed is that we're not running a webserver.
> In the production systems [1], there is an always running webserver,
> however, in our local machines we don't.
> 
> A way to make this run is like this:
> * python scripts/talos_script.py --suite chromez --add-option
> --webServer,localhost --branch-name Firefox-Non-PGO --system-bits 64 --cfg
> talos/linux_config.py --download-symbols ondemand --use-talos-json
> --blob-upload-branch Firefox-Non-PGO --cfg developer_config.py
> --installer-url
> http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/firefox-
> 37.0a1.en-US.linux-x86_64.tar.bz2
> 

Right. This is the command we actually want to run when we want to test out on a local dev machine as an external developer, correct? (In short: This is the test case for this bug, correct?)


> After it fails, you can do this:
> * source build/venv/bin/activate
> 
> NOTE: The following command will show up in the output of the
> talos_script.py run (or check under logs/talos_info.log)
> * $PWD/build/venv/bin/talos --noisy --debug -v --executablePath
> $PWD/build/application/firefox/firefox --title armenzg-thinkpad   
> --symbolsPath
> https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/
> firefox-37.0a1.en-US.linux-x86_64.crashreporter-symbols.zip --activeTests
> tresize:tcanvasmark --develop
> 

This is the manual way to invoke all the tests we want, correct? (In short: This is the desired result)

> The --develop flag starts up a webserver for us.
> 
> Adding "python_webserver" to True [2] to developer_config does not seem to
> be enough.
> It is probably because we pass --webserver and we should not:

Yes, I removed the --webserver,localhost option from my runs to test the new change.

> http://hg.mozilla.org/build/talos/file/f3179facd945/talos/PerfConfigurator.
> py#l497
> 

I see. So as long as the developer specifies in their command to run the option of (--cfg developer_config.py), we should be taking care of everything within the scripts, correct? i.e. setting up the various options like (--add-option --webserver,localhost).

> Simarpreet, would you mind trying to tinker with this until the --developer
> flag is enabled?
> 

Yes, I have a diff that currently is able to mimic (via executing the talos_script.py script) the behaviour of a running a local webserver and running the various tests that are part of the --activeTests option. I will submit it soon.

> This is what it currently run instead:
> * $PWD/build/venv/bin/talos --noisy --debug -v --executablePath
> $PWD/build/application/firefox/firefox --title armenzg-thinkpad   
> --symbolsPath
> https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/
> firefox-37.0a1.en-US.linux-x86_64.crashreporter-symbols.zip --activeTests
> tresize:tcanvasmark --results_url
> http://graphs.mozilla.org/server/collect.cgi --output talos.yml --branchName
> Firefox-Non-PGO --datazilla-url https://datazilla.mozilla.org/talos
> --authfile /home/armenzg/repos/mozharness/oauth.txt --webServer localhost
> 

This is how my output looks from the latest runs (no --webserver flag and python_webserver set to True in developer_config.py.

http://pastebin.mozilla.org/8091769

> 
> [1]
> https://tbpl.mozilla.org/?jobname=Ubuntu%20HW%2012.04%20x64%20mozilla-
> central%20talos%20chromez
> [2]
> http://dxr.mozilla.org/build:mozharness/source/mozharness/mozilla/testing/
> talos.py#426


The output I have seems to be inline with what I get from manually running the talos binary. So do you think the --develop flag is now working?
Flags: needinfo?(armenzg)
For now just adding the python_webserver flag here. To see this in action run it with the following testline:

$ python scripts/talos_script.py --suite chromez --branch-name Firefox-Non-PGO --system-bits 64 --   cfg talos/linux_config.py --download-symbols ondemand --use-talos-json --blob-upload-branch        Firefox-Non-PGO --cfg developer_config.py --installer-url http://ftp.mozilla.org/pub/mozilla.org/  firefox/nightly/latest-trunk/firefox-37.0a1.en-US.linux-x86_64.tar.bz2

No need to add the --add-option --webserver,localhost anymore.
Attachment #8530680 - Attachment is obsolete: true
Attachment #8539734 - Flags: feedback?(armenzg)
I will be looking at this in the next hour or so.
Comment on attachment 8539734 [details]
[WIP} Adding python_webserver option to developer_config.py

Thank you for a very thorough comment. Everything you said shows that you understand the business need and are making it easy for me to see if you're getting it. Keep it up.

Yes, the patch does what we need.
I will be landing it for you.
Flags: needinfo?(armenzg)
Attachment #8539734 - Flags: review+
Attachment #8539734 - Flags: feedback?(armenzg)
Attachment #8539734 - Flags: feedback+
Thank you Simarpreet!
http://armenzg.blogspot.ca/2014/12/run-mozharness-talos-as-developer.html

Let me know if there are other projects you would like to look into.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
mozharness has been merged to production. patches are live :)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: