Closed Bug 673131 Opened 13 years ago Closed 12 years ago

when minor talos changes land, the a-team should be able to deploy with minimal releng time required

Categories

(Release Engineering :: General, defect, P2)

x86
Linux
defect

Tracking

(firefox10 wontfix, firefox11 wontfix, firefox12 fixed, firefox-esr10 wontfix)

RESOLVED FIXED
Tracking Status
firefox10 --- wontfix
firefox11 --- wontfix
firefox12 --- fixed
firefox-esr10 --- wontfix

People

(Reporter: jmaher, Assigned: armenzg)

References

Details

(Whiteboard: [talos][qa-])

Attachments

(7 files, 5 obsolete files)

2.83 KB, patch
coop
: review+
jmaher
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
5.63 KB, patch
coop
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
512 bytes, patch
armenzg
: review+
Details | Diff | Splinter Review
2.32 KB, patch
coop
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
4.45 KB, patch
coop
: review+
armenzg
: checked-in-
Details | Diff | Splinter Review
2.19 KB, patch
coop
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
2.46 KB, patch
coop
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
many of the changes we make to talos do not change tests or performance numbers, they are bug fixes and stability changes to the tools.  

It would be nice if we could have some method where we can check into m-c a version number (hg revision, link to talos.zip, etc...) so it will be updated.

there are a lot of issues to discuss:
 - how to roll this out to all branches (bootstrapping)
 - updating the buildbot scripts
 - how to define what we are pulling
I will follow up with a blog post explaining all the pieces and changes on the work flow.
Assignee: nobody → armenzg
Status: NEW → ASSIGNED
Priority: -- → P4
Whiteboard: [talos]
As a note, whatever system that we come up with we should try and do the same for the pageloader bundle - it is going through the same deployment woes that talos is.
To better define a 'minor' talos change:

- something that does not affect currently collected talos numbers
- something that does not add a new test

Basically, just a bug fix or feature addition to the talos code base that has passed ateam staging and is considered to be low risk of causing any alteration to the current testing infrastructure (other than making it better, of course).
can we start putting some details down about how to work around this?  It would be nice to have a game plan figured out even if we are not ready to execute it.
I was thinking about this last night and there are several parts that we would like to control from m-c.

As an example of a similar system we have jetpack which has a file checked-in on m-c:
http://mxr.mozilla.org/mozilla-central/source/testing/jetpack/jetpack-location.txt

I believe the right way to go is to have a json file that indicates the values of the following:
TALOS_TP_OPTS = {'plugins': 'zips/plugins.zip', 'pagesets': ['zips/tp5.zip']}
TALOS_TP4_OPTS = {'plugins': 'zips/plugins.zip', 'pagesets': ['zips/tp4.zip',]}
TALOS_ADDON_OPTS = {'addonTester' : True, 'releaseTester' : True, 'plugins': 'zips/plugins.zip'}
TALOS_BASELINE_ADDON_OPTS = {'releaseTester' : True, 'plugins': 'zips/plugins.zip'}

This file would be read on the fly at the beginning of a talos job and download the required files.

The added value to this are the following:
* a talos landing is visible to every one
* a performance variance can be reported by the regression emails
* it can be backed out without relengs
* you can use the try server to test new bundles and even use the compare-talos tool
* every push has a matching set of bundles that do not vary after the push (right now if we switch anything some test jobs for a changeset can grab the newer files)

For minor changes you would probably want to land the change with the comment DONTBUILD.

This system would improve both minor and major changes by not needing downtimes (hopefully) for the major changes (as the push would be blamed for regressions)

For now I would prefer to keep all the bundles in build.m.o by asking releng to pull them in. We should ensure that the json file (or whatever we use) would be matching integrity rules and not allow to grab files from places that are not build.m.o (perhaps in the future it would make sense to be more flexible).
So this makes a lot of sense, except things like tp5 and plugins.zip are highly likely to cause regressions or need rebaselining (maybe even a reconfig to absorb config.py changes).  likewise with pageloader.xpi

Most of the changes we make are minor changes to talos.zip (addons, xperf, remote) which do not affect talos proper or the numbers.  

Also when releng does a downtime would the process be the same?  I.e. we can look at a hg revision history and track what version we were using.

So I am envisioning a talos-locations.txt:
http://hg.mozilla.org/projects/talos/archive/31415926535/talos.zip
http://hg.mozilla.org/projects/talos/archive/31415926535/tp5.zip
http://hg.mozilla.org/projects/talos/archive/31415926535/plugins.zip
http://hg.mozilla.org/projects/talos/archive/31415926535/dirtyprofiles.zip
http://hg.mozilla.org/projects/talos/archive/31415926535/pageloader.xpi

So the work would need to be done in buildbot to accommodate these referenced .zip files instead of being hardcoded.  For talos, can we access http://mxr.mozilla.org/mozilla-central/source/testing/talos/talos-locations.txt?  We don't download a tests.zip file or anything, so I want to make sure we don't overlook that small detail.
(In reply to Joel Maher (:jmaher) from comment #6)
> 
> Also when releng does a downtime would the process be the same?  I.e. we can
> look at a hg revision history and track what version we were using.
I believe we could actually not have downtimes but we will see once we get there.
 
> So the work would need to be done in buildbot to accommodate these
> referenced .zip files instead of being hardcoded.  For talos, can we access
> http://mxr.mozilla.org/mozilla-central/source/testing/talos/talos-locations.
> txt?  We don't download a tests.zip file or anything, so I want to make sure
> we don't overlook that small detail.

We could access through hg and raw:
http://hg.mozilla.org/mozilla-central/raw-file/%(revision)s/testing/talos/talos-locations.
Priority: P4 → P3
Depends on: 701484
I will be tackling this either this week or next one.
Priority: P3 → P2
Attached patch [poc] talos_from_code.py (obsolete) — Splinter Review
Feedback wanted:
* is testing/talos/talos.json a good location for the json file?
* is talos_from_code.py a good name for the script?
* in the future we will rewrite talos on mozharness but for now where should I put the script? should I land it on mozharness?

#####
This script downloads a talos.json file [1] which indicates to grab the talos.zip from a given location [2] (it can be from anywhere at this point).

This script will need to:
* retrieve the script to the talos machine (I don't care where)
* ask TalosFactory to give us the tree's revision
* run the script by passing [3]
** this will download the specified talos.zip on the json file

Needed:
* accept calling the script with the URL to the json file
* hack TalosFactory to download and run the script 

Optional:
* add progress for downloading it
* optimize to not download the same talos.zip twice
* extend the script to download all artifacts for a given talos job (pageloader.xpi, plugins, etc)

[1] http://people.mozilla.com/~armenzg/talos/talos.json
[2] http://hg.mozilla.org/build/talos/archive/5ae66b11330f.zip
[3] http://hg.mozilla.org/mozilla-central/raw-file/c120734d20ba/testing/talos/talos.json
so if I put a talos.zip on my people account I see there being problems with every talos machine downloading talos.zip from there.

otherwise I like this, it will be much easier to change things around then.

as for mozharness, I know we have a script for remote and desktop versions of talos ready to go.  It needs some more love and deployment when there is bandwidth.  What I don't know is if the mozharness stuff will allow dynamic updating of a talos.zip.
(In reply to Joel Maher (:jmaher) from comment #10)
> as for mozharness, I know we have a script for remote and desktop versions
> of talos ready to go.  It needs some more love and deployment when there is
> bandwidth.  What I don't know is if the mozharness stuff will allow dynamic
> updating of a talos.zip.

Talos on mozharness should be able to both grab a talos zip or a specified revision from source when it's done.  We should be able to specify which talos zip, have that zip be different across branches if desired, and we can edit the config files on the fly to point at a different location if, say, people.m.o goes down.

However, re: the other bit, I do not like the idea of people.m.o being a critical piece of infrastructure.  It's not guaranteed to be up 24/7.  If we serve the talos zip off there for all talos jobs, all talos jobs will go red when people.m.o goes down.
(In reply to Joel Maher (:jmaher) from comment #10)
> so if I put a talos.zip on my people account I see there being problems with
> every talos machine downloading talos.zip from there.
> 
On the try server I assume it would be good to allow it to happen.
I assume people would use reviews to land this on the development repos.

Would it makes sense to modify the script to stop the job if something different than http://build.m.o/talos/zips is used for non-try repos? We could add a cgi to upload official talos without releng's involvement.
this is all sounding good.  Try server for anything; otherwise build.m.o via upload script for production.
Attached patch talos_from_code.py - v0.1 (obsolete) — Splinter Review
What do you think so far?

##############################
Armens-MacBook-Air:talos armenzg$ python talos_from_code.py --talos_json_url http://localhost/~armenzg/talos.json --branch try; echo $?
INFO: talos.json URL:  http://localhost/~armenzg/talos.json
INFO: talos.zip URL: 'http://localhost/~armenzg/talos.zip'
0
Armens-MacBook-Air:talos armenzg$ python talos_from_code.py --talos_json_url http://localhost/~armenzg/talos.json --branch wrong-branch; echo $?
INFO: talos.json URL:  http://localhost/~armenzg/talos.json
INFO: talos.zip URL: 'http://localhost/~armenzg/talos.zip'
ERROR: You have tried to download a talos.zip from a location different thanhttp://build.mozilla.org/talos/zips
1
Armens-MacBook-Air:talos armenzg$ cat ~/Sites/talos.json 
{
    "talos_zip": "http://localhost/~armenzg/talos.zip"
}
Attachment #578400 - Attachment is obsolete: true
Attachment #580561 - Flags: feedback?(jmaher)
Attachment #580561 - Flags: feedback?(coop)
Attached patch [poc] factory/pu changes (obsolete) — Splinter Review
Attachment #580562 - Flags: feedback?(coop)
coop, what do you think of putting the file in:
http://hg.mozilla.org/build/tools/file/default/scripts/talos/talos_from_code.py ?

Does the script name seem good?
Comment on attachment 580561 [details] [diff] [review]
talos_from_code.py - v0.1

>#! /usr/bin/env python
>import argparse
>import json

do we have json available on our boxes?  I know we don't for talos proper in 2.4, that is why we use pyyaml.  Also simplejson seems to be a common fallback:
try:
  import json
except:
  import simplejson as json


>def get_filename_from_url(url):
>    '''
>    This returns the filename of the file we're trying to download
>    '''
>    parsed = urlparse.urlsplit(url.rstrip('/'))
>    if parsed.path != '':
>        return parsed.path.rsplit('/', 1)[-1]
>    else:
>        file_name = parsed.netloc

we don't return anything in the else case.

>
>def download_file(url, saveAs=None):
>    '''
>    It downloads a file from the URL indicated and can be saved locally with
>    a different name if needed.
>    '''
>    req = urllib2.Request(url)
>    filename = get_filename_from_url(url)
>    f = urllib2.urlopen(req)
>    local_file = open(saveAs if saveAs else filename, 'wb')

this inline if/else blocks do not work in python 2.4.  It would be nice to put this in a try/catch block and return a value.

>if __name__ == '__main__':

normal style at Mozilla is to put this in a main() function and call it as so:
if __name__ == '__main__':
  main()

>    # retrieve the talos.zip as "talos.zip"
>    download_file(talos_zip_url, "talos.zip")

we need to determine if this was successful or not.  Maybe the network resource is not available.  Should we check the integrity of the talos.zip before exiting this fuction?

Overall, this is great and seems like it will work except for some corner cases.  I put comments in regarding python2.4 since that is what we run for talos, I believe python2.5+ will work just fine.
Attachment #580561 - Flags: feedback?(jmaher) → feedback+
(In reply to Joel Maher (:jmaher) from comment #17)
> Comment on attachment 580561 [details] [diff] [review]
> talos_from_code.py - v0.1
> 
> >#! /usr/bin/env python
> >import argparse
> >import json
> 
> do we have json available on our boxes?  I know we don't for talos proper in
> 2.4, that is why we use pyyaml.  Also simplejson seems to be a common
> fallback:

I believe aki has just explored this for mozharness but I can't recall what bug he documented the results in - you may want to ask him.  I believe the answer was "yes, but..."


> 
> Overall, this is great and seems like it will work except for some corner
> cases.  I put comments in regarding python2.4 since that is what we run for
> talos, I believe python2.5+ will work just fine.

really? 2.4 - gosh I hope not.  I thought the minimum was 2.5 now with 2.6 becoming more common.
yeah, desktop talos runs on python2.4, we are working on a method to install the pywin32 modules automatically so we can upgrade to 2.5 (and then 2.6 afterwards will be a piece of cake since we are doing the work now)
Attached patch talos.json (obsolete) — Splinter Review
Attachment #581770 - Flags: review?(coop)
Attachment #580561 - Attachment is obsolete: true
Attachment #580561 - Flags: feedback?(coop)
Attachment #581773 - Flags: review?(jmaher)
Attachment #581773 - Flags: review?(coop)
Attachment #580562 - Attachment is obsolete: true
Attachment #580562 - Flags: feedback?(coop)
Attachment #581774 - Flags: review?(coop)
Comment on attachment 581773 [details] [diff] [review]
talos_from_code.py - take 1

Review of attachment 581773 [details] [diff] [review]:
-----------------------------------------------------------------

thanks, this looks good.
Attachment #581773 - Flags: review?(jmaher) → review+
Attachment #581770 - Flags: review?(coop) → review+
Comment on attachment 581772 [details] [diff] [review]
buildbot-config changes to run talos_from_code on mozilla-central

Review of attachment 581772 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with nits addressed.

::: scripts/talos/talos_from_code.py
@@ +9,5 @@
> +import urlparse
> +
> +def main():
> +    parser = OptionParser()
> +    parser.add_option("--talos_json_url", dest="talos_json_url", type="string",

I think it might be standard practice to use, e.g., --talos-json-url for the command-line option vs. --talos_json_url, and then use talos_json_url internally.

@@ +20,5 @@
> +
> +    # json file with info on which talos.zip to use
> +    # the json file URL should look like this:
> +    #  %(repo_path)s/raw-file/%(revision)s/testing/talos/talos.json
> +    jsonFilename = download_file(options.talos_json_url)

Should probably have a try block around this instance of download_file too, just like you do below.
Attachment #581772 - Flags: review?(coop) → review+
Comment on attachment 581773 [details] [diff] [review]
talos_from_code.py - take 1

Review of attachment 581773 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with nits addressed.

::: scripts/talos/talos_from_code.py
@@ +9,5 @@
> +import urlparse
> +
> +def main():
> +    parser = OptionParser()
> +    parser.add_option("--talos_json_url", dest="talos_json_url", type="string",

I think it might be standard practice to use, e.g., --talos-json-url for the command-line option vs. --talos_json_url, and then use talos_json_url internally.

@@ +20,5 @@
> +
> +    # json file with info on which talos.zip to use
> +    # the json file URL should look like this:
> +    #  %(repo_path)s/raw-file/%(revision)s/testing/talos/talos.json
> +    jsonFilename = download_file(options.talos_json_url)

Should probably have a try block around this instance of download_file too, just like you do below.
Attachment #581773 - Flags: review?(coop) → review+
Comment on attachment 581772 [details] [diff] [review]
buildbot-config changes to run talos_from_code on mozilla-central

Looks like you double-posted the python script instead of the config changes.
Attachment #581772 - Attachment is obsolete: true
Attachment #581772 - Flags: review+
Attachment #581774 - Flags: review?(coop) → review+
Attached patch talos.json patchSplinter Review
Now in a patch format.
Attachment #581770 - Attachment is obsolete: true
Attachment #581954 - Flags: review+
Hi,
Could you please land for me attachment 581954 [details] [diff] [review] to mozilla-central or mozilla-inbound?

This is a NPOTB and DONTBUILD.

Thanks!
Keywords: checkin-needed
Whiteboard: [talos] → [talos] please land on inbound or central attachment 581954
https://hg.mozilla.org/integration/mozilla-inbound/rev/53a13d60eaf5
Keywords: checkin-needed
Whiteboard: [talos] please land on inbound or central attachment 581954 → [talos]
Comment on attachment 581954 [details] [diff] [review]
talos.json patch

Hi release-drivers would you please be able to approve this patch to land on 1.9.2, aurora, beta and release?

This is a NPOTB and a DONTBUILD type of patch.
This patch has already landed on mozilla-inbound (.

This file is the new way to configure talos.zip for talos/performance jobs and it needs to land on every branch. Nevertheless, this does *not* block me as I have branch granularity.
Attachment #581954 - Flags: approval1.9.2.25?
Attachment #581954 - Flags: approval-mozilla-release?
Attachment #581954 - Flags: approval-mozilla-beta?
Attachment #581954 - Flags: approval-mozilla-aurora?
Comment on attachment 581773 [details] [diff] [review]
talos_from_code.py - take 1

http://hg.mozilla.org/build/tools/rev/667d675c733c

I have only landed the two patches that do not turn this new system live.
Attachment #581773 - Flags: checked-in+
Comment on attachment 581774 [details] [diff] [review]
buildbotcustom changes to run talos_from_code.py

This change can also land as it is a no-op since we default to the old version.
http://hg.mozilla.org/build/buildbotcustom/rev/4aeb797aea9a
Once the patch goes from mozilla-inbound into mozilla-central we can land this.

There is no rush to land this so I'll wait until Wednesday morning.

On Thursday we can land a change that will enable talos_from_source_code if all approvals arrive on time and nothing goes wrong on Wednesday.
Attachment #582083 - Flags: review?(coop)
https://hg.mozilla.org/mozilla-central/rev/53a13d60eaf5
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Thanks Ed for landing it!

There's one more patch to land on the releng side so reopening.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #581774 - Flags: checked-in+
Comment on attachment 581954 [details] [diff] [review]
talos.json patch

[Triage Comment]
Approving for Aurora and Beta.

As for release or 1.9.2 (both going out on the 20th), I understand that this won't cause build regressions. Is there any opportunity for this to cause test regressions or changes in behavior though? If so, let's wait until after we ship FF3.6.25/9.
Attachment #581954 - Flags: approval-mozilla-beta?
Attachment #581954 - Flags: approval-mozilla-beta+
Attachment #581954 - Flags: approval-mozilla-aurora?
Attachment #581954 - Flags: approval-mozilla-aurora+
Comment on attachment 581954 [details] [diff] [review]
talos.json patch

[Triage Comment]
Just spoke with Armen who clarified that this would not affect automated testing. Approving for all branches requested.
Attachment #581954 - Flags: approval1.9.2.25?
Attachment #581954 - Flags: approval1.9.2.25+
Attachment #581954 - Flags: approval-mozilla-release?
Attachment #581954 - Flags: approval-mozilla-release+
Hi checkin-needed hero,
Would you please be able to land attachment 581954 [details] [diff] [review] on aurora, beta, release and 1.9.2?
Thanks and sorry it is so many branches :(
Keywords: checkin-needed
Whiteboard: [talos] → [talos] please land attachment 581954 on aurora, beta, release and 1.9.2
Attachment #582083 - Flags: review?(coop) → review+
Comment on attachment 582083 [details] [diff] [review]
buildbot-configs - enable talos_from_source code only for mozilla-central

checked-in on default:
http://hg.mozilla.org/build/buildbot-configs/rev/27198f5bff61

This will be picked up next time we reconfig our masters (next scheduled time is Thursday).
Attachment #582083 - Flags: checked-in+
This would exempt mozilla-beta, mozilla-release and mozilla-1.9.2 (since talos.json has not been landed there yet).
Attachment #583252 - Flags: review?(coop)
Attachment #583252 - Flags: review?(coop) → review+
Comment on attachment 583252 [details] [diff] [review]
enable talos_from_source_code by default and exempt some release branches

Landed on "default":
http://hg.mozilla.org/build/buildbot-configs/rev/073d4cd32699
Attachment #583252 - Flags: checked-in+
Attachments 582083 and 583252 made it to production tonight.
I backed this out as it was causing reds.
I will be looking deeper into it tomorrow but we were having "simplejson" is missing for mac.
I thought I tested it properly on staging for all platforms.
I'm still running it on staging but manual testing has proven to be good.
Comment on attachment 583252 [details] [diff] [review]
enable talos_from_source_code by default and exempt some release branches

This was backed-out yesterday.
Attachment #583252 - Flags: checked-in+ → checked-in-
Attachment #583615 - Flags: review?(coop)
I really have to unify those two sections.

talos_from_source_code has worked on all platforms on staging.

I had not tested it correctly previously.
Attachment #583786 - Flags: review?(coop)
Comment on attachment 583615 [details] [diff] [review]
use a python that has simplejson

Review of attachment 583615 [details] [diff] [review]:
-----------------------------------------------------------------

::: process/factory.py
@@ -7067,5 @@
> -            return "/home/cltbld/bin/python"
> -        elif (platform == "leopard"):
> -            return "/usr/bin/python"
> -        elif (platform in ("snowleopard", "lion")):
> -            return "/Users/cltbld/bin/python"

I have never seen this code before, and now I cannot unsee it. My poor, virgin eyes.
Attachment #583615 - Flags: review?(coop) → review+
Attachment #583786 - Flags: review?(coop) → review+
Comment on attachment 583615 [details] [diff] [review]
use a python that has simplejson

http://hg.mozilla.org/build/buildbotcustom/rev/eee5d445b9b6
Attachment #583615 - Flags: checked-in+
Comment on attachment 583786 [details] [diff] [review]
enable for normal and project branches

http://hg.mozilla.org/build/buildbot-configs/rev/f03744c11c9d
Attachment #583786 - Flags: checked-in+
The patches that landed on "default" are now on production and the masters have been reconfigured.
This is completed.
http://armenzg.blogspot.com/2011/12/taloszip-talosjson-and-you.html

If anyone cares to land the approved patches on the releases branches please go ahead.
Otherwise in 11.5 weeks only 1.9.2 would be different that the other ones.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Adjusting flags/whiteboard post uplift.

Armen, is this still wanted on beta? Presume 1.9.2 is wontfix at this point, given ESR - in which case presume wanted for ESR 10.1? 

I don't have a beta or 1.9.2 tree checked out (my poor HD is dying, so I've culled as many repos as possible), or I'd land these for you. I suspect most people who look at check-neededs have fixed bugs filtered out, so asking directly on #developers might be the best way forwards.
status1.9.2: --- → ?
Whiteboard: [talos] please land attachment 581954 on aurora, beta, release and 1.9.2 → [talos] please land attachment 581954 on beta and 1.9.2
Thanks edmorley. I found out I can land it myself plus I just kept the system disabled for those branches until we merge there.
Keywords: checkin-needed
Whiteboard: [talos] please land attachment 581954 on beta and 1.9.2 → [talos]
Whiteboard: [talos] → [talos][qa-]
We then moved the file to the source tree in bug 721822.
Blocks: 721822
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.