Closed Bug 398192 (trytalos) Opened 12 years ago Closed 12 years ago

set-up talos slaves to test try server builds

Categories

(Release Engineering :: General, defect, P3)

defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: bhearsum, Unassigned)

References

Details

Attachments

(1 file, 4 obsolete files)

See summary.
Priority: -- → P3
Any idea what we want to do with the data collected by a try server perf run?  Should be just be exporting to csv?  Perhaps graphing all try server runs on the same line?  Do want want to do multiple runs to attempt to get more dependable results?
I think it should be sent to the graph server to make it accessible to everybody (rather than people needing to pass around csv files). For simplicity, we could put it all on the same line, but graphing just that line will not be very useful. Additionally, some identifier will need to be shown on the graph server so people can find their builds. The final numbers could be shown on the Tinderbox page too, along with the persons name.

Runs == cycles? I think we should be doing at least 5 cycles on try builds. Ideally, we should do the same amount that we do for normal trunk builds to stay consistent.
For cycles, I'm referring to the amount of times to run the talos suite in its entirety - since even with going through the web page set itself multiple times there are still high runs and low runs.  In the past when I've done testing for a single build I've run it through talos between 5 and 20 times to try and get a feel for what the average talos results would be.

I'd be concerned if we are going to consider a perf run on a try server as a real method of determining if a patch is good/bad for performance we need to be sure that the number reported is meaningful.  Thus, multiple full talos runs on the same build.  On the down side this does two things: increases the time to test a given patch and makes it difficult to graph the output.
Priority: P3 → P2
Attached patch talos try buildbot configs (obsolete) — Splinter Review
Working Buildbot configs w/ Try support. Not production grade.
Attachment #284259 - Attachment is obsolete: true
Priority: P2 → P3
Priority: P3 → P2
Priority: P2 → P3
Whiteboard: not working on this right now
Depends on: 401116
Assignee: bhearsum → nobody
Status: ASSIGNED → NEW
Whiteboard: not working on this right now
Status: NEW → ASSIGNED
Alias: trytalos
Attached patch different try perfmaster patch (obsolete) — Splinter Review
This is a modified version of the existing perfmaster that monitors the MozillaTry tinderbox tree then grabs the latest build for each platform. Currently in testing mode on qm-rhel02 (waterfall on port 2007).

Linux and Mac probably need additional cleanup before they can run.
Attachment #284638 - Attachment is obsolete: true
Attachment #293191 - Flags: review?(bhearsum)
Comment on attachment 293191 [details] [diff] [review]
different try perfmaster patch

>Index: PerfConfigurator.py
>===================================================================

The only PerfConfigurator.py in use is in testing/performance/talos, right? Let's avoid putting it in here -- I think it will be good to avoid creating another one to maintain.

(Related to this, I filed bug 408684 to get it out of the perfmaster configs.)

>Index: master.cfg
>===================================================================

>+import os.path
>+# from buildbot.changes.freshcvs import FreshCVSSource

Might as well just remove this.

>+from buildbot.process import factory
>+from buildbot.scheduler import Scheduler, Periodic

Can remove Periodic, too.

>+from buildbot.status import html
>+from buildbot import locks
>+from buildbot.steps.transfer import FileDownload
>+from buildbot.steps.shell import ShellCommand
>+
>+# from auth import authlist, debugPassword

And this line.

>+
>+import perfrunner
>+reload(perfrunner)
>+from perfrunner import *
>+
>+###
>+### Tinderbox builder names and build directories
>+###
>+WIN32_TRUNK_BUILDER="Try server win32 builder"
>+LINUX_TRUNK_BUILDER="Linux fx-linux-tbox Depend Nightly

This should be the right builder :)

>+MAC_TRUNK_BUILDER="Try server mac builder"
>+TRUNK_BUILDDIR="https://build.mozilla.org/tryserver-builds/?C=M;O=D"
>+
>+CVSROOT=":pserver:anonymous@cvs.mozilla.org:/cvsroot"
>+
>+
>+# This is the dictionary that the buildmaster pays attention to. We also use
>+# a shorter alias to save typing.
>+c = BuildmasterConfig = {}
>+
>+##
>+## Misc Config
>+##
>+
>+c['debugPassword'] = "mozilla"
>+#c['manhole'] = Manhole(9999, "admin", "password")
>+c['projectName'] = "Talos"
>+c['projectURL'] = "http://quality.mozilla.org/en/projects/automation/talos"
>+c['buildbotURL'] = "http://qm-rhel02.mozilla.org:2007"
>+c['slavePortnum'] = 9985
>+
>+##
>+## Slaves
>+##
>+
>+c['bots'] = [("qm-pxp-try01", "w1nd3rs"),
>+             ("qm-ptiger-try01", "mac1nt0sh"),
>+             ("qm-pubuntu-try01", "l1nux")]
>+

In Build, we've been keeping the passwords out of CVS. Is qm-rhel02 behind the VPN? If it is, I'm not too concerned about passwords being in CVS. If it isn't we definitely need to keep them out.


>+# this is the local TinderboxMailNotifier
>+# c['status'].append(TinderboxMailNotifier(
>+#                        fromaddr="rcampbell@mozilla.com",
>+#                        tree="Firefox",
>+#                        extraRecipients=["tinderbox-daemon@tinderbox.mozilla.org"],
>+#                        relayhost="smtp.mozilla.org",
>+#                        builders=["WINNT 5.1 mini talos try trunk"],
>+#                        useChangeTime=True,
>+#                        logCompression="bzip2"))
>+

I think this one can go completely; these builders should never be reporting to Firefox. This means that you should be able to dump the local tinderboxmailnotifier, too.


>Index: perfrunner.py
>===================================================================
>+class LatestFileURL(ApacheDirectory): 

Is it possible to update perfmaster with these two classes for consistencies sake? I think they _should_ get put into buildbot-custom eventually but we don't need to solve that here.


Overall, this is really good. Your approach is much better than mine! This really exemplifies why we need to get buildbot-custom going, though. (And be better about not doing local patches, like for tinderboxmailnotifier here.) That sort of thing probably can't be fixed as part of this though, it's a larger problem.

Going to r- because I think it's important to keep the duplicate PerfConfigurator/tinderboxmailnotifier out. Like I said though, good work!
Attachment #293191 - Flags: review?(bhearsum) → review-
Comment on attachment 293191 [details] [diff] [review]
different try perfmaster patch

>+win32_trunk_steps.addStep(FileDownload,
>+                           mastersrc="configs/config-win32-pxp.py",
>+                           slavedest="sample.config",
>+                           workdir="talos/")

One additional thing: This should be a yaml file, not a python one ;). I don't think we need these python config files at all (iirc, they are only necessary on the blades). I'm going to make this change on the master and see if this fixes RunPerfTests.
Using configs/sample.config this appears to be working fine!
Assignee: nobody → bhearsum
Status: ASSIGNED → NEW
I'm busy with other things right now, but I should start tackling this again in a week or two.
Status: NEW → ASSIGNED
um, no worries, I was going to take a look at it this week. Likely tomorrow.
Attached patch try server talos buildbot master (obsolete) — Splinter Review
This an update to your patch, Rob. It's working on Windows and Linux. The main update I had to do was adjusting the filenameSearchString on Linux. I updated it for Mac, too, which I believe will work once the mac slave gets re-imaged.

This patch only includes the necessary files for running try server talos. There was a lot of unused scripts, configuration files, etc. in the tryperfmaster directory. I got rid of those.
Attachment #293191 - Attachment is obsolete: true
Attachment #298351 - Flags: review?(rcampbell)
Comment on attachment 298351 [details] [diff] [review]
try server talos buildbot master

Removing review for a now, there's a couple of issues I just noticed..
Attachment #298351 - Flags: review?(rcampbell)
Depends on: 413382
I'm going to toss this back into the pool. With the Beta 3 release coming up I'm going to be pretty busy the next couple of weeks. I may have time to look at it a bit, not sure.
Assignee: bhearsum → nobody
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
We talked about this bug in build meeting earlier this week. Unclear what exactly is left to do here? Note: this is a Q1 goal.
Reassigning component, so it shows up in our triage queue.
Status: ASSIGNED → NEW
Component: Testing → Build & Release
Product: Core → mozilla.org
QA Contact: testing → build
Version: Other Branch → other
We've got to land the Buildbot configs and get more slaves hooked up, but other than that we're good to go (I think). Alice, how do we get more machines allocated for this? Also, can you confirm there's nothing else we need to do?
FWIW if we're running short on slaves, I'd prefer to have tryserver slaves than actionmonkey slaves... we can always get numbers from actionmonkey by submitting the builds to tryserver manually.
Alright. I'm not sure we've purchased mini's allocated for Moz2 yet, have we?
We have minis allocated to moz2 - pretty much whatever machines I have left at this point are for moz2.  But, there is still some wiggle room in terms of leftovers/idle machines that we can beef up try talos.

What is left here is final configuration issues (there are configuration settings that don't stick in the images that have to be manually set) - so I'd need to look over the machines.  I also need to know if these were taken from the first or second set of minis.  If we want more machines I have to be sure that I have machines with matching specs, and the second set is slightly more powerful than the first.
Depends on: 424897
There have been complaints made to me that the talos try numbers are untrustworthy.  I think that this stems from them only running test code, so there is no real baseline to compare with.  They _should be comparable to other production machines that are currently reporting but it can be hard to see a real regression amongst test result variance.

There's some suggestion that we should alternate test builds with standard builds.  I'm not sure if that's the best solution as we'd then have a mix of data reporting to a single graph line (though, there are probably ways of hacking around that) along with adding delays to submitting try builds.

I'm mostly just concerned that we won't get much pick up on these machines if people don't think that they collect valuable data.
Alice, I don't think graphing the tryserver talos runs against eachother is going to be useful at all. What I think would be helpful is a graph that overlays a single X mark for the current test run against the graph for CVS-trunk, so that you can easily compare your tryserver test result with the equivalent chart.

But really, printing the numbers to the MozillaTry tinderbox is probably the most important.
These are the exact configs used by the talos try server. I updated it to send status to MozillaTry instead of MozillaTest. Let's get these suckers checked in.
Attachment #298351 - Attachment is obsolete: true
Attachment #314382 - Flags: review?(rcampbell)
Attachment #314382 - Flags: review?(rcampbell) → review+
Comment on attachment 314382 [details] [diff] [review]
[checked in] tryperfmaster configs

RCS file: /cvsroot/mozilla/tools/buildbot-configs/testing/talos/tryperfmaster/master.cfg,v
done
Checking in master.cfg;
/cvsroot/mozilla/tools/buildbot-configs/testing/talos/tryperfmaster/master.cfg,v  <--  master.cfg
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/tools/buildbot-configs/testing/talos/tryperfmaster/perfrunner.py,v
done
Checking in perfrunner.py;
/cvsroot/mozilla/tools/buildbot-configs/testing/talos/tryperfmaster/perfrunner.py,v  <--  perfrunner.py
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/tools/buildbot-configs/testing/talos/tryperfmaster/tinderboxpoller.py,v
done
Checking in tinderboxpoller.py;
/cvsroot/mozilla/tools/buildbot-configs/testing/talos/tryperfmaster/tinderboxpoller.py,v  <--  tinderboxpoller.py
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/tools/buildbot-configs/testing/talos/tryperfmaster/waterfall.css,v
done
Checking in waterfall.css;
/cvsroot/mozilla/tools/buildbot-configs/testing/talos/tryperfmaster/waterfall.css,v  <--  waterfall.css
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/tools/buildbot-configs/testing/talos/tryperfmaster/configs/sample.config,v
done
Checking in configs/sample.config;
/cvsroot/mozilla/tools/buildbot-configs/testing/talos/tryperfmaster/configs/sample.config,v  <--  sample.config
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/tools/buildbot-configs/testing/talos/tryperfmaster/scripts/generate-tpcomponent.py,v
done
Checking in scripts/generate-tpcomponent.py;
/cvsroot/mozilla/tools/buildbot-configs/testing/talos/tryperfmaster/scripts/generate-tpcomponent.py,v  <--  generate-tpcomponent.py
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/tools/buildbot-configs/testing/talos/tryperfmaster/scripts/installdmg.sh,v
done
Checking in scripts/installdmg.sh;
/cvsroot/mozilla/tools/buildbot-configs/testing/talos/tryperfmaster/scripts/installdmg.sh,v  <--  installdmg.sh
initial revision: 1.1
done
Attachment #314382 - Attachment description: tryperfmaster configs → [checked in] tryperfmaster configs
The initial infrastructure is up and running. New bugs should be filed for future issues.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Ben, do you have an example URL where I can find a graph of such a Talos test?
Thanks, Ben. It looks great => Verified.
Status: RESOLVED → VERIFIED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.