Last Comment Bug 476099 - Investigate if Talos can use pool-of-slaves model or not
: Investigate if Talos can use pool-of-slaves model or not
Status: RESOLVED FIXED
:
Product: Release Engineering
Classification: Other
Component: Other (show other bugs)
: other
: All All
: P2 normal (vote)
: ---
Assigned To: Chris AtLee [:catlee]
:
Mentors:
Depends on: 474348 476100 483684 483932
Blocks: 480197
  Show dependency treegraph
 
Reported: 2009-01-29 20:58 PST by John O'Duinn [:joduinn] (please use "needinfo?" flag)
Modified: 2013-08-12 21:54 PDT (History)
5 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
pool of slaves configuration for talos staging (5.92 KB, patch)
2009-02-27 13:25 PST, Chris AtLee [:catlee]
no flags Details | Diff | Review
pool of slaves configuration for talos staging (14.29 KB, patch)
2009-02-27 14:01 PST, Chris AtLee [:catlee]
anodelman: review+
catlee: checked‑in+
Details | Diff | Review
Add new pool slaves to graph server database (2.19 KB, patch)
2009-03-17 13:04 PDT, Chris AtLee [:catlee]
anodelman: review+
catlee: checked‑in+
Details | Diff | Review
Use build properties for installdmg command (1.95 KB, patch)
2009-03-17 14:09 PDT, Chris AtLee [:catlee]
anodelman: review+
catlee: checked‑in+
Details | Diff | Review
Add new pool slaves to graph server database (1.04 KB, patch)
2009-03-20 08:22 PDT, Chris AtLee [:catlee]
anodelman: review+
catlee: checked‑in+
Details | Diff | Review

Description John O'Duinn [:joduinn] (please use "needinfo?" flag) 2009-01-29 20:58:32 PST
Talos machines now reboot ready-for-use (bug#447686), and autoreboot frequently (bug#463020). This has significantly reduced the drift in results reported by different Talos machines running the same tests on the same project branch. 

With those changes in place, now the open questions are: 

1) has the drift reduced enough that we could now happily accept the result from any one of the set of Talos machines, not needing to manually review all 3 results and eyeballing for differences? A quick manual eyeball of graphs shows the numbers seem close, but we should confirm this. 

2) can we now equally accept a perf test result from any Talos machine? If yes, then we could use the pool-of-slaves model for Talos, just like we do for build and unittests. Its unclear if there are different criteria for perf tests machines, and if so, what those criteria are. At this point, seems easiest to just try this, hence this bug. 


Lets try running a small pool-of-talos-slaves on staging Talos, and compare results with the current staging/production dedicated-to-a-branch Talos slaves.


(If all this works, there are some questions about what/if any graphserver changes would need to be made to handle data for the one branch and suite coming from different machines. But lets burn that bridge if/when we get to it!)
Comment 1 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2009-01-29 21:29:45 PST
(In reply to comment #1)
> 1) has the drift reduced enough that we could now happily accept the result
> from any one of the set of Talos machines, not needing to manually review all 3
> results and eyeballing for differences? A quick manual eyeball of graphs shows
> the numbers seem close, but we should confirm this. 

Is there any way to do this programatically? Filed bug#476102 to track.
Comment 2 alice nodelman [:alice] [:anode] 2009-01-30 11:06:21 PST
I'm a little confused as to how 10 machines is going to re-create a reasonable pool-of-slaves environment.  Won't that just leave 2 machines per-platform?  Doesn't that just end up being one less than the current 3 machines per-platform per-branch?

Might be reasonable to take all ten as being on the same OS, then have them test all branches, and maybe use the multi-build scheduler to generate more testing request per build.  That way you could generate a lot of tests over a lot of machines over a lot of branches, and it would look more like it would in production.
Comment 3 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2009-02-20 16:16:31 PST
(In reply to comment #2)
> I'm a little confused as to how 10 machines is going to re-create a reasonable
> pool-of-slaves environment.  Won't that just leave 2 machines per-platform? 
> Doesn't that just end up being one less than the current 3 machines
> per-platform per-branch?

Sorry, thought we covered this during goal settings. 

The idea here is to run these 10 pooled-talos-slaves, *at the same time as* the existing dedicated-talos-slaves-per-branch to see how the talos numbers compare.

Obviously, these 10 slaves are not enough to do all talos runs on all 3 branches. Whats important here is to see if these 10 pooled-slaves return same results as the dedicated slaves. This experiment will confirm whether we can safely use pooled-slaves for talos perf testing or whether we must use dedicated machines.

Having two machines per o.s. seemed the smallest number of machines we could use to make this experiment meaningful, hence 10 pooled-talos-slaves.

Hope that clarifies?
Comment 4 Chris AtLee [:catlee] 2009-02-27 13:25:37 PST
Created attachment 364579 [details] [diff] [review]
pool of slaves configuration for talos staging

This is the same as attachment 364172 [details] [diff] [review], but with just the pool of slaves configuration.
Comment 5 Chris AtLee [:catlee] 2009-02-27 14:01:29 PST
Created attachment 364583 [details] [diff] [review]
pool of slaves configuration for talos staging
Comment 6 Chris AtLee [:catlee] 2009-02-27 15:12:11 PST
Comment on attachment 364583 [details] [diff] [review]
pool of slaves configuration for talos staging

changeset:   969:32de31330849
Comment 7 Chris AtLee [:catlee] 2009-03-17 13:04:44 PDT
Created attachment 367835 [details] [diff] [review]
Add new pool slaves to graph server database
Comment 8 Chris AtLee [:catlee] 2009-03-17 14:09:44 PDT
Created attachment 367853 [details] [diff] [review]
Use build properties for installdmg command

This fixes a bug where the installdmg step would not use the correct filename for the build.
Comment 9 Chris AtLee [:catlee] 2009-03-17 14:11:38 PDT
Comment on attachment 367835 [details] [diff] [review]
Add new pool slaves to graph server database

changeset:   200:0ea7f315d2fc
Comment 10 Chris AtLee [:catlee] 2009-03-18 06:52:27 PDT
Comment on attachment 367853 [details] [diff] [review]
Use build properties for installdmg command

changeset:   1023:4331e3429d64
Comment 11 Chris AtLee [:catlee] 2009-03-18 10:10:07 PDT
Some modifications are necessary on the slave to be able to handle multiple branches.

A directory under talos-slave called talos-data must be created.  E.g. on windows we'll have c:\talos-slave\talos-data and on mac and linux we'll have ~/talos-slave/talos-data.

The apache config needs to be updated to point to this directory.  So on windows we'll have DocumentRoot be set to c:\talos-slave\talos-data\talos, and on linux we'll have it set to /home/mozqa/talos-slave/talos-data/talos.
Comment 12 Chris AtLee [:catlee] 2009-03-20 08:22:03 PDT
Created attachment 368528 [details] [diff] [review]
Add new pool slaves to graph server database
Comment 13 Chris AtLee [:catlee] 2009-03-24 14:21:47 PDT
Comment on attachment 368528 [details] [diff] [review]
Add new pool slaves to graph server database

changeset:   204:0c52bf1b74b1
Comment 14 Chris AtLee [:catlee] 2009-04-02 11:20:46 PDT
After post-Q1 discussions, we've concluded that Talos should be able use a pool-o-slaves model.

Note You need to log in before you can comment on or make changes to this bug.