Investigate if Talos can use pool-of-slaves model or not

RESOLVED FIXED

Status

Release Engineering
General
P2
normal
RESOLVED FIXED
9 years ago
4 years ago

People

(Reporter: joduinn, Assigned: catlee)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(4 attachments, 1 obsolete attachment)

Talos machines now reboot ready-for-use (bug#447686), and autoreboot frequently (bug#463020). This has significantly reduced the drift in results reported by different Talos machines running the same tests on the same project branch. 

With those changes in place, now the open questions are: 

1) has the drift reduced enough that we could now happily accept the result from any one of the set of Talos machines, not needing to manually review all 3 results and eyeballing for differences? A quick manual eyeball of graphs shows the numbers seem close, but we should confirm this. 

2) can we now equally accept a perf test result from any Talos machine? If yes, then we could use the pool-of-slaves model for Talos, just like we do for build and unittests. Its unclear if there are different criteria for perf tests machines, and if so, what those criteria are. At this point, seems easiest to just try this, hence this bug. 


Lets try running a small pool-of-talos-slaves on staging Talos, and compare results with the current staging/production dedicated-to-a-branch Talos slaves.


(If all this works, there are some questions about what/if any graphserver changes would need to be made to handle data for the one branch and suite coming from different machines. But lets burn that bridge if/when we get to it!)
Depends on: 476100
(In reply to comment #1)
> 1) has the drift reduced enough that we could now happily accept the result
> from any one of the set of Talos machines, not needing to manually review all 3
> results and eyeballing for differences? A quick manual eyeball of graphs shows
> the numbers seem close, but we should confirm this. 

Is there any way to do this programatically? Filed bug#476102 to track.
Assignee: nobody → catlee
Priority: -- → P3
I'm a little confused as to how 10 machines is going to re-create a reasonable pool-of-slaves environment.  Won't that just leave 2 machines per-platform?  Doesn't that just end up being one less than the current 3 machines per-platform per-branch?

Might be reasonable to take all ten as being on the same OS, then have them test all branches, and maybe use the multi-build scheduler to generate more testing request per build.  That way you could generate a lot of tests over a lot of machines over a lot of branches, and it would look more like it would in production.
(In reply to comment #2)
> I'm a little confused as to how 10 machines is going to re-create a reasonable
> pool-of-slaves environment.  Won't that just leave 2 machines per-platform? 
> Doesn't that just end up being one less than the current 3 machines
> per-platform per-branch?

Sorry, thought we covered this during goal settings. 

The idea here is to run these 10 pooled-talos-slaves, *at the same time as* the existing dedicated-talos-slaves-per-branch to see how the talos numbers compare.

Obviously, these 10 slaves are not enough to do all talos runs on all 3 branches. Whats important here is to see if these 10 pooled-slaves return same results as the dedicated slaves. This experiment will confirm whether we can safely use pooled-slaves for talos perf testing or whether we must use dedicated machines.

Having two machines per o.s. seemed the smallest number of machines we could use to make this experiment meaningful, hence 10 pooled-talos-slaves.

Hope that clarifies?
Blocks: 480197
(Assignee)

Comment 4

8 years ago
Created attachment 364579 [details] [diff] [review]
pool of slaves configuration for talos staging

This is the same as attachment 364172 [details] [diff] [review], but with just the pool of slaves configuration.
Attachment #364579 - Flags: review?(anodelman)
(Assignee)

Updated

8 years ago
Attachment #364579 - Attachment is obsolete: true
Attachment #364579 - Flags: review?(anodelman)
(Assignee)

Comment 5

8 years ago
Created attachment 364583 [details] [diff] [review]
pool of slaves configuration for talos staging
Attachment #364583 - Flags: review?(anodelman)
Attachment #364583 - Flags: review?(anodelman) → review+
(Assignee)

Comment 6

8 years ago
Comment on attachment 364583 [details] [diff] [review]
pool of slaves configuration for talos staging

changeset:   969:32de31330849
Attachment #364583 - Flags: checked‑in+ checked‑in+
(Assignee)

Updated

8 years ago
Depends on: 483684
(Assignee)

Comment 7

8 years ago
Created attachment 367835 [details] [diff] [review]
Add new pool slaves to graph server database
Attachment #367835 - Flags: review?(anodelman)
Attachment #367835 - Flags: review?(anodelman) → review+
(Assignee)

Comment 8

8 years ago
Created attachment 367853 [details] [diff] [review]
Use build properties for installdmg command

This fixes a bug where the installdmg step would not use the correct filename for the build.
Attachment #367853 - Flags: review?(anodelman)
(Assignee)

Comment 9

8 years ago
Comment on attachment 367835 [details] [diff] [review]
Add new pool slaves to graph server database

changeset:   200:0ea7f315d2fc
Attachment #367835 - Flags: checked‑in+ checked‑in+
Attachment #367853 - Flags: review?(anodelman) → review+
(Assignee)

Comment 10

8 years ago
Comment on attachment 367853 [details] [diff] [review]
Use build properties for installdmg command

changeset:   1023:4331e3429d64
Attachment #367853 - Flags: checked‑in+ checked‑in+
(Assignee)

Comment 11

8 years ago
Some modifications are necessary on the slave to be able to handle multiple branches.

A directory under talos-slave called talos-data must be created.  E.g. on windows we'll have c:\talos-slave\talos-data and on mac and linux we'll have ~/talos-slave/talos-data.

The apache config needs to be updated to point to this directory.  So on windows we'll have DocumentRoot be set to c:\talos-slave\talos-data\talos, and on linux we'll have it set to /home/mozqa/talos-slave/talos-data/talos.
(Assignee)

Updated

8 years ago
Depends on: 483932
(Assignee)

Comment 12

8 years ago
Created attachment 368528 [details] [diff] [review]
Add new pool slaves to graph server database
Attachment #368528 - Flags: review?(anodelman)
(Assignee)

Updated

8 years ago
Priority: P3 → P2
Attachment #368528 - Flags: review?(anodelman) → review+
(Assignee)

Comment 13

8 years ago
Comment on attachment 368528 [details] [diff] [review]
Add new pool slaves to graph server database

changeset:   204:0c52bf1b74b1
Attachment #368528 - Flags: checked‑in+ checked‑in+
(Assignee)

Comment 14

8 years ago
After post-Q1 discussions, we've concluded that Talos should be able use a pool-o-slaves model.
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED

Updated

8 years ago
Component: Release Engineering: Talos → Release Engineering
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.