Closed
Bug 1192994
Opened 9 years ago
Closed 9 years ago
investigate seta scheduling for talos
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: kmoir, Assigned: catlee)
Details
Attachments
(3 files, 2 obsolete files)
2.72 KB,
patch
|
kmoir
:
review+
|
Details | Diff | Splinter Review |
6.39 KB,
patch
|
kmoir
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
6.39 KB,
patch
|
kmoir
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
requested in irc by RyanVM and jmaher
jmaher kmoir: are there other things we can do to force coalescing on talos jobs on certain branches, maybe not as dynamic as seta
jmaher kmoir: yes, regular intervals
RyanVM|sheriffduty oh, no wonder why
RyanVM|sheriffduty we have 2x more linux64 iX slaves than linx32
jmaher ah, interesting
RyanVM|sheriffduty currently 185 pending linux32-ix test jobs
-->| sydpolk1 (Adium@moz-kqd.cge.136.40.IP) has joined #ateam
RyanVM|sheriffduty jmaher: https://secure.pub.build.mozilla.org/builddata/reports/slave_health/
=-= hwine is now known as hwine-food
jmaher I would rather treat things as similar as possible- so all talos e10s forced coalescing and it will help a lot
RyanVM|sheriffduty 45 total slaves, 40 currently in service
|<-- sydpolk has left moznet (Ping timeout: 121 seconds)
RyanVM|sheriffduty vs. 99/86 for linux64
RyanVM|sheriffduty ouch
RyanVM|sheriffduty of course, we run some other jobs linux64-ix slaves (i.e. Android x86 S4)
RyanVM|sheriffduty either way, I wonder if we could maybe rebalance that a bit
kmoir jmaher: not without changing the scheduler code. Also, scheduler coalescing was not implemented for talos iarc
kmoir iirc
|<-- stanley has left moznet (Ping timeout: 121 seconds)
jmaher kmoir: ok- is that a hard thing to do? I have been patient for 6+ months to get talos e10s scheduled and we need it now- so it is enabled everywhere; I am not sure if this is something we can get in the releng work queue?
kmoir jmaher: I'll open a bug and see how much work it will be. In theory it shouldn't be that difficult
RyanVM|sheriffduty kmoir: our logic for periodic jobs only applies to builds?
RyanVM|sheriffduty i.e. we can't make talos-e10s a periodic job?
jmaher RyanVM|sheriffduty: I don't know, that work work as long as we can make them tier-1
kmoir hmm, let me look. I was just thinking of it from the test side
RyanVM|sheriffduty jmaher: I really do wonder what a little rebalancing would do for us - maybe even reimaging 10 or so
RyanVM|sheriffduty also, 4 of the 5 disabled slaves have been offline for more than a month now
jmaher that is a lot offline
RyanVM|sheriffduty i see armen has one for bug 1141416
bugbot Bug https://bugzilla.mozilla.org/show_bug.cgi?id=1141416 normal, --, ---, nobody, NEW , Fix the slaves broken by talos's inability to deploy an update
RyanVM|sheriffduty which *may* be the issue with the others as well
jmaher I think it is
RyanVM|sheriffduty looks like he's actively working on that now
RyanVM|sheriffduty so even 5 more would help
RyanVM|sheriffduty jmaher, kmoir: seems that rebalancing seems like the most painless option if it can be done
RyanVM|sheriffduty maybe even 5 to start to see how linux64-ix wait times get impacted
Reporter | ||
Updated•9 years ago
|
Assignee: nobody → kmoir
Comment 1•9 years ago
|
||
jmaher, RyanVM: is this still important?
Comment 2•9 years ago
|
||
I think this could help us reduce our load on win* machines if it was implemented right now. Ideally we would ensure all tests run every X pushes where X=1,2,3,etc. Maybe for now X=2 and we could save a lot of resources.
Comment 3•9 years ago
|
||
Do we need any change on the SETA side? or just the Buildbot side?
Comment 4•9 years ago
|
||
right now SETA has talos data integrated in, it is hardcoded- so we would need to edit the buildbot talos scheduler (which is different than the unittest scheduler)
Reporter | ||
Comment 5•9 years ago
|
||
Yes, just change the talos scheduler. To be honest I have not looked at this because my thinking was that we would be moving to taskcluster and thus not worth the effort changing the buildbot side of things. But this depends on timelines for migration.
Comment 6•9 years ago
|
||
*Very* optimistically, we won't be able to migrate before Q4.
In any case, TaskCluster needs SETA support which we don't yet have.
Hopefully bug 1243123 makes it as a GsoC project and in the blocked bugs we will add in-tree Buildbot scheduling via TC/BBB.
Comment 7•9 years ago
|
||
at the current rate of migration we are looking at a full calendar year at least (end of 2016). We would bump this up if we used taskcluster to schedule and use buildbot bridge to schedule everything.
Assignee | ||
Comment 8•9 years ago
|
||
I took another look at this yesterday as a way to help with some of the HW capacity issues we've been having.
The changes I made result in these tests running all the time on mozilla-inbound (for 32-bit windows):
'Windows 7 32-bit mozilla-inbound talos g2',
'Windows 7 32-bit mozilla-inbound talos g1',
'Windows 7 32-bit mozilla-inbound talos svgr',
'Windows 7 32-bit mozilla-inbound talos dromaeojs',
'Windows 7 32-bit mozilla-inbound talos other',
'Windows 7 32-bit mozilla-inbound talos chromez',
'Windows 7 32-bit mozilla-inbound talos tp5o',
'Windows 7 32-bit mozilla-inbound talos xperf',
'Windows XP 32-bit mozilla-inbound talos g2',
'Windows XP 32-bit mozilla-inbound talos g1',
'Windows XP 32-bit mozilla-inbound talos svgr',
'Windows XP 32-bit mozilla-inbound talos dromaeojs',
'Windows XP 32-bit mozilla-inbound talos other',
'Windows XP 32-bit mozilla-inbound talos chromez',
'Windows XP 32-bit mozilla-inbound talos tp5o',
And these tests running every 14 pushes, or 2 hours:
'Windows XP 32-bit mozilla-inbound talos chromez-e10s',
'Windows XP 32-bit mozilla-inbound talos g2-e10s',
'Windows XP 32-bit mozilla-inbound talos svgr-e10s',
'Windows XP 32-bit mozilla-inbound talos dromaeojs-e10s',
'Windows XP 32-bit mozilla-inbound talos other-e10s',
'Windows XP 32-bit mozilla-inbound talos tp5o-e10s',
'Windows XP 32-bit mozilla-inbound talos g1-e10s'
And these running every 7 pushes, or 1 hour:
'Windows 7 32-bit mozilla-inbound talos chromez-e10s',
'Windows 7 32-bit mozilla-inbound talos g2-e10s',
'Windows 7 32-bit mozilla-inbound talos svgr-e10s',
'Windows 7 32-bit mozilla-inbound talos dromaeojs-e10s',
'Windows 7 32-bit mozilla-inbound talos other-e10s',
'Windows 7 32-bit mozilla-inbound talos tp5o-e10s',
'Windows 7 32-bit mozilla-inbound talos xperf-e10s',
'Windows 7 32-bit mozilla-inbound talos g1-e10s'
Win8 64-bit, and OSX 10.10 are also impacted by this change.
Does this look about right?
Flags: needinfo?(kmoir)
Flags: needinfo?(jmaher)
Assignee | ||
Comment 9•9 years ago
|
||
Assignee | ||
Comment 10•9 years ago
|
||
Reporter | ||
Comment 11•9 years ago
|
||
The first stanza looks right.
For
And these tests running every 14 pushes, or 2 hours:
and
And these running every 7 pushes, or 1 hour:
I don't understand why these are run less often since they are not listed as tests that should run less frequently here
http://alertmanager.allizom.org/data/setadetails/?date=2015-03-03&buildbot=1&branch=mozilla-inbound&inactive=1
Flags: needinfo?(kmoir)
Reporter | ||
Comment 12•9 years ago
|
||
actually ignore my last comment, the seta data link I had was for 2015, not 2016 so the tests you indicate link look fine with the correct link :-) moar caffeine
Comment 13•9 years ago
|
||
:catlee, thanks for looking into this! I am excited to see this change, could we add win8 in there as well to be scheduled as we are for win7?
One thought is we could go every other push by default, then do e10s on 7th/14th based on OS. That would reduce the load and probably end up not requiring any more work to narrow down regressions.
The biggest concern is that regression won't show up until much later, especially in the 14rev/2hour window. That is a full 24 hours.
Will this affect pgo scheduling? I would like to keep pgo the same as it is now.
Flags: needinfo?(jmaher)
Assignee | ||
Comment 14•9 years ago
|
||
Win8 has similar changes, I just didn't call them out explicitly.
I can look at making it every other push by default. Would that be only on fx-team and inbound, or all branches?
PGO scheduling isn't impacted - they happen as usual.
Comment 15•9 years ago
|
||
I think central should stay the same, but for inbound/fx-team we should have this intentional coalescing.
If you want me to define anything via the SETA API, that would be very doable- there is data in there already, but I could add priority fields, etc.
Assignee | ||
Comment 16•9 years ago
|
||
Assignee | ||
Comment 17•9 years ago
|
||
Comment on attachment 8726356 [details] [diff] [review]
Handle Linux64 data from SETA r=kmoir
Joel just added some Linux64 talos data to SETA, and it breaks our current configuration. I think because we're filtering out talos jobs, and so we end up with no tests inside define_configs(), but not skipping that platform in that case.
Attachment #8726356 -
Flags: review?(kmoir)
Reporter | ||
Updated•9 years ago
|
Attachment #8726356 -
Flags: review?(kmoir) → review+
Assignee | ||
Comment 18•9 years ago
|
||
https://hg.mozilla.org/build/buildbot-configs/rev/8382286ea52cc334eba5c459f3cf19f4339a6cca
Bug 1192994 - Handle Linux64 data from SETA r=kmoir
Assignee | ||
Comment 19•9 years ago
|
||
Assignee | ||
Updated•9 years ago
|
Attachment #8726174 -
Attachment is obsolete: true
Assignee | ||
Comment 20•9 years ago
|
||
Comment on attachment 8726789 [details] [diff] [review]
adding talos support to seta (buildbotcustom)
This patch changes the generateBranchObjects function to look for talos suites in the platform's skipconfig data. We then collect the talos builder names, grouped by their skipconfig. Finally, we create schedulers for each group of talos builders with the same skipconfig.
Attachment #8726789 -
Attachment description: adding talos support to seta → adding talos support to seta (buildbotcustom)
Attachment #8726789 -
Flags: review?(kmoir)
Assignee | ||
Updated•9 years ago
|
Attachment #8726194 -
Attachment is obsolete: true
Reporter | ||
Comment 21•9 years ago
|
||
Comment on attachment 8726789 [details] [diff] [review]
adding talos support to seta (buildbotcustom)
Looks good
Will have to revise
test_exclusions = re.compile('\[funsize\]|\[TC\]|talos')
in config_seta.py so talos is removed so talos skipconfig defintions are added
Attachment #8726789 -
Flags: review?(kmoir) → review+
Assignee | ||
Comment 22•9 years ago
|
||
Assignee | ||
Comment 23•9 years ago
|
||
Comment on attachment 8726844 [details] [diff] [review]
adding talos support to seta (buildbot-configs)
Minor changes required for buildbot-configs. I needed a dummy entry for ubuntu64_hw, since talos is the only thing that runs there.
Most of the changes to config_seta.py are cleanup, with the exception of the regex change where I stop skipping talos jobs.
Attachment #8726844 -
Attachment description: adding talos support to seta → adding talos support to seta (buildbot-configs)
Attachment #8726844 -
Flags: review?(kmoir)
Reporter | ||
Updated•9 years ago
|
Attachment #8726844 -
Flags: review?(kmoir) → review+
Reporter | ||
Updated•9 years ago
|
Assignee: kmoir → catlee
Assignee | ||
Comment 24•9 years ago
|
||
https://hg.mozilla.org/build/buildbot-configs/rev/60bc7b8b426a21181082027ab6f3ef02cf40511c
Bug 1192994 - adding talos support to seta r=kmoir
Assignee | ||
Updated•9 years ago
|
Attachment #8726789 -
Flags: checked-in+
Assignee | ||
Updated•9 years ago
|
Attachment #8726844 -
Flags: checked-in+
Assignee | ||
Comment 25•9 years ago
|
||
I think this is done.
bug 1255088 tracks cleaning up some of the talos suite configuration on various branches, and will allow us to have more control over talos with seta.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•