Closed Bug 1126493 Opened 5 years ago Closed 5 years ago

rollout 10.10 tests in a way that doesn't impact wait times

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86
macOS
task
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kmoir, Assigned: kmoir)

References

Details

Attachments

(13 files, 14 obsolete files)

2.97 KB, patch
catlee
: review+
kmoir
: checked-in+
Details | Diff | Splinter Review
3.14 KB, patch
coop
: review+
Details | Diff | Splinter Review
65.29 KB, text/plain
Details
5.46 KB, patch
kmoir
: checked-in+
Details | Diff | Splinter Review
2.27 KB, text/plain
Details
5.94 KB, patch
coop
: review+
kmoir
: checked-in+
Details | Diff | Splinter Review
3.00 KB, patch
coop
: review+
Details | Diff | Splinter Review
54.69 KB, patch
Details | Diff | Splinter Review
4.37 KB, patch
kmoir
: checked-in+
Details | Diff | Splinter Review
42.11 KB, text/plain
Details
2.26 KB, patch
coop
: review+
kmoir
: checked-in+
Details | Diff | Splinter Review
6.63 KB, text/plain
Details
890 bytes, patch
coop
: review+
kmoir
: checked-in+
Details | Diff | Splinter Review
We aren't in the state yet where we can enable 10.10 tests on trunk (see bug 1121199 for status on greening up tests) but I thought I'd open this bug to get things started.

We don't currently have enough capacity to run tests on 10.10 and 10.8.  10.10 is the dominant platform compared to 10.8 for our users now.  Jake is going to find out when the ~25 used minis that Amy ordered are due to arrive and how much time dcops needs to get them up and running.  There is also the time needed to image them and enter their records into inventory etc.

So the question is:
Once we have 10.10 tests in a greener state, my thought is to disable the 10.8 
tests on trunk and enable the corresponding 10.10 tests so wait times in the 10.8 pool don't get worse. Also, if the new minis are delayed in arriving, we can still stand up the 10.10 tests. If they do arrive on time this plan still makes sense give the current long wait times for 10.8. Thoughts?
Assignee: nobody → kmoir
Attached patch bug1125998.patch (obsolete) — Splinter Review
talked to coop about this a bit in our 1x1 today, he suggested ensuring we have green tests on aurora before enabling them on trunk branches etc.  In any case, he is a patch for when we get to that state
Attached file bug1125998builder.diff (obsolete) —
builder diff
Approval at the Cxx level is still pending before we purchase the mac minis.
Blocks: 1118183
Attached patch bug1126493-2.patch (obsolete) — Splinter Review
patch to enable opt on 10.10 on trunk while keep 10.8 running on debug on these branches as suggested here

https://bugzilla.mozilla.org/show_bug.cgi?id=1131269#c4
Attachment #8556041 - Attachment is obsolete: true
Attached file bug1126493builder.diff (obsolete) —
builder diff 

still have to figure out a way to disable talos on 10.8 and enabled on 10.10
Attachment #8556042 - Attachment is obsolete: true
Was talking to coop today about how to reduce our load on mountainlion temporarily given that it looks like we don't have many new minis for 10.10.

One way is to reduce the frequency at which they run to we can pull some minis and reimage them for 10.10.  

This is an initial patch, if this doesn't change the load that much, we can reduce the frequency on other branches.
Attachment #8563050 - Flags: review?(catlee)
Attachment #8563050 - Flags: review?(catlee) → review+
Attachment #8563050 - Flags: checked-in+
Attached patch bug1126493-3.patch (obsolete) — Splinter Review
Attachment #8563004 - Attachment is obsolete: true
Attached patch bug1126493builder.diff (obsolete) — Splinter Review
Attachment #8563010 - Attachment is obsolete: true
Comment on attachment 8565977 [details] [diff] [review]
bug1126493-3.patch

This patch is for enabling 10.10 opt tests on trunk once we have the machines reimaged in bug 1134223
Attachment #8565977 - Flags: review?(coop)
Depends on: 1134223
Comment on attachment 8565977 [details] [diff] [review]
bug1126493-3.patch

Review of attachment 8565977 [details] [diff] [review]:
-----------------------------------------------------------------

An overarching comment for this section indicating what our current plan is for 10.10 testing would not be amiss.

r+ with nits fixed.

::: mozilla-tests/config.py
@@ +2177,5 @@
>                  tests[3] = [x for x in tests[3] if x not in platforms_for_os or x in enabled_platforms_for_os]
>                  BRANCHES[branch]['%s_tests' % s] = tuple(tests)
>  
> +# bug 1126493 Enable Yosemite testing on select branches only
> +exclude_yosemite = ['try']

I think this var is poorly named. Based on the comment, it should be either include_yosemite or exclude_mountainlion.

@@ +2192,5 @@
> +if len(exclude_yosemite) > 0:
> +    delete_slave_platform(BRANCHES, PLATFORMS, {'macosx64': 'yosemite'}, branch_exclusions=exclude_yosemite)
> +
> +#opt and talos can be disabled on branches where they are enabled on 10.10
> +for branch in exclude_yosemite:

Could you nest this under the |if len(exclude_yosemite) > 0:| conditional above for clarity?
Attachment #8565977 - Flags: review?(coop) → review+
Attached patch bug1126493-4.patch (obsolete) — Splinter Review
fixed with feedback from review
Attached patch bug1126493-5.patch (obsolete) — Splinter Review
enable on 39 instead since merge happened.  Will run more tests to see current state of aurora
Attachment #8566176 - Attachment is obsolete: true
Depends on: 1137749
So I talked to the sheriffs today and they want a 50/50 split on 10.8/10.10 slave pools before this 10.10 is enabled on trunk.  So I opened bug 1137749 with relops to reimage some more machines. We will do this on Monday so we can get the patches for talos on graph servers enabled etc.

Also, I'm working on a patch to disable the 10.10 talos tests that erroneously get enabled on branches where don't have other 10.10 tests running.
much better patch - ensures that talos tests that are enabled on 10.10 are removed on 10.8.  Also ensures talos tests are not enabled on m-a and m-b
will attach builder diff
Attachment #8565977 - Attachment is obsolete: true
Attachment #8569940 - Attachment is obsolete: true
Attachment #8570671 - Flags: review?(coop)
Attached file 10.10builder.diff
Attachment #8565978 - Attachment is obsolete: true
Comment on attachment 8570671 [details] [diff] [review]
bug1126493-6.patch

Review of attachment 8570671 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozilla-tests/config.py
@@ +2192,5 @@
> +            if slave_platform not in ['mountainlion', 'yosemite']:
> +                continue
> +            if name not in include_yosemite:
> +                include_yosemite.append(name)
> +if len(include_yosemite) > 0:

You shouldn't need this condition if you are setting |include_yosemite = ['try']| above.
Attachment #8570671 - Flags: review?(coop) → review+
Attached patch bug1126493-7.patch (obsolete) — Splinter Review
patch with review comment incorporated
Attached patch bug1126493-7.patch (obsolete) — Splinter Review
Attachment #8571361 - Attachment is obsolete: true
Attached patch bug1126493debug.patch (obsolete) — Splinter Review
patch to enable debug tests, work in bug 1125998 looks promising to reducing the time needed to run debug tests
Attached file builder1126493debug.diff (obsolete) —
builder diff for patch to enable debug tests on 10.10 and disable on 10.8
We were going to deploy this today but arr noticed an issue with slaveapi and reboots of yoseimite slaves. dividehex is investigating a fix. We are planning to enable 10.10 on trunk for opt tests tomorrow.  I'll disable the next batch of slaves starting for noon eastern or 9 am pacific.
unbitrotten patch
Attachment #8571375 - Attachment is obsolete: true
Attachment #8571960 - Flags: checked-in+
We're seeing 10.10 debug M(JP) and M(gl) jobs being scheduled on trunk. I've hidden M(JP) for now (and probably will hide gl too), but we should really kill those ASAP since we don't have the capacity to spare right now.
Flags: needinfo?(kmoir)
Job names are:
Rev5 MacOSX Yosemite 10.10 mozilla-central debug test mochitest-jetpack
Rev5 MacOSX Yosemite 10.10 mozilla-central debug test mochitest-gl

Presumably happening on other branches as well, but they haven't been enabled in production long enough to confirm that yet.
Thanks RyanVM I'll write a patch to disable them.
Flags: needinfo?(kmoir)
Attached patch bug1126493disablegljp.patch (obsolete) — Splinter Review
will attach builder diff
Attachment #8572310 - Flags: review?(coop)
patch to disable mochitest-gl and mochitest jetpack on debug for 10.10
Comment on attachment 8572312 [details]
bug1126493disablegljpbuilder.diff

previous comment should have read builder diff to disable mochitest-gl and mochitest jetpack on debug for 10.10
There is an issue on the masters where 10.8 talos jobs are still being scheduled.  I did have this addressed on my dev-master but it seemed when I made a new patch this morning to account for bit rot, the order was lost and the part to remove mountain lion as a talos slave platform on trunk moved to after talos tests were loaded.  Anyways this patch addresses this + removes m-jp and m-gl on debug for 10.10
Attachment #8572310 - Attachment is obsolete: true
Attachment #8572310 - Flags: review?(coop)
Attachment #8572408 - Flags: review?(coop)
Attachment #8572408 - Flags: review?(coop) → review+
Attachment #8572408 - Flags: checked-in+
patch to enable 10.10 tests on debug and m-a, only remaining 10.8 tests would be on branches where gecko < 37.  We are not ready to land this patch yet because of bug 1125998 (fix debug tests so the run in a reasonable timeframe) and bug 1139002 (green up tests on aurora)

Will attach a builder diff
Attachment #8572661 - Flags: review?(coop)
Attachment #8571385 - Attachment is obsolete: true
Attachment #8571389 - Attachment is obsolete: true
Attachment #8572661 - Flags: review?(coop) → review+
Once bug 1137963 is resolved (and a few tests are fixed) I should be able to enable 10.10 tests on aurora and debug on trunk.  At the same time I'll disable the corresponding 10.8 tests.  At that point, I'd like to reimage another 25 10.8 machines as 10.10.  This will allow leave us with 25 10.8 machines for beta and release tests. 

Once 10.10 tests are on beta, I'd like to reimage 15 more machines so there are only 10 10.8 test machines left for release builds.  :rail, does this seem reasonable to you?  I thought I'd ask since you are on release duty.
Flags: needinfo?(rail)
We talked about this on IRC a bit. Now that all 10.8 jobs have been shut off on the B2G release branches, that'll leave us just needing to support 10.8 on mozilla-release and mozilla-esr31 once Gecko 38 hits beta. We run about 30 10.8 test jobs per push. In general, Windows PGO builds/tests are the long pole for overall run to completion time on the release branches, and neither of those branches generally see significant push volume on a daily basis.

Assuming we have the ability to reimage from 10.10 back to 10.8 if the wait times are too long, could we even try leaving only 5 10.8 slaves as a first shot? We'll still have 10.6 tests for faster turnaround too. If that proves too few, maybe go up to 10. But overall, I think we should err on the side of 10.10 capacity as much as we can get away with.
removing rail needinfo since RyanVM responded to question
Flags: needinfo?(rail)
Blocks: 1140246
Attached patch bug1126493debugonly.patch (obsolete) — Splinter Review
patch to enable on debug only since patch to fix debug hasn't landed on aurora yet
change patch so all 10.8 tests remains an option on try
Attachment #8576063 - Attachment is obsolete: true
Attached file 10.10builder.diff
builder diff for latest patch to enable debug on trunk + keep 10.8 running on try
Attachment #8576239 - Attachment is patch: false
Attachment #8576235 - Flags: checked-in+
10.10 debug are now running on trunk and there are now 25 additional 10.10 machines we enabled in bug 1140246
Depends on: 1143165
We can enable 10.10 on aurorar now that the patch in bug 1138616 has been uplifted
Attachment #8578022 - Flags: review?(coop)
Attached file bug1126493aurora.diff
builder diff from previous patch
Attachment #8578022 - Flags: review?(coop) → review+
From irc
philor	kmoir: and speaking of disused filing cabinets in the basement, https://treeherder.mozilla.org/#/jobs?repo=addon-sdk is still using 10.8 when it should have switched to 10.10 along with the trunk
kmoir	philor: okay didn't know about that, will look at that
philor	kmoir: it hides in two places, weird conditionals in http://mxr.mozilla.org/build/source/tools/buildfarm/utils/run_jetpack.py?force=1 and... somewhere in configs, can't remember where

I have patches to fix, will attach
switch jetpack to use yosemite instead of mtnlion
builder diff

(test16)[kmoir@dev-master2.bb.releng.use1.mozilla.com test16]$  diff old2 new2
3479,3480c3479,3480
< jetpack-fx-team-mountainlion-opt ScriptFactory
< jetpack-fx-team-mountainlion-debug ScriptFactory
---
> jetpack-fx-team-yosemite-opt ScriptFactory
> jetpack-fx-team-yosemite-debug ScriptFactory
Attachment #8578212 - Flags: review?(coop)
Attachment #8578212 - Flags: review?(coop) → review+
Attachment #8578212 - Flags: checked-in+
Attachment #8578022 - Flags: checked-in+
closing this bug, thanks everyone for your help

On next merge day, we'll have to enable talos yosemite tests on beta, the patch is in bug 1144102
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.