add windows 10 machines to buildbot-configs so we can run new talos tests on there

RESOLVED FIXED

Status

Release Engineering
Platform Support
RESOLVED FIXED
7 months ago
5 months ago

People

(Reporter: jmaher, Assigned: jmaher)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(4 attachments, 3 obsolete attachments)

Comment hidden (empty)

Comment 1

7 months ago
Do we know what the machine naming scheme will be?
t-w1064-ix-NNNN.wintest.releng.scl3.mozilla.com
(Assignee)

Comment 3

7 months ago
I am familiar with list_builder_differences to verify changes for scheduling new jobs, I am not familiar with adding new machine names and platforms in the buildbot-configs (or maybe even buildbotcustom).  If there is prior art for doing this, I would be happy to look at that as a starting point.

Comment 4

7 months ago
We previously removed win10 support from buildbot in bug 1330999. I would use those as a starting point.
(Assignee)

Comment 5

7 months ago
Created attachment 8869610 [details] [diff] [review]
add win10-ix as a platform- shift win8 talos tests to win10
Assignee: nobody → jmaher
Status: NEW → ASSIGNED
Attachment #8869610 - Flags: feedback?(catlee)
(Assignee)

Comment 6

7 months ago
Created attachment 8869611 [details]
buildbot differences for win10
(Assignee)

Comment 7

7 months ago
assuming my patch looks good, we can go ahead and schedule a time to replace win8 talos with win10; ideally this is something we can line up with reimaging machines.
(Assignee)

Comment 8

7 months ago
:catlee, I would like to know if this is a patch worth pursuing- maybe if you don't have time you can redirect to another buildbot hacker?  Getting this ready to land would help us move forward in finishing the win10 project.

Comment 9

7 months ago
Comment on attachment 8869610 [details] [diff] [review]
add win10-ix as a platform- shift win8 talos tests to win10

Review of attachment 8869610 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozilla-tests/config.py
@@ -152,5 @@
>      'config_file': 'talos/windows_config.py',
>  }
>  
>  PLATFORMS['win64']['slave_platforms'] = ['win8_64']
> -PLATFORMS['win64']['talos_slave_platforms'] = ['win8_64']

Will we want to make a hard transition from win8 to win10 talos testing?
Attachment #8869610 - Flags: feedback?(catlee) → feedback+
(Assignee)

Comment 10

7 months ago
in addition to buildbot-configs, we need support for slavehealth/slavealloc/puppet.

I see a puppet patch when win10 was removed:
https://bugzilla.mozilla.org/page.cgi?id=splinter.html&bug=1330999&attachment=8827909

there is also a cloudtools patch:
https://bugzilla.mozilla.org/page.cgi?id=splinter.html&bug=1330999&attachment=8827910

but I am not sure what slavehealth/slaveconfig is, is that cloud-tools?
Flags: needinfo?(catlee)
buildduty can add the entries to slavealloc. I'm not sure about how machines get added to slavehealth. Alin, can you help Joel out?
Flags: needinfo?(catlee) → needinfo?(aselagea)
(Assignee)

Comment 12

7 months ago
Created attachment 8871284 [details] [diff] [review]
add windows 10 ix to buildbot configs

the plan here is to turn off win8 and turn on win10 at the same time.  If there are problems with that plan, let me know and I can do this in 2 stages.
Attachment #8869610 - Attachment is obsolete: true
Attachment #8871284 - Flags: review?(kmoir)
(Assignee)

Comment 13

7 months ago
Created attachment 8871285 [details] [diff] [review]
add windows 10 ix to puppet

support for windows 10 ix hardware inside of puppet.
Attachment #8871285 - Flags: review?(kmoir)
(In reply to Chris AtLee [:catlee] from comment #11)
> buildduty can add the entries to slavealloc. I'm not sure about how machines
> get added to slavehealth. Alin, can you help Joel out?

Yeah, I can take care of those.
Flags: needinfo?(aselagea)
According to https://bugzilla.mozilla.org/show_bug.cgi?id=1367102#c4, we're going to enable 75 Win 10 machines at this point.   
Added those to slavealloc.

mysql> select count(*) from slaves where name like 't-w1064-ix%';
+----------+
| count(*) |
+----------+
|       75 |
+----------+
1 row in set (0.00 sec)

Comment 16

7 months ago
Comment on attachment 8871285 [details] [diff] [review]
add windows 10 ix to puppet

Do we need to include

  $slave_trustlevel = 'try'

here?

Comment 17

7 months ago
Comment on attachment 8871284 [details] [diff] [review]
add windows 10 ix to buildbot configs

I think this is fine except for

PLATFORMS['win64-devedition']['win10_64_devedition'] = {'name': 'Windows 10 64-bit DevEdition',
+                                                       'try_by_default': True}


try_by_default': True should be False

we only run these tests on beta
Attachment #8871284 - Flags: review?(kmoir) → review+
(Assignee)

Comment 18

7 months ago
thanks!  I think with the two patches attached here, we will be all set.  I assume the puppet patch can land sooner rather than later, then the buildbot-config patch when we start shutting off win8 machines.
For the slave_health part, I simply reverted Coop's patch which actually disabled win10:
https://hg.mozilla.org/build/slave_health/rev/ed1e646be536
manifests/moco-nodes.pp should not have any node definitions for w10, since we are using GPO and AD.
(Assignee)

Comment 21

7 months ago
Created attachment 8871312 [details] [diff] [review]
add windows 10 ix to puppet

removed the moco-nodes.pp changes.
Attachment #8871285 - Attachment is obsolete: true
Attachment #8871285 - Flags: review?(kmoir)
Attachment #8871312 - Flags: review+
(Assignee)

Comment 22

7 months ago
Comment on attachment 8871312 [details] [diff] [review]
add windows 10 ix to puppet

sorry, this was not r+ from :kmoir already; the question about slavelevel='try' seems to be resolved by removing the changes for moco-nodes.pp
Attachment #8871312 - Flags: review+ → review?(kmoir)
(Assignee)

Comment 23

7 months ago
Created attachment 8871313 [details] [diff] [review]
add windows 10 ix to buildbot configs

updated patch to set win10-devedition on try=False by default.  thanks for the review
Attachment #8869611 - Attachment is obsolete: true
Attachment #8871313 - Flags: review+
One note here that I made in bug 1367102, the host regex is t-w1064-ix-NNN.wintest.releng.scl3.mozilla.com (3 digits instead of 4).
Blocks: 1367102

Updated

7 months ago
Attachment #8871312 - Flags: review?(kmoir) → review+
Did a bit of research over what's needed in Treeherder so the new jobs show up and I think we have everything in place from our previous setup to run Win 10 tests.

https://github.com/mozilla/treeherder/blob/master/ui/js/values.js#L38
https://github.com/mozilla/treeherder/blob/master/treeherder/etl/buildbot.py#L279

A test is also added:
https://github.com/mozilla/treeherder/blob/master/tests/etl/test_buildbot.py#L1018
Comment on attachment 8871312 [details] [diff] [review]
add windows 10 ix to puppet

https://hg.mozilla.org/build/puppet/rev/66f603ea69af
https://hg.mozilla.org/build/puppet/rev/868762962e40
Attachment #8871312 - Flags: checked-in+

Comment 27

7 months ago
We ran into several problems with this deploy from the releng side of things.  There were also relops issues but I'll also address them in their bug.

There were two main problems
1) New w10 machines could not connect to buildbot masters
2) Huge windows pending counts were triggered

New w10 machines could not connect to buildbot masters
1) The initial reconfig failed because the win10 devedition key was missing in puppet.  Also there were windows eol characters in the patch, not sure if this caused an issue but I removed them as well.
I deployed this fix
https://hg.mozilla.org/build/puppet/rev/3f09b62b7c30
2) The puppet patch landed but a new reconfig was not triggered because the reconfig script did not see a change to the version from the last time when it failed bug 1369164
3) I triggered a reconfig and machines could connect

Huge windows pending counts were triggered
When we enabled w10 as a platform there were a huge increase in pending counts for w7 and w10 jobs.  We have seen this happen before when adding a new platform.  I opened bug 1369157 to investigate the root cause.

Alin fixed the db issues as well
Alin, can you include the db queries/updates you used to fix the issue on this bug.  I looked in the mysql console history but you must have attached to the db from a different machine than I did.
Flags: needinfo?(aselagea)

Comment 28

6 months ago
Created attachment 8873465 [details] [diff] [review]
bug1366029range.patch

noticed this alert because the range is not quite right

[sns alert] Jun 01 08:00:02 buildbot-master119.bb.releng.scl3.mozilla.com watch_twistd_log.py: Count: 372 | First instance: 2017-06-01 07:38:09-0700 | Most recent instance: 2017-06-01 08:00:00-0700 | Twistd exception: twisted.cred.error.UnauthorizedLogin - t-w1064-ix-075.wintest.releng.scl3.mozilla.com 10.26.42.97

Updated

6 months ago
Attachment #8873465 - Flags: checked-in+
(In reply to Kim Moir [:kmoir] from comment #27)

> Alin, can you include the db queries/updates you used to fix the issue on
> this bug.  I looked in the mysql console history but you must have attached
> to the db from a different machine than I did.

I first created a temporary table to store the IDs of all build requests that were submitted *after* May 31 07:00 PDT, but corresponding to changes that were done *before* May 31 07:00 PDT.

create temporary table ids select buildrequests.id from buildrequests, buildsets, sourcestamp_changes, changes where changes.changeid = sourcestamp_changes.changeid and sourcestamp_changes.sourcestampid = buildsets.sourcestampid and buildrequests.buildsetid = buildsets.id and buildrequests.complete = 0 and buildrequests.claimed_at =0 and  buildername like 'Windows%' and buildrequests.submitted_at > 1496214000 and changes.when_timestamp < 1496214000;

I then simply marked those jobs as completed.

update buildrequests, ids2 set complete=1, results=2, complete_at=1496223480 where buildrequests.id=ids2.id and complete=0 and claimed_at=0;
Flags: needinfo?(aselagea)
(Assignee)

Updated

5 months ago
Status: ASSIGNED → RESOLVED
Last Resolved: 5 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.