install vs2013 across all our win64 machines and add a junction link for it in our GPO

RESOLVED FIXED

Status

Infrastructure & Operations
RelOps
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: jlund, Assigned: Q)

Tracking

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/795] [time=20:00])

(Reporter)

Description

3 years ago
like ix-mn-w0864-001,

we are going to need vs2013 installed across all of our win64 machines so that we can build ff32 and ff64 against that compiler.

also like ix-mn-w0864-001, we will want to keep vs2010 installed so we can ride trains and other support.

Along side this, our automation mozconfigs will be pointing to /c/tools/vs2013 for the Visual Studio 2013 compiler path so we will need a junction link added to our GPO:
mklink /j c:\tools\vs2013 "C:\Program Files (x86)\Microsoft Visual Studio 12.0"

q: does the above sound OK? see https://bugzilla.mozilla.org/show_bug.cgi?id=1009807#c12 for more details.
(Reporter)

Comment 1

3 years ago
need info WRT comment 0 above
Flags: needinfo?(q)
To be pedantic, this needs to be MSVC 2013 Update 2, which fixes an important compile error for us: http://support.microsoft.com/kb/2927432
(Reporter)

Comment 3

3 years ago
friendly ping -- we are ready to try vs2013 in automation. Once this bug is resolved, we can move forward.

I'm available if there are any questions/concerns :)
(Assignee)

Comment 4

3 years ago
I am testing this now and we can do a roll out Monday.
Flags: needinfo?(q)
Assignee: relops → q
(Assignee)

Comment 5

3 years ago
Junction is looking good and ready for 2013 install:

C:\Users\cltbld>dir c:\Tools\
 Volume in drive C is OSDisk
 Volume Serial Number is 04C9-BAB6

 Directory of c:\Tools

06/11/2014  07:27 PM    <DIR>          .
06/11/2014  07:27 PM    <DIR>          ..
11/01/2013  02:19 PM    <DIR>          sdks
06/11/2014  07:27 PM    <JUNCTION>     vs2013 [C:\Program Files (x86)\Microsoft Visual Studio 12.0]
               0 File(s)              0 bytes
               4 Dir(s)  184,692,510,720 bytes free

C:\Users\cltbld>
(Reporter)

Comment 6

3 years ago
There was some chatter on irc WRT this today. arr wanted clarification we are ready for rollout.

We are all good from releng end, and would like this to be installed in small iterations across our machines or whatever Q deems 'safe'. Thank you!
(Assignee)

Comment 7

3 years ago
How about groups of 10 starting at the beginning with lowest machine numbers first and working our way up. Install 10 wait a day then start 10 every four hours aftwr we confirm that first 10?

Q
(Reporter)

Comment 8

3 years ago
sounds perfect.
(Assignee)

Comment 9

3 years ago
We should be ready to kick this off as soon as the most recent move train is done.
It looks like we have rolled out the junction everywhere, and C:\Program Files (x86)\Microsoft Visual Studio 12.0 on some slaves. Just b-2008-ix-002x ? If so, I think bug 1026870 is fallout.
Depends on: 1026870
(Reporter)

Comment 11

3 years ago
just for some context, I can not say which slaves have vs2013 installed on them but none should be actually using them. The mozconfig patch[1] to switch over to vs2013 hasn't been applied.

We should have vs2010 and vs2013 installed simultaneously on some slaves, my guess is the 10 from comment 10. Testing against ix-mn-w0864-001 was done to verify that vs2010 would still build with vs2013 installed along side it[2] but I guess something is not right.

[1] - https://bugzilla.mozilla.org/attachment.cgi?id=8434728
[2] - https://bugzilla.mozilla.org/show_bug.cgi?id=1009807#c11
(Reporter)

Comment 12

3 years ago
based off the recent comments in bug 1026870, the best course of action sounds like to re-image the machines.

I'm sorry you have to revert the work you did. Thanks for doing this in a safe chunked manner.
(Reporter)

Comment 13

3 years ago
I am going to put these hosts back in production now. I'll close this when that is done. We can re-open when we are ready to try again.

see 'Bug 1027745 - please install a vs2013 along side vs2010 on one machine for staging testing', for details on the next POA.
See Also: → bug 1027745
(Reporter)

Comment 14

3 years ago
all keys have been added bar b-2008-ix-0020

for some reason I can not reach it: https://bugzilla.mozilla.org/show_bug.cgi?id=768933#c13

looks like this host has had booting issues before but it was a re-image that solved it: https://bugzilla.mozilla.org/show_bug.cgi?id=944779

I am going to close this bug for now until things change. we can track b-2008-ix-0020 in its problem tracking bug.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
(Reporter)

Comment 15

3 years ago
we are back in a state where we are ready to try installing vs2013 on a small chunk of windows builders again

reason for being ready: https://bugzilla.mozilla.org/show_bug.cgi?id=1009807#c31 and https://bugzilla.mozilla.org/show_bug.cgi?id=1009807#c36 https://bugzilla.mozilla.org/show_bug.cgi?id=1009807#c38

I *believe* we still have the vs2013 junction across our machines but I am not sure if this[1] stuck after the backout. Either way that still needs to be there and we also need vs2013 installed again like before. The *difference* this time is we also need to modify some files. See here[2] and here[3] for context.

so to be explicit, we need:

1) confirm that the /c/tools/vs2013 junction still exists across our windows pool
2) install 'MSVC 2013 Update 2' like we did before across a small chunk of machines (5-10)
3) post vs2013 install, cp two files from vs2013 paths to the vs2010 equivalents:

    $ cp C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\cvtres.exe C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\cvtres.exe

    $ cp C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\amd64\cvtres.exe C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\amd64\cvtres.exe


Thanks in advance :D

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1019165#c5
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=1009807#c30
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1009807#c35
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Reporter)

Comment 16

3 years ago
hey Q, just trying to set my self a timeline here. How is your queue (get it?) looking? would we be able to try this out on a select few tomorrow?
Flags: needinfo?(q)
(Assignee)

Comment 17

3 years ago
I have a gpo ready to go. We should be able to push this out to select machines.  Same deal of 10 machines at a time or do you have a list?
Flags: needinfo?(q)
(Assignee)

Updated

3 years ago
Whiteboard: [time=20:00]
(Reporter)

Comment 18

3 years ago
ya, sgtm! tx Q.

1) Just to confirm, does that GPO include copying and replacing the two files I mentioned in comment 15?

2) How complicated would it be to do 5, even 2 initially followed by chunks of 10? I ask because of what happened last time and I'd like to save you the effort/rollback. That said, replacing those two files fixed everything in staging for me; I am just being cautious.
(Assignee)

Comment 19

3 years ago
1) yes it includes the file copies 

2) Easy lets start with 2


Q
Flags: needinfo?(jlund)
(Reporter)

Comment 20

3 years ago
I spoke with Q over irc about this earlier. He will update the bug when we are ready to go.
Flags: needinfo?(jlund)
(Reporter)

Comment 21

3 years ago
hey Q, gentle ping. just trying to get a rough date/time so I can set my timeline. thanks :)
Flags: needinfo?(q)
(Reporter)

Comment 22

3 years ago
also, I proposed trying a smaller chunk of 2: B-2008-IX-0020 and B-2008-IX-0021, but it turns out B-2008-IX-0020 is still disabled as it hasn't been hit in buildduty queue. I have commented in Bug 768933 my efforts to enable it again but I suppose for this bug, only adding 1 host might not be enough.

Furthermore, I am just looking at hosts B-2008-IX-00{20-29} and it looks like a lot of them run once a day or even every couple days. It might be better to just do all 10 of them again if that's OK with you so we can get some faster results. Thanks in advance :)
(Assignee)

Comment 23

3 years ago
I ran into some trouble with 20 and it stopped the rollout.  10 sounds like a great number I will try to get this rolledout by Monday morning.
Flags: needinfo?(q)
(Assignee)

Comment 24

3 years ago
Roll-out is looking good. As soon a I verify all machines I will re-enable them in slavealloc and hand over.

Q
(Reporter)

Comment 25

3 years ago
\o/
Which 10 machines was this rolled out to? glandium was reporting issues with b-2008-ix-0087 in #releng.
glandium
11:41:32 did someone do something on b-2008-ix-0087 today?

glandium
11:41:42 ... like install MSVC 2013

pmoore
11:41:51 not me :)
11:42:06 is there a tracking bug for the slave?

glandium
11:42:22 795791 but it says nothing
11:42:46 it failed the last 3 builds it did with the same error we were getting when msvc 2013 started being deployed

pmoore
11:43:21 hmmmm curious
11:44:16 glandium: i can request a reimage of that slave, or i can loan it to you if you'd like to take a closer look at it
glandium
11:44:36 pmoore: too late for me to do that

glandium
11:45:40 pmoore: note that if that was a deliberate thing from gpo, it's likely to bust all the same after reimaging, so you'd rather figure out what's been done to it
glandium
11:46:19 iow, if i were you, i'd pull it off the pool, and ask around

pmoore
11:46:25 markco: Q: ^^

11:46:31 glandium: thanks for the heads up - will do
Disabled b-2008-ix-0082, b-2008-ix-0087, b-2008-ix-0092
Blocks: 873139, 795791, 805515
See comment 27: this is causing breakages on tbpl - have so far disabled 3 slaves.

I couldn't see a list of upgraded slaves in the bug - could you provide a list of the hosts that were upgraded so far? I think we'll need to roll them all back, and then restrategise about how to proceed.

Thanks,
Pete
Flags: needinfo?(q)
Flags: needinfo?(jlund)
(Assignee)

Comment 30

3 years ago
Ix- 20 -29 per comment 22
Flags: needinfo?(q)
Thanks Q!

Sounds like an unrelated issue then.

I'll unblock these bugs, and hand over to build duty...
No longer blocks: 795791, 805515, 873139
(Reporter)

Updated

3 years ago
Flags: needinfo?(jlund)
(Assignee)

Comment 32

3 years ago
Three slaves ( b-2008-ix-0082, b-2008-ix-0087, b-2008-ix-0092 ) had a ctvres file locked. This is fixed I made sure that checks won't lock files when checking versions.
(Assignee)

Comment 33

3 years ago
Jordan, 

 How is the test going ?
Flags: needinfo?(jlund)
(Reporter)

Comment 34

3 years ago
it looks like things are going really well. this is on Try so there are a lot of fails and it is hard to distinguish if this rollout is at fault.

I have noticed one situation where things look questionable:

on B-2008-IX-0028

* Mon Jul 28 14:56:49 2014
** fail
** log: http://buildbot-master87.srv.releng.scl3.mozilla.com:8101/builders/WINNT%205.2%20try%20leak%20test%20build/builds/11436/steps/compile/logs/stdio
** snippet: with - LINK : fatal error LNK1123: failure during conversion to COFF: file invalid or corrupt

* Wed Jul 30 10:58:49 2014
** pass
** log: http://buildbot-master87.srv.releng.scl3.mozilla.com:8101/builders/WINNT%205.2%20try%20leak%20test%20build/builds/11436/steps/compile/logs/stdio


this is odd as the later one (july 30th) passed. Is it possible that the July 28th job was in a bad or incomplete state?
Flags: needinfo?(jlund) → needinfo?(q)
(Reporter)

Comment 35

3 years ago
hmm B-2008-IX-0029 has a similar situation:

WINNT 5.2 try leak test build 7/29/2014 10:01:06 PM ran for 01:09:51 <- success
WINNT 5.2 try build           7/28/2014 2:36:49 PM  ran for 00:31:18 <- FAIL w/ fatal LNK1123 coff err
WINNT 5.2 try leak test build 7/28/2014 10:54:29 AM  ran for 01:09:16 <- success
WINNT 5.2 try build           7/28/2014 9:37:49 AM   ran for 1:09:46 <- success

both those slaves seem to have started with a fail but then corrected themselves?
(Reporter)

Comment 36

3 years ago
maybe 028 and 0029 were failing for the same lock file reason slaves were failing here: https://bugzilla.mozilla.org/show_bug.cgi?id=1019165#c32 

that might explain why they are green recently
(In reply to Q from comment #32)
> Three slaves ( b-2008-ix-0082, b-2008-ix-0087, b-2008-ix-0092 ) had a ctvres
> file locked. This is fixed I made sure that checks won't lock files when
> checking versions.

Thanks Q!

I'll put them back in the pool...

Pete
(Reporter)

Comment 38

3 years ago
checking on things today, everything seems much better. I am not seeing the:

"LINK : fatal error LNK1123: failure during conversion to COFF" error

there are a few jobs that failed but they don't appear repeated and fit in the norm of what appears on try.

Q, if you can confirm my theory on comment 36, I think we are good to continue the rollout at your convenience :)
(Assignee)

Comment 39

3 years ago
Jordon,

  I do believe they were victims of the same locking issue which is all fixed now.

Q
Flags: needinfo?(q)
(Reporter)

Comment 40

3 years ago
OK, I don't see a reason than to not continue.

Can you continue rolling out if you haven't done so already? Also let me know which machines have been updated and I'll keep an eye open on them.

Let me know if I can make your life easier in this process. poke me on irc if you want a fast response :)
Flags: needinfo?(q)
(Assignee)

Comment 41

3 years ago
Shall we roll them out 10 by 10 hoorah hoorah ?
Flags: needinfo?(q) → needinfo?(jlund)
(Reporter)

Comment 42

3 years ago
sounds good. hoorah!
Flags: needinfo?(jlund)
(Assignee)

Comment 43

3 years ago
Looks like my comment got eaten by by bad hotel wifi. 30 - 39 are rolled out.

Q
(Reporter)

Comment 44

3 years ago
(In reply to Q from comment #43)
> Looks like my comment got eaten by by bad hotel wifi. 30 - 39 are rolled out.
> 
> Q

fantastic, all ops normal after I took a peek at those slaves. let's keep on rolling :D
(Assignee)

Comment 45

3 years ago
Great. Quick question I believe all the machines so far have been try do we want to do a block of build before we do ALL of try ?
(Reporter)

Comment 46

3 years ago
good idea. sgtm.

In addition to it being good to use a different pool, try tends to have many failures so it has been tough deciphering if vs2013 is at cause for the failure.
(Assignee)

Comment 47

3 years ago
How about 09* (minus the one on loan to you)
(Assignee)

Updated

3 years ago
Flags: needinfo?(jlund)
(Reporter)

Comment 48

3 years ago
wfm
Flags: needinfo?(jlund)
(Assignee)

Comment 49

3 years ago
Deploying 
B-2008-IX-0090
B-2008-IX-0091
B-2008-IX-0092
B-2008-IX-0093
B-2008-IX-0094
B-2008-IX-0095
B-2008-IX-0096
B-2008-IX-0098
B-2008-IX-0099
Seeing issues with:

b-2008-ix-0093
https://tbpl.mozilla.org/php/getParsedLog.php?id=45328247&tree=Mozilla-Central
LINK : fatal error LNK1123: failure during conversion to COFF: file invalid or corrupt

b-2008-ix-0098
https://tbpl.mozilla.org/php/getParsedLog.php?id=45327983&tree=Mozilla-Inbound
LINK : fatal error LNK1123: failure during conversion to COFF: file invalid or corrupt

Looks like bug 1026870 again - could we roll back these changes for now?
Flags: needinfo?(q)
Actually, reading the bug it seems like they may self-resolve - happy for us to see how it goes then.
Flags: needinfo?(q)
b-2008-ix-0092
https://tbpl.mozilla.org/php/getParsedLog.php?id=45328842&tree=Mozilla-Inbound
b-2008-ix-0096
https://tbpl.mozilla.org/php/getParsedLog.php?id=45329571&tree=Mozilla-Central
(Assignee)

Comment 54

3 years ago
Re disabled in slave-alloc the above machines so slave reboot-er will let the install finish.
(Assignee)

Comment 55

3 years ago
Re enabling in Salve alloc
(Reporter)

Comment 56

3 years ago
status update:

tl;dr - gpo rollout should improve from last time. Q made some adjustments and is ready to continue installing vs2013. We shouldn't see the issues from comment 50-55

after above comments last week Q and I touched base on Wed:
09:34:33 <jlund> morning :) just reading the vs2013 bug comments, is there an intermittent failure found or was that just some fallback of the deploy itself?
09:34:59 <Q> It was  falback
09:35:18 <Q> The machines got reneabled in slave alloc and were rebooted before the install finished
09:35:33 <Q> it rolled back on the next reboot and reapplied
09:35:47 <jlund> yessss \o/ (well, you know what I mean)
09:44:16 <jlund> so. before I lose you to TZ difference, what's our state with rollout? I guess we don't want to do much more than 10 at a time eh or our load will suffer while they are disabled...
09:44:39 <jlund> what have we done in the past?
09:45:08 <Q> Yes I am finding the install takes long enough that rebooter has a good chance of firing before it is done
09:45:33 <Q> So 10 at a time. If the build pool goes well we could do 50+ in a day
09:46:07 <jlund> ok cool. and is there something I can do to make your life easier or is this left as a relops only thing?
09:46:21 — jlund feels guilty to put this work on you
09:46:39 <Q> I scripted it today :)
09:46:54 <jlund> you are a wizard
09:47:02 <Q> I just need you or some in releng to call it green in the new pool
09:49:29 <jlund> 0091 failed with vs2013 error at: b2g_mozilla-b2g30_v1_4_win32_gecko build	8/6/2014, 4:42:24 AM
09:49:50 <Q> Okay let me take a look
09:51:38 <jlund> same for 0092 @ b2g_mozilla-inbound_win32_gecko-debug build	8/6/2014, 5:44:16 AM
09:52:08 <jlund> (those are start times, both failed around an hour into the job)
09:52:36 <Q> Trying to determine if the install finished
09:52:55 <Q> Start times in what tz ?
09:53:04 <jlund> pdt sorry
09:56:03 <jlund> 0094 - WINNT 5.2 mozilla-aurora nightly	8/6/2014, 12:40:08 AM (ran for almost 5 hours before barfing on vs2013 error)
09:56:54 <Q> Crap that may be my fault
09:57:07 <Q> I may not be checking for the right processes pre install
09:57:33 <Q> I think the install over wrote things while running
09:58:38 <jlund> w/e. gotta crack a few eggs ... 
09:58:59 <jlund> as long as we know what is going on and have a fix for going forward :)
09:59:02 <Q> Yep that is what happened
09:59:07 <Q> Dammit
09:59:28 <Q> Okay I will add mroe stuff to check
09:59:43 <Q> But the install happened, heh
09:59:58 <jlund> should I give you more failures to cross reference?
10:00:10 <Q> If you have them yes please
10:00:37 <jlund> 0096 - WINNT 5.2 mozilla-central build	8/6/2014, 5:35:22 AM (ran for 1h 22m)
10:02:00 <jlund> then there is two from 0093 and 0098 but that was mentioned by ed in bug
10:02:25 <Q> Yeah I know about those
10:02:35 <jlund> that's all I can find
10:15:33 <Q> Okay, I am not too worried. I will check and see what I can do to avoid these
10:18:41 <jlund> cool. let me know when we continue or with what
10:19:36 <Q> Will do
10:21:38 <jlund> awesome


Then today we followed up with:
Monday, August 11th, 2014
11:36:27 <jlund> hey I just got pinged about status for vs2013.
12:20:19 <Q> Are we good with the stuff build pool that has been running
12:25:58 <jlund> yes. looking at 009* slaves, everything seems good. also tbpl bot has not reported issue since slave 0120 https://bugzil.la/1026870
12:26:16 <jlund> is gpo ready as per previous discussion?
12:32:57 <Q> Yep
12:40:30 <jlund> cool. I'll update the bug stating so and leave it to you when you're ready
(Assignee)

Comment 57

3 years ago
Starting on the b-2008-ix-004* slaves now since things look good with build
(Assignee)

Comment 58

3 years ago
b-2008-ix-004* are done going to 

b-2008-ix-005*
(Reporter)

Comment 59

3 years ago
coop discovered that of recent we have 72 win builders that have not taken a recent job[1]. It looks like our windows capacity is pretty healthy. Talking to arr and coop on irc, we concluded that we could much larger chunks at a time. Initially I made the call of doing 10 at at time based on previous hiccups and the time it takes for gpo to do its thing.

Let's try a chunks of 50 (still under the 72 idle). Even at that we shouldn't see hits in wait times.

Q, that sound good with you?


[1] https://bugzil.la/1053436
Flags: needinfo?(q)
I've asked mark to work on this today while Q is on pto.
Flags: needinfo?(q)
Assignee: q → mcornmesser
ix-50 through 59 are complete and re-enabled. 

Moving onto 60 through 69.
(Reporter)

Comment 62

3 years ago
(In reply to Mark Cornmesser [:markco] from comment #61)
> ix-50 through 59 are complete and re-enabled. 
> 
> Moving onto 60 through 69.

thanks markco! btw - re comment 59, we can do larger chunks than 10. So if it's doable or will make things faster, feel free to try a chunk of 50! :)
(Assignee)

Comment 63

3 years ago
Make sure these are disabled in slave alloc and that all jobs are complete before the push. This has to be done to not hose jobs or have slave rebooter hose the install.
(Reporter)

Comment 64

3 years ago
looks like b-2008-ix-0069 hit issues. I guess this happens if you don't do Comment 63?
With the exception of 79 and 80 60 through 88 has been completed and re-enabled. 

79 and 80 still had jobs on them at last check. 

Next portion will be 100 through 119.
(Assignee)

Comment 66

3 years ago
Mark. I am getting tpbl errors that jos are being killwd. Arr you making sure that there are no jobw running wjen doing the upgrade?
Flags: needinfo?(mcornmesser)
There were 3 machines, 67, 68, 69, in the morning that there was an overlap. After that there has not been any jobs running verified through manual check prior to setting the item level targeting. Additionally no machines where in process of the upgrade within 2 hour to 3 hours of comment 66.
Flags: needinfo?(mcornmesser)
100 through 119 has been complete. All but 108 has been re-enabled. 108 was not enabled initially.

Looking through the GPO there is a 2013 links GPO. I suspect that maybe some of the failed jobs launched prior to that GPO taking effect. Since that GPO is conditioned on the existence of VS 2013. I guess that should be an additional check prior to the machine being re-enabled.
I have disabled the following machines in slavealloc in preparation of finish up the iX upgrades today (we'll still need to do the seamicros):

b-2008-ix-0001
b-2008-ix-0002
b-2008-ix-0003
b-2008-ix-0004
b-2008-ix-0005
b-2008-ix-0006
b-2008-ix-0007
b-2008-ix-0008
b-2008-ix-0009
b-2008-ix-0010
b-2008-ix-0011
b-2008-ix-0012
b-2008-ix-0013
b-2008-ix-0014
b-2008-ix-0015
b-2008-ix-0016
b-2008-ix-0017
b-2008-ix-0018
b-2008-ix-0019

b-2008-ix-0120
b-2008-ix-0121
b-2008-ix-0122
b-2008-ix-0123
b-2008-ix-0124
b-2008-ix-0125
b-2008-ix-0126
b-2008-ix-0127
b-2008-ix-0128
b-2008-ix-0129
b-2008-ix-0130
b-2008-ix-0131
b-2008-ix-0132
b-2008-ix-0133
b-2008-ix-0134
b-2008-ix-0135
b-2008-ix-0136
b-2008-ix-0137
b-2008-ix-0138
b-2008-ix-0139
b-2008-ix-0140
b-2008-ix-0141
b-2008-ix-0142
b-2008-ix-0143
b-2008-ix-0144
b-2008-ix-0145
b-2008-ix-0146
b-2008-ix-0147
b-2008-ix-0148
b-2008-ix-0149
b-2008-ix-0150
b-2008-ix-0151
b-2008-ix-0152
b-2008-ix-0153
b-2008-ix-0154
b-2008-ix-0155
b-2008-ix-0156
b-2008-ix-0157
b-2008-ix-0158
The following have been updated and put back into service:

b-2008-ix-0120
b-2008-ix-0121
b-2008-ix-0122
b-2008-ix-0123
b-2008-ix-0124
b-2008-ix-0125

b-2008-ix-0127
b-2008-ix-0128
b-2008-ix-0129
b-2008-ix-0130
b-2008-ix-0131
b-2008-ix-0132
b-2008-ix-0133
b-2008-ix-0134
b-2008-ix-0135
b-2008-ix-0136
b-2008-ix-0137
b-2008-ix-0138
b-2008-ix-0139
b-2008-ix-0140
b-2008-ix-0141
b-2008-ix-0142
b-2008-ix-0143
b-2008-ix-0144
b-2008-ix-0145
b-2008-ix-0146
b-2008-ix-0147
b-2008-ix-0148
b-2008-ix-0149
b-2008-ix-0150
b-2008-ix-0151
b-2008-ix-0152
b-2008-ix-0153
b-2008-ix-0154
b-2008-ix-0155
b-2008-ix-0156
b-2008-ix-0157
b-2008-ix-0158

There are some oddities with b-2008-ix-0001 - b-2008-ix-0019 and b-2008-ix-0126 which I'll consult with Q on when he's online.

I've disabled the following machines in slavealloc in preparation for updates:

b-2008-ix-0161
b-2008-ix-0162
b-2008-ix-0163
b-2008-ix-0164
b-2008-ix-0165
b-2008-ix-0166
b-2008-ix-0167
b-2008-ix-0168
b-2008-ix-0169
b-2008-ix-0170
b-2008-ix-0171
b-2008-ix-0172
b-2008-ix-0173
b-2008-ix-0174
b-2008-ix-0175
b-2008-ix-0176
b-2008-ix-0177
b-2008-ix-0178
b-2008-ix-0179
b-2008-ix-0180
b-2008-ix-0181
b-2008-ix-0182
b-2008-ix-0183
b-2008-ix-0184
The following machines have been updated and put back into the pool:

b-2008-ix-0126

b-2008-ix-0161

b-2008-ix-0163
b-2008-ix-0164
b-2008-ix-0165
b-2008-ix-0166
b-2008-ix-0167
b-2008-ix-0168
b-2008-ix-0169
b-2008-ix-0170
b-2008-ix-0171
b-2008-ix-0172
b-2008-ix-0173
b-2008-ix-0174
b-2008-ix-0175
b-2008-ix-0176
b-2008-ix-0177
b-2008-ix-0178
b-2008-ix-0179
b-2008-ix-0180
b-2008-ix-0181

b-2008-ix-0183
b-2008-ix-0184

b-2008-ix-0162 is still chugging away at a pgo build.
b-2008-ix-0182 was disabled to begin with
b-2008-ix-0001 - b-2008-ix-0019 are still pending investigation from Q.

jlund: what seamicro machine(s) do you want to test on to make sure the update works there (before we push out to all seamicro nodes)?
Flags: needinfo?(jlund)
(Reporter)

Updated

3 years ago
Blocks: 1005426
(Reporter)

Comment 72

3 years ago
b-2008-sm-0001 has been disabled + rebooted. I've loaned to me. Could we test on this first. Let's keep it disabled and I will run it on my master after. I don't expect issues with using vs2010 after vs2013 install as it should be the same as the non seamicros equivalent. However I'll also be testing out the vs2013 compiler on it as it was brought to my attention that a new compiler might be different on seamicros.

arr: let me know when I can play with b-2008-sm-0001 :)
Flags: needinfo?(jlund)
b-2008-ix-0162 is now updated and back in the pool as well.

I was unable to get the GPO to apply to b-2008-sm-0001, so I've added that to the list of machines I was having Q take a look at (in addition to b-2008-ix-0001 - b-2008-ix-0019)

I think we have those 19 iX machines left as well as all the seamicro machines, then we'll be done with this bug.
Flags: needinfo?(q)
Q discovered that the low number iX machines were still listed in the SCL1 OU, not the SCL3 OU. Same for b-2008-sm-0001.  They've all been moved to the correct OU now and updated. The iX nodes have also been re-enabled in slavealloc.

jlund: letme know how the sm testing goes, and we can move ahead with the rollout there assuming all goes well.
Flags: needinfo?(q)
(Reporter)

Comment 75

3 years ago
(In reply to Amy Rich [:arich] [:arr] from comment #74)
> Q discovered that the low number iX machines were still listed in the SCL1
> OU, not the SCL3 OU. Same for b-2008-sm-0001.  They've all been moved to the
> correct OU now and updated. The iX nodes have also been re-enabled in
> slavealloc.
> 
> jlund: letme know how the sm testing goes, and we can move ahead with the
> rollout there assuming all goes well.

arr: I ran tests on b-2008-sm-0001 against vs2010 and vs2013. both passed like the ix counterparts. We should be good to go for roll-out remaining seamicros.

btw - I am re-enabling b-2008-sm-0001 into prod. it will be live with vs2013 on it.
(Reporter)

Comment 76

3 years ago
> arr: I ran tests on b-2008-sm-0001 against vs2010 and vs2013. both passed
> like the ix counterparts. We should be good to go for roll-out remaining
> seamicros.

NI'ing amy
Flags: needinfo?(arich)
All of the following machines have been updated and changed back to their original enable/disable state:

b-2008-sm-0002
b-2008-sm-0003

b-2008-sm-0005
b-2008-sm-0006
b-2008-sm-0007
b-2008-sm-0008
b-2008-sm-0009
b-2008-sm-0010
b-2008-sm-0011
b-2008-sm-0012
b-2008-sm-0013
b-2008-sm-0014
b-2008-sm-0015
b-2008-sm-0016
b-2008-sm-0017
b-2008-sm-0018
b-2008-sm-0019
b-2008-sm-0020
b-2008-sm-0021
b-2008-sm-0022
b-2008-sm-0023
b-2008-sm-0024
b-2008-sm-0025
b-2008-sm-0026
b-2008-sm-0027
b-2008-sm-0028
b-2008-sm-0029
b-2008-sm-0030

b-2008-sm-0032
b-2008-sm-0033
b-2008-sm-0034
b-2008-sm-0035
b-2008-sm-0036
b-2008-sm-0037
b-2008-sm-0038
b-2008-sm-0039
b-2008-sm-0040
b-2008-sm-0041
b-2008-sm-0042
b-2008-sm-0043
b-2008-sm-0044
b-2008-sm-0045
b-2008-sm-0046
b-2008-sm-0047
b-2008-sm-0048
b-2008-sm-0049
b-2008-sm-0050
b-2008-sm-0051
b-2008-sm-0052
b-2008-sm-0053
b-2008-sm-0054
b-2008-sm-0055
b-2008-sm-0056
b-2008-sm-0057
b-2008-sm-0058
b-2008-sm-0059
b-2008-sm-0060
b-2008-sm-0061
b-2008-sm-0062
b-2008-sm-0063
b-2008-sm-0064

The following tow hosts haven't been installed yet, so don't require updates:
b-2008-sm-0004
b-2008-sm-0031

I believe all that's left to do here is remove the whitelist so that all 2008r2 hosts have the VS2013 GPO enabled by default, correct, Q?
Assignee: mcornmesser → q
Flags: needinfo?(arich)
(Assignee)

Comment 78

3 years ago
Correct!
Q: great, can you verify and remove the whitelist, please?  Then releng should be good to do 2013 testing wide scale.
(Assignee)

Comment 80

3 years ago
White list removed machines have installs. Closing bug
Status: REOPENED → RESOLVED
Last Resolved: 3 years ago3 years ago
Resolution: --- → FIXED
(Reporter)

Updated

3 years ago
See Also: → bug 1055876
The comments from TBPL Robot in bug 1026870 indicate some of the slaves haven't gotten the install yet.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Reporter)

Comment 82

3 years ago
(In reply to Nick Thomas [:nthomas] from comment #81)
> The comments from TBPL Robot in bug 1026870 indicate some of the slaves
> haven't gotten the install yet.

FTR - the following slaves were hitting bug 1026870:

b-2008-sm-0042
b-2008-sm-0050
b-2008-sm-0038
b-2008-sm-0053
b-2008-sm-0008

all seamicro.

Q: can you confirm that all these were installed correctly before Aug 20th and that it stuck?

If they look good from gpo end, we might have intermittent fallback although 1026870 should only be hit when vs2010 uses the wrong cvtres.exe: https://bugzilla.mozilla.org/show_bug.cgi?id=1019165#c15
The verification method I was using during the installs was to run gpresult /R and look for:

        Install_links_VS2013
        Install_VS2013_builders

The 5 machines you mention above all pass that verification method. I'll let Q investigate to see if something wonky happened there and it's mis-reporting.
Flags: needinfo?(q)
(Assignee)

Comment 84

3 years ago
Taking a look.
Flags: needinfo?(q)
(Reporter)

Updated

3 years ago
See Also: → bug 1057022
(Assignee)

Comment 85

3 years ago
The installs are in place. I am still combing through logs to see if there is a locking or other issue.
We have too many burning fires on windows right now, and this is one of two of which are seamicro specific. Until we can get the other issues under control, we're disabling all of the seamicros since we have enough capacity and they're actually hurting us by being in the build and try pools.
(Reporter)

Updated

3 years ago
See Also: → bug 1068922

Updated

3 years ago
Whiteboard: [time=20:00] → [kanban:engops:https://kanbanize.com/ctrl_board/6/345] [time=20:00]

Updated

3 years ago
Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/345] [time=20:00] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/795] [kanban:engops:https://kanbanize.com/ctrl_board/6/345] [time=20:00]
(Assignee)

Comment 87

3 years ago
This is all set. The links are still set via GPO however VS2013 is being installed at the MDT level.
Status: REOPENED → RESOLVED
Last Resolved: 3 years ago3 years ago
Resolution: --- → FIXED
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/795] [kanban:engops:https://kanbanize.com/ctrl_board/6/345] [time=20:00] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/795] [time=20:00]
You need to log in before you can comment on or make changes to this bug.