Closed Bug 1019165 Opened 10 years ago Closed 10 years ago

install vs2013 across all our win64 machines and add a junction link for it in our GPO

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jlund, Assigned: q)

References

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/795] [time=20:00])

like ix-mn-w0864-001, we are going to need vs2013 installed across all of our win64 machines so that we can build ff32 and ff64 against that compiler. also like ix-mn-w0864-001, we will want to keep vs2010 installed so we can ride trains and other support. Along side this, our automation mozconfigs will be pointing to /c/tools/vs2013 for the Visual Studio 2013 compiler path so we will need a junction link added to our GPO: mklink /j c:\tools\vs2013 "C:\Program Files (x86)\Microsoft Visual Studio 12.0" q: does the above sound OK? see https://bugzilla.mozilla.org/show_bug.cgi?id=1009807#c12 for more details.
need info WRT comment 0 above
Flags: needinfo?(q)
To be pedantic, this needs to be MSVC 2013 Update 2, which fixes an important compile error for us: http://support.microsoft.com/kb/2927432
friendly ping -- we are ready to try vs2013 in automation. Once this bug is resolved, we can move forward. I'm available if there are any questions/concerns :)
I am testing this now and we can do a roll out Monday.
Flags: needinfo?(q)
Assignee: relops → q
Junction is looking good and ready for 2013 install: C:\Users\cltbld>dir c:\Tools\ Volume in drive C is OSDisk Volume Serial Number is 04C9-BAB6 Directory of c:\Tools 06/11/2014 07:27 PM <DIR> . 06/11/2014 07:27 PM <DIR> .. 11/01/2013 02:19 PM <DIR> sdks 06/11/2014 07:27 PM <JUNCTION> vs2013 [C:\Program Files (x86)\Microsoft Visual Studio 12.0] 0 File(s) 0 bytes 4 Dir(s) 184,692,510,720 bytes free C:\Users\cltbld>
There was some chatter on irc WRT this today. arr wanted clarification we are ready for rollout. We are all good from releng end, and would like this to be installed in small iterations across our machines or whatever Q deems 'safe'. Thank you!
How about groups of 10 starting at the beginning with lowest machine numbers first and working our way up. Install 10 wait a day then start 10 every four hours aftwr we confirm that first 10? Q
sounds perfect.
We should be ready to kick this off as soon as the most recent move train is done.
It looks like we have rolled out the junction everywhere, and C:\Program Files (x86)\Microsoft Visual Studio 12.0 on some slaves. Just b-2008-ix-002x ? If so, I think bug 1026870 is fallout.
Depends on: 1026870
just for some context, I can not say which slaves have vs2013 installed on them but none should be actually using them. The mozconfig patch[1] to switch over to vs2013 hasn't been applied. We should have vs2010 and vs2013 installed simultaneously on some slaves, my guess is the 10 from comment 10. Testing against ix-mn-w0864-001 was done to verify that vs2010 would still build with vs2013 installed along side it[2] but I guess something is not right. [1] - https://bugzilla.mozilla.org/attachment.cgi?id=8434728 [2] - https://bugzilla.mozilla.org/show_bug.cgi?id=1009807#c11
based off the recent comments in bug 1026870, the best course of action sounds like to re-image the machines. I'm sorry you have to revert the work you did. Thanks for doing this in a safe chunked manner.
I am going to put these hosts back in production now. I'll close this when that is done. We can re-open when we are ready to try again. see 'Bug 1027745 - please install a vs2013 along side vs2010 on one machine for staging testing', for details on the next POA.
See Also: → 1027745
all keys have been added bar b-2008-ix-0020 for some reason I can not reach it: https://bugzilla.mozilla.org/show_bug.cgi?id=768933#c13 looks like this host has had booting issues before but it was a re-image that solved it: https://bugzilla.mozilla.org/show_bug.cgi?id=944779 I am going to close this bug for now until things change. we can track b-2008-ix-0020 in its problem tracking bug.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
we are back in a state where we are ready to try installing vs2013 on a small chunk of windows builders again reason for being ready: https://bugzilla.mozilla.org/show_bug.cgi?id=1009807#c31 and https://bugzilla.mozilla.org/show_bug.cgi?id=1009807#c36 https://bugzilla.mozilla.org/show_bug.cgi?id=1009807#c38 I *believe* we still have the vs2013 junction across our machines but I am not sure if this[1] stuck after the backout. Either way that still needs to be there and we also need vs2013 installed again like before. The *difference* this time is we also need to modify some files. See here[2] and here[3] for context. so to be explicit, we need: 1) confirm that the /c/tools/vs2013 junction still exists across our windows pool 2) install 'MSVC 2013 Update 2' like we did before across a small chunk of machines (5-10) 3) post vs2013 install, cp two files from vs2013 paths to the vs2010 equivalents: $ cp C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\cvtres.exe C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\cvtres.exe $ cp C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\amd64\cvtres.exe C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\amd64\cvtres.exe Thanks in advance :D [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1019165#c5 [2] https://bugzilla.mozilla.org/show_bug.cgi?id=1009807#c30 [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1009807#c35
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
hey Q, just trying to set my self a timeline here. How is your queue (get it?) looking? would we be able to try this out on a select few tomorrow?
Flags: needinfo?(q)
I have a gpo ready to go. We should be able to push this out to select machines. Same deal of 10 machines at a time or do you have a list?
Flags: needinfo?(q)
Whiteboard: [time=20:00]
ya, sgtm! tx Q. 1) Just to confirm, does that GPO include copying and replacing the two files I mentioned in comment 15? 2) How complicated would it be to do 5, even 2 initially followed by chunks of 10? I ask because of what happened last time and I'd like to save you the effort/rollback. That said, replacing those two files fixed everything in staging for me; I am just being cautious.
1) yes it includes the file copies 2) Easy lets start with 2 Q
Flags: needinfo?(jlund)
I spoke with Q over irc about this earlier. He will update the bug when we are ready to go.
Flags: needinfo?(jlund)
hey Q, gentle ping. just trying to get a rough date/time so I can set my timeline. thanks :)
Flags: needinfo?(q)
also, I proposed trying a smaller chunk of 2: B-2008-IX-0020 and B-2008-IX-0021, but it turns out B-2008-IX-0020 is still disabled as it hasn't been hit in buildduty queue. I have commented in Bug 768933 my efforts to enable it again but I suppose for this bug, only adding 1 host might not be enough. Furthermore, I am just looking at hosts B-2008-IX-00{20-29} and it looks like a lot of them run once a day or even every couple days. It might be better to just do all 10 of them again if that's OK with you so we can get some faster results. Thanks in advance :)
I ran into some trouble with 20 and it stopped the rollout. 10 sounds like a great number I will try to get this rolledout by Monday morning.
Flags: needinfo?(q)
Roll-out is looking good. As soon a I verify all machines I will re-enable them in slavealloc and hand over. Q
\o/
Which 10 machines was this rolled out to? glandium was reporting issues with b-2008-ix-0087 in #releng.
glandium 11:41:32 did someone do something on b-2008-ix-0087 today? glandium 11:41:42 ... like install MSVC 2013 pmoore 11:41:51 not me :) 11:42:06 is there a tracking bug for the slave? glandium 11:42:22 795791 but it says nothing 11:42:46 it failed the last 3 builds it did with the same error we were getting when msvc 2013 started being deployed pmoore 11:43:21 hmmmm curious 11:44:16 glandium: i can request a reimage of that slave, or i can loan it to you if you'd like to take a closer look at it glandium 11:44:36 pmoore: too late for me to do that glandium 11:45:40 pmoore: note that if that was a deliberate thing from gpo, it's likely to bust all the same after reimaging, so you'd rather figure out what's been done to it glandium 11:46:19 iow, if i were you, i'd pull it off the pool, and ask around pmoore 11:46:25 markco: Q: ^^ 11:46:31 glandium: thanks for the heads up - will do
Disabled b-2008-ix-0082, b-2008-ix-0087, b-2008-ix-0092
See comment 27: this is causing breakages on tbpl - have so far disabled 3 slaves. I couldn't see a list of upgraded slaves in the bug - could you provide a list of the hosts that were upgraded so far? I think we'll need to roll them all back, and then restrategise about how to proceed. Thanks, Pete
Flags: needinfo?(q)
Flags: needinfo?(jlund)
Ix- 20 -29 per comment 22
Flags: needinfo?(q)
Thanks Q! Sounds like an unrelated issue then. I'll unblock these bugs, and hand over to build duty...
Flags: needinfo?(jlund)
Three slaves ( b-2008-ix-0082, b-2008-ix-0087, b-2008-ix-0092 ) had a ctvres file locked. This is fixed I made sure that checks won't lock files when checking versions.
Jordan, How is the test going ?
Flags: needinfo?(jlund)
it looks like things are going really well. this is on Try so there are a lot of fails and it is hard to distinguish if this rollout is at fault. I have noticed one situation where things look questionable: on B-2008-IX-0028 * Mon Jul 28 14:56:49 2014 ** fail ** log: http://buildbot-master87.srv.releng.scl3.mozilla.com:8101/builders/WINNT%205.2%20try%20leak%20test%20build/builds/11436/steps/compile/logs/stdio ** snippet: with - LINK : fatal error LNK1123: failure during conversion to COFF: file invalid or corrupt * Wed Jul 30 10:58:49 2014 ** pass ** log: http://buildbot-master87.srv.releng.scl3.mozilla.com:8101/builders/WINNT%205.2%20try%20leak%20test%20build/builds/11436/steps/compile/logs/stdio this is odd as the later one (july 30th) passed. Is it possible that the July 28th job was in a bad or incomplete state?
Flags: needinfo?(jlund) → needinfo?(q)
hmm B-2008-IX-0029 has a similar situation: WINNT 5.2 try leak test build 7/29/2014 10:01:06 PM ran for 01:09:51 <- success WINNT 5.2 try build 7/28/2014 2:36:49 PM ran for 00:31:18 <- FAIL w/ fatal LNK1123 coff err WINNT 5.2 try leak test build 7/28/2014 10:54:29 AM ran for 01:09:16 <- success WINNT 5.2 try build 7/28/2014 9:37:49 AM ran for 1:09:46 <- success both those slaves seem to have started with a fail but then corrected themselves?
maybe 028 and 0029 were failing for the same lock file reason slaves were failing here: https://bugzilla.mozilla.org/show_bug.cgi?id=1019165#c32 that might explain why they are green recently
(In reply to Q from comment #32) > Three slaves ( b-2008-ix-0082, b-2008-ix-0087, b-2008-ix-0092 ) had a ctvres > file locked. This is fixed I made sure that checks won't lock files when > checking versions. Thanks Q! I'll put them back in the pool... Pete
checking on things today, everything seems much better. I am not seeing the: "LINK : fatal error LNK1123: failure during conversion to COFF" error there are a few jobs that failed but they don't appear repeated and fit in the norm of what appears on try. Q, if you can confirm my theory on comment 36, I think we are good to continue the rollout at your convenience :)
Jordon, I do believe they were victims of the same locking issue which is all fixed now. Q
Flags: needinfo?(q)
OK, I don't see a reason than to not continue. Can you continue rolling out if you haven't done so already? Also let me know which machines have been updated and I'll keep an eye open on them. Let me know if I can make your life easier in this process. poke me on irc if you want a fast response :)
Flags: needinfo?(q)
Shall we roll them out 10 by 10 hoorah hoorah ?
Flags: needinfo?(q) → needinfo?(jlund)
sounds good. hoorah!
Flags: needinfo?(jlund)
Looks like my comment got eaten by by bad hotel wifi. 30 - 39 are rolled out. Q
(In reply to Q from comment #43) > Looks like my comment got eaten by by bad hotel wifi. 30 - 39 are rolled out. > > Q fantastic, all ops normal after I took a peek at those slaves. let's keep on rolling :D
Great. Quick question I believe all the machines so far have been try do we want to do a block of build before we do ALL of try ?
good idea. sgtm. In addition to it being good to use a different pool, try tends to have many failures so it has been tough deciphering if vs2013 is at cause for the failure.
How about 09* (minus the one on loan to you)
Flags: needinfo?(jlund)
wfm
Flags: needinfo?(jlund)
Deploying B-2008-IX-0090 B-2008-IX-0091 B-2008-IX-0092 B-2008-IX-0093 B-2008-IX-0094 B-2008-IX-0095 B-2008-IX-0096 B-2008-IX-0098 B-2008-IX-0099
Seeing issues with: b-2008-ix-0093 https://tbpl.mozilla.org/php/getParsedLog.php?id=45328247&tree=Mozilla-Central LINK : fatal error LNK1123: failure during conversion to COFF: file invalid or corrupt b-2008-ix-0098 https://tbpl.mozilla.org/php/getParsedLog.php?id=45327983&tree=Mozilla-Inbound LINK : fatal error LNK1123: failure during conversion to COFF: file invalid or corrupt Looks like bug 1026870 again - could we roll back these changes for now?
Flags: needinfo?(q)
Actually, reading the bug it seems like they may self-resolve - happy for us to see how it goes then.
Flags: needinfo?(q)
Re disabled in slave-alloc the above machines so slave reboot-er will let the install finish.
Re enabling in Salve alloc
status update: tl;dr - gpo rollout should improve from last time. Q made some adjustments and is ready to continue installing vs2013. We shouldn't see the issues from comment 50-55 after above comments last week Q and I touched base on Wed: 09:34:33 <jlund> morning :) just reading the vs2013 bug comments, is there an intermittent failure found or was that just some fallback of the deploy itself? 09:34:59 <Q> It was falback 09:35:18 <Q> The machines got reneabled in slave alloc and were rebooted before the install finished 09:35:33 <Q> it rolled back on the next reboot and reapplied 09:35:47 <jlund> yessss \o/ (well, you know what I mean) 09:44:16 <jlund> so. before I lose you to TZ difference, what's our state with rollout? I guess we don't want to do much more than 10 at a time eh or our load will suffer while they are disabled... 09:44:39 <jlund> what have we done in the past? 09:45:08 <Q> Yes I am finding the install takes long enough that rebooter has a good chance of firing before it is done 09:45:33 <Q> So 10 at a time. If the build pool goes well we could do 50+ in a day 09:46:07 <jlund> ok cool. and is there something I can do to make your life easier or is this left as a relops only thing? 09:46:21 — jlund feels guilty to put this work on you 09:46:39 <Q> I scripted it today :) 09:46:54 <jlund> you are a wizard 09:47:02 <Q> I just need you or some in releng to call it green in the new pool 09:49:29 <jlund> 0091 failed with vs2013 error at: b2g_mozilla-b2g30_v1_4_win32_gecko build 8/6/2014, 4:42:24 AM 09:49:50 <Q> Okay let me take a look 09:51:38 <jlund> same for 0092 @ b2g_mozilla-inbound_win32_gecko-debug build 8/6/2014, 5:44:16 AM 09:52:08 <jlund> (those are start times, both failed around an hour into the job) 09:52:36 <Q> Trying to determine if the install finished 09:52:55 <Q> Start times in what tz ? 09:53:04 <jlund> pdt sorry 09:56:03 <jlund> 0094 - WINNT 5.2 mozilla-aurora nightly 8/6/2014, 12:40:08 AM (ran for almost 5 hours before barfing on vs2013 error) 09:56:54 <Q> Crap that may be my fault 09:57:07 <Q> I may not be checking for the right processes pre install 09:57:33 <Q> I think the install over wrote things while running 09:58:38 <jlund> w/e. gotta crack a few eggs ... 09:58:59 <jlund> as long as we know what is going on and have a fix for going forward :) 09:59:02 <Q> Yep that is what happened 09:59:07 <Q> Dammit 09:59:28 <Q> Okay I will add mroe stuff to check 09:59:43 <Q> But the install happened, heh 09:59:58 <jlund> should I give you more failures to cross reference? 10:00:10 <Q> If you have them yes please 10:00:37 <jlund> 0096 - WINNT 5.2 mozilla-central build 8/6/2014, 5:35:22 AM (ran for 1h 22m) 10:02:00 <jlund> then there is two from 0093 and 0098 but that was mentioned by ed in bug 10:02:25 <Q> Yeah I know about those 10:02:35 <jlund> that's all I can find 10:15:33 <Q> Okay, I am not too worried. I will check and see what I can do to avoid these 10:18:41 <jlund> cool. let me know when we continue or with what 10:19:36 <Q> Will do 10:21:38 <jlund> awesome Then today we followed up with: Monday, August 11th, 2014 11:36:27 <jlund> hey I just got pinged about status for vs2013. 12:20:19 <Q> Are we good with the stuff build pool that has been running 12:25:58 <jlund> yes. looking at 009* slaves, everything seems good. also tbpl bot has not reported issue since slave 0120 https://bugzil.la/1026870 12:26:16 <jlund> is gpo ready as per previous discussion? 12:32:57 <Q> Yep 12:40:30 <jlund> cool. I'll update the bug stating so and leave it to you when you're ready
Starting on the b-2008-ix-004* slaves now since things look good with build
b-2008-ix-004* are done going to b-2008-ix-005*
coop discovered that of recent we have 72 win builders that have not taken a recent job[1]. It looks like our windows capacity is pretty healthy. Talking to arr and coop on irc, we concluded that we could much larger chunks at a time. Initially I made the call of doing 10 at at time based on previous hiccups and the time it takes for gpo to do its thing. Let's try a chunks of 50 (still under the 72 idle). Even at that we shouldn't see hits in wait times. Q, that sound good with you? [1] https://bugzil.la/1053436
Flags: needinfo?(q)
I've asked mark to work on this today while Q is on pto.
Flags: needinfo?(q)
Assignee: q → mcornmesser
ix-50 through 59 are complete and re-enabled. Moving onto 60 through 69.
(In reply to Mark Cornmesser [:markco] from comment #61) > ix-50 through 59 are complete and re-enabled. > > Moving onto 60 through 69. thanks markco! btw - re comment 59, we can do larger chunks than 10. So if it's doable or will make things faster, feel free to try a chunk of 50! :)
Make sure these are disabled in slave alloc and that all jobs are complete before the push. This has to be done to not hose jobs or have slave rebooter hose the install.
looks like b-2008-ix-0069 hit issues. I guess this happens if you don't do Comment 63?
With the exception of 79 and 80 60 through 88 has been completed and re-enabled. 79 and 80 still had jobs on them at last check. Next portion will be 100 through 119.
Mark. I am getting tpbl errors that jos are being killwd. Arr you making sure that there are no jobw running wjen doing the upgrade?
Flags: needinfo?(mcornmesser)
There were 3 machines, 67, 68, 69, in the morning that there was an overlap. After that there has not been any jobs running verified through manual check prior to setting the item level targeting. Additionally no machines where in process of the upgrade within 2 hour to 3 hours of comment 66.
Flags: needinfo?(mcornmesser)
100 through 119 has been complete. All but 108 has been re-enabled. 108 was not enabled initially. Looking through the GPO there is a 2013 links GPO. I suspect that maybe some of the failed jobs launched prior to that GPO taking effect. Since that GPO is conditioned on the existence of VS 2013. I guess that should be an additional check prior to the machine being re-enabled.
I have disabled the following machines in slavealloc in preparation of finish up the iX upgrades today (we'll still need to do the seamicros): b-2008-ix-0001 b-2008-ix-0002 b-2008-ix-0003 b-2008-ix-0004 b-2008-ix-0005 b-2008-ix-0006 b-2008-ix-0007 b-2008-ix-0008 b-2008-ix-0009 b-2008-ix-0010 b-2008-ix-0011 b-2008-ix-0012 b-2008-ix-0013 b-2008-ix-0014 b-2008-ix-0015 b-2008-ix-0016 b-2008-ix-0017 b-2008-ix-0018 b-2008-ix-0019 b-2008-ix-0120 b-2008-ix-0121 b-2008-ix-0122 b-2008-ix-0123 b-2008-ix-0124 b-2008-ix-0125 b-2008-ix-0126 b-2008-ix-0127 b-2008-ix-0128 b-2008-ix-0129 b-2008-ix-0130 b-2008-ix-0131 b-2008-ix-0132 b-2008-ix-0133 b-2008-ix-0134 b-2008-ix-0135 b-2008-ix-0136 b-2008-ix-0137 b-2008-ix-0138 b-2008-ix-0139 b-2008-ix-0140 b-2008-ix-0141 b-2008-ix-0142 b-2008-ix-0143 b-2008-ix-0144 b-2008-ix-0145 b-2008-ix-0146 b-2008-ix-0147 b-2008-ix-0148 b-2008-ix-0149 b-2008-ix-0150 b-2008-ix-0151 b-2008-ix-0152 b-2008-ix-0153 b-2008-ix-0154 b-2008-ix-0155 b-2008-ix-0156 b-2008-ix-0157 b-2008-ix-0158
The following have been updated and put back into service: b-2008-ix-0120 b-2008-ix-0121 b-2008-ix-0122 b-2008-ix-0123 b-2008-ix-0124 b-2008-ix-0125 b-2008-ix-0127 b-2008-ix-0128 b-2008-ix-0129 b-2008-ix-0130 b-2008-ix-0131 b-2008-ix-0132 b-2008-ix-0133 b-2008-ix-0134 b-2008-ix-0135 b-2008-ix-0136 b-2008-ix-0137 b-2008-ix-0138 b-2008-ix-0139 b-2008-ix-0140 b-2008-ix-0141 b-2008-ix-0142 b-2008-ix-0143 b-2008-ix-0144 b-2008-ix-0145 b-2008-ix-0146 b-2008-ix-0147 b-2008-ix-0148 b-2008-ix-0149 b-2008-ix-0150 b-2008-ix-0151 b-2008-ix-0152 b-2008-ix-0153 b-2008-ix-0154 b-2008-ix-0155 b-2008-ix-0156 b-2008-ix-0157 b-2008-ix-0158 There are some oddities with b-2008-ix-0001 - b-2008-ix-0019 and b-2008-ix-0126 which I'll consult with Q on when he's online. I've disabled the following machines in slavealloc in preparation for updates: b-2008-ix-0161 b-2008-ix-0162 b-2008-ix-0163 b-2008-ix-0164 b-2008-ix-0165 b-2008-ix-0166 b-2008-ix-0167 b-2008-ix-0168 b-2008-ix-0169 b-2008-ix-0170 b-2008-ix-0171 b-2008-ix-0172 b-2008-ix-0173 b-2008-ix-0174 b-2008-ix-0175 b-2008-ix-0176 b-2008-ix-0177 b-2008-ix-0178 b-2008-ix-0179 b-2008-ix-0180 b-2008-ix-0181 b-2008-ix-0182 b-2008-ix-0183 b-2008-ix-0184
The following machines have been updated and put back into the pool: b-2008-ix-0126 b-2008-ix-0161 b-2008-ix-0163 b-2008-ix-0164 b-2008-ix-0165 b-2008-ix-0166 b-2008-ix-0167 b-2008-ix-0168 b-2008-ix-0169 b-2008-ix-0170 b-2008-ix-0171 b-2008-ix-0172 b-2008-ix-0173 b-2008-ix-0174 b-2008-ix-0175 b-2008-ix-0176 b-2008-ix-0177 b-2008-ix-0178 b-2008-ix-0179 b-2008-ix-0180 b-2008-ix-0181 b-2008-ix-0183 b-2008-ix-0184 b-2008-ix-0162 is still chugging away at a pgo build. b-2008-ix-0182 was disabled to begin with b-2008-ix-0001 - b-2008-ix-0019 are still pending investigation from Q. jlund: what seamicro machine(s) do you want to test on to make sure the update works there (before we push out to all seamicro nodes)?
Flags: needinfo?(jlund)
b-2008-sm-0001 has been disabled + rebooted. I've loaned to me. Could we test on this first. Let's keep it disabled and I will run it on my master after. I don't expect issues with using vs2010 after vs2013 install as it should be the same as the non seamicros equivalent. However I'll also be testing out the vs2013 compiler on it as it was brought to my attention that a new compiler might be different on seamicros. arr: let me know when I can play with b-2008-sm-0001 :)
Flags: needinfo?(jlund)
b-2008-ix-0162 is now updated and back in the pool as well. I was unable to get the GPO to apply to b-2008-sm-0001, so I've added that to the list of machines I was having Q take a look at (in addition to b-2008-ix-0001 - b-2008-ix-0019) I think we have those 19 iX machines left as well as all the seamicro machines, then we'll be done with this bug.
Flags: needinfo?(q)
Q discovered that the low number iX machines were still listed in the SCL1 OU, not the SCL3 OU. Same for b-2008-sm-0001. They've all been moved to the correct OU now and updated. The iX nodes have also been re-enabled in slavealloc. jlund: letme know how the sm testing goes, and we can move ahead with the rollout there assuming all goes well.
Flags: needinfo?(q)
(In reply to Amy Rich [:arich] [:arr] from comment #74) > Q discovered that the low number iX machines were still listed in the SCL1 > OU, not the SCL3 OU. Same for b-2008-sm-0001. They've all been moved to the > correct OU now and updated. The iX nodes have also been re-enabled in > slavealloc. > > jlund: letme know how the sm testing goes, and we can move ahead with the > rollout there assuming all goes well. arr: I ran tests on b-2008-sm-0001 against vs2010 and vs2013. both passed like the ix counterparts. We should be good to go for roll-out remaining seamicros. btw - I am re-enabling b-2008-sm-0001 into prod. it will be live with vs2013 on it.
> arr: I ran tests on b-2008-sm-0001 against vs2010 and vs2013. both passed > like the ix counterparts. We should be good to go for roll-out remaining > seamicros. NI'ing amy
Flags: needinfo?(arich)
All of the following machines have been updated and changed back to their original enable/disable state: b-2008-sm-0002 b-2008-sm-0003 b-2008-sm-0005 b-2008-sm-0006 b-2008-sm-0007 b-2008-sm-0008 b-2008-sm-0009 b-2008-sm-0010 b-2008-sm-0011 b-2008-sm-0012 b-2008-sm-0013 b-2008-sm-0014 b-2008-sm-0015 b-2008-sm-0016 b-2008-sm-0017 b-2008-sm-0018 b-2008-sm-0019 b-2008-sm-0020 b-2008-sm-0021 b-2008-sm-0022 b-2008-sm-0023 b-2008-sm-0024 b-2008-sm-0025 b-2008-sm-0026 b-2008-sm-0027 b-2008-sm-0028 b-2008-sm-0029 b-2008-sm-0030 b-2008-sm-0032 b-2008-sm-0033 b-2008-sm-0034 b-2008-sm-0035 b-2008-sm-0036 b-2008-sm-0037 b-2008-sm-0038 b-2008-sm-0039 b-2008-sm-0040 b-2008-sm-0041 b-2008-sm-0042 b-2008-sm-0043 b-2008-sm-0044 b-2008-sm-0045 b-2008-sm-0046 b-2008-sm-0047 b-2008-sm-0048 b-2008-sm-0049 b-2008-sm-0050 b-2008-sm-0051 b-2008-sm-0052 b-2008-sm-0053 b-2008-sm-0054 b-2008-sm-0055 b-2008-sm-0056 b-2008-sm-0057 b-2008-sm-0058 b-2008-sm-0059 b-2008-sm-0060 b-2008-sm-0061 b-2008-sm-0062 b-2008-sm-0063 b-2008-sm-0064 The following tow hosts haven't been installed yet, so don't require updates: b-2008-sm-0004 b-2008-sm-0031 I believe all that's left to do here is remove the whitelist so that all 2008r2 hosts have the VS2013 GPO enabled by default, correct, Q?
Assignee: mcornmesser → q
Flags: needinfo?(arich)
Correct!
Q: great, can you verify and remove the whitelist, please? Then releng should be good to do 2013 testing wide scale.
White list removed machines have installs. Closing bug
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
See Also: → 1055876
The comments from TBPL Robot in bug 1026870 indicate some of the slaves haven't gotten the install yet.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Nick Thomas [:nthomas] from comment #81) > The comments from TBPL Robot in bug 1026870 indicate some of the slaves > haven't gotten the install yet. FTR - the following slaves were hitting bug 1026870: b-2008-sm-0042 b-2008-sm-0050 b-2008-sm-0038 b-2008-sm-0053 b-2008-sm-0008 all seamicro. Q: can you confirm that all these were installed correctly before Aug 20th and that it stuck? If they look good from gpo end, we might have intermittent fallback although 1026870 should only be hit when vs2010 uses the wrong cvtres.exe: https://bugzilla.mozilla.org/show_bug.cgi?id=1019165#c15
The verification method I was using during the installs was to run gpresult /R and look for: Install_links_VS2013 Install_VS2013_builders The 5 machines you mention above all pass that verification method. I'll let Q investigate to see if something wonky happened there and it's mis-reporting.
Flags: needinfo?(q)
Taking a look.
Flags: needinfo?(q)
See Also: → 1057022
The installs are in place. I am still combing through logs to see if there is a locking or other issue.
We have too many burning fires on windows right now, and this is one of two of which are seamicro specific. Until we can get the other issues under control, we're disabling all of the seamicros since we have enough capacity and they're actually hurting us by being in the build and try pools.
See Also: → 1068922
Whiteboard: [time=20:00] → [kanban:engops:https://kanbanize.com/ctrl_board/6/345] [time=20:00]
Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/345] [time=20:00] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/795] [kanban:engops:https://kanbanize.com/ctrl_board/6/345] [time=20:00]
This is all set. The links are still set via GPO however VS2013 is being installed at the MDT level.
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/795] [kanban:engops:https://kanbanize.com/ctrl_board/6/345] [time=20:00] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/795] [time=20:00]
You need to log in before you can comment on or make changes to this bug.