Closed Bug 1063018 Opened 10 years ago Closed 10 years ago

Install vs2013 Update 3, NSIS 3.0a2, and Mercurial 2.9.2 on 5 machines for testing

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jlund, Assigned: markco)

References

Details

No description provided.
I've grabbed b-2008-ix-000[1-5]. b-2008-ix-0001 and b-2008-ix-0004 were not running a job so they are they are fully disabled and ready to be worked with. Once 2, 3, and 5 finish their current job and reboot, they will be ready too. I also have a dev staging machine: ix-mn-w0864-001 that we can use to create the image with. that is ready to be worked with at any time too.
FTR - b-2008-ix-0002, b-2008-ix-0003, and b-2008-ix-0005 are fully disabled and ready to be worked on
Assignee: nobody → mcornmesser
Depends on: 1063372
The 5 machines are now being installed with VS 2013 update 3. The git, nsis, and VS 2013 GPOs will not apply to these machines. I will check back in the morning to make sure the machines have completed the install.
(In reply to Mark Cornmesser [:markco] from comment #3) > The 5 machines are now being installed with VS 2013 update 3. \o/
0001 and 0003 did not seem to complete correctly. I am looking into those 2 machines now.
These machines are good to go now. The got into an odd state where the deployment was not complete the install. After the drives were wiped the installation completed.
sweet. I can't connect to 003, ping 100% packet loss. I connected the other 4 to my dev master to test before enabling in prod. unfortunately I can not get backup prod keys and copy staging keys to these slaves (sftp and scp does not work[1]) so steps like upload will fail. we can still monitor most the others like hg_update, symbols, compile, etc for now. [1] https://wiki.mozilla.org/ReleaseEngineering/How_To/Set_Up_a_Freshly_Imaged_Slave
triggered a dozen builds from m-a, m-b, and m-c. will check back in the morning
For 0003 I have just wiped the drive and attempting another install. Though it is behaving kind of odd. I can't tell if it is a drive issue or an over all slowness between it and WDS1. I will check back on it in the morning. Perhaps we should pick another machine for this since 3 is seeming odd at the moment. Sftp should work. What is the message when it fails?
0003 i still out. For some reason the installation will not complete on this machine. I am looking into it this morning.
hmm, ok I'll ignore 003 for the moment and unlock it from my master. here was the cmds for copying keys I tried: cltbld@B-2008-IX-0001 ~ $ "C:\mozilla-build\msys\bin\scp" -o 'StrictHostKeyChecking no' -o 'BatchMode=no' -r cltbld@b-linux64-hp-0020.build.mozilla.org:~/.ssh .ssh ssh: connect to host b-linux64-hp-0020.build.mozilla.org port 22: Bad file number cltbld@B-2008-IX-0001 ~ $ sftp b-2008-ix-0183.build.mozilla.org:.ssh/* .ssh/ Connecting to b-2008-ix-0183.build.mozilla.org... ssh: connect to host b-2008-ix-0183.build.mozilla.org port 22: Bad file number Connection closed
looks like all my builds failed over night. I think this was my master's fault. it failed at tooltool step. e.g.: sh: c:/builds/moz2_slave/m-beta-w32-d-00000000000000000/tools/scripts/tooltool/tooltool_wrapper.sh: No such file or directory reason it doesn't exist is because the tools repo being cloned is very old: 'hg' 'clone' 'https://hg.mozilla.org/users/stage-ffxbld/tools' 'tools' will figure out why and point these builds to the correct repo
There seems to be an issue with ssh from within try and build vlans to machines in the try vlan. Which is causing the sftp failures. Do we know if this is a new behavior?
0003 is back in play. Following removal of its account from the domain,a new image deployment completed successfully.
sftp will work if in same vlan however there were no win staging keys available in win staging build pool. hand made keys and copied them across. connected all 5 to my master. fixed tools error. triggered a dozen builds
I *think* jobs against these slaves using vs2010 are looking good. I've just triggered a bunch of vs2013 jobs against win32 and win64 to see how update 3 fairs. will post findings later today for both vs2010 and vs2013
some results: WINNT 5.2 mozilla-beta build: 'hg_update': ['37 mins, 46 secs', '18 mins, 31 secs', '19 mins, 4 secs', '37 mins, 39 secs', '18 mins, 14 secs '] 'compile': ['3 hrs, 52 mins, 36 secs, '3 hrs, 51 mins, 8 secs', '3 hrs, 51 mins, 8 secs', '3 hrs, 52 mins, 6 secs', '3 hrs, 53 mins, 36 secs', '3 hrs, 48 mins, 39 secs'] 'buildsymbols': ['6 mins, 54 secs', ' 7 mins, 1 secs', '6 mins, 55 secs', '6 mins, 57 secs', '6 mins, 56 secs'] 'package-tests': ['9 mins, 48 secs', '9 mins, 43 secs', '9 mins, 46 secs', '9 mins, 49 secs', '9 mins, 40 secs'] WINNT 5.2 mozilla-beta leak test build: 'hg_update': ['37 mins, 29 secs'] 'compile': ['1 hrs, 11 mins, 25 secs'] 'buildsymbols': ['9 mins, 13 secs'] 'package-tests': ['9 mins, 28 secs'] WINNT 5.2 mozilla-aurora nightly: 'hg_update': ['19 mins, 57 secs', '18 mins, 31 secs'] 'compile': ['4 hrs, 45 secs', '7 mins, 25 secs', '4 hrs, 44 secs'] 'buildsymbols': ['7 mins, 25 secs', '7 mins, 25 secs'] WINNT 5.2 mozilla-central nightly: 'hg_update': ['18 mins, 46 secs', '20 mins, 44 secs', '33 mins, 7 secs', '18 mins, 48 secs'] 'compile': ['3 hrs, 12 secs', ' 3 hrs, 34 secs', '3 hrs, 2 secs', '3 hrs, 22 secs'] 'buildsymbols': ['6 mins, 48 secs', ' 7 mins, 26 secs', '6 mins, 55 secs', '6 mins, 55 secs'] I think we are good to put these in production. all the steps bar hg update were very consistant across the branches I tested. I think with hg and the network things may vary so that might explain the two jobs that took ~35 min to clone instead of the usual 18. either way, it doesn't appear like we have bg procs or more than one step taking longer than usual. markco, when I put staging keys on these machines they seem to have been overwritten with prod keys again. is there pgo logic to do so? dmajor I also ran the following builds with vs2013 update 3. they all seemed to pass. do u want logs for any of them? If so, I can upload them somewhere public: WINNT 5.2 mozilla-central leak test build WINNT 6.1 x86-64 mozilla-central build WINNT 6.1 x86-64 mozilla-central nightly (pgo) WINNT 5.2 mozilla-central nightly (pgo)
Flags: needinfo?(mcornmesser)
Flags: needinfo?(dmajor)
(In reply to Jordan Lund (:jlund) from comment #17) > dmajor I also ran the following builds with vs2013 update 3. they all seemed > to pass. do u want logs for any of them? If so, I can upload them somewhere If you could crack open one of each flavor and make sure it contains "18.00.30723", that's all I need to feel comfortable going ahead.
Flags: needinfo?(dmajor)
I'll look into it. Most likely i will apply an item level targeting based on the machine name to prevent any keys from being copied over.
Flags: needinfo?(mcornmesser)
Item level targeting is in place. Jlund: Could you copy over the keys to a machine and see if they stick this time?
(In reply to David Major [:dmajor] from comment #18) > (In reply to Jordan Lund (:jlund) from comment #17) > > dmajor I also ran the following builds with vs2013 update 3. they all seemed > > to pass. do u want logs for any of them? If so, I can upload them somewhere > If you could crack open one of each flavor and make sure it contains > "18.00.30723", that's all I need to feel comfortable going ahead. I see the following in each log: "Microsoft (R) C/C++ Optimizing Compiler Version 18.00.30723 for x86"
Is x86 in all the logs? The x86-64 ones ought to say "18.00.30723 for x64"
(In reply to David Major [:dmajor] from comment #22) > Is x86 in all the logs? The x86-64 ones ought to say "18.00.30723 for x64" my mistake. it says x64 for: WINNT 6.1 x86-64 mozilla-central build WINNT 6.1 x86-64 mozilla-central nightly (pgo) B-2008-IX-000{1-5} machines have been clobbered of vs2013 builds, asserted to have prod keys, and enabled back into production based on positive results from comment 17 left notes on slavealloc and keeping this bug open while they are in the wild
Blocks: 1068922
these 5 machines are also going to test new of hg and NSIS as part of the image. markco will update with times of when these changes were applied.
Blocks: 989531, 1056981
Summary: Install vs2013 Update 3 on 5 machines for testing → Install vs2013 Update 3, NSIS 3.0a2, and Mercurial 2.9.2 on 5 machines for testing
The Mercurial 2.9.2 GPO is now going out to these machines. Due to failures nsis GPO is no longer being pushed and is being removed.
No longer blocks: 1056981
Depends on: 1056981
0118, 0135, 0136, and 0139 have been re-imaged and re-enabled. 0144 through 0158 has been disabled. Note 0159 and 0160 seem to not to exists.
0144 through 158 has re-imaged and been re-enabled. With exception of 0150 due to still a build going. 0160 through 0184 has been disabled.
Correction not 0160 but 0161.
0144 through 0184 has been completed and re-enabled. Except for 180 which currently unreachable. 0180 and 0038 are the only two machines left See bug 1078350.
Depends on: 1078350
No longer depends on: 1056981
Looks like 0103 and 0002 are either still waiting for a reimage, or still waiting for a reenable.
0103 and 0002 are now re-enabled.
No longer blocks: 1062877
Are we done here? Can this be closed?
This bug is finished. however, the nsis is still an open issue but is covered in a different bug.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.