Closed Bug 1416867 Opened 8 years ago Closed 8 years ago

Stand up 30 VMs each of w7 and w10 on moonshots

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: fubar, Assigned: markco)

References

Details

I don't think we filed a bug for w7/w10 VMs; dupe if we did! We need 30 of each; w10 first, then w7.
while we are at it, can we also do: 5 machines native install (no xen) 2 machines setup as loaners that rwood and jmaher can access I want to get ahead of the game!
Mark, let me know which 2 VMs will be the loaners and I'll update VPN access for rwood. If you need a hand with the loaner setup, dhouse can help (:buildduty will likely be offline by then). Please prioritize the 2 loaners and getting w7 up over trying any bare metal installs. We *do* want to see if they reduce noise, but they require network driver changes and possible vlan/fw changes, and I don't want that to delay w7. Q will have more details on what changes need to be made.
t-w1064-xe-226 through t-w1064-xe-255 have been set up. I will set up t-w1064-xe-254 and 255 as loaners. What all is needed for the loaner? VNC password change (No RDP access because it may have adverse effects on the graphic card setup) Root password change Disable the tasks involved in starting generic worker and picking up tests Remove releng jumphost firewall restrictions. Other items needed?
(In reply to Mark Cornmesser [:markco] from comment #3) > t-w1064-xe-226 through t-w1064-xe-255 have been set up. > > I will set up t-w1064-xe-254 and 255 as loaners. Added to vpn_releng_loan for :rwood. > What all is needed for the loaner? > > VNC password change (No RDP access because it may have adverse effects on > the graphic card setup) > Root password change > Disable the tasks involved in starting generic worker and picking up tests > Remove releng jumphost firewall restrictions. > Other items needed? Dave will work with you on these bits.
254 and 255, 10.49.42.100 and 10.49.42.99, are set up for loaners with the releng loaner passwords.
(In reply to Mark Cornmesser [:markco] from comment #5) > 254 and 255, 10.49.42.100 and 10.49.42.99, are set up for loaners with the > releng loaner passwords. Just to note instead of cltbld the username is GenericWorker.
the win10 machines are not accepting jobs when I pushed to try- are these not connected to taskcluster? https://treeherder.mozilla.org/#/jobs?repo=try&revision=4ba20967f7529d5e08d4869c962a7b8452469721
Flags: needinfo?(mcornmesser)
Looking at the task there is: "provisionerId: releng-hardware". Which in testing I was not able to get to work. I will jump on a machine and manual change that in the config file and see what happens.
Flags: needinfo?(mcornmesser)
The manual change the file on the 226 and it did not pick up a test. I am going to reinstall 227 with releng-hardware in the config file before the GenericWorker account is created. If that doesn't work I will NI pmoore.
We end up with this error from the GenericWorker: "Client with clientId 'project/releng/worker/releng-hardware/gecko-t-win10-64-hw' not found\n----\nmethod: claimWork\nerrorCode: AuthenticationFailed\nstatusCode: 401\ntime: 2017-11-15T19:01:06.495Z" I am currently seeking help in #taskcluster. If it is possible to force the scl3-puppet provisionor for tests the other VMs will pick up and run tests.
A new workerid and token was generated for gecko-t-win10-64-hw to work with releng-hardware provisionor. I am kicking off another install on 227, and if it works I will go through reinstall the rest and redo the loaners.
Just an update. With the changes in the generic worker config file, the json became malformed. I think I have sorted out the issues and now attempting another install.
226 through 255 are now reinstalling with a good GenericWorker config file pointed to the releng-hardware provisionor. Thye should be good to go with in an hour or two. I will set up 254 and 255 as loaners tomorrow.
one test is running but fails 2/2 times: https://public-artifacts.taskcluster.net/JqLfWWSMSuyiO5h6kFDDhw/0/public/logs/live_backing.log the failure is a 403 and I suspect we don't have firewall rules setup properly on these machines.
I am able to jump on a machine and pull from the url that is returning the 403 during the test. I will looks through the logs and see if anything is in the test environment that may cause an issue.
Also reaching in #releng for suggestions. The more I think about it the less I think it is a firewall configuration since it is an HTTP response, so the machine is able to reach out to the site and the site is saying no.
It is abit odd. The error is coming up here: HTTP error 403 while getting http://pypi.pvt.build.mozilla.org/pub/mozsystemmonitor-0.3.tar.gz (from http://pypi.pvt.build.mozilla.org/pub/) But other package were downloaded successfully: 14:56:46 INFO - Downloading http://pypi.pvt.build.mozilla.org/pub/psutil-3.1.1-cp27-none-win32.whl (87kB) 14:56:47 INFO - Installing collected packages: psutil 14:56:47 INFO - Successfully installed psutil-3.1.1
Outside of the test virtual environment I get the same response using wget: C:\Users\GenericWorker>wget http://pypi.pvt.build.mozilla.org/pub/mozsystemmonitor-0.3.tar.gz --2017-11-16 17:27:51-- http://pypi.pvt.build.mozilla.org/pub/mozsystemmonitor-0.3.tar.gz Resolving pypi.pvt.build.mozilla.org (pypi.pvt.build.mozilla.org)... 10.22.74.160 Connecting to pypi.pvt.build.mozilla.org (pypi.pvt.build.mozilla.org)|10.22.74.160|:80... connected. HTTP request sent, awaiting response... 403 Forbidden 2017-11-16 17:27:51 ERROR 403: Forbidden. C:\Users\GenericWorker>wget http://pypi.pvt.build.mozilla.org/pub/psutil-3.1.1-cp27-none-win32.whl --2017-11-16 17:30:34-- http://pypi.pvt.build.mozilla.org/pub/psutil-3.1.1-cp27-none-win32.whl Resolving pypi.pvt.build.mozilla.org (pypi.pvt.build.mozilla.org)... 10.22.74.160 Connecting to pypi.pvt.build.mozilla.org (pypi.pvt.build.mozilla.org)|10.22.74.160|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 87554 (86K) [application/x-troff-man] Saving to: 'psutil-3.1.1-cp27-none-win32.whl'
:catlee, can you help us figure out why we get 403 for certain packages in pypi.pvt.build.mozilla.org ?
Flags: needinfo?(catlee)
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #19) > :catlee, can you help us figure out why we get 403 for certain packages in > pypi.pvt.build.mozilla.org ? bug 1415703
that is a secure bug, I don't have access
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #21) > that is a secure bug, I don't have access sorry, fixed! the fix to it is on its way out now, fwiw.
Flags: needinfo?(catlee)
254 is set up as loaner. I will set 255 when I can catch it not running a test.
255 is now set up a Win 10 loaner. t-w732-xe-256 through t-w732-xe-262 are now set up in WIn 7 test pool. I will add to this pool once blades become available.
JUst to note we are tacking which VMs on on which blade here: https://docs.google.com/spreadsheets/d/1ewFVpaFw60ljxCmPaW4Gu8rrmnuJ2nDqZejmer1dw38/edit#gid=0 I am snagging 226 WIndows VM to do some testing starting with IO.
Installing t-w732-xe-195 through 119 now. After the installations complete I will set up 118 and 119 loaners. I am also returning 226 to the win 10 test pool.
I am working on getting the loaners set up this morning. It will end up being 217 and 218. I am hitting an issue with VNC authentication not accepting the loaner password or the old password after a reboot.
I had to go in and directly edit to the ini file to update the password and to allow user input. The loaners are now set up.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.