Closed Bug 857064 Opened 12 years ago Closed 12 years ago

Image Win7 on iX systems (t-w732-ix)

Categories

(Infrastructure & Operations :: RelOps: General, task)

x86
Windows 7
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: arich, Assigned: q)

References

Details

Install the new talos-w7-ix nodes once they are racked and cabled and the imaging process is completed.
Summary: install talos-w7-ix systems → install t-w732-ix systems
Should I be able to ping or vnc the systems that van racked up on https://bugzilla.mozilla.org/show_bug.cgi?id=857042#c11 ? Should I wait for a specific "go" on this bug to put machines on staging?
Please ignore the rack and cable bug. It has no useful information for you. We will identify in this bug when machines have been imaged and are ready for releng to use.
Thanks for the clarification! I will only pay attention to this one :)
Summary: ######## We're waiting for 2 imaging issues on bug 829126. We have 18 Win7 machines racked and the remaining could be racked next week (bug 857042).
OS: Mac OS X → Windows 7
Summary: install t-w732-ix systems → Image Win7 on iX systems (t-w732-ix)
Whiteboard: Summary on comment 4 - waiting for 2 imaging issues and more machines to be racked
Due to recent setbacks in the deployment process these machines won't be imaged today. I am working on a resolution to to problem but do not currently have an ETA.
(In reply to Q from comment #5) > Due to recent setbacks in the deployment process these machines won't be > imaged today. I am working on a resolution to to problem but do not > currently have an ETA. Q: 1) While you figure out the deployment problems, can you at least manually install the changes needed on an ix machine, so we can verify if these run green-in-staging-as-expected, or if there are other problems to also debug? 2) anything you need from us to help with the imaging issues?
Flags: needinfo?(q)
John: I believe we already have a machines running in staging that Armen has done this qualification work on. Q is staying focused on fixing the issue with the deployment process that cropped up last week since that impacts rolling out images that will work at all.
Flags: needinfo?(q)
As Amy mentioned, #1 on comment 6 is not helpful for us as we have already done this a couple of weeks ago. #2 on comment 6: there's nothing that Q is waiting on us.
Per the EngOps meeting, adding an additional update here. The install process is no longer recognizing the Nvidia third party card as a routable graphics card. This is no longer working correctly as the result of an accidental overwrite of the deployment share (though we have recovered the actual files). Q is focused on debugging the issue until it's solved.
Armen, Mark has imaged t-w732-ix-008 - t-w732-ix-018 for you to hook up to cedar this week. Q is going to continue working on the graphics card issue to figure out how we can do this correctly.
(In reply to Amy Rich [:arich] [:arr] from comment #10) > Armen, Mark has imaged t-w732-ix-008 - t-w732-ix-018 for you to hook up to > cedar this week. Q is going to continue working on the graphics card issue > to figure out how we can do this correctly. Thank you very much! I will get them in.
Whiteboard: Summary on comment 4 - waiting for 2 imaging issues and more machines to be racked → Summary on comment 4 - waiting for dual graphic solution to be fixed and remaining machines to be racked
I issued this to reboot the machines into staging and the host went down: ipmitool -U releng -P blah -H t-w732-ix-0$i-mgmt chassis power soft Is this expected? I fixed it by calling "chassis power up"
I think "power soft" is a soft power-down (a short press of the power button) I usually use 'power cycle'
It has always worked for me. I will start using cycle for now.
I've spoken with arr and Q will be trying to fix the graphic card solution before DCOps rack the machines (waiting on deliveries of rack mount kits). If the machines get racked and we don't have a solution we will be disabling the primary graphic card and figure out the solution later on. We also have to fix the Media package issue. Hopefully to be resolved this week. I will try to have the machines running on Cedar by tomorrow in their current state. I've delayed due to the failed attempt on Friday and today being a shorter day for me.
Whiteboard: Summary on comment 4 - waiting for dual graphic solution to be fixed and remaining machines to be racked → Summary on comment 4 - waiting for dual graphic solution to be fixed and remaining machines to be racked and last imaging issue
I have put the w7 image on t-w732-ix-004 - t-w732-ix-007, t-w732-ix-043 - t-w732-ix-100. These are not suitable for use yet since the graphics card issue has not been solved. Ass soon as it is (and pushed out via GPO), these machines will be ready for use. t-w732-ix-019 - t-w732-ix-042 are racked but can not be cabled until the switch rack mounting kit is delivered.
Any news from bug 857042? Is Tuesday/Wednesday the current ETA to get the nodes handed over to us?
Whiteboard: Summary on comment 4 - waiting for dual graphic solution to be fixed and remaining machines to be racked and last imaging issue
t-w732-ix-019 - t-w732-ix-042 have also been imaged (but are not currently recognizing the third-party graphics card). Armen: yes, the plan is to have these over to you by Tuesday/Wednesday.
t-w732-019 through 098 are now available. t-w732-099 and 100 are going to be used for graphics testing. t-w732-01 through 07 will be available in the near future. Various testing data and registration changes need to be removed. 25 machines had to be reboot after the initial imaging, so that the correct resolution would be picked up. We are planning on adding an reboot and gpupdate to a start up or log on script.
Great! I will put them through the system.
We have now 91 machines running on production. 7 machines to be placed later on. 2 kept for figuring out the graphic cards' solution. Thank you all for your help!
Depends on: t-w732-ix-012
Depends on: t-w732-ix-016
Depends on: t-w732-ix-043
Depends on: t-w732-ix-045
It seems that four machines are having graphics issues which have caused a lot of the intermittent oranges since yesterday. * t-w732-ix-012 * t-w732-ix-016 * t-w732-ix-043 * t-w732-ix-045 I will look at those machines a little later.
We've imaged all of the w732 machines and handed them over (with the exception of a couple kept for debugging) at this point. Please open new bugs for any break/fix or enhancement.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
We forgot about these: t-w732-0[01-07] Are they ready? Should we re-open?
Flags: needinfo?(arich)
I've asked Q and Mark to coordinate with you on all windows platforms. I don't want to keep this bug open to track systems that we're keeping for debugging.
Flags: needinfo?(arich)
(In reply to Amy Rich [:arich] [:arr] from comment #25) > I've asked Q and Mark to coordinate with you on all windows platforms. I > don't want to keep this bug open to track systems that we're keeping for > debugging. Those 7 are not for debugging. That is why I asked. I will file later on other bugs to keep track of debugging hosts.
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.