Closed Bug 758624 Opened 12 years ago Closed 11 years ago
X 4-node chassis with new graphics card
We need to figure out if http://www.newegg.com/Product/Product.aspx?Item=N82E16814130579 will work in the iX 4-node chassis we have on loan from. Please purchase one and see if it comes with a low-profile bracket. If not, please obtain a low profile bracket and ensure that it will work in the iX machine. The iX machine will need to be racked and cabled and networked to the releng network for actual verification of the card with real builds.
tweaking summary while in RelEng/IT mtg
Summary: procure graphics cards for iX 4-node chassis → evaluate iX 4-node chassis with new graphics card
I discovered today that the loaner chassis we from IX only had a single node installed. After talking with Matt Finney, he has agreed to install the other 3 nodes so we can continue with the evaluation. He did note that the CPU and Ram will be faster than what is already installed. I cleared this with :Coop. He says this won't particularly matter for the evaluation. Matt also asked if we could provide the HDDs since they didn't have any immediately available. I was able to find 3 x 250GB HDDs around the office. The plan is for me to drop the chassis off tomorrow at IX with the video cards and HDDs so IX can install all the components for us. They expect to have it ready by Thurs or Fri. Hopefully we can get this stood up on Fri. Per IRC with Coop: These are the OSes to be installed, win7(32), winXP, linux64, and linux32.
Status: NEW → ASSIGNED
The system was dropped off at IX systems this morning at 8:30am.
IX had to order memory for the nodes but might be able to get this to us on Thurs 6/14. Matt and I will get this racked in SCL1 and drop the OSes when it arrives.
This system is ready to be picked up from IX.
This system was racked and cabled last night. IPMI and vlan assignments also configured. https://inventory.mozilla.org/en-US/systems/show/6538/ https://inventory.mozilla.org/en-US/systems/show/6539/ https://inventory.mozilla.org/en-US/systems/show/6540/ https://inventory.mozilla.org/en-US/systems/show/6541/
Problem: IX had disabled the onboard video which made IPMI console redirection unusable. I have enabled onboard video so we may remotely install the needed OSes. If having the onboard video enabled is a hard stopper to using these systems in production as talos slaves, we might want to look into Serial over Lan as an possible alternative to console redirection. At the very least, this would allow us to PXE boot to kickstart for the linux nodes. It might be an issue with launching WDS from the ipmi console unless WDS can be automated to start imaging without need of an interactive session or if it has a way of providing VNC or RDP in-band once the system has WDS booted.
(In reply to Jake Watkins [:dividehex] from comment #7) > Problem: IX had disabled the onboard video which made IPMI console > redirection unusable. I have enabled onboard video so we may remotely > install the needed OSes. If having the onboard video enabled is a hard > stopper to using these systems in production as talos slaves, we might want > to look into Serial over Lan as an possible alternative to console > redirection. At the very least, this would allow us to PXE boot to > kickstart for the linux nodes. It might be an issue with launching WDS from > the ipmi console unless WDS can be automated to start imaging without need > of an interactive session or if it has a way of providing VNC or RDP in-band > once the system has WDS booted. We need some form of video enabled on these slaves in order for things like talos to even run. Is the issue that we can't run with both onboard video *and* the graphics card enabled?
coop, this doesn't meant that the graphics cards are disabled it just means that to be able to do remote management with IPMI we had to enable the onboard graphics as well. Per discussion with Jake it might just push us to make some small alterations to the code to make sure it is selecting the Nvidia cards for the tests. I will let Jake give you even further details regarding this though.
Coop: The following nodes are ready for testing: ix-multinode-1-C.build.scl1.mozilla.com (CentOS 6.2 x86_64) ix-multinode-1-D.build.scl1.mozilla.com (CentOS 6.2 i386) Note: I have enabled the onboard video since it looks like nvidia addon card is being detected and switched over to being the primary display These systems are also very bare. Any packages/software or configuration changes will need to be noted so that we can build all changes into puppetAgain when it comes time for production.
All four nodes have been handed to releng for testing at this point.
Any update on this?
Melissa: Armen is working to get Win8 testing stood up on one of these nodes, but we don't have people available to do the setup on the other OSes at present. After Win8 and 10.8 are in production, we *should* be able to shift someone over to work on the rest.
Followup from yesterday's meeting w/hwine, melissa, arr, and others. On this 4 node box, please install: * WinXP (there was some verbal discussion about XP driver support, need clarification from vendor on official support) * win7x32 - to qualify for use as tester * win7x64 - to qualify for use as tester * please leave the existing win8 node as-is.
(In reply to John O'Duinn [:joduinn] from comment #15) > * win7x64 - to qualify for use as tester Are we pursuing this platform in earnest now? We only have five win7 64-bit testers right now, and I'm happy to keep them on minis unless we really care about the platform.
Today I attempted a deploy to these machines from WDS/MDT. The deploy worked but caused the system to lock up. Upon investigation onsite I did a reboot but RDP refused to work. I then disabled RDP and re-enabled it where it then started working again. (I have seen this issue once before on this system and only on the multinode iX box.) The other issue we seem to face is that IPMI will not pass video redirection over a third party video card. This isn't a complete blocker as it is more a luxury for troubleshooting but as part of the image deployment requires us to connect to the clients. I have setup a work around with monitoring from the WDS/MDT box, however this opens up a new issue that anyone imaging will have to have access to the WDS/MDT box and the images that it holds where if changes are made could break the imaging process. I will be online tomorrow to discuss further as I am still attempting to determine the rest of the potential issues we might face with these systems.
Also per talking with Jake Watkins we have a driver disk that has Windows XP drivers on it I will have to assess that disk and hopefully pull the drivers into MDT so we can deploy XP SP3* (The OS deployment must be SP3 as anything older is no longer supported for deployments).
I am going to help with evaluating Windows 7 32-bit. How far is the win7 32-bit? Can I connect to it? MaRu, are comments 17 & 18 related to set such Windows machine up?
The host seems to be pingable: w732-ix-test1.winbuild.scl1.mozilla.com if you can't RDP to it, then comment 17 may have struck again.
(In reply to Matthew Larrain[:MaRu] from comment #17) > The deploy worked > but caused the system to lock up. Can you clarify what you mean by "system"? Do you mean "node" or "all 4 nodes in the chassis" or something else? Is the lock up in the OS or the hardware or we don't know yet?
It was just that one node, but we don't have further diagnostics beyond that.
(In reply to Dustin J. Mitchell [:dustin] from comment #20) > The host seems to be pingable: > w732-ix-test1.winbuild.scl1.mozilla.com > if you can't RDP to it, then comment 17 may have struck again. Thanks Dustin! I managed to login into it. I will keep on giving updates about win7 in bug 786052 in case anyone is interested on following up.
(In reply to Matthew Larrain[:MaRu] from comment #18) > Also per talking with Jake Watkins we have a driver disk that has Windows XP > drivers on it I will have to assess that disk and hopefully pull the drivers > into MDT so we can deploy XP SP3* (The OS deployment must be SP3 as anything > older is no longer supported for deployments). :Maru, how goes the WinXPsp3 install?
He's been at training the past week, and this isn't even on the priority list. It will likely be some time before anything is done with XP unless this bug is slotted above w64 and w8 on the priority list.
Trying to untangle and summarize a bit - evaluating the chassis requires that all FF supported OS be evaluated to determine if they can run on this hardware. That list includes: Win XP Win 7x32 bit Win 7x64 bit Win 8x32 bit Win 8x64 bit Each of these requires work by both IT & RelEng - I'll open bugs for each OS/bit combination (some are already opened, and just need to be made blockers).
And, to be explicit, evaluation needs to be completed before we order the chassis.
Hal in the email from mrz on 8/27/12 it indicates he and joduinn agreed on the following OS's" > * RelOps will install WinXP, win7x32, win7x64 In comment 10 Jake indicated that the following nodes are ready for testing (this is RelEng testing): ix-multinode-1-C.build.scl1.mozilla.com (CentOS 6.2 x86_64) ix-multinode-1-D.build.scl1.mozilla.com (CentOS 6.2 i386)
This hardware has 4 nodes. We have w8 64 installed on two of them because we were testing the install procedure with one (we didn't want to delete the work Armen had done), and w7 32 on another. We are capable of putting a maximum of 4 oses up at one time. From joduinn's email, he asked for w8 64 (we already have) w7 32 (we already have) w7 64 xp Since we can *only* do 4 installs at a time without having more hardware (we need two nodes per OS so there's one of releng to test on and one for relops to test on), can we please have a definite list of which 4 we should be deploying, or can you give us the okay to go ahead and order another chassis so that we can install all of the OSes in comment 26.
Unfortunately XP was accidentally installed over the node that releng had been using to obtain configuration settings for w8. We can work with armen to get the w8 installed from scratch again.
(In reply to Amy Rich [:arich] [:arr] from comment #30) > Unfortunately XP was accidentally installed over the node that releng had > been using to obtain configuration settings for w8. We can work with armen > to get the w8 installed from scratch again. Win8 is more important than XP at this point. Are we getting Win8 re-installed ASAP?
(In reply to Chris Cooper [:coop] from comment #31) > Win8 is more important than XP at this point. Are we getting Win8 > re-installed ASAP? Or perhaps I should ask Armen: did you/jmaher alerady sign off on Win8-on-iX? Is there any work left to do there or can we proceed with XP?
(In reply to Chris Cooper [:coop] from comment #32) > (In reply to Chris Cooper [:coop] from comment #31) > > Win8 is more important than XP at this point. Are we getting Win8 > > re-installed ASAP? > > Or perhaps I should ask Armen: did you/jmaher alerady sign off on > Win8-on-iX? Is there any work left to do there or can we proceed with XP? I believe this was answered with bug 780024.
(In reply to Armen Zambrano G. [:armenzg] from comment #33) > I believe this was answered with bug 780024. OK, then installing XP over Win8 was the correct thing to do here. I will proceed with XP.
Depends on: 780024
11 years ago
Depends on: 803595
Is there stuff left to do here?
Assignee: jwatkins → server-ops-releng
I don't know. Probably close this bug and file a separate bug to do a run down of imaging the various Windows machines and see if they work on staging. On bug 780050, Matt and I did a lot of polishing for win8 64-bits.
OK, I'm trying to shrink the number of vague bugs on (and soon to be off) Matt's plate, so R/F it is.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Hey Dustin, so https://inventory.mozilla.org/en-US/systems/show/6541/ is already setup with Centos6 or is it up to us what linux variant this would support support/could be installed ?
That's a loaner, so we're trying to evacuate it. That said, comment 28 is the most recent data I have about that machine.
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.