Closed
Bug 691856
Opened 13 years ago
Closed 13 years ago
Select graphic card for HP DL120 infra for Linux/Windows testing (post rev3 era)
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task, P4)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: armenzg, Unassigned)
References
()
Details
IT has new hardware (HP ProLiant DL120 G7 Server) which we would like to use for testing purposes besides build jobs. This would be to replace the Linux and Windows rev3 minis.
To determine that we can use this hardware we need to be able to see if we can run unit tests and talos[1].
To do so we need graphic processing power which requires installing a graphic card unto these machines. Joe has volunteered to determine what the appropriate graphic card is.
As I mentioned to Joe we have to choose a graphic card that would work for the following OSes Linux (32/64-bit), Windows 7 (32/64-bit) and Windows XP. Having different graphic cards from OS to OS is a very expensive maintenance burden for IT/releng and should be avoided at all costs. The reason behind this is that not being able to simply reimage a Linux testing machine as a Window testing machine without having to access the collocation to replace the graphic card makes it very expensive for pool re-distribution/re-purposing. I hope this makes sense.
The DL120 G7s have two expansion slots:
1 PCIe x16 Gen2 (x16 speed) (full-length, full-height)
1 PCIe x8 Gen2 (x4 speed) (half-length low-profile)
See below for the datasheet [2].
The operating systems that we want to support will need to have compatible drivers for the card.
Once we have a sign off by Joe, IT and someone from QA we can proceed to the installation on one/few of those machines and start the setup process (adjust if other people should be signing off).
Please correct me if I made any incorrect assumptions and/or misunderstood anything and add anyone that you think should be involved.
[1] If anyone knows that we should definitely run talos on slow machines speak up and we can drive that discussion offline.
[2] http://h20195.www2.hp.com/V2/GetPDF.aspx/4AA3-3691ENW.pdf
Reporter | ||
Updated•13 years ago
|
Assignee: nobody → joe
Comment 1•13 years ago
|
||
Here are a list of the thing I'd prefer, in order:
1. Multiple GPU vendors per OS. That is, some of the (for example) Windows 7 machines run using AMD GPUs, some NVIDIA, some Intel. Repeat for all OSes.
2. Single GPU vendor per OS, different GPU vendor cross-OS. This still gets us multiple GPU vendor testing, but is obviously not optimal.
3. Single GPU vendor for all machines.
Now, I know that due to performance testing, #1 is a non-starter. However, if we could differentiate performance results per GPU, that'd be awesome; it would in fact make it so pool re-distribution would be possible too.
#2 might be best for our testing-on-different-GPU vs performance testing buck, but as Armen points out, it makes pool re-distribution impossible. Just how much of a hard stop is that, anyways?
#3 is easiest for everyone involved, but has the least testing efficacy.
Reporter | ||
Comment 2•13 years ago
|
||
Let's aim for #3 if there are no blocking issues since it is the one that would make things easier.
Any suggestions?
Comment 3•13 years ago
|
||
(In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #2)
> Let's aim for #3 if there are no blocking issues since it is the one that
> would make things easier.
In fact, IT/releng will likely insist on #3. Any other option and we can't move machines between pools for capacity as required.
Now, that said, if the graphics team is interested in getting some dedicated hardware setup to run different graphics hardware, I'm sure we could spin off a side project to do that. It just wouldn't be at the same scale as the rest of the build pool. I would also want to make sure we had a better rollover plan for any special hardware we set up, so we don't end up with another geriatric testing setup.
If there is interest in this, please file a bug.
Reporter | ||
Comment 4•13 years ago
|
||
After talking with Joe (in holidays) and looking for a while I have chosen Nvidia's GeForce GT 430 for several reasons.
1) It is Nvidia (Joe said any Nvidia should do)
2) It seems to be mid to low range (between GTS and normal GeForce) ~$90
3) I believe it is good for the x16 slot
4) It is not extremely new
** if we need a new model we can choose 440 or 520 if they meet requirements
5) It has drivers for all 5 OSes we care about
I really don't know if I chose it with good criteria but this is what I have. Suggestions are welcome. Otherwise, let's give it a shot.
IT does this card work for you?
Anyone has any objections? Shall we give it a shot?
Would you like me to get this down to dev.platform or dev.planning? or Yammer?
I believe we have the right people in this bug.
[1] http://www.nvidia.com/object/product-geforce-gt-430-us.html
[2] http://store.steampowered.com/hwsurvey (Gamers' survey from Sept. 2011)
Assignee: joe → armenzg
Status: NEW → ASSIGNED
Reporter | ||
Comment 5•13 years ago
|
||
Shall we try getting few cards of the proposed one and try them out?
IMHO we won't know until we try.
Priority: -- → P2
Comment 6•13 years ago
|
||
Sure, but bear in mind we don't have a working tester image for the DL120G7's yet, either, so testing may be a bit difficult.
Will you be leading the new-linux-testers effort? If so, let's schedule a time to talk.
Comment 7•13 years ago
|
||
From IRC, this is running ahead of the new-refimage project, and hoping to validate that the graphics card is adequate.
We can spare a few systems in the relabs cluster to test these. There are currently three that I used for rabbitmq testing, and two allocated to jhford. I'll order three cards.
We're confident that these hosts will work as builders, using more up-to-date operating systems, but HP does not list any of the testing OS's as officially supported, because they're all desktop/end-user OS's. The latest Fedora and Windows 7 are likely to work, but XP may be a stretch. For the former two we'll try 32- and 64-bit versions. This will be a chance to get an early idea as to the suitability of these machines as testers.
Note that this hardware is remotely manageable, so the plan is that once the cards are installed, relops will remotely install the relevant operating system and hand over to armen for testing.
Comment 8•13 years ago
|
||
In looking more deeply, I see that
http://www.nvidia.com/object/product-geforce-gt-430-us.html
lists this is a dual-slot width, but
http://h18004.www1.hp.com/products/quickspecs/13504_na/13504_na.html
shows that the PCI-e x16 slot is only single-width.
I think you'll need to find a card that is single-width. So, back to the search :(
Comment 9•13 years ago
|
||
I have a single slot GT430
Comment 10•13 years ago
|
||
Can you post a link to it?
Comment 11•13 years ago
|
||
Newegg:
http://www.newegg.com/Product/Product.aspx?Item=N82E16814130579
Evga.com (same card):
http://www.evga.com/products/moreInfo.asp?pn=01G-P3-1335-KR&family=GeForce%20400%20Series%20Family&sw=
is what I have, i think. I know for sure that its a gt 430 from evga and is single slot. I am not 100% sure if it is passive. Taking it out of my home theatre setup would be a pain, so i'd rather leave it in there unless this is an emergency.
Comment 12•13 years ago
|
||
armen, look good to you?
Definitely no need to take it out - we'll buy some :)
Reporter | ||
Comment 13•13 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #12)
> armen, look good to you?
>
> Definitely no need to take it out - we'll buy some :)
This looks good. Feel free to use this bug to track it or file another one.
Thank you guys!
Comment 14•13 years ago
|
||
(In reply to John Ford [:jhford] from comment #11)
> Newegg:
> http://www.newegg.com/Product/Product.aspx?Item=N82E16814130579
>
> Evga.com (same card):
> http://www.evga.com/products/moreInfo.asp?pn=01G-P3-1335-
> KR&family=GeForce%20400%20Series%20Family&sw=
These aren't quite the same card - the first is
01G-P3-1335-KR - http://www.evga.com/products/pdf/01G-P3-1335.pdf
while the second is
01G-P3-1430-LR - http://www.evga.com/products/pdf/01G-P3-1430.pdf
From the PDFs, the differences are (1335 vs. 1430):
Memory Clock: 1200 vs. 1400 MHz
Memory Bit Width: 64 vs. 128
Memory Bandwidth: 9.6 vs. 22.4
Dual-Link DHCP Capable: no vs. yes
These also aren't passive - the images on the sites you linked to all show a fan. They are single-width, though.
The 1335 looks to be discontinued:
http://www.newegg.com/Product/Product.aspx?Item=N82E16814130656
so I suppose that means we should go with the 1430! I just need to check up on the suitability of a passive card before we get these ordered.
Comment 15•13 years ago
|
||
I meant suitability of an *active* card :)
It sounds like this should work fine, so I'll order up three 1430's for delivery to mtv1.
Comment 16•13 years ago
|
||
Fine folks of desktop: can you order three 01G-P3-1430-LR's from the vendor of your choice, for delivery to mtv1 and eventually 3.MDF?
The newegg link - http://www.newegg.com/Product/Product.aspx?Item=N82E16814130579 - is one possible vendor, but whatever source is best for you is fine.
If you get tracking info, please add it here and I'll keep an eye on it.
Assignee: armenzg → desktop-support
Component: Release Engineering → Server Operations: Desktop Issues
QA Contact: release → tfairfield
Updated•13 years ago
|
Assignee: desktop-support → aignacio
Comment 17•13 years ago
|
||
As a note to self, once these arrive we'll need to get lmsensors installed to monitor GPU temperature, or use something like
nvidia-settings -q [gpu:0]/GPUCoreTemp | grep "Attribute" | sed -e "s/.*: //g" -e "s/\.//g"
Updated•13 years ago
|
Whiteboard: On order
Updated•13 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Whiteboard: On order → Received
Comment 18•13 years ago
|
||
Sweet, thanks! I'll get hands laid on these shortly, and get them installed.
Assignee: aignacio → server-ops-releng
Component: Server Operations: Desktop Issues → Server Operations: RelEng
QA Contact: tfairfield → zandr
Comment 19•13 years ago
|
||
Hi, Ann, can you please drop these at Matt Larrain's desk today (or let us know where you are, and we can coem get themfrom you if you're in MTV). Thanks!
Assignee: server-ops-releng → aignacio
Status: RESOLVED → REOPENED
Component: Server Operations: RelEng → Server Operations: Desktop Issues
QA Contact: zandr → tfairfield
Resolution: FIXED → ---
Comment 20•13 years ago
|
||
Ann, Can you put these on Matt Larrain's desk (where I'm sitting) ASAP? I need to get these installed today, as I'm only around for today :)
Assignee: aignacio → dustin
Component: Server Operations: Desktop Issues → Server Operations: RelEng
QA Contact: tfairfield → zandr
Comment 21•13 years ago
|
||
..or tell me where they are :)
Comment 22•13 years ago
|
||
Tabatha said she hasn't seen these.
Assignee: dustin → desktop-support
Component: Server Operations: RelEng → Server Operations: Desktop Issues
QA Contact: zandr → tfairfield
Updated•13 years ago
|
Assignee: desktop-support → tromero
Comment 23•13 years ago
|
||
I'll be in tomorrow (Friday) from 9-noon. This isn't an emergency, but it will be much easier for me to install these myself than direct others to do so, and it seems silly to not do so due to a simple miscommunication over where the cards are..
Reporter | ||
Comment 24•13 years ago
|
||
Did it happen?
Comment 25•13 years ago
|
||
Nope. Tabitha, we're almost at a week of "Received" but not actually received. What can we do to find these? Or can we just order new ones?
Comment 26•13 years ago
|
||
eVGA GeForce GT 430 - graphics card was reordered. ETA: 11/18
Whiteboard: Received → ETA 11/18
Comment 27•13 years ago
|
||
Tabitha - thanks for tracking that down.
Ann - great, and where are they shipping to? Do you have tracking information? Tabitha mentions they are shipping, or did ship, to Armen in Toronto. If that's the case for the previous batch, and the new batch is going to Mountain View, great. Otherwise, can you get a head-start on shipping arrangements to send those from Toronto to Mountain View, since that's where they're needed (per comment 16)
Also, you used the singular in comment 26, but comment 16 specifies three cards. Were three ordered?
Comment 28•13 years ago
|
||
3 graphics cards will be shipped to Mountain View. ETA: 11/18
Comment 29•13 years ago
|
||
Armen has located the original three graphics cards in toronto, and will bring them to mountain view next time he's in town. We don't need six (yet), so there's no rush on that.
Comment 30•13 years ago
|
||
I'll see if Matt or Jake can track these down tomorrow (11/18) afternoon.
Assignee: tromero → dustin
Component: Server Operations: Desktop Issues → Server Operations: RelEng
QA Contact: tfairfield → zandr
Comment 31•13 years ago
|
||
Sorry for the inconvenience. The vendor is saying that the graphics cards are on backorder. Ship Date 11/25
Status: REOPENED → ASSIGNED
Whiteboard: ETA 11/18 → Backordered-Ship Date 11/25
Reporter | ||
Comment 32•13 years ago
|
||
FTR I only got one card in Toronto. I should have been specific on my email.
Comment 33•13 years ago
|
||
The two GT430 video cards came in. Who am I supposed to deploy them to?
-Vinh
Comment 34•13 years ago
|
||
Hm, there should be three 01G-P3-1430-LR cards. Please give them to Matt Larrain.
Comment 35•13 years ago
|
||
Only two came in. The third one is on order. I will place these two on Matt's desk.
Updated•13 years ago
|
Whiteboard: Backordered-Ship Date 11/25 → Backordered-Ship Date 11/25 - 2/3 deployed. Waiting for the third video card to arrive.
Comment 36•13 years ago
|
||
Please install these cards in relabs01 and relabs02. I will kickstart those hosts with CentOS 6.0, and hand them to Armen.
Assignee: dustin → server-ops-releng
colo-trip: --- → mtv1
Comment 37•13 years ago
|
||
Third graphics card deployed.
Comment 38•13 years ago
|
||
OK, please install them in relabs01, relabs02, and relabs06 :)
Updated•13 years ago
|
Assignee: server-ops-releng → jwatkins
Comment 39•13 years ago
|
||
Cards have been installed in relabs01, relabs02, and relabs06 :)
Comment 40•13 years ago
|
||
Great, thanks!
Hearkening back to comment 7 and yesterday's IRC conversation, it's still not clear what operating system should be installed here.
Armen, how would you like to proceed? I can do a CentOS 6.0 kickstart install for you, or I can add a username/password to the iLO for you, and you can install whatever you'd like.
Assignee: jwatkins → armenzg
Whiteboard: Backordered-Ship Date 11/25 - 2/3 deployed. Waiting for the third video card to arrive.
Reporter | ||
Comment 41•13 years ago
|
||
I think iLO works best for me even if it would require extra work for me.
Thanks a lot for making this happen.
FTR: I won't be able to jump right away to work on this
Comment 42•13 years ago
|
||
OK, I've set this up. You can login to
https://relabs01-mgmt.build.mtv1.mozilla.com
https://relabs02-mgmt.build.mtv1.mozilla.com
https://relabs06-mgmt.build.mtv1.mozilla.com
with username 'armenzg' and the releng root pw. You should have access to the virtual console (use the Java one, not .NET), virtual media, and control of the server power. Things are fairly self-explanatory, but we can help out in #ops with any questions.
Please report your progress here - this will be a nice "first peek" at how well desktop operating systems run on this hardware.
Component: Server Operations: RelEng → Release Engineering
QA Contact: zandr → release
Reporter | ||
Updated•13 years ago
|
Priority: P2 → P3
Comment 43•13 years ago
|
||
There's been discussion lately of AMD-only graphics bugs on R-D. I realize that these are nVidia cards, but this work would let us create a pool with both nVidia and AMD cards.
On the Mac side, r5 minis have AMD GPUs.
Comment 44•13 years ago
|
||
I'm confused why this is a P3. We finally have the cards, we finally have machines, let's finish this out and get it done. The old Rev3 minis are getting more and more out of date and when we "need" to stand up new windows/linux testers you know that everyone will want it "last week". I think this is our chance to get ahead of the curve before things get urgent.
Comment 45•13 years ago
|
||
It's mostly a resource allocation issue - we're looking at a datacenter move, wrapping up the new rev4 tester hardware, building rev5 mac builders, and creating a new linux builder image first. We could potentially swap testers' priority (and hardware?) with the latter item, but the others already have firm (and close!) deadlines behind them.
And, not to nitpick, but we *don't* have the machines - Armen is using three machines from the relops lab to test the hardware, which he'll need to give back. In fact, at the moment we don't have anywhere to install new machines even if we had them (that will have to be scl3).
So, bottom line, aside from experimenting with graphics hardware and linux images on this platform, there's not much we can do, and given the balls currently in the air I think it's premature to have much conversation about priority.
Comment 46•13 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #45)
> So, bottom line, aside from experimenting with graphics hardware and linux
> images on this platform, there's not much we can do, and given the balls
> currently in the air I think it's premature to have much conversation about
> priority.
Cool, thanks for the update, Dustin. That's exactly the context I was missing. I appreciate you filling me in.
Reporter | ||
Updated•13 years ago
|
Priority: P3 → P4
Comment 47•13 years ago
|
||
(In reply to Clint Talbert ( :ctalbert ) from https://bugzilla.mozilla.org/show_bug.cgi?id=737282#c3)
> I would love for us to move away from mac mini's wholesale for windows and
> linux testers. But that's a larger issue that's been going on for over two
> years now. Current work on that appears stalled in bug 691856.
Nobody likes the minis. Both IT and releng are resource-constrained due to the in-progress colo move, and are currently focusing on replacing the aging 10.5 rev2 Mac mini builder platform with a new 10.7 rev5 Mac mini builder platform. Per IT, those old minis can't live in the new colo, so this *is* the critical path.
Once the colo move is done, how fast *would* we be able to pivot on this? I see a bunch of unknowns:
* once stage 1 of scl3 is done (May), how much space will we have in our various colos to deploy HPs as testing boxes, assuming a 1-for-1 replacement of current test machines with HP machines? Keep in mind we will have space reqs for pandaboards also.
* I know we have more space coming in scl3 in Q3 as part of the releng BU build-out, and we probably don't *want* to put these machines elsewhere. Given that, do/can we wait until Q3 for this? Is there anyone that could even work on it (releng or IT) in the interim?
* because it's taken so long, is there different hardware (or a different HW rev) that we should be considering?
* how quickly can we order/deliver/rack more of these machines & graphics cards?
Comment 48•13 years ago
|
||
The only place we'll be putting new hardware is scl3. Right now we have 9 racks we allocated to releng.
* 1.5 of them are currently being used by minis (a rack holds 64 minis). Racks configured to hold minis will not hold other types of hardware based on the PDU setup. So we may want to allocate more racks to minis, based on how many more you think you might need for builders or capacity for 10.8.
* Part of 1 rack is being used for server equipment (3 HP DL 360s at the moment)
* We haven't been told how many pandaboards we're going to need since we can't even get them working yet. I would guestimate somewhere between 10-15 pandaboards per 4U. We'd likely want to allocate at least 2 racks to them, maybe 3?
We will have more space (19 racks) once the expansion is complete, but we'll have a total power budget of 100kW. We must plan around that for *all* releng hardware since scl1 will be closing in a bit over a year and everything will be consolidated in scl3. Knowing how many and what tyep of servers we're going to need for growth (for every platform) is critical.
Please remember to take into consideration any hardware you're going to want for w8, mountain lion, etc. Also please keep in mind that we can not use modern hardware to run tests on ancient OSes (e.g. xp, centos 5.0) since they will not be supported.
Timing for ordering hardware and getting it racked will depend on the hardware and when we want to get this done. There's probably more time constraint now before the sjc1 move than there will be afterwards for racking and cabling because folks are very busy trying to evac sjc1.
There are additional prerequisites for getting this working other than just buying and installing the hardware (buildbot masters, configuration management, how servers are imaged, etc).
We would need to sit down and come up with a comprehensive plan for each platform to give an accurate time estimate, but testing on centos6 is probably the lowest hanging fruit at this point.
Comment 49•13 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #42)
> OK, I've set this up. You can login to
> https://relabs01-mgmt.build.mtv1.mozilla.com
> https://relabs02-mgmt.build.mtv1.mozilla.com
> https://relabs06-mgmt.build.mtv1.mozilla.com
Armen, are these still being used? I'd like to reclaim the systems for labs use otherwise.
Reporter | ||
Comment 50•13 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #49)
> (In reply to Dustin J. Mitchell [:dustin] from comment #42)
> > OK, I've set this up. You can login to
> > https://relabs01-mgmt.build.mtv1.mozilla.com
> > https://relabs02-mgmt.build.mtv1.mozilla.com
> > https://relabs06-mgmt.build.mtv1.mozilla.com
>
> Armen, are these still being used? I'd like to reclaim the systems for labs
> use otherwise.
All yours.
Comment 51•13 years ago
|
||
Which is to say, Armen's not working on this project, but it's quite a bit more critical now, so we should leave these cards in place, and hopefully another relenger will be assigned to evaluate them.
Assignee: armenzg → nobody
Comment 52•13 years ago
|
||
I think this belongs in Platform Support.
Component: Release Engineering → Release Engineering: Platform Support
QA Contact: release → coop
Comment 53•13 years ago
|
||
The graphics card selected (eVGA GeForce GT 430, comment 26) will work for the HP machines. We should open another bug to actually buy them in bulk once they're needed (soon). We're already installing the existing cards in bug 755772.
However, Amy tells me that if we're going to try to squeeze as much testing density out of new testing hardware as possible, we'll be going with different hardware, possibly dual iX systems (half-rack). These will require a low-profile graphics cards, so we should also file a bug to determine whether a comparable low-profile card exists.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago → 13 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 54•12 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #27)
> Ann - great, and where are they shipping to? Do you have tracking
> information? Tabitha mentions they are shipping, or did ship, to Armen in
> Toronto.
I cleared my desk and found on my drawers this card.
I have asked hilzy to ship it back to MaRu on MV office.
Comment 55•12 years ago
|
||
(In reply to Chris Cooper [:coop] from comment #53)
> We should open another bug to actually buy them in bulk
> once they're needed (soon).
Once bugs are opened for that / the buying of the machines, please can they be marked as dependants of bug 764713, just to give something to point at :-)
Comment 56•12 years ago
|
||
(In reply to Ed Morley [:edmorley] from comment #55)
> Once bugs are opened for that / the buying of the machines, please can they
> be marked as dependants of bug 764713, just to give something to point at :-)
We'll be using the same cards, but they'll be going into the 4-node iX machines now instead of these HP machines. Yes, we'll file bugs to get the graphics cards ordered when we place our first order for the iX nodes.
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Assignee | ||
Updated•7 years ago
|
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•