Closed
Bug 666044
Opened 13 years ago
Closed 11 years ago
tegras rackmount solution
Categories
(Infrastructure & Operations :: RelOps: General, task, P2)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: joduinn, Assigned: dividehex)
References
Details
per meeting with zandr yesterday: we have 4? 5? racks available for these 200 tegras. However, these racks still need: * shelves * PDUs * network cable * ...?? (zandr, anything I forgot?)
Comment 1•13 years ago
|
||
(In reply to comment #0) > per meeting with zandr yesterday: we have 4? 5? racks available for these > 200 tegras. However, these racks still need: Available is a loose term, and slightly beyond the scope of this bug. There are four total racks available in scl1. Things we've discussed moving here: iX machines from mtv1 (1.2 racks) 200-300 Tegras (2-3 racks minimum) 42 DL120G7's (1 rack) I'd recommend leaving the first 100 Tegras and the iX machines in MTV until scl3 becomes available. > (zandr, anything I forgot?) There are significant design and engineering tasks unresolved before we can do a production-quality deployment of these boards. Must have: * Better mechanical mounting of the Tegra boards themselves. * A better power supply solution. I'm not going to install 200 giant wall warts dangling off 1' power cords in a real data center. Open questions: * USB (adb): Is this useful, and should it be part of the production system (bug 665926) * HDMI: Is an HDMI switch sufficient to convince the board it has a monitor, and is this useful. This has been discussed a few times, but to my knowledge never tested. * Remote Buttons: We should be able to rig relays (or open drain outputs) to the buttons and provide access to imaging mode and hardware resets. Open question as to whether this replaces switched PDUs, or has significant value. That's what I have off the top of my head.
Updated•13 years ago
|
Assignee: server-ops-releng → zandr
Comment 2•13 years ago
|
||
So, I'm going to appropriate this bug to 'blog' the work on the production solution for the Tegras. The thumbnail sketch is 14 tegras and a foopy in a 4U case, which switches/relays/usb hubs/etc integral to the chassis. Details will follow. As such, duping the 'spec new foopy' bug here.
Reporter | ||
Updated•13 years ago
|
Whiteboard: [android_tier_1]
Comment 4•13 years ago
|
||
Since bugzilla really isn't the place for this, project planning and tracking is happening on the wiki.
Severity: normal → critical
Summary: fit out racks to support 200 tegras → tegras rackmount solution
Comment 5•13 years ago
|
||
Combining bugs and capturing info from bug 668526: In order to make Tegras a tier-one solution, we need to design and build a rackmount solution. Requirements: 1) Proper airflow 2) no 'waterfall of wall-warts' power supplies 3) remote power management 4) remote imaging for a large percentage of failure modes (bug 665926) 5) stable mechanical mounting Current thoughts are around a 4U box with a foopy and 14 tegras. Two high-current DC supplies to feed the tegras through usb relay boards. USB for the tegras (and relay boards) connected to the foopy. On-board unmanaged switch. Design for enhancement if necessary: Add relay boards if pressing buttons remotely is needed. (unfortunate, but reasonable) Add front panel USB/video connections for crashcarting. (needing this should be considered a failure mode) 14 Tegras + 1 Foopy in 4U gives us a density of 140 Tegras/rack
Comment 7•13 years ago
|
||
If we're hurting for tegra space, we can place the n900s in a box (place, not throw, hurl, smash, or other verbs :) with their power supplies and use their haxxor power and "racks" for tegras.
Updated•13 years ago
|
Severity: critical → normal
Comment 8•12 years ago
|
||
zandr has obtained phidget USB relay boards for prototyping
Updated•12 years ago
|
Assignee: zandr → jwatkins
Updated•12 years ago
|
Priority: -- → P2
Comment 9•12 years ago
|
||
So I've already seen pictures of this on twitter. Can we get an update here for posterity, please?
Assignee | ||
Comment 10•12 years ago
|
||
The pictures you have seen are of a panda chassis prototype. We are still working on a tegra version since design requirements are different.
Comment 11•12 years ago
|
||
(In reply to Zandr Milewski [:zandr] from comment #2) > So, I'm going to appropriate this bug to 'blog' the work on the production > solution for the Tegras. > > The thumbnail sketch is 14 tegras and a foopy in a 4U case, which > switches/relays/usb hubs/etc integral to the chassis. Details will follow. > > As such, duping the 'spec new foopy' bug here. This thumbnail feels obsolete now. -- We eventually want to get to "no need for foopy" but the ETA of that is unknown, however we are already off Mac Mini's for new foopies, using Linux HP machines [1U iirc]. So designing a chasis with a foopy as part of the design feels wrong. (In reply to Amy Rich [:arich] [:arr] from comment #4) > Since bugzilla really isn't the place for this, project planning and > tracking is happening on the wiki. I don't see a wiki page linked here, is there one available? (In reply to Amy Rich [:arich] [:arr] from comment #5) > Combining bugs and capturing info from bug 668526: > > In order to make Tegras a tier-one solution, we need to design and build a > rackmount solution. > > Requirements: > 1) Proper airflow > 2) no 'waterfall of wall-warts' power supplies > 3) remote power management > 4) remote imaging for a large percentage of failure modes (bug 665926) > 5) stable mechanical mounting > > Current thoughts are around a 4U box with a foopy and 14 tegras. > Two high-current DC supplies to feed the tegras through usb relay boards. > USB for the tegras (and relay boards) connected to the foopy. > On-board unmanaged switch. The USB connections for the tegras should be independant of which foopy we have them controlled by. Such that if foopy25 dies, for example, we don't necessarily lose the tegra, and it allows us to reshuffle dead/hurting tegras to new foopies without needing IT hands-on to shuffle them on us. > Design for enhancement if necessary: > Add relay boards if pressing buttons remotely is needed. > (unfortunate, but reasonable) Yes, we do need a means for doing remote PowerCycling of the devices. Preferably with PDU snmp like present. > Add front panel USB/video connections for crashcarting. > (needing this should be considered a failure mode) We need an *easy* way for IT to hands-on the devices, for some common means, like swapping SDCards, crashcarting for other means, etc. RemoteImaging is the ideal as well, but is not the only failure mode we need to account for (I don't want a failure mode of a single tegra with an sdcard swap needed, to end up taking 13 other tegras out of service to fix, for example) > 14 Tegras + 1 Foopy in 4U gives us a density of 140 Tegras/rack This density will change when we account for not having the foopy part of this inherent design.
Comment 12•12 years ago
|
||
Zandr's comment 2 is a year old, so I'm not surprised a lot has changed. As I think you know, most of these problems have been solved, many in different ways than Zandr's. Jake, is this bug serving any purpose anymore? Should we just close it?
Updated•11 years ago
|
Whiteboard: [android_tier_1] → [android_tier_1] [2013Q1]
Updated•11 years ago
|
QA Contact: zandr → arich
Comment 13•11 years ago
|
||
The original impetus of bug 821400 was to replace Mac foopies with Linux ones. Just so I'm clear, do we intend to use a Linux foopy (i.e. not a Mac mini) in each tegra chassis?
Comment 14•11 years ago
|
||
The foopies won't be *in* the chassis (they aren't for pandas either). But to the extent possible, yes, we will use Linux foopies. We won't be moving mac foopies or building new ones.
Updated•11 years ago
|
Whiteboard: [android_tier_1] [2013Q1] → [android_tier_1] [2013Q2]
Updated•11 years ago
|
Whiteboard: [android_tier_1] [2013Q2] → [android_tier_1]
Updated•11 years ago
|
Whiteboard: [android_tier_1]
Updated•11 years ago
|
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
Comment 15•11 years ago
|
||
The decision has been jointly made by IT, releng, a-team, and product to move the tegras to Evelyn and not into chassis in a datacenter. They'll continue on as they currently are until they're decommissioned and won't ever be put in chassis.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•