Closed Bug 1265557 Opened 10 years ago Closed 10 years ago

Purchase new hardware for hg.mo SSH server

Categories

(Infrastructure & Operations :: DCOps, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gps, Assigned: van)

References

Details

Attachments

(1 file)

+++ This bug was initially created as a clone of Bug #1244874 +++ Bug 1244874 set out to purchase all the new hardware for hg.mozilla.org but we only purchased new hardware for the hgweb machines. We still have budget (~$17k) and intent to replace the hgssh server(s). What held us up before was figuring out what we wanted to do with I/O for hgssh. gcox provided an excellent write-up of options in bug 1244874 comment #11. Since that was written we've installed the new hgweb machines. They have SSDs capable of writing at 300+MB/s. The I/O performance difference between the old hgweb machines (which I assume were using spinning disks) is night and day. The I/O performance difference between hgssh (netapp via NFS) and the old hgweb machines was already night and day, so the even faster hgweb I/O makes hgssh seem like a tortoise. In bug 1248072, gcox provided some Netapp mounts to play around with. There were promising performance wins by using a "poor man's iSCSI" (mounting a ext4 filesystem on a loopback device from a block device exposed by the netapp). However, the performance still fell far short of what's possible with local SSDs. Furthermore, I raised concerns about durability. I'm not keen of using a local filesystem from a network mounted device. gcox also suggested some "refactorings" to our NFS setup, including the potential switch to NFSv4. These are tempting. However, he lists NFSv4 as a "major" re-architecture and I'm sensitive to creating a new project to deploy NFSv4 just for Mercurial. Furthermore, I /think/ I recall Matt Mackall (Mercurial's project lead) saying things about NFSv4 not working well with Mercurial or something like that. There were NFS-related re-architectures requiring less effort than NFSv4. However, when it comes to NFS (or any network based base filesystem), I'm pessimistic about performance. Mercurial's I/O access patterns are such that the overhead of network I/O operations will kill performance. We would need something like FC or iSCSI to get decent performance, and that's way, way beyond our budget. So this assessment kinda puts us in a bind. The easiest thing to do is make no major changes and continuing using NFS as we are today. We know it works. And it has the durability and failover properties we want. But performance is relatively horrible compared to what the new hgweb machines are getting. I'd *really* like to achieve better performance. I feel that `hg push` is a synchronous operation for a lot of people and anything we can do to make that faster will improve developer productivity by not requiring as many context switches. So here's a new, compromise solution. We buy a new hgssh server that has local SSD(s). Writes to hg.mozilla.org go to the local filesystem (read: they are fast). We install a replication mirror/daemon on the hgssh server that consumes the replication log and applies changes into an NFS mounted volume. Basically this is what the hgweb machines are doing except we write into an NFS mount instead of a local filesystem. This achieves fast local writes without sacrificing the immediate backup and failover availability of the data. There are some downsides to this approach. First, there is a window where the NFS won't have all data from local disk. It is possible that the master could receive a write, crash, and data loss could ensue. However, data loss would only be from very recent pushes (something within 1-5s assuming the replication log is behaving as expected). I /think/ we can live with this (we just ask people to re-push). Second, failover strategy could be complicated. I /think/ I want the failover to be that the warm SSH standby server serves from NFS. So if a push goes into the failed-over backup server, it goes direct to NFS. But this creates a problem with restoring the master server. How do we reconcile changes made to NFS when the master was offline? We could consume the replication log manually. But this requires manual intervention. If e.g. the master goes offline for a few minutes then comes back on its own, we could have a problem where it is automatically promoted into master again and its local filesystem state is behind NFS. We would need a mechanism to ensure a failed master doesn't come back online without human involvement. Perhaps we can do this with the load balancer? Perhaps the standby server can write a file on NFS saying NFS was written to and the master could look at it and refuse to accept writes until the file was removed [after the master's local filesystem was reconciled with NFS]? Perhaps we could employ a custom load balancer check that refuses to put the master server back in service if the standby server wrote something. Distributed systems are hard. But I think we can devise something that is a combination of custom load balancer health checks, Mercurial hooks, and scripts to recover from failover that will allow the dual local filesystem + NFS mirror setup to work. It sounds like a lot of work, but trust me: the performance difference from pushing will be night and day if we are using local SSDs.
van: since we last did quotes for hgweb, Intel has announced their v4 Xeons featuring Broadwell CPUs. Since they are announced, I'd prefer to not purchase v3 Xeons since they are effectively out of date. Can you please check with our vendor regarding v4 Xeon availability? We're probably interested in the E5-2637 v4 or E5-2643 v4. Also, you may need to confirm that the Xeon v4 work with the blades we support.
Flags: needinfo?(vle)
honestly, if we want to move away from landing/serving data from NFS then we should just do that completely. the proposed complexity scares the hell out of me if/when something goes completely pear shaped (and you're on pto). things to note: 1) we lose netapp snapshots for DR purposes 2) tooling will have to be made/adjusted for things like twig resets, new l10n repos, etc 3) tbh, it needs to be bulletproof I'm sure I'll think of more things after I have more coffee...
The failover and disaster recovery situations for full detach from NFS are even scarier, IMO. One idea was to use storage on a blade (can't remember what it is called). Basically there's some SSDs on a blade/daughter card that is attached to the master. If the master goes down, someone has to *physically* relocate the storage to the new master. Since we don't have people in the data center 24/7, we could be read only for a few hours or more. A variation of what I proposed is to use local storage on the standy master instead of NFS. But, failover ends up being almost completely the same. Except you don't have NFS/Netapp, so you lose some DR benefits. Some of the work I was doing with "headless try" might be relevant here. Basically that proof-of-concept siphoned incoming data into a standalone file and uploaded it to S3. In theory, the series of standalone files is enough to reconstruct full repository history. We could leverage that in combination with local storage to gain back some DR. But failover is still a pain. If we talk about removing the single master (Netapp/NFS) and have multiple potential masters, failover is going to be complicated. No way around it. It is right for you to question the complexities of these proposals and prefer simplicity. Unfortunately, simplicity means the service is up to 10x slower than it could be :/
these are the current processors compatible with the g9 blades. please let me know if you're looking at any specific processor and i can inquire our VAR about it's availability. E5-2600 v4 series Processors HPE BL460c Gen9 Intel® Xeon® E5-2690v4 (2.6GHz/14-core/35MB/135W) FIO Processor Kit 819852-L21 HPE BL460c Gen9 Intel® Xeon® E5-2683v4 (2.1GHz/16-core/40MB/120W) FIO Processor Kit 819851-L21 HPE BL460c Gen9 Intel® Xeon® E5-2680v4 (2.4GHz/14-core/35MB/120W) FIO Processor Kit 819842-L21 HPE BL460c Gen9 Intel® Xeon® E5-2660v4 (2.0GHz/14-core/35MB/105W) FIO Processor Kit 819841-L21 HPE BL460c Gen9 Intel® Xeon® E5-2650v4 (2.2GHz/12-core/30MB/105W) FIO Processor Kit 819840-L21 HPE BL460c Gen9 Intel® Xeon® E5-2650Lv4 (1.7GHz/14-core/35MB/65W) FIO Processor Kit 819849-L21 HPE BL460c Gen9 Intel® Xeon® E5-2640v4 (2.4GHz/10-core/25MB/90W) FIO Processor Kit 819839-L21 HPE BL460c Gen9 Intel® Xeon® E5-2630v4 (2.2GHz/8-core/25MB/85W) FIO Processor Kit 819845-L21 HPE BL460c Gen9 Intel® Xeon® E5-2630Lv4 (1.8GHz/10-core/25MB/55W) FIO Processor Kit 819846-L21 HPE BL460c Gen9 Intel® Xeon® E5-2623v4 (2.6GHz/4-core/10MB/85W) FIO Processor Kit 819844-L21 HPE BL460c Gen9 Intel® Xeon® E5-2620v4 (2.1GHz/8-core/20MB/85W) FIO Processor Kit 819838-L21 HPE BL460c Gen9 Intel® Xeon® E5-2609v4 (1.7GHz/8-core/20MB/85W) FIO Processor Kit 819837-L21 HPE BL460c Gen9 Intel® Xeon® E5-2680v4 (2.4GHz/14-core/35MB/120W) FIO Processor Kit 819842-L21 HPE BL460c Gen9 Intel® Xeon® E5-2699v4 (2.2GHz/22-core/55MB/145W) FIO Processor Kit 819856-L21 HPE BL460c Gen9 Intel® Xeon® E5-2698v4 (2.2GHz/20-core/50MB/135W) FIO Processor Kit 819855-L21 HPE BL460c Gen9 Intel® Xeon® E5-2697v4 (2.3GHz/18-core/45MB/145W) FIO Processor Kit 819854-L21 HPE BL460c Gen9 Intel® Xeon® E5-2697Av4 (2.6GHz/16-core/40MB/145W) FIO Processor Kit 819857-L21 HPE BL460c Gen9 Intel® Xeon® E5-2695v4 (2.1GHz/18-core/45MB/120W) FIO Processor Kit 819853-L21 HPE BL460c Gen9 Intel® Xeon® E5-2667v4 (3.2GHz/8-core/25MB/135W) FIO Processor Kit 819850-L21 HPE BL460c Gen9 Intel® Xeon® E5-2643v4 (3.4GHz/6-core/20MB/135W) FIO Processor Kit 819848-L21 HPE BL460c Gen9 Intel® Xeon® E5-2637v4 (3.5GHz/4-core/15MB/135W) FIO Processor Kit 819847-L21
Assignee: server-ops-dcops → vle
Flags: needinfo?(vle)
QA Contact: cshields
>We're probably interested in the E5-2637 v4 or E5-2643 v4. i can ask for a quote, how do you want the hardware configured? will this be a 2 CPU system, 16GB with 1 SSD? HPE BL460c Gen9 Intel® Xeon® E5-2643v4 (3.4GHz/6-core/20MB/135W) FIO Processor Kit 819848-L21 HPE BL460c Gen9 Intel® Xeon® E5-2637v4 (3.5GHz/4-core/15MB/135W) FIO Processor Kit 819847-L21
Let's get a quote for 2x E5-2637v4, 32 GB RAM, and the same SSD we got for hgweb.
(In reply to Gregory Szorc [:gps] from comment #3) > The failover and disaster recovery situations for full detach from NFS are > even scarier, IMO. I think it depends on what you mean by 'full detach from NFS'. If the NetApp cluster catches fire, then yes. If we're talking about the NFS mount falling off, I think that's actually easier - it becomes a convenient dead man's switch, and all incoming changes are denied. What I want to avoid is getting into any sort of weird, split brain scenario, where we have different, or extra, data written to multiple locations and being forced into possibly reconciling that *in addition* to dealing with whatever caused the initial failover. Thoughts on things like lustre/gpfs/gfs2? Part of me thinks HA clustering might help, but then I remember all of the pain associated with it and drink heavily... :-\ Alternatiely, maybe a half shelf of SSDs on the filer would get us more performance? With WAFL and NVRAM I'm skeptical that it would be the same as local SSDs, because NFS, but we can actually compare the cost (~40k) vs developer time spent waiting...
a thought: I think I'd like to separate replacing the hardware and doing something, whatever it is, different with handling I/O. that is, figuring out what to do differently and making it all bulletproof shouldn't block replacing the hardware.
(In reply to Kendall Libby [:fubar] from comment #8) > a thought: I think I'd like to separate replacing the hardware and doing > something, whatever it is, different with handling I/O. that is, figuring > out what to do differently and making it all bulletproof shouldn't block > replacing the hardware. I'm mostly OK with that. A risk is we purchase local storage we don't need or we don't purchase the local storage we want. e.g. we buy a 800 GB SSD and waste the money on that if we keep using the Netapp or we don't buy a 2nd SSD for redundancy.
816909-B21 HP 960GB 6G SATA RI-3 SFF SC SSD $6,315 each 2x E5-2637v4 32 GB RAM Dual 10GB Ethernet 816909-B21 HP 960GB 6G SATA RI-3 SFF SC SSD $9,865 each 2x E5-2643v4 32 GB RAM Dual 10GB Ethernet
For CPU core count, hgssh needs CPUs for: * Serving `hg pull` requests from the mirrors * Serving random `hg push` and `hg pull` requests * Producing bundles every night As long as the core count is >= the hgweb mirror count, I think we should be good. Once hgweb[1-10] get decommissioned, we'll have 4 hgweb mirrors in scl3. In the future, we'll likely establish some mirrors in AWS. I think 8 cores (2x E5-2637v4) should be sufficient. But we can go with 12 cores (2x E5-2643v4) if we want head room. I'm leaning towards the 8 cores (which is more like 10 cores with hyperthreading) because I think that will be plenty.
I agree with comment 8. Let's get replacement hardware with 1 SSD now so we unblock hardware purchasing without spending money we may not need. We can always upgrade the hardware with a RAID controller and 2nd SSD (and rebuild the nodes one at a time) if we decide to move to on-disk repos at a later date. :gps and :fubar: what information/testing do need to make a final decision on the hardware specs?
I'm pretty comfortable moving forward with 1 blade of 2x E5-2637v4's. I need to look at the list of SSD drives again. (The tentative plan has been to re-up warranty on hgssh1 or 2 and use it as a standby. No matter what we do with the storage, I'm pretty comfortable with having failover to an older, slower server for a few days/weeks until the master server/hardware can be replaced/rebuilt. That's no worse than what we have today. And it doesn't require us spending money on a 99% idle warm standby.
arr, fubar, and I chatted about things just now. We decided to hold off on making a decision regarding local storage vs Netapp. We're going to get replacement servers first. If we decide to move to local storage, we can stuff some RAID controllers and SSDs in the blades at a later date. We also decided to buy 2 servers so we have identical hardware. If something ever goes wrong with the master, failover should be easy since the servers are identical. We'll be buying 2 2x E5-2637v4 w/ 32 GB RAM. We just need to figure out the storage. van: can you please get a new quote with a "regular" SSD boot drive. 120 or 240 GB mixed use should be fine. From http://www8.hp.com/h20195/v2/GetPDF.aspx%2Fc04154378.pdf, I found the following potential SSDs: HP 120GB 6G SATA Value Endurance SFF 2.5-in SC Enterprise Boot 3yr Wty Solid State Drive 717965-B21 HP 120GB 6G SATA Value Endurance LFF 3.5-in SC Enterprise Boot 3yr Wty Solid State Drive 718171-B21 HP 240GB 6G SATA Value Endurance SFF 2.5-in ENT Value 3yr Wty Non-hot Plug G1 Solid State Drive 756654-B21 HP 120GB 6G SATA Value Endurance SFF 2.5-in ENT Value 3yr Wty Non-hot Plug G1 Solid State Drive 756633-B21 HP 120GB 6G SATA Mixed Use-3 SFF 2.5-in SC 3yr Wty Solid State Drive 816965-B21 HP 120GB 6G SATA Value Endurance SFF 2.5-in SC Enterprise Value 3yr Wty M1 Solid State Drive 764923-B21 HP 120GB 6G SATA Value Endurance SFF 2.5-in SC Enterprise Value 3yr Wty G1 Solid State Drive 756621-B21 HP 120GB 6G SATA Value Endurance SFF 2.5-in Enterprise Value 3yr Wty M1 Solid State Drive 764914-B21 HP 120GB 6G SATA Value Endurance SFF 2.5-in Enterprise Value 3yr Wty G1 Solid State Drive 756630-B21 (I'll have to read the PDF again to see the nuanced differences between these drives.) If you talk to someone to get the quote, we just want a vanilla boot drive - nothing fancy. It will likely be the cheapest SSD they offer :)
Flags: needinfo?(vle)
here are a few drive options from our vendor. we can only use the 2.5" drives. HP 120GB 6G SATA Value Endurance SFF 2.5-in SC Enterprise Boot 3yr Wty Solid State 717965-B21 $445 HP 240GB 6G SATA Value Endurance SFF 2.5-in SC Enterprise Value 3yr Wty Solid State Drive717969-B21 $645 HP 120GB 6G SATA Value Endurance SFF 2.5-in SC Enterprise Value 3yr Wty M1 Solid State Drive 764923-B21 $455 HP 240GB 6G SATA Value Endurance SFF 2.5-in SC Enterprise Value 3yr Wty M1 Solid State Drive 764925-B21 $729 HP 240GB 6G SATA Mixed Use-3 SFF 2.5-in SC 3yr Wty Solid State Drive 816975-B21 $625 HP 120GB 6G SATA Mixed Use-3 SFF 2.5-in SC 3yr Wty Solid State Drive 816965-B21 $335
Flags: needinfo?(vle)
Man, they really gouge you on SSD prices :/ The $335 option (816965-B21) looks fine and is more than sufficient in terms of performance. Let's get a quote so we can place the order!
:van so to be clear, we want to get quotes on/order two of: 816965-B21 HP 120GB 6G SATA Mixed Use-3 SFF 2.5-in SC 3yr Wty Solid State Drive 2x E5-2637v4 32 GB RAM Dual 10GB Ethernet
Attached file blade quote
Sorry for the delayed response. Quote is attached Each Blade configured with 2 x HPE BL460c Gen9 E5-2637v4 Processors HPE 32GB 1Rx4 PC4-2400T-R Kit ( 2 x 16GB DIMM) 1 x HP 120GB 6Gb SATA 2.5 MU-PLP SC S2 SSD 1 x HP B140i Smart Array (http://www8.hp.com/h20195/v2/GetPDF.aspx/c04390743.pdf) 1 x HP Ethernet 10Gb 2P 560FLB Adptr This is a configure to order configuration so about 2 week lead time Please let me know of any questions.
vendor finally got back to us. looks like we don't need a PO for this since it's under 25k but will need 2 approvals per - https://mana.mozilla.org/wiki/display/FIN/Project+Pre-Approvals
Flags: needinfo?(gps)
Do we need the RAID controller? I recall arr saying something about adding this later if we decide to throw multiple local disks on the servers.
>Do we need the RAID controller? I recall arr saying something about adding this later if we decide to throw multiple local disks on the servers. yup, i was curious about that too so i've emailed him.
It is embedded in the motherboard and there is no charge associated with it. It will only support SATA drives Key Features · System memory used as read cache · Smart Array RAID engine running in OS driver · RAID functionality requires driver to be downloaded from HP website for Linux and VMware · Support for migration of drives to P-Series controllers · RAID 0, 1, 1+0, and RAID 5 without the need of cache module · Windows, Linux and VMware support · Online drive flash support · HP Smart Storage Administrator (HP SSA
Not sure if my approval counts, but I'll give it. needinfo arr to provide additional approval. jgriffin or lmandel can provide a 2nd manager approval if you need it.
Flags: needinfo?(gps) → needinfo?(arich)
I approve, Lawrence, please also approve
Flags: needinfo?(arich) → needinfo?(lmandel)
There are a lot of parts and options listed. Can one of you please add a comment with a summary of what we're planning to order and the total cost?
Flags: needinfo?(vle)
Flags: needinfo?(lmandel)
Flags: needinfo?(gps)
Flags: needinfo?(arich)
Sorry, Lawrence, we're ordering the quote attached to comment 18. That's two blades with the following specs: 2 x HPE BL460c Gen9 E5-2637v4 Processors 1 x HPE 32GB 1Rx4 PC4-2400T-R Kit ( 2 x 16GB DIMM) 1 x HP 120GB 6Gb SATA 2.5 MU-PLP SC S2 SSD 1 x HP B140i Smart Array (http://www8.hp.com/h20195/v2/GetPDF.aspx/c04390743.pdf) 1 x HP Ethernet 10Gb 2P 560FLB Adptr
Flags: needinfo?(vle)
Flags: needinfo?(lmandel)
Flags: needinfo?(gps)
Flags: needinfo?(arich)
Thanks. The total is ~$12k. I approve.
Flags: needinfo?(lmandel)
Terminal is processing our request. ETA is 2 weeks for the blades as they're CTO (configured to order).
update regarding the custom servers: Planned Ship Date 17-May-2016 Planned Delivery Date 24-May-2016
blades received and installed. opened 1277643 for MOC to install o/s.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Blocks: 1283185
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: