Closed Bug 962830 Opened 10 years ago Closed 10 years ago

hardware for staging

Categories

(Infrastructure & Operations :: Infrastructure: Other, task)

x86_64
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: lars, Unassigned)

References

Details

Attachments

(1 file)

All too frequently, we're failing in production pushes of new code because we're blindsided by the differences between our staging and production environments.  By handling only a fraction of the true production volume in staging, we're caught unaware problems related to volume, capacity and processing power.  

A severe pain point for has been synchronization of HBase.  The current batch system bogs down both production and staging.  We want to move to a real time synchronization.  That means collecting and processing the same crashes in both environments.

We have been mandated to move to full processing of all crashes rather than the sampling that we do today.  To accomplish we need to be confident about our system before we push to production. We need to have staging accurately reflect the production environment.  

This requires more hardware for staging.  

collector:  
   production 6
   staging 2

processors:
   production 10
   staging 2

Postgres database:
   production 4
   staging 1  (only actually need 2)

Elastic Search:
   production: many
   staging: not so many (? :adrian)

other systems, such as middleware and admin are not as relavent and can remain at current levels.
Elastic Search:
   production: 5 (more to come shortly)
   staging: 3
This is a costing piece.
Blocks: 929669
Hey :cturra,

Could we get a quote on this for budgeting purposes? Thanks!!
-selena
Flags: needinfo?(cturra)
i am going to have to defer this request over to :cshields. he will need an hardware estimate (cpu, memory, etc) for these before he can provide a quote. :selena - would you be able to provide these?
Flags: needinfo?(cturra) → needinfo?(sdeckelmann)
(In reply to Chris Turra [:cturra] from comment #4)
> i am going to have to defer this request over to :cshields. he will need an
> hardware estimate (cpu, memory, etc) for these before he can provide a
> quote. :selena - would you be able to provide these?

What we currently have is: 

Processors: HP BL2x220c, 2x L5640, 1x250GB SATA, 12GB RAM
Collectors: HP BL460c G7, 1x L5630, 2x300GB SAS, 6GB RAM  (although, I believe these are SSDs)
Postgres: HP DL360 G7 (BBWC), 2x L5640, 2x300GB and 6x900GB, 72GB RAM


Is that enough information to go on?
Flags: needinfo?(sdeckelmann)
Flags: needinfo?(cshields)
I'm increasing our stage Postgres needs to 3 systems, per 100% processing refactor of the Postgres schema. 

Postgres database:
   production 4
   staging 3
My 2 cents: at least one of new staging db servers must be delivered and set up so we can migrate in time for the 4/30 expiration of current staging.
Sorry, I'm behind on this stuff.

Will reach out to Rich for a quote but this bug is confusing and I need clarity first.  I'll be asking for the following (please answer my ??s below)

4x collectors (HP BL460c G7, 1x L5630, 2x300GB SAS, 6GB RAM)
8x processors (HP BL2x220c, 2x L5640, 1x250GB SATA, 12GB RAM)
2x elasticsearch nodes (what spec??)
?x postgres nodes (HP DL360 G7 (BBWC), 2x L5640, 2x300GB and 6x900GB, 72GB RAM)
Flags: needinfo?(sdeckelmann)
Flags: needinfo?(cshields)
Flags: needinfo?(adrian)
See my comment in bug 962840... perhaps we could use some of the old Bunker Seamicro nodes for this? Some of them have 1TB disks, some have SSDs, all have lots of CPU and RAM. No RAID, but this is staging anyway, and theoretically we can always repopulate from prod.

This would probably suffice for collectors, processors, and ES nodes... dunno about postgres.
Quoting bug 909884:

> An existing member of the cluster is socorro-es3.webapp.phx1.mozilla.com: https://inventory.mozilla.org/en-US/systems/show/4625/

> Model: HP - DL360 G7
> CPU: Intel(R) Xeon(R) CPU E5649  @ 2.53GHz
> RAM: 42GB

The new machines we received recently are a bit different: 

> single 6 core proc. 64 GB RAM. 2 x 300GB 10K (146GB 10K no longer available) and 6 x 1TB SATA
Flags: needinfo?(adrian)
Sorry for being confusing. Thanks for helping us clarify, Corey.

(In reply to Corey Shields [:cshields] from comment #9)
> Sorry, I'm behind on this stuff.
> 
> Will reach out to Rich for a quote but this bug is confusing and I need
> clarity first.  I'll be asking for the following (please answer my ??s below)

We need: 
2x postgres nodes (HP DL360 G7 (BBWC), 2x L5640, 2x300GB and 6x900GB, 72GB
 RAM)

(we need 1 for replication and RO use with the middleware, and one for splitting our databases into two different schemas. In the end, we'll have 3 stage databases.)
Flags: needinfo?(sdeckelmann)
Corey - please make that:

3x postgres nodes (HP DL360 G7 (BBWC), 2x L5640, 2x300GB and 6x900GB, 72GB
 RAM)

As the current stage node is being end-of-life'd on 4/30/2014.
Assignee: server-ops-webops → infra
Component: WebOps: Socorro → Infrastructure: Other
QA Contact: nmaul → jdow
Note so it doesn't get lost - repurpose https://inventory.mozilla.org/en-US/systems/show/1548/ for socorro2.stage.
(In reply to Selena Deckelmann :selenamarie :selena from comment #15)
> Created attachment 8400201 [details]
> https://speakerdeck.com/selenamarie/cost-of-100-percent-processing-and-
> crashstorage-options-for-socorro
> 
> Speakerdeck slides about TCO

Oops wrong bug. SOrry for noise!
Consolidated and budget submitted to laura.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: