Closed Bug 962830 Opened 10 years ago Closed 10 years ago

hardware for staging

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: lars, Unassigned)

References

Details

Attachments

(1 file)

https://speakerdeck.com/selenamarie/cost-of-100-percent-processing-and-crashstorage-options-for-socorro 10 years ago Selena Deckelmann :selenamarie :selena 103 bytes, text/plain		Details

K Lars Lohn [:lars] [:klohn]

Reporter

Description

•

10 years ago

All too frequently, we're failing in production pushes of new code because we're blindsided by the differences between our staging and production environments.  By handling only a fraction of the true production volume in staging, we're caught unaware problems related to volume, capacity and processing power.  

A severe pain point for has been synchronization of HBase.  The current batch system bogs down both production and staging.  We want to move to a real time synchronization.  That means collecting and processing the same crashes in both environments.

We have been mandated to move to full processing of all crashes rather than the sampling that we do today.  To accomplish we need to be confident about our system before we push to production. We need to have staging accurately reflect the production environment.  

This requires more hardware for staging.  

collector:  
   production 6
   staging 2

processors:
   production 10
   staging 2

Postgres database:
   production 4
   staging 1  (only actually need 2)

Elastic Search:
   production: many
   staging: not so many (? :adrian)

other systems, such as middleware and admin are not as relavent and can remain at current levels.

[DEACTIVATED] Adrian Gaudebert

Comment 1

•

10 years ago

Elastic Search:
   production: 5 (more to come shortly)
   staging: 3

Selena Deckelmann :selenamarie :selena

Comment 2

•

10 years ago

This is a costing piece.

K Lars Lohn [:lars] [:klohn]

Reporter

Updated

•

10 years ago

Blocks: 929669

Selena Deckelmann :selenamarie :selena

Comment 3

•

10 years ago

Hey :cturra,

Could we get a quote on this for budgeting purposes? Thanks!!
-selena

Flags: needinfo?(cturra)

Chris Turra [:cturra]

Comment 4

•

10 years ago

i am going to have to defer this request over to :cshields. he will need an hardware estimate (cpu, memory, etc) for these before he can provide a quote. :selena - would you be able to provide these?

Flags: needinfo?(cturra) → needinfo?(sdeckelmann)

Selena Deckelmann :selenamarie :selena

Comment 5

•

10 years ago

(In reply to Chris Turra [:cturra] from comment #4)
> i am going to have to defer this request over to :cshields. he will need an
> hardware estimate (cpu, memory, etc) for these before he can provide a
> quote. :selena - would you be able to provide these?

What we currently have is: 

Processors: HP BL2x220c, 2x L5640, 1x250GB SATA, 12GB RAM
Collectors: HP BL460c G7, 1x L5630, 2x300GB SAS, 6GB RAM  (although, I believe these are SSDs)
Postgres: HP DL360 G7 (BBWC), 2x L5640, 2x300GB and 6x900GB, 72GB RAM


Is that enough information to go on?

Flags: needinfo?(sdeckelmann)

Selena Deckelmann :selenamarie :selena

Updated

•

10 years ago

Flags: needinfo?(cshields)

Selena Deckelmann :selenamarie :selena

Comment 6

•

10 years ago

I'm increasing our stage Postgres needs to 3 systems, per 100% processing refactor of the Postgres schema. 

Postgres database:
   production 4
   staging 3

Selena Deckelmann :selenamarie :selena

Updated

•

10 years ago

Blocks: 972387

Sheeri Cabral [:sheeri]

Comment 8

•

10 years ago

My 2 cents: at least one of new staging db servers must be delivered and set up so we can migrate in time for the 4/30 expiration of current staging.

Corey Shields [:cshields]

Comment 9

•

10 years ago

Sorry, I'm behind on this stuff.

Will reach out to Rich for a quote but this bug is confusing and I need clarity first.  I'll be asking for the following (please answer my ??s below)

4x collectors (HP BL460c G7, 1x L5630, 2x300GB SAS, 6GB RAM)
8x processors (HP BL2x220c, 2x L5640, 1x250GB SATA, 12GB RAM)
2x elasticsearch nodes (what spec??)
?x postgres nodes (HP DL360 G7 (BBWC), 2x L5640, 2x300GB and 6x900GB, 72GB RAM)

Flags: needinfo?(sdeckelmann)

Flags: needinfo?(cshields)

Flags: needinfo?(adrian)

Jake Maul [:jakem]

Comment 10

•

10 years ago

See my comment in bug 962840... perhaps we could use some of the old Bunker Seamicro nodes for this? Some of them have 1TB disks, some have SSDs, all have lots of CPU and RAM. No RAID, but this is staging anyway, and theoretically we can always repopulate from prod.

This would probably suffice for collectors, processors, and ES nodes... dunno about postgres.

[DEACTIVATED] Adrian Gaudebert

Comment 11

•

10 years ago

Quoting bug 909884:

> An existing member of the cluster is socorro-es3.webapp.phx1.mozilla.com: https://inventory.mozilla.org/en-US/systems/show/4625/

> Model: HP - DL360 G7
> CPU: Intel(R) Xeon(R) CPU E5649  @ 2.53GHz
> RAM: 42GB

The new machines we received recently are a bit different: 

> single 6 core proc. 64 GB RAM. 2 x 300GB 10K (146GB 10K no longer available) and 6 x 1TB SATA

Flags: needinfo?(adrian)

Selena Deckelmann :selenamarie :selena

Comment 12

•

10 years ago

Sorry for being confusing. Thanks for helping us clarify, Corey.

(In reply to Corey Shields [:cshields] from comment #9)
> Sorry, I'm behind on this stuff.
> 
> Will reach out to Rich for a quote but this bug is confusing and I need
> clarity first.  I'll be asking for the following (please answer my ??s below)

We need: 
2x postgres nodes (HP DL360 G7 (BBWC), 2x L5640, 2x300GB and 6x900GB, 72GB
 RAM)

(we need 1 for replication and RO use with the middleware, and one for splitting our databases into two different schemas. In the end, we'll have 3 stage databases.)

Flags: needinfo?(sdeckelmann)

Sheeri Cabral [:sheeri]

Comment 13

•

10 years ago

Corey - please make that:

3x postgres nodes (HP DL360 G7 (BBWC), 2x L5640, 2x300GB and 6x900GB, 72GB
 RAM)

As the current stage node is being end-of-life'd on 4/30/2014.

Jake Maul [:jakem]

Updated

•

10 years ago

Assignee: server-ops-webops → infra

Component: WebOps: Socorro → Infrastructure: Other

QA Contact: nmaul → jdow

Selena Deckelmann :selenamarie :selena

Updated

•

10 years ago

Blocks: 929692

Sheeri Cabral [:sheeri]

Comment 14

•

10 years ago

Note so it doesn't get lost - repurpose https://inventory.mozilla.org/en-US/systems/show/1548/ for socorro2.stage.

Selena Deckelmann :selenamarie :selena

Comment 15

•

10 years ago

Attached file https://speakerdeck.com/selenamarie/cost-of-100-percent-processing-and-crashstorage-options-for-socorro — Details

Speakerdeck slides about TCO

Selena Deckelmann :selenamarie :selena

Comment 16

•

10 years ago

(In reply to Selena Deckelmann :selenamarie :selena from comment #15)
> Created attachment 8400201 [details]
> https://speakerdeck.com/selenamarie/cost-of-100-percent-processing-and-
> crashstorage-options-for-socorro
> 
> Speakerdeck slides about TCO

Oops wrong bug. SOrry for noise!

Selena Deckelmann :selenamarie :selena

Comment 17

•

10 years ago

Consolidated and budget submitted to laura.

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.