new treeherder staging VMs

RESOLVED FIXED

Status

Infrastructure & Operations
Virtualization
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: fubar, Assigned: cknowles)

Tracking

Details

(Whiteboard: [treeherder][vm-create:8])

(Reporter)

Description

3 years ago
we'd like to build out a complete staging environment for treeherder in SCL3, to better match production and to test/verify future loads. as jeads says:

--
Our current staging environment for Treeherder is deployed on AWS. It's completely different than the production environment that's deployed on Mozilla's internal virtualization resources. Treeherder loads the full complement of build and test data from buildbot and provides a web service for other build/test systems to submit data through. We are now starting to see data from other build test systems like taskcluster, jenkins, and autophone on our staging environment but have no way to estimate what the additional load on production will be.  In addition to this, our AWS stage deployment is under heavy load at the moment due to limited resources.

So we have a number of pain points that are caused by our stage environment not mirroring production. We need a stage environment that can support the following things:

1.) Mimic production so we can do real load testing before turning on additional data submission systems in production.

2.) Simulate load from multiple clients to get a better idea of how things will behave in production.

3.) Test log parsing and etl data processing throughput before changes go into production.

4.) Test changes to the homegrown Mozilla chief/commander deployment scripts https://github.com/mozilla/treeherder-service/tree/master/deployment/update before deploying them in production.

To do this internally we would need the same number/configuration of VM's that we have in production. We could reduce the data storage time interval on stage to something lower than what we have in production (6 months) to reduce our disk usage (maybe 1-2 months) and still be able to accurately test changes in whole system throughput.
--

note in particular that this means the staging system would be processing the same, or more data, than production rather than the "standard" staging set up that sits idle except for brief test periods.

so.. please may we have:

treeherder-processor[123].stage.private.scl3 - ~40G / with 4G RAM and 2 CPU
treeherder-etl[12].stage.private.scl3 - ~40G / with 4G RAM and 2 CPU
treeherder[123].stage.webapp.scl3 - ~40G / with 6G RAM and 2 CPU
Blocks: 1064438, 1074213
Whiteboard: [treeherder]
(Assignee)

Comment 1

3 years ago
Per our IRC conversation - due to ongoing work, these likely won't be up until tomorrow - will update should that change in either direction.

However, to make life easier - allocated the IP addresses - so that you can get on with filing any ancillary bugs you need.

treeherder-processor1.stage.private.scl3.mozilla.com has address 10.22.75.22
treeherder-processor2.stage.private.scl3.mozilla.com has address 10.22.75.23
treeherder-processor3.stage.private.scl3.mozilla.com has address 10.22.75.24

treeherder-etl1.stage.private.scl3.mozilla.com has address 10.22.75.25
treeherder-etl2.stage.private.scl3.mozilla.com has address 10.22.75.26

treeherder1.stage.webapp.scl3.mozilla.com has address 10.22.81.29
treeherder2.stage.webapp.scl3.mozilla.com has address 10.22.81.30
treeherder3.stage.webapp.scl3.mozilla.com has address 10.22.81.31
(Assignee)

Comment 2

3 years ago
Alright, those have been created.

Initial puppet and nagios has been completed - should be ready for your customization.  If anything appears amiss, let me know.
Assignee: server-ops-virtualization → cknowles
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Whiteboard: [treeherder] → [treeherder][vm-create:8]
(Reporter)

Updated

3 years ago
Blocks: 1080595
No longer blocks: 1064438, 1074213
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.