Closed Bug 1174903 Opened 9 years ago Closed 9 years ago

Load Testing for Unified Telemetry beta release

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: sphilp, Unassigned)

References

Details

(Whiteboard: [unifiedTelemetry][b5])

      No description provided.
Per discussiosn in IRC as well as :mreid's storage exploration and historical heka stats, we think we'll see about 1000rps averaged, spiking to about 2000rps based on time of day, and each request is about 100KB. 

This load test should ramp up to 2000rps at 100KB and then sustained for about an hour.

May need to prepare staging
Note that the above estimates are for Pre-release volume (nightly/aurora/beta).
Priority: -- → P1
Whiteboard: [unifiedTelemetry][b5]
Summary: Load Testing for Unified Telemetry → Load Testing for Unified Telemetry beta release
Load tested staging up to ~2000rps in bursts, seemed okay. Will do an sustained test tomorrow to confirm, but it's looking like this should be fine.
The longer test killed heka with too many file handles open (which is something that Wes can set I believe). Something to be aware of. Restarted the test with a lower concurrency to keep that down (while still hitting 2000rps) and everything looks fine for beta numbers. Dropped from 1024 -> 512 concurrent connections.

Something to look at before release is varying submission size, we estimated 100kb on average but it is possible we could have smaller/larger submissions n% of the time and would provide a more realistic load.

One piece of infrastructure that wasn't a part of this, which Wes is going to set up in stage, is the tee server. I'm told that is network bound so a simple scale/sizing test should suffice to make sure the tee instance can handle the traffic.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.