1093202 - [Tiles][Back-end] Load testing the Tiles web-submission component

Reporter

Description

•

10 years ago

This bug is mostly for posterity -- a place to record load test results.  Bugs discovered as a result of this load testing will be marked to block this bug.

Karl Thiessen [:kthiessen, he/him]

Reporter

Comment 1

•

10 years ago

Day 1: 2014-10-28


beeswithmachineguns is a better tool for this than blitz.io.

I did, in fact, get the AutoScaler to generate a new set of 3 servers.

Sustained load of ~3K req/sec over about 15 minutes.  (4 requests failed out of 1.2 million.  Yay!)

Detailed logs and so forth forthcoming tomorrow morning.

So far, I'd say status looks green, but more testing is definitely called for.

Status: NEW → ASSIGNED

Karl Thiessen [:kthiessen, he/him]

Reporter

Comment 2

•

10 years ago

Day 2: 2014-10-29


Status: still green, no blockers.  AutoScaling continues to work as designed.

By grafting siege onto BWMG, Benson has created bees with siege engines! (Yay Ben!) 50 of these t2.micro instances can create a lode of 6.5K RPS.

This caused AutoScaling to ramp up, but nothing fell over.

By tweaking concurrency limits, I hope to get this strategy to 10K RPS today, at which point if we use a hive of 100 rather than 50, we should be able to get 20K.

Suggestions and encouragement welcome.  I will also be filing a bug somewhere to record these results.

Onward!
--KT.

Karl Thiessen [:kthiessen, he/him]

Reporter

Comment 3

•

10 years ago

Day 3: 2014-10-30


Hive of 50 t2.micros replaced with 10 c3.larges.  Managed to get 9K RPS out of those; scaling works as expected.

Small number of 500s (<0.1% of requests) as the ELBs scale up -- not sure that's avoidable.  In practice, I suspect it will just mean a re-fetch; anyone care to confirm?

Anyone who would like to is welcome to look at the Stackwatch graphs at https://app.stackdriver.com/groups/11386/stage-tiles for the last few days and tell me if anything looks fishy -- I'm mostly looking for catastrophic failures, and other people will have a better idea of how small glitches will affect the system.

Last run of yesterday used 20 c3.larges, and was able to get to 16K RPS for a moment, but couldn't sustain it.

Starting off today with 30 c3.larges, turning down the concurrency a little bit, and we'll see if we can get a sustained 15-20K RPS.  Wish me luck.

--KT.

Karl Thiessen [:kthiessen, he/him]

Reporter

Comment 4

•

10 years ago

Day 4: 2014-10-31  Status: YELLOW


Folks --

There does, in fact, appear to be a performance/functionality cliff at 16.5K RPS or so ... it's reachable as a peak but doesn't sustain.

I'd like for someone with better knowledge of the system than I have to take a look at the last few runs and swap theories about why that cliff happens there when I've been able to sustain slightly lower rates.

I hate to bury this in the 'bad news on Friday evening' timeslot, but that's the way it has worked out.

I'll bring this up first thing in the Monday 10:30 meeting, and request help from others in the group.

Thanks,
--KT.

Karl Thiessen [:kthiessen, he/him]

Reporter

Updated

•

10 years ago

Depends on: 1093204

Karl Thiessen [:kthiessen, he/him]

Reporter

Comment 5

•

10 years ago

Day 5: 2014-11-03  Status: GREEN

After Benson replaced the one ELB with a set of three behind DNS round-robin, I was able to get a sustained 15-20K RPS total out of 30 c3.large wasps.  I'm going to run a couple of bake-in tests overnight, but the object lesson here seems to be that 12K req/sec or so per ELB is a safe maximum.

If the bake-in tests succeed, I'll close out bug 1093204 and we'll proceed to the next hurdle.

Nobody; OK to take it and work on it

Updated

•

10 years ago

Product: Mozilla Services → Content Services

Karl Thiessen [:kthiessen, he/him]

Reporter

Comment 6

•

10 years ago

The work for the initial deployment is complete; raising the load ceiling will be covered in bug 1093204.
I'm going to remove the dependency, however -- the 16.5K ceiling was adequate for initial deployment,

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

No longer depends on: 1093204

Resolution: --- → FIXED

Karl Thiessen [:kthiessen, he/him]

Reporter

Comment 7

•

10 years ago

Marking this as VERIFIED to get it off my 'todo' query.

Status: RESOLVED → VERIFIED

Comment hidden (spam)

I'd like for someone with better knowledge of the system than I have to take a look at the last few runs and swap theories about why that cliff happens there when I've been able to sustain slightly lower rates.
Thanks
http://www.contact-phone-numbers.com

Bugzilla

Quick Search

[Tiles][Back-end] Load testing the Tiles web-submission component

Categories

(Content Services Graveyard :: General, defect)

Tracking

(Not tracked)

People

(Reporter: kthiessen, Assigned: mostlygeek)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Updated

Comment 6

Comment 7

Comment 8