Closed Bug 383744 Opened 18 years ago Closed 18 years ago

Load testing for crash reporting system

Categories

(mozilla.org Graveyard :: Server Operations, task)

All
Other
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: morgamic, Assigned: morgamic)

Details

Attachments

(2 files)

We need to start load testing the crash reporting setup. The two pieces we'd like to test are the collector and the reporter (web apps). Page to discuss: http://wiki.mozilla.org/Breakpad/Design/Loadtesting We are not sure what our expected req/s is so it'd likely be finding the limits of the applications and deciding. I talked about this on the wiki page. Aravind -- is there a good time we can set up to do some preliminary load testing? We'd need someone to watch and log app and db load during tests.
Assignee: server-ops → aravind
The collector can be tested much independently, because it doesn't have any DB load. I expect a load of around 60,000 reports per day, but Jay says that it could peak much higher than that, so we should be prepared for peak loading of 180,000 reports/day. The .dump and .json files that aravind has saved from the processor should be useful for this, and I wrote a python script that can be used to mimic clients sending minidumps: http://socorro.googlecode.com/svn/trunk/scripts/collector-loadtesting.py
So is this on hold until we get the actual hardware? In the meantime it might be useful to profile all three pieces of the system to make sure we aren't missing any time wasters. I would suggest we focus on that while the hardware is spec'd/purchased/set up. Thoughts?
The collector is on the production cluster, so that can be tested now. There isn't much point in load-testing the processor/reporter/DB combination until it's on production hardware. I think it would be useful to profile the app code... though I'm not sure how to do it. I will be looking at the pathological case of the infinite-recursion crash in particular.
I was looking at using this: http://docs.python.org/lib/module-profile.html ...to call paster serve, the standalone processor in order to get a handle on how resources are allocated and time is spent. Not sure how to profile the collector (trick will be calling it from the command line).
We will be getting some new servers next week, so I can set this up on it.
Okay, sorry this took a while. We now have a brand new server processing crash reports and serving the reporter app. https://crash-reports.mozilla.com/submit lives on the production web farm and accepts crash dumps. These are processed on the new breakpad server. The db backed serving this is also on a production database. The reports are now available at http://crash-stats.mozilla.com. This is fronted by the netscaler and goes to the backend breakpad production server. I think we are in good shape to start load testing the app now. The db server is already monitored in nagios. I will get the other urls into nagios as well.
Assignee: aravind → morgamic
Picking this up. We hope to deploy an update to breakpad that eliminates the memory leak this week, then use raw minidumps as POST data in combination with grinder to mimic high load. This should give us a good indication of what the collector's upper limits are. Right now our goal by Friday should be to reliably update production then set a time for load testing next Monday or Tuesday evening during off-peak. Does that sound reasonable?
Status: NEW → ASSIGNED
morgamic: Can we close this bug if we don't have any actionable items in the near future?
Please re-open the bug once we (IT) have any actionable items.
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: