Closed Bug 881903 Opened 11 years ago Closed 11 years ago

Production support for Socorro + RabbitMQ

Categories

(Infrastructure & Operations Graveyard :: WebOps: Socorro, task, P1)

x86_64
Linux

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: selenamarie, Assigned: bburton)

References

Details

(Whiteboard: [2013q3])

Socorro is transitioning to use RabbitMQ for production.  Socorro's peak supportable load appears to be around 500 crashes/second in the current infrastructure. 

This translates to a theoretical maximum of 500 inserts and 500 removals per second. 

Our more typical load is about 60 inserts/removals per second. 

We have talked a bit about using an existing queue rather than standing up our own special services. 

This ticket is to discuss and come to a decision on what the right infrastructure would be and to make a plan for deploying this in early Q3.
For that kind of load, I'm very tempted to put this on our shared rabbitmq cluster. That's already set up for redundancy, and 500+500/sec is not enough that I'm even a little bit concerned about it. We've tested it much higher than that, and we can always add one or more nodes dedicated to Socorro if need be. This gets us off the ground much faster.

If this turns out to be a problem, it should be relatively easy to switch to a dedicated cluster later on.
Sounds sane to me.
(In reply to Jake Maul [:jakem] from comment #1)
> For that kind of load, I'm very tempted to put this on our shared rabbitmq
> cluster. That's already set up for redundancy, and 500+500/sec is not enough
> that I'm even a little bit concerned about it. We've tested it much higher
> than that, and we can always add one or more nodes dedicated to Socorro if
> need be. This gets us off the ground much faster.
> 
> If this turns out to be a problem, it should be relatively easy to switch to
> a dedicated cluster later on.

+1  

Thanks, Jake! I'm about to finish up my integration testing, at which point we'll be ready when you are. :)
Component: Server Operations: Web Operations → WebOps: Socorro
Product: mozilla.org → Infrastructure & Operations
Assignee: server-ops-webops → bburton
Priority: -- → P1
Whiteboard: [2013q3]
I'll file a bug to get flows for this in place.

:selenamarie, to confirm, processors and collectors need flows right? Admin host too?
Status: NEW → ASSIGNED
(In reply to Brandon Burton [:solarce] from comment #4)
> I'll file a bug to get flows for this in place.
> 
> :selenamarie, to confirm, processors and collectors need flows right? Admin
> host too?

Answered my own questions by looking at https://bugzilla.mozilla.org/show_bug.cgi?id=878999
See Also: → 878999
Depends on: 904718
Depends on: 567826
Planning to turn this on in production 9/24/13.
------------------------------------------------------------------------
r75664 | bburton@mozilla.com | 2013-09-24 16:30:19 -0700 (Tue, 24 Sep 2013) | 1 line

final socorro prod crashmover and processor configs for rabbitmq, bug 917501
------------------------------------------------------------------------

/prod]
-> % svn ci -m "disabling socorro-monitor in prod now that we're using rabbitmq, bug 917501"
Sending        prod/data-bin/update-socorro.sh
Transmitting file data .
Committed revision 75666.

-> % svn ci -m "disabling socorro-monitor in prod now that we're using rabbitmq, bug 917501"
Sending        trunk/manifests/nodes/socorro.pp
Transmitting file data .

-> % svn ci -m "disabling socorro-monitor in prod now that we're using rabbitmq, bug 917501"
Sending        trunk/manifests/nodes/socorro.pp
Transmitting file data .
Committed revision 75668.
Committed revision 75667.
-> % svn ci -m "removing socorro monitor check, bug 881903" modules/nagios/
Sending        modules/nagios/manifests/mozilla/services.pp
Transmitting file data .
Committed revision 75744.
With the removal of all the monitor stuff, we can RF this

Seel also http://www.twobraids.com/2013/09/the-socorro-monitor-rest-in-peace.html
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.