[tracker] switch from rabbitmq to pub/sub
Categories
(Socorro :: Infra, task, P2)
Tracking
(Not tracked)
People
(Reporter: willkg, Assigned: willkg)
References
Details
Socorro uses rabbitmq to queue crash ids for processing. It gets populated via three things:
-
Antenna (the collector) saves crashes to AWS S3 which triggers Pigeon which tosses crash ids into the socorro.normal queue.
-
A user views a crash report for a crash that hasn't been processed which adds a crash id to the socorro.priority queue.
-
Someone requests a crash to be reprocessed either from the report view or the Reprocess API which adds crash ids to the socorro.reprocessing queue.
The queue gets consumed by the RabbitMQCrashStore in the processor.
This bug covers figuring out a plan to redo that using Amazon SQS and come up with a rough work estimate.
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 1•6 years ago
|
||
I think we want to maintain three separate queues: normal, priority, and reprocessing. We should set up 3 SQS queues.
We can set up AWS S3 event notification to add things to SQS. If we decide to go with this plan, we should nix bug #1513080 because we won't need to adjust Antenna at all. I wonder what the shape of those events in SQS look like.
We can write an SQSCrashStore
that does what the RabbitMQCrashStore
is currently doing where it cycles between the queues and yields crash ids.
We can either continue with the current architecture and write PriorityjobRabbitMQCrashStore
and ReprocessingOneRabbitMQCrashStore
or rework how that works and do something saner. I think I'm voting for the latter.
We need to be able to run an SQS equivalent in the local development environment. We're currently using localstack for a local S3 and it implements a local SQS. That might work nicely.
I want to talk with Miles and Brian.
Assignee | ||
Comment 2•6 years ago
|
||
Brian pointed out maybe we skip SQS and just switch directly to pubsub. I'll look at that, too.
Assignee | ||
Comment 3•6 years ago
|
||
Assignee | ||
Comment 4•6 years ago
|
||
I worked through requirements and talked with Brian and Miles and the consensus is that we should switch to Pub/Sub.
We're going to try to do that this quarter. I'm going to rescope this bug to switching from rabbitmq to pubsub.
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 5•6 years ago
•
|
||
Rough scope of work:
- bug #1527343: (Will) Write a class the collector can use to produce Pub/Sub messages.
- bug #1527346: (Will) Write a class the webapp can use to add crash ids in the same to socorro.priority and socorro.reprocessing queues.
- bug #1527345: (Will) Write class processor will use to consume crash ids from the Pub/Sub queues.
- (Will) Write tests.
- (ops) Set up socorro.normal, socorro.priority, and socorro.reprocessing queues for stage.
- (ops) Set up socorro.normal, socorro.priority, and socorro.reprocessing queues for prod.
- (ops; Will) Write configuration.
- (ops; Will) Deploy to stage.
- (ops) Set up queues in stage.
- (ops; Will) Deploy antenna with Pub/Sub configuration and code.
- (ops; Will) Deploy socorro processor and webapp with Pub/Sub configuration and code.
- (ops; Will) Update Datadog graphs for stage.
- (ops; Will) Verify.
- (ops; Will) Deploy to prod.
I'll break this up into bugs that block this bug.
Assignee | ||
Comment 6•6 years ago
|
||
All bugs have been completed and we've pushed everything to prod. Marking as FIXED.
Assignee | ||
Comment 7•9 months ago
|
||
Here's the project plan to switch to PubSub in case future me needs to find it again: https://docs.google.com/document/d/13_PbjSncCH60tLjkWst91B6TQQDaJYlSuSvPLqaWN5A/edit
Description
•