Closed Bug 1455461 Opened 7 years ago Closed 6 years ago

Infra request for prototype Browser Error Alerting

Categories

(Cloud Services :: Operations: Miscellaneous, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: osmose, Assigned: miles)

References

Details

In bug 1454549, we're prototyping an alerting system to run alongside the existing browser error collection, since Sentry's alerting is not suitable. To run this, we'd like to request some infra to run the service on for 3-6 months (until browser error collection is ended sometime in late 2018q3). The alerting system is implemented as a Docker container intended to run in EC2. The WIP deployment docs for the service are available here: https://github.com/mozilla/bec-alerts#deployment The resources we'd need in AWS include: - An SQS queue for receiving and storing events from Sentry - Written to by Sentry, read from the processor - An SES account for sending emails - Written to by the watcher - An RDS Postgres resource - Written to and read from the processor and watcher - EC2 instances to run the processor and watcher processes. The Firefox JS Error project on Sentry will need the built-in SQS plugin configured to write to the SQS queue, which miles says will probably need new netflows opened up along with the plugin config itself. The actual rules and emails for alerts are hard-coded into the app and do not need to be set in configuration or in the database. We'll probably want a new Sentry project to send backend errors to. Miles and I discussed using datadog for monitoring two main things we care about: - Reads/deletions from the queue to ensure the processor is grabbing data. - A health ping via DogStatsD that the watcher will send when it processes events. There's no uptime requirements on this; it's a prototype, and if it breaks, it can just stay broken until I/miles can get around to fixing it. Deploys will require running migrations on the database.
We got the infra set up for this a while ago.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.