Closed Bug 755394 Opened 13 years ago Closed 13 years ago

No email delivery on BrowserID in production

Categories

(Cloud Services :: Operations: Miscellaneous, task, P1)

x86_64
Linux
task

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: lhilaiel, Assigned: petef)

Details

(Whiteboard: [qa+])

tried three email addresses: lhilaiel@mozilla.com, wtf@hilaiel.com, and powned@hilaiel.com (the last two go through gmail). no delivery after 10minutes
Ditto here. 60+m waiting for luke.crouch+pa@gmail.com
This should be fixed now; smtp.socketlabs.com IP changed. Full impact information and timeline coming soon.
Status: NEW → ASSIGNED
summary: BrowserID production outbound email (used to verify email on new accounts) was not delivered for 207 minutes (07:43 - 11:10 PDT, 5/15/2012), which effectively blocked new accounts from signing up. This was due to our upstream email provider (SocketLabs) changing the IP address of their SMTP server. Our firewall rules are "default deny" for security, so we updated the rules to explicitly allow traffic to the new IP address and then manually flushed the mail queues. Due to a nagios check bug which will be resolved today, we were not paged when mail queues reached a non-zero size (should have been alerted around 08:00). We are also updating our firewall ACLs with all of the possible SocketLabs IP addresses (per their documentation) to avoid having the same type of problem in the future. timeline (PDT): 07:43:37: smtp.socketlabs.com IP address changes, outbound email becomes undeliverable 10:52:00: ops notified on IRC, petef acked 10:57:00: identified problem from logs (firewall blocking traffic to new socketlabs IP) 10:58:50: netops blocker bug 755393 filed to update ACL 11:10:00: netops updated ACL, petef flushed postfix queues, emails delivered Impact: * 445 unique email messages were queued up * 340 unique email addresses (so some people retried) Problems & future fixes: * Monitoring did not catch this. We do monitor mailq size, but Nagios' built-in check_mailq does not appear to be playing nicely with postfix. Filed bug 755410. * smtp.socketlabs.com IP might change again in the future. Found socketlabs FAQ entry on the subject, filed Netops bug 755403 to add all the possible IPs in our ACLs.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Whiteboard: [qa+]
Verified fast email delivery is working again in Prod. Made sure to try gmail...
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.