Closed
Bug 755394
Opened 13 years ago
Closed 13 years ago
No email delivery on BrowserID in production
Categories
(Cloud Services :: Operations: Miscellaneous, task, P1)
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: lhilaiel, Assigned: petef)
Details
(Whiteboard: [qa+])
tried three email addresses: lhilaiel@mozilla.com, wtf@hilaiel.com, and powned@hilaiel.com (the last two go through gmail).
no delivery after 10minutes
Comment 1•13 years ago
|
||
Ditto here. 60+m waiting for luke.crouch+pa@gmail.com
| Assignee | ||
Comment 2•13 years ago
|
||
This should be fixed now; smtp.socketlabs.com IP changed. Full impact information and timeline coming soon.
Status: NEW → ASSIGNED
| Assignee | ||
Comment 3•13 years ago
|
||
summary:
BrowserID production outbound email (used to verify email on new accounts) was not delivered for 207 minutes (07:43 - 11:10 PDT, 5/15/2012), which effectively blocked new accounts from signing up. This was due to our upstream email provider (SocketLabs) changing the IP address of their SMTP server. Our firewall rules are "default deny" for security, so we updated the rules to explicitly allow traffic to the new IP address and then manually flushed the mail queues. Due to a nagios check bug which will be resolved today, we were not paged when mail queues reached a non-zero size (should have been alerted around 08:00). We are also updating our firewall ACLs with all of the possible SocketLabs IP addresses (per their documentation) to avoid having the same type of problem in the future.
timeline (PDT):
07:43:37: smtp.socketlabs.com IP address changes, outbound email becomes undeliverable
10:52:00: ops notified on IRC, petef acked
10:57:00: identified problem from logs (firewall blocking traffic to new socketlabs IP)
10:58:50: netops blocker bug 755393 filed to update ACL
11:10:00: netops updated ACL, petef flushed postfix queues, emails delivered
Impact:
* 445 unique email messages were queued up
* 340 unique email addresses (so some people retried)
Problems & future fixes:
* Monitoring did not catch this. We do monitor mailq size, but Nagios' built-in check_mailq does not appear to be playing nicely with postfix. Filed bug 755410.
* smtp.socketlabs.com IP might change again in the future. Found socketlabs FAQ entry on the subject, filed Netops bug 755403 to add all the possible IPs in our ACLs.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•13 years ago
|
Whiteboard: [qa+]
Comment 4•13 years ago
|
||
Verified fast email delivery is working again in Prod.
Made sure to try gmail...
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•