Closed Bug 1086923 Opened 10 years ago Closed 10 years ago

MDN: unable to connect to smtp.socketlabs.com

Categories

(Infrastructure & Operations Graveyard :: WebOps: Community Platform, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: groovecoder, Unassigned)

Details

(Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/1708] )

10 of these in the last 3 days:

socket:error: [Errno 110] Connection timed out [1]

I thought the first few might be transient errors, but now I think the email server is down or inaccessible?

It's trying to use smtp.socketlabs.com. (See settings_local.py)

[1] https://rpm.newrelic.com/accounts/263620/applications/3172075/traced_errors/2410522397
Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/1708]
best guess is that netflows are restricting this access outbound. copying :dcurado and :XioNoX from netops for their thoughts here.

 [cturra@developer1.webapp.scl3 ~]$ nc -zv smtp.socketlabs.com 465
 nc: connect to smtp.socketlabs.com port 465 (tcp) failed: Connection timed out
Flags: needinfo?(dcurado)
Flags: needinfo?(arzhel)
How about this:
can you ping it, but not get to port 465/tcp?
That's a great way to tell if the firewalls are blocking something.
We let all icmp through.

Thanks,
Dave
pings are getting through...

[cturra@developer1.webapp.scl3 ~]$ ping -c5 smtp.socketlabs.com
PING in.socketlabs.com (142.0.180.14) 56(84) bytes of data.
64 bytes from lb.h.in.socketlabs.com (142.0.180.14): icmp_seq=1 ttl=114 time=95.7 ms
64 bytes from lb.h.in.socketlabs.com (142.0.180.14): icmp_seq=2 ttl=114 time=79.8 ms
64 bytes from lb.h.in.socketlabs.com (142.0.180.14): icmp_seq=3 ttl=114 time=79.2 ms
64 bytes from lb.h.in.socketlabs.com (142.0.180.14): icmp_seq=4 ttl=114 time=78.5 ms
64 bytes from lb.h.in.socketlabs.com (142.0.180.14): icmp_seq=5 ttl=114 time=78.7 ms

--- in.socketlabs.com ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4084ms
rtt min/avg/max/mdev = 78.548/82.422/95.743/6.675 ms
Can you specify the src FQDNs and/or IPs, and if there are any other dst hosts and/or any other
ports?

Thanks.
:groovecoder - pls correct me if any of these details are incorrect (specifically tcp port used in your code).

(In reply to Dave Curado :dcurado from comment #4)
> Can you specify the src FQDNs and/or IPs, and if there are any other dst
> hosts and/or any other ports?

looks to be happening on all dev/stage/prod nodes, which have a source of:

 dev:
  developer1.dev.webapp.scl3.mozilla.com (10.22.81.16)

 stage:
  developer1.stage.webapp.scl3.mozilla.com (10.22.81.17)

 prod:
  developer1.webapp.scl3.mozilla.com (10.22.81.18)
  developer2.webapp.scl3.mozilla.com (10.22.81.19)
  developer3.webapp.scl3.mozilla.com (10.22.81.20)


i have no idea about the dst.
Thanks for providing the source side of the information.
Will Luke know about the destination host(s)?
If not, can you guys figure out who knows and update the bug when you get the info?
That would be greatly appreciated.

Thanks
all i know is about this socketlabs smtp service is what dns tells me. 

 $ dig +short smtp.socketlabs.com
 in.socketlabs.com.
 142.0.179.10


:groovecoder - anything else you know here?
Flags: needinfo?(lcrouch)
We don't override the EMAIL_PORT setting, so it should be the default port 25.
Flags: needinfo?(lcrouch)
:cturra - note the email tasks are executed by the developer-celeryN hosts.
(In reply to Luke Crouch [:groovecoder] from comment #9)
> :cturra - note the email tasks are executed by the developer-celeryN hosts.

with this info... my tests look fine from those hosts. could this issue be outside of our control? 

[cturra@developeradm.private.scl3 ~]$ issue-multi-command celery nc -zv smtp.socketlabs.com 25
[2014-10-21 15:32:13] [developer-celery1.webapp.scl3.mozilla.com] running: nc -zv smtp.socketlabs.com 25
[2014-10-21 15:32:13] [developer-celery2.webapp.scl3.mozilla.com] running: nc -zv smtp.socketlabs.com 25
[2014-10-21 15:32:13] [developer-celery3.webapp.scl3.mozilla.com] running: nc -zv smtp.socketlabs.com 25
[2014-10-21 15:32:14] [developer-celery1.webapp.scl3.mozilla.com] finished: nc -zv smtp.socketlabs.com 25 (0.335s)
[developer-celery1.webapp.scl3.mozilla.com] out: Connection to smtp.socketlabs.com 25 port [tcp/smtp] succeeded!
[2014-10-21 15:32:14] [developer-celery3.webapp.scl3.mozilla.com] finished: nc -zv smtp.socketlabs.com 25 (0.393s)
[developer-celery3.webapp.scl3.mozilla.com] out: Connection to smtp.socketlabs.com 25 port [tcp/smtp] succeeded!
[2014-10-21 15:32:14] [developer-celery2.webapp.scl3.mozilla.com] finished: nc -zv smtp.socketlabs.com 25 (0.451s)
[developer-celery2.webapp.scl3.mozilla.com] out: Connection to smtp.socketlabs.com 25 port [tcp/smtp] succeeded!
Severity: critical → major
clearing the needinfo from me, as comment #10 suggests there, uh, isn't an issue?
(please pull me in again if needs be!)

THanks
And I dropped the severity because this was paging me.
Flags: needinfo?(dcurado)
There's no outage report from SocketLabs. [1] Another possibility is the SocketLabs account we're using has been closed. :cturra - does WebOps have a new SocketLabs account?

[1] http://status.socketlabs.com/
Flags: needinfo?(cturra)
no, we (webops) have no access to socketlabs. in fact, this is the only site that i know of using them. additionally, looking at the configuration, it looks like morgamic was the one who create it.

from the settings file...

 # bug 869588
 EMAIL_HOST = 'smtp.socketlabs.com'
 EMAIL_HOST_USER = 'morgamic'
 EMAIL_HOST_PASSWORD = <REMOVED>
Flags: needinfo?(cturra)
(In reply to Luke Crouch [:groovecoder] from comment #13)
> There's no outage report from SocketLabs. [1] Another possibility is the
> SocketLabs account we're using has been closed. :cturra - does WebOps have a
> new SocketLabs account?
> 
> [1] http://status.socketlabs.com/

Oh this is likely. I heard something about people (in the SF office) asking about Socketlabs and which credit card it was going to. I'll try to track it down tomorrow if I can. 

Luke, 

Can you guys in the meantime figure out how to get your own Socketlabs account? In case this is closed...that would be the way forward.
Flags: needinfo?(arzhel)
Hi there, we're seeing more timeouts. Please provide alternative SMTP credentials, MDN's users aren't getting emails because of that.
Severity: major → critical
Webops are looking lowering the priority so Onduty doesn't get paged.
Severity: critical → normal
(In reply to Ludovic Hirlimann [:Usul] from comment #17)
> Webops are looking lowering the priority so Onduty doesn't get paged.

:jezdez - socketlabs is not a service we've (IT) purchased or provided. as mentioned in comment 14, it was setup by morgamic. 

:groovecoder - if MDN needs to continue using this service, we'll need you guys to provide us with updated credentials for socketlabs.
Flags: needinfo?(lcrouch)
:cturra I've seen the comment, :groovecoder is on PTO today and can't organize a new socketlabs account. I don't have a credit card or access to the socketlabs account to create an alternative. Since morgamic isn't with the company anymore I would guess this is a legacy that we should get rid as soon as possible, at best now.

:cyliang mentioned on IRC that one of the two people able to log into socketlabs Wil Clouser is also on PTO today. The other person is Jared Hirsch, who she wasn't able to get in touch with yet. All in all, I know it's not your fault, but I honestly don't know who to talk to instead.

This is the third day we're seeing SMTP timeouts, even though it was only filed yesterday (UTC) which is why I ask for your help here.
After much sturm und drang, we think that:
  1. The old morgamic credentials should be working (based on the fact that marketplace also uses them).
  2. There might be an issue with SMTP timeouts due to a new host being added to their service: "A new IP address of [ 54.213.1.165 ] has been added to DNS resolution of the smtp.socketlabs.com gateway".

@dcurado: Do you know if there is anything, networking-wise, where we might need to add a new host to an ACL, list of white-listed IPs, etc.  (The list of IPs that are already probably in that list can be found at https://support.socketlabs.com/index.php/Knowledgebase/Article/View/94.)
Flags: needinfo?(lcrouch) → needinfo?(dcurado)
Ammended to point out that attempts to netcat to the new IP never return:

[cliang@developer-celery1.webapp.scl3 ~]$ nc -vz 54.213.1.165 25
^C
Here's how this request should have been written:

Please ensure that following list of source hosts:
  developer1.dev.webapp.scl3.mozilla.com (10.22.81.16)
  developer1.stage.webapp.scl3.mozilla.com (10.22.81.17)
  developer1.webapp.scl3.mozilla.com (10.22.81.18)
  developer2.webapp.scl3.mozilla.com (10.22.81.19)
  developer3.webapp.scl3.mozilla.com (10.22.81.20)
can get to the following destination hosts:
  142.0.179.10
  142.0.180.14
  23.23.219.154
  54.86.14.32
  54.213.1.165
  54.187.77.82 
On the SMTP port (port 25/tcp)

You could also add that there may already be an existing policy, and that
54.187.77.82 is new and may need to be added.
The existing policy was:
  From zone: webapp, To zone: untrust
  Source addresses:
    developer-celery3: 10.22.81.83/32 
    developer-celery2: 10.22.81.82/32 
    developer-celery1: 10.22.81.40/32 
    developer3: 10.22.81.20/32 
    developer2: 10.22.81.19/32 
    developer1: 10.22.81.18/32 
    developer1.stage: 10.22.81.17/32 
    developer1.dev: 10.22.81.16/32
  Destination addresses:
    cidr-block.socketlabs.com: 142.0.176.0/20 
    lb.sg.in.socketlabs.com: 142.0.179.10/32 
    lb.h.in.socketlabs.com: 142.0.180.14/32 
    lb.rsc.in.socketlabs.com: 184.106.77.171/32 
    lb.east.aws.in.socketlabs.com: 23.23.219.154/32
  Application: junos-smtp
    IP protocol: tcp, ALG: 0, Inactivity timeout: 1800
      Source port range: [0-0] 
      Destination port range: [25-25]

The policy now is:
  From zone: webapp, To zone: untrust
  Source addresses:
    developer3: 10.22.81.20/32 
    developer2: 10.22.81.19/32 
    developer1: 10.22.81.18/32 
    developer-celery3: 10.22.81.83/32 
    developer-celery2: 10.22.81.82/32 
    developer-celery1: 10.22.81.40/32 
    developer1.stage: 10.22.81.17/32 
    developer1.dev: 10.22.81.16/32
  Destination addresses:
    socketlabs-54.187.77.82: 54.187.77.82/32 
    socketlabs-54.213.1.165: 54.213.1.165/32 
    socketlabs-54.86.14.32: 54.86.14.32/32 
    cidr-block.socketlabs.com: 142.0.176.0/20 
    lb.sg.in.socketlabs.com: 142.0.179.10/32 
    lb.h.in.socketlabs.com: 142.0.180.14/32 
    lb.rsc.in.socketlabs.com: 184.106.77.171/32 
    lb.east.aws.in.socketlabs.com: 23.23.219.154/32
  Application: junos-smtp
    IP protocol: tcp, ALG: 0, Inactivity timeout: 1800
      Source port range: [0-0] 
      Destination port range: [25-25]

So there were missing source hosts and missing destination hosts.
I hope this fixes the problem.
If not, please update this bug and specify the source IP, dest IP, port and protocol that is not
working for you.  Thanks!
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Thanks!  I wasn't sure if there was an existing policy; now that I know that there is one, we'll know to file an ACL if SocketLabs updates their list again.
Just for documentation purposes, the socketlabs IP addresses came from [1], which hasn't changed since April 2012, but might change in the future :-)

[1] https://support.socketlabs.com/index.php/Knowledgebase/Article/View/94
Flags: needinfo?(dcurado)
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.