1086923 - MDN: unable to connect to smtp.socketlabs.com

Reporter

Description

•

10 years ago

10 of these in the last 3 days:

socket:error: [Errno 110] Connection timed out [1]

I thought the first few might be transient errors, but now I think the email server is down or inaccessible?

It's trying to use smtp.socketlabs.com. (See settings_local.py)

[1] https://rpm.newrelic.com/accounts/263620/applications/3172075/traced_errors/2410522397

:kanban

Updated

•

10 years ago

Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/1708]

Chris Turra [:cturra]

Comment 1

•

10 years ago

best guess is that netflows are restricting this access outbound. copying :dcurado and :XioNoX from netops for their thoughts here.

 [cturra@developer1.webapp.scl3 ~]$ nc -zv smtp.socketlabs.com 465
 nc: connect to smtp.socketlabs.com port 465 (tcp) failed: Connection timed out

Flags: needinfo?(dcurado)

Flags: needinfo?(arzhel)

Dave Curado :dcurado

Comment 2

•

10 years ago

How about this:
can you ping it, but not get to port 465/tcp?
That's a great way to tell if the firewalls are blocking something.
We let all icmp through.

Thanks,
Dave

Chris Turra [:cturra]

Comment 3

•

10 years ago

pings are getting through...

[cturra@developer1.webapp.scl3 ~]$ ping -c5 smtp.socketlabs.com
PING in.socketlabs.com (142.0.180.14) 56(84) bytes of data.
64 bytes from lb.h.in.socketlabs.com (142.0.180.14): icmp_seq=1 ttl=114 time=95.7 ms
64 bytes from lb.h.in.socketlabs.com (142.0.180.14): icmp_seq=2 ttl=114 time=79.8 ms
64 bytes from lb.h.in.socketlabs.com (142.0.180.14): icmp_seq=3 ttl=114 time=79.2 ms
64 bytes from lb.h.in.socketlabs.com (142.0.180.14): icmp_seq=4 ttl=114 time=78.5 ms
64 bytes from lb.h.in.socketlabs.com (142.0.180.14): icmp_seq=5 ttl=114 time=78.7 ms

--- in.socketlabs.com ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4084ms
rtt min/avg/max/mdev = 78.548/82.422/95.743/6.675 ms

Dave Curado :dcurado

Comment 4

•

10 years ago

Can you specify the src FQDNs and/or IPs, and if there are any other dst hosts and/or any other
ports?

Thanks.

Chris Turra [:cturra]

Comment 5

•

10 years ago

:groovecoder - pls correct me if any of these details are incorrect (specifically tcp port used in your code).

(In reply to Dave Curado :dcurado from comment #4)
> Can you specify the src FQDNs and/or IPs, and if there are any other dst
> hosts and/or any other ports?

looks to be happening on all dev/stage/prod nodes, which have a source of:

 dev:
  developer1.dev.webapp.scl3.mozilla.com (10.22.81.16)

 stage:
  developer1.stage.webapp.scl3.mozilla.com (10.22.81.17)

 prod:
  developer1.webapp.scl3.mozilla.com (10.22.81.18)
  developer2.webapp.scl3.mozilla.com (10.22.81.19)
  developer3.webapp.scl3.mozilla.com (10.22.81.20)


i have no idea about the dst.

Dave Curado :dcurado

Comment 6

•

10 years ago

Thanks for providing the source side of the information.
Will Luke know about the destination host(s)?
If not, can you guys figure out who knows and update the bug when you get the info?
That would be greatly appreciated.

Thanks

Chris Turra [:cturra]

Comment 7

•

10 years ago

all i know is about this socketlabs smtp service is what dns tells me. 

 $ dig +short smtp.socketlabs.com
 in.socketlabs.com.
 142.0.179.10


:groovecoder - anything else you know here?

Flags: needinfo?(lcrouch)

Luke Crouch [:groovecoder]

Reporter

Comment 8

•

10 years ago

We don't override the EMAIL_PORT setting, so it should be the default port 25.

Flags: needinfo?(lcrouch)

Luke Crouch [:groovecoder]

Reporter

Comment 9

•

10 years ago

:cturra - note the email tasks are executed by the developer-celeryN hosts.

Chris Turra [:cturra]

Comment 10

•

10 years ago

(In reply to Luke Crouch [:groovecoder] from comment #9)
> :cturra - note the email tasks are executed by the developer-celeryN hosts.

with this info... my tests look fine from those hosts. could this issue be outside of our control? 

[cturra@developeradm.private.scl3 ~]$ issue-multi-command celery nc -zv smtp.socketlabs.com 25
[2014-10-21 15:32:13] [developer-celery1.webapp.scl3.mozilla.com] running: nc -zv smtp.socketlabs.com 25
[2014-10-21 15:32:13] [developer-celery2.webapp.scl3.mozilla.com] running: nc -zv smtp.socketlabs.com 25
[2014-10-21 15:32:13] [developer-celery3.webapp.scl3.mozilla.com] running: nc -zv smtp.socketlabs.com 25
[2014-10-21 15:32:14] [developer-celery1.webapp.scl3.mozilla.com] finished: nc -zv smtp.socketlabs.com 25 (0.335s)
[developer-celery1.webapp.scl3.mozilla.com] out: Connection to smtp.socketlabs.com 25 port [tcp/smtp] succeeded!
[2014-10-21 15:32:14] [developer-celery3.webapp.scl3.mozilla.com] finished: nc -zv smtp.socketlabs.com 25 (0.393s)
[developer-celery3.webapp.scl3.mozilla.com] out: Connection to smtp.socketlabs.com 25 port [tcp/smtp] succeeded!
[2014-10-21 15:32:14] [developer-celery2.webapp.scl3.mozilla.com] finished: nc -zv smtp.socketlabs.com 25 (0.451s)
[developer-celery2.webapp.scl3.mozilla.com] out: Connection to smtp.socketlabs.com 25 port [tcp/smtp] succeeded!

Shyam Mani [:fox2mike]

Updated

•

10 years ago

Severity: critical → major

Dave Curado :dcurado

Comment 11

•

10 years ago

clearing the needinfo from me, as comment #10 suggests there, uh, isn't an issue?
(please pull me in again if needs be!)

THanks

Shyam Mani [:fox2mike]

Comment 12

•

10 years ago

And I dropped the severity because this was paging me.

Flags: needinfo?(dcurado)

Luke Crouch [:groovecoder]

Reporter

Comment 13

•

10 years ago

There's no outage report from SocketLabs. [1] Another possibility is the SocketLabs account we're using has been closed. :cturra - does WebOps have a new SocketLabs account?

[1] http://status.socketlabs.com/

Flags: needinfo?(cturra)

Chris Turra [:cturra]

Comment 14

•

10 years ago

no, we (webops) have no access to socketlabs. in fact, this is the only site that i know of using them. additionally, looking at the configuration, it looks like morgamic was the one who create it.

from the settings file...

 # bug 869588
 EMAIL_HOST = 'smtp.socketlabs.com'
 EMAIL_HOST_USER = 'morgamic'
 EMAIL_HOST_PASSWORD = <REMOVED>

Flags: needinfo?(cturra)

Shyam Mani [:fox2mike]

Comment 15

•

10 years ago

(In reply to Luke Crouch [:groovecoder] from comment #13)
> There's no outage report from SocketLabs. [1] Another possibility is the
> SocketLabs account we're using has been closed. :cturra - does WebOps have a
> new SocketLabs account?
> 
> [1] http://status.socketlabs.com/

Oh this is likely. I heard something about people (in the SF office) asking about Socketlabs and which credit card it was going to. I'll try to track it down tomorrow if I can. 

Luke, 

Can you guys in the meantime figure out how to get your own Socketlabs account? In case this is closed...that would be the way forward.

Arzhel Younsi [:XioNoX]

Updated

•

10 years ago

Flags: needinfo?(arzhel)

Jannis Leidel [:jezdez]

Comment 16

•

10 years ago

Hi there, we're seeing more timeouts. Please provide alternative SMTP credentials, MDN's users aren't getting emails because of that.

Severity: major → critical

Ludovic Hirlimann [:Usul]

Comment 17

•

10 years ago

Webops are looking lowering the priority so Onduty doesn't get paged.

Severity: critical → normal

Chris Turra [:cturra]

Comment 18

•

10 years ago

(In reply to Ludovic Hirlimann [:Usul] from comment #17)
> Webops are looking lowering the priority so Onduty doesn't get paged.

:jezdez - socketlabs is not a service we've (IT) purchased or provided. as mentioned in comment 14, it was setup by morgamic. 

:groovecoder - if MDN needs to continue using this service, we'll need you guys to provide us with updated credentials for socketlabs.

Flags: needinfo?(lcrouch)

Jannis Leidel [:jezdez]

Comment 19

•

10 years ago

:cturra I've seen the comment, :groovecoder is on PTO today and can't organize a new socketlabs account. I don't have a credit card or access to the socketlabs account to create an alternative. Since morgamic isn't with the company anymore I would guess this is a legacy that we should get rid as soon as possible, at best now.

:cyliang mentioned on IRC that one of the two people able to log into socketlabs Wil Clouser is also on PTO today. The other person is Jared Hirsch, who she wasn't able to get in touch with yet. All in all, I know it's not your fault, but I honestly don't know who to talk to instead.

This is the third day we're seeing SMTP timeouts, even though it was only filed yesterday (UTC) which is why I ask for your help here.

C. Liang [:cyliang]

Comment 20

•

10 years ago

After much sturm und drang, we think that:
  1. The old morgamic credentials should be working (based on the fact that marketplace also uses them).
  2. There might be an issue with SMTP timeouts due to a new host being added to their service: "A new IP address of [ 54.213.1.165 ] has been added to DNS resolution of the smtp.socketlabs.com gateway".

@dcurado: Do you know if there is anything, networking-wise, where we might need to add a new host to an ACL, list of white-listed IPs, etc.  (The list of IPs that are already probably in that list can be found at https://support.socketlabs.com/index.php/Knowledgebase/Article/View/94.)

Flags: needinfo?(lcrouch) → needinfo?(dcurado)

C. Liang [:cyliang]

Comment 21

•

10 years ago

Ammended to point out that attempts to netcat to the new IP never return:

[cliang@developer-celery1.webapp.scl3 ~]$ nc -vz 54.213.1.165 25
^C

Dave Curado :dcurado

Comment 22

•

10 years ago

Here's how this request should have been written:

Please ensure that following list of source hosts:
  developer1.dev.webapp.scl3.mozilla.com (10.22.81.16)
  developer1.stage.webapp.scl3.mozilla.com (10.22.81.17)
  developer1.webapp.scl3.mozilla.com (10.22.81.18)
  developer2.webapp.scl3.mozilla.com (10.22.81.19)
  developer3.webapp.scl3.mozilla.com (10.22.81.20)
can get to the following destination hosts:
  142.0.179.10
  142.0.180.14
  23.23.219.154
  54.86.14.32
  54.213.1.165
  54.187.77.82 
On the SMTP port (port 25/tcp)

You could also add that there may already be an existing policy, and that
54.187.77.82 is new and may need to be added.

Dave Curado :dcurado

Comment 23

•

10 years ago

The existing policy was:
  From zone: webapp, To zone: untrust
  Source addresses:
    developer-celery3: 10.22.81.83/32 
    developer-celery2: 10.22.81.82/32 
    developer-celery1: 10.22.81.40/32 
    developer3: 10.22.81.20/32 
    developer2: 10.22.81.19/32 
    developer1: 10.22.81.18/32 
    developer1.stage: 10.22.81.17/32 
    developer1.dev: 10.22.81.16/32
  Destination addresses:
    cidr-block.socketlabs.com: 142.0.176.0/20 
    lb.sg.in.socketlabs.com: 142.0.179.10/32 
    lb.h.in.socketlabs.com: 142.0.180.14/32 
    lb.rsc.in.socketlabs.com: 184.106.77.171/32 
    lb.east.aws.in.socketlabs.com: 23.23.219.154/32
  Application: junos-smtp
    IP protocol: tcp, ALG: 0, Inactivity timeout: 1800
      Source port range: [0-0] 
      Destination port range: [25-25]

The policy now is:
  From zone: webapp, To zone: untrust
  Source addresses:
    developer3: 10.22.81.20/32 
    developer2: 10.22.81.19/32 
    developer1: 10.22.81.18/32 
    developer-celery3: 10.22.81.83/32 
    developer-celery2: 10.22.81.82/32 
    developer-celery1: 10.22.81.40/32 
    developer1.stage: 10.22.81.17/32 
    developer1.dev: 10.22.81.16/32
  Destination addresses:
    socketlabs-54.187.77.82: 54.187.77.82/32 
    socketlabs-54.213.1.165: 54.213.1.165/32 
    socketlabs-54.86.14.32: 54.86.14.32/32 
    cidr-block.socketlabs.com: 142.0.176.0/20 
    lb.sg.in.socketlabs.com: 142.0.179.10/32 
    lb.h.in.socketlabs.com: 142.0.180.14/32 
    lb.rsc.in.socketlabs.com: 184.106.77.171/32 
    lb.east.aws.in.socketlabs.com: 23.23.219.154/32
  Application: junos-smtp
    IP protocol: tcp, ALG: 0, Inactivity timeout: 1800
      Source port range: [0-0] 
      Destination port range: [25-25]

So there were missing source hosts and missing destination hosts.
I hope this fixes the problem.
If not, please update this bug and specify the source IP, dest IP, port and protocol that is not
working for you.  Thanks!

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

C. Liang [:cyliang]

Comment 24

•

10 years ago

Thanks!  I wasn't sure if there was an existing policy; now that I know that there is one, we'll know to file an ACL if SocketLabs updates their list again.

Jared Hirsch [:jhirsch] (he/him) (Needinfo please)

Comment 25

•

10 years ago

Just for documentation purposes, the socketlabs IP addresses came from [1], which hasn't changed since April 2012, but might change in the future :-)

[1] https://support.socketlabs.com/index.php/Knowledgebase/Article/View/94

Dave Curado :dcurado

Comment 26

•

10 years ago

Dave Curado :dcurado

Updated

•

10 years ago

Flags: needinfo?(dcurado)

BMO Automation

Updated

•

6 years ago

Product: Infrastructure & Operations → Infrastructure & Operations Graveyard