Connections to symbolpush.mozilla.org are blocked

RESOLVED FIXED

Status

Infrastructure & Operations
NetOps
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: nthomas, Assigned: XioNoX)

Tracking

Details

(Reporter)

Description

4 years ago
RelEng use symbolpush.mozilla.org to upload symbols to Socorro via scp/ssh, and can no longer open a connection.

Example flow:
cltbld@b-linux64-hp-0026.build.releng.scl3.mozilla.com ~]$ nc -vz -w10 symbolpush.mozilla.org 22
nc: connect to symbolpush.mozilla.org port 22 (tcp) timed out: Operation now in progress

The flow is something like 
 Source: 10.26.52.0/22, 10.26.36.0/22
 Dest  : 63.245.217.193, port 22

This is blocking the Fennec 33.0b10 release (last beta before 33.0), and will prevent nightly builds for desktop, mobile, and b2g builds (ie disrupts development and testing). 

Tentatively marking as fallout from bug 1078504.
Looks like this is actually blocked from everywhere - I can't get to port 22 from home, either.  It might be as simple as a zlb config?
(Reporter)

Updated

4 years ago
Summary: Regression of flow to symbolpush.mozilla.org → Connections to symbolpush.mozilla.org are blocked
To mention, "everywhere" includes SeaMonkey (scl3 community vlan), and is blocking our beta as well.
(Reporter)

Comment 3

4 years ago
Quick regression window: worked up to 2014-10-06 06:55:03 PDT, busted by 13:00. Either end could shift inwards if I look at more data.
This works from admin1.phx1:

$ telnet symbolpush.mozilla.org 22
Trying 63.245.217.193...
Connected to symbolpush.mozilla.org.
Escape character is '^]'.
SSH-2.0-OpenSSH_5.3
From the moz-config:
$ diff core1.phx1.mozilla.net.conf core2.phx1.mozilla.net.conf
[blah blah]
772,787d787
<             /* 829039 */
<             term permit-symbolpush-ssh {
<                 from {
<                     source-address {
<                         0.0.0.0/0;
<                         /* Added in case we remove 0/0. */
<                         63.245.223.64/26;
<                     }
<                     destination-address {
<                         63.245.217.193/32;
<                     }
<                     protocol tcp;
<                     destination-port 22;
<                 }
<                 then accept;
<             }

core2's missing a rule.  Why it's here and not in a fw I leave to the bigger netops brains.
Lowering priority due to an ongoing higher impact issue (Bug 1078504). Please revisit in the AM.
Severity: critical → normal
(Reporter)

Updated

4 years ago
Blocks: 1079004
(Reporter)

Comment 7

4 years ago
We've worked around for the most part, except for Windows where most of the Nightly/Aurora users are. Could we sync the config in comment #5 in the next couple of hours ?
Flags: needinfo?(arzhel)
(Assignee)

Comment 8

4 years ago
This should be good, thanks gcox for making it easier. It was indeed a rule that was out of sync between the 2 core switches. Was working fine until core1 collapses.
It's not in a firewall because that's a load balancer VIP and those don't live behind a firewall.
Assignee: network-operations → arzhel
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Flags: needinfo?(arzhel)
Resolution: --- → FIXED
(Reporter)

Comment 9

4 years ago
Thanks!
You need to log in before you can comment on or make changes to this bug.