646046 - turn off external network access for staging and preproduction machines

Reporter

Description

•

13 years ago

Please turn off external network access for the following machines:
moz2-linux-slave03, 04, 10, 17, 51
mv-moz2-linux-ix-slave01
moz2-linux64-slave07, 10
moz2-darwin9-slave03, 08, 68, 010
moz2-darwin10-slave01, 02, 03, 04, 010
mw32-ix-slave01
win32-slave03, 04, 010, 21, 60
talos-r3-fed-001, 002, 010
talos-r3-fed64-001, 002, 010
talos-r3-leopard-001, 002, 010
talos-r3-snow-001, 002, 010
talos-r3-w7-001, 002, 003, 010
talos-r3-xp-001, 002, 003, 010
t-r3-w764-001, 002, 010


But please allow them to access all Mozilla mirror nodes.

Zandr Milewski [:zandr]

Comment 1

•

13 years ago

(In reply to comment #0)

> But please allow them to access all Mozilla mirror nodes.

a) Where can I get that list?
b) Is that really a good idea?
   Help me understand this requirement. It seems to me that we're going to lock down these machines tightly except for letting them talk to machines all over the internet that are probably friendly. That doesn't quite scan.

As an aside, I don't think we've dealt with NTP, so I'll allow 123 out for now and file a separate bug to fix that and close that off.

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 2

•

13 years ago

(In reply to comment #1)
> (In reply to comment #0)
> 
> > But please allow them to access all Mozilla mirror nodes.
> 
> a) Where can I get that list?

https://nagios.mozilla.org/sentry/ may be complete, probably best to check with justdave though.

> b) Is that really a good idea?
>    Help me understand this requirement. It seems to me that we're going to lock
> down these machines tightly except for letting them talk to machines all over
> the internet that are probably friendly. That doesn't quite scan.

As part of every release we have a test that goes through all of the releasetest snippets and checks links, which eventually point to various mirror nodes. We do plan to make changes that don't require us to hit Mozilla mirrors, but in the meantime we need to make sure this test continues to work.

> As an aside, I don't think we've dealt with NTP, so I'll allow 123 out for now
> and file a separate bug to fix that and close that off.

Don't the build machines sync to ntp.build.m.o?

Zandr Milewski [:zandr]

Comment 3

•

13 years ago

(In reply to comment #2)
 
> https://nagios.mozilla.org/sentry/ may be complete, probably best to check with
> justdave though.

Netops can't be asked to maintain a whitelist that looks like that, so we're going to have to do it ourselves. This means if we want to turn off access before we fix the releasetest snippet checks, we need to build and install a proxy so we can maintain that whitelist.

> Don't the build machines sync to ntp.build.m.o?

Not really, no. Filed bug 646056 which blocks the tracking bug.

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 4

•

13 years ago

(In reply to comment #3)
> (In reply to comment #2)
> 
> > https://nagios.mozilla.org/sentry/ may be complete, probably best to check with
> > justdave though.
> 
> Netops can't be asked to maintain a whitelist that looks like that, so we're
> going to have to do it ourselves. This means if we want to turn off access
> before we fix the releasetest snippet checks, we need to build and install a
> proxy so we can maintain that whitelist.

We just chatted on IRC about this a bit. Based on us adding a new mirror every week or so, this option still isn't very good, because we'll likely end up behind, and cause erroneous test failures. We're strongly leaning towards blocking this on changing the releasetest checks to not touch external mirrors.

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 5

•

13 years ago

Nthomas came up with the idea of using a special bouncer region/country to shunt all requests from build machines to internal mirrors only. If we do that, we only need to whitelist the external IPs of those mirrrors, which rarely ever change. Filed bug 646076 on this.

Depends on: 646076

Zandr Milewski [:zandr]

Updated

•

13 years ago

Assignee: server-ops-releng → zandr

bhearsum@mozilla.com (:bhearsum)

Reporter

Updated

•

13 years ago

Depends on: 656916

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 6

•

13 years ago

With bug 646076 resolved we can go ahead with this:
> Please turn off external network access for the following machines:
> moz2-linux-slave03, 04, 10, 17, 51
> mv-moz2-linux-ix-slave01
> moz2-linux64-slave07, 10
> moz2-darwin9-slave03, 08, 68, 010
> moz2-darwin10-slave01, 02, 03, 04, 010
> mw32-ix-slave01
> win32-slave03, 04, 010, 21, 60
> talos-r3-fed-001, 002, 010
> talos-r3-fed64-001, 002, 010
> talos-r3-leopard-001, 002, 010
> talos-r3-snow-001, 002, 010
> talos-r3-w7-001, 002, 003, 010
> talos-r3-xp-001, 002, 003, 010
> t-r3-w764-001, 002, 010



> But please allow them to access all Mozilla mirror nodes.

And of course, this is no longer necessary, these machines should be blocked from ALL external access.

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 7

•

13 years ago

I somehow missed bug 646056 until today. We're actually blocked on it (which is blocked on the DNS overhaul) before we can test this out in staging. Sorry for the back and forth.

Depends on: 646056
No longer depends on: 656916

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 8

•

13 years ago

We're finally ready for this, for realsies:
(In reply to comment #6)
> With bug 646076 resolved we can go ahead with this:
> > Please turn off external network access for the following machines:
> > moz2-linux-slave03, 04, 10, 17, 51
> > mv-moz2-linux-ix-slave01
> > moz2-linux64-slave07, 10
> > moz2-darwin9-slave03, 08, 68, 010
> > moz2-darwin10-slave01, 02, 03, 04, 010
> > mw32-ix-slave01
> > win32-slave03, 04, 010, 21, 60
> > talos-r3-fed-001, 002, 010
> > talos-r3-fed64-001, 002, 010
> > talos-r3-leopard-001, 002, 010
> > talos-r3-snow-001, 002, 010
> > talos-r3-w7-001, 002, 003, 010
> > talos-r3-xp-001, 002, 003, 010
> > t-r3-w764-001, 002, 010

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 9

•

13 years ago

Both of the currently dependent bugs are 95% done, and don't block us from moving forward here, removing dependency.

No longer depends on: 646056, 646076

Zandr Milewski [:zandr]

Comment 10

•

13 years ago

Just to confirm before I lob this over to netops:

That list of machines is all staging/preproduction, so we do not need a downtime window for this change.

Put another way, if those machines are knocked out by this change, the tree stays open, yes?

Also, t-r3-w764-010 should not be on the list, as it's now talos-r3-fed-058

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 11

•

13 years ago

(In reply to comment #10)
> Just to confirm before I lob this over to netops:
> 
> That list of machines is all staging/preproduction, so we do not need a
> downtime window for this change.
> 
> Put another way, if those machines are knocked out by this change, the tree
> stays open, yes?

Correct.

> Also, t-r3-w764-010 should not be on the list, as it's now talos-r3-fed-058

K.

Zandr Milewski [:zandr]

Comment 12

•

13 years ago

So, over to netops then.

Let's start with this set, because they're all scl1.

> > win32-slave03, 04, 010, 21, 60
> > talos-r3-fed-001, 002, 010
> > talos-r3-fed64-001, 002, 010
> > talos-r3-leopard-001, 002, 010
> > talos-r3-snow-001, 002, 010
> > talos-r3-w7-001, 002, 003, 010
> > talos-r3-xp-001, 002, 003, 010
> > t-r3-w764-001, 002

These machines should no longer be allowed to connect out to the internet.

Assignee: zandr → network-operations

Component: Server Operations: RelEng → Server Operations: Netops

QA Contact: zandr → mrz

Arzhel Younsi [:XioNoX]

Comment 13

•

13 years ago

> > win32-slave03, 04, 010, 21, 60
are in SJC, so it should already be blocked.

Done for the others.

[root@talos-r3-fed64-001 ~]# wget google.fr
--2011-06-08 14:58:57--  http://google.fr/
Resolving google.fr... 74.125.115.99, 74.125.115.103, 74.125.115.104, ...
Connecting to google.fr|74.125.115.99|:80... ^C

Zandr Milewski [:zandr]

Comment 14

•

13 years ago

Over to release engineering to make sure things are still working. Punt it back to Server Ops:Releng when you're happy that we haven't broken anything.

Assignee: network-operations → nobody

Component: Server Operations: Netops → Release Engineering

QA Contact: mrz → release

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 15

•

13 years ago

(In reply to comment #13)
> > > win32-slave03, 04, 010, 21, 60
> are in SJC, so it should already be blocked.

Is the implication here that build hosts in SJC already don't have internet access? I don't think that's the case...:
D:\mozilla-build\wget>hostname
win32-slave04

D:\mozilla-build\wget>wget google.fr
--05:13:19--  http://google.fr/
           => `index.html.1'
Resolving google.fr... 74.125.115.105, 74.125.115.106, 74.125.115.147, ...
Connecting to google.fr|74.125.115.105|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://www.google.fr/ [following]
--05:13:20--  http://www.google.fr/
           => `index.html.1'
Resolving www.google.fr... 74.125.115.105, 74.125.115.106, 74.125.115.147, ...
Reusing existing connection to google.fr:80.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]

    [ <=>                                                           ] 9,475         --.--K/s

05:13:20 (68.09 MB/s) - `index.html.1' saved [9475]


Tossing this back to get these hosts dealt with.


> Done for the others.
> 
> [root@talos-r3-fed64-001 ~]# wget google.fr
> --2011-06-08 14:58:57--  http://google.fr/
> Resolving google.fr... 74.125.115.99, 74.125.115.103, 74.125.115.104, ...
> Connecting to google.fr|74.125.115.99|:80... ^C

Thanks!

Assignee: nobody → network-operations

Component: Release Engineering → Server Operations: Netops

QA Contact: release → mrz

Zandr Milewski [:zandr]

Comment 16

•

13 years ago

(In reply to comment #15)
> (In reply to comment #13)
> > > > win32-slave03, 04, 010, 21, 60
> > are in SJC, so it should already be blocked.
> 
> Is the implication here that build hosts in SJC already don't have internet
> access? I don't think that's the case...:

Perhaps, though I agree that's not the case. However, before I take up any more of netops' time: 

(In reply to comment #14)
> Over to release engineering to make sure things are still working. Punt it
> back to Server Ops:Releng when you're happy that we haven't broken anything.

I don't see that statement in comment 15.

Assignee: network-operations → nobody

Component: Server Operations: Netops → Release Engineering

QA Contact: mrz → release

Zandr Milewski [:zandr]

Comment 17

•

13 years ago

To clarify, my intent was to do this for a single DC first. grabbing the win32 VMs in sjc1 was a mistake. If talos looks OK with access shut off, we'll move on to shutting down access out of sjc1

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 18

•

13 years ago

(In reply to comment #17)
> To clarify, my intent was to do this for a single DC first. grabbing the
> win32 VMs in sjc1 was a mistake. If talos looks OK with access shut off,
> we'll move on to shutting down access out of sjc1

Gotcha, sorry about the confusion.

bhearsum@mozilla.com (:bhearsum)

Reporter

Updated

•

13 years ago

Assignee: nobody → bhearsum

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 19

•

13 years ago

I'm satisfied with the testing on the SCL machines, can we do the rest of them now? I believe the remaining are:
> > > moz2-linux-slave03, 04, 10, 17, 51
> > > mv-moz2-linux-ix-slave01
> > > moz2-linux64-slave07, 10
> > > moz2-darwin9-slave03, 08, 68, 010
> > > moz2-darwin10-slave01, 02, 03, 04, 010
> > > mw32-ix-slave01
> > > win32-slave03, 04, 010, 21, 60



Additionally, is it possible to get a list of all the specific things these new rules have blocked, over the next 6 hours or so? Eg, if a machine is attempting to connect to microsoft.com or time.apple.com still I'd like to see the source & destination IPs.

Assignee: bhearsum → server-ops-releng

Component: Release Engineering → Server Operations: RelEng

QA Contact: release → zandr

Arzhel Younsi [:XioNoX]

Comment 20

•

13 years ago

Attached file Blocked traffic — Details

A small portion of the blocked traffic for this rule.
The firewall logs are full of blocked ntp traffic to time.apple.com, it would be nice if you could change that, or I can allow traffic to there.

Amy Rich [:arr] [:arich]

Comment 21

•

13 years ago

(In reply to comment #20)
> Created attachment 538576 [details]
> Blocked traffic
> 
> A small portion of the blocked traffic for this rule.
> The firewall logs are full of blocked ntp traffic to time.apple.com, it
> would be nice if you could change that, or I can allow traffic to there.

All of the machines in scl1 capable of doing so should be using dhcp to get their ntp servers (and dhcp is handing out 10.12.75.10, 10.12.75.12, 10.2.71.5 right now)

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 22

•

13 years ago

(In reply to comment #21)
> (In reply to comment #20)
> > Created attachment 538576 [details]
> > Blocked traffic
> > 
> > A small portion of the blocked traffic for this rule.
> > The firewall logs are full of blocked ntp traffic to time.apple.com, it
> > would be nice if you could change that, or I can allow traffic to there.
> 
> All of the machines in scl1 capable of doing so should be using dhcp to get
> their ntp servers (and dhcp is handing out 10.12.75.10, 10.12.75.12,
> 10.2.71.5 right now)

Mac and Windows actually don't know how to obey DHCP-provided ntp servers. Over in bug 646056 I rolled out changes that should've got all machines syncing to ntp.build.mozilla.org, instead.

The strange part is that the six hosts listed in the log are:
10.12.49.190: talos-r3-fed64-010.build.scl1.mozilla.com.
10.12.50.111: talos-r3-xp-003.build.scl1.mozilla.com.
10.12.50.118: talos-r3-xp-010.build.scl1.mozilla.com.
10.12.50.163: talos-r3-w7-002.build.scl1.mozilla.com.
10.12.50.171: talos-r3-w7-010.build.scl1.mozilla.com.
10.12.50.55: talos-r3-snow-002.build.scl1.mozilla.com.

snow-002 hasn't synced with Puppet yet, so I can understand why it would be talking with time.apple.com still. But the others aren't even running OS X, and in fact, are configured to talk with ntp.build.mozilla.org. Unless minis have some sort of hardware level NTP client, I'm baffled.

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 23

•

13 years ago

As for the other two hosts in that log, they are:
144.50.95.208.in-addr.arpa domain name pointer mirror.liberty.edu.
65.182.224.39 (doesn't resolve, but appears to be a PHP.net mirror)

I'm not concerned about blocking either of them. The former is a CentOS/fedora mirror and likely happened because someone ran 'yum search' or something. Not sure about the latter, but I can't find any references to php.net in mxr (that aren't comments), so it's doubtful anything bad is happening.

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 24

•

13 years ago

(In reply to comment #22)
> The strange part is that the six hosts listed in the log are:
> 10.12.49.190: talos-r3-fed64-010.build.scl1.mozilla.com.

Actually, this one only accessed the fedora mirror and php.net, so maybe there's something that gets installed on Windows through Boot Camp that mucks with time. Still looking into it.

> 10.12.50.111: talos-r3-xp-003.build.scl1.mozilla.com.
> 10.12.50.118: talos-r3-xp-010.build.scl1.mozilla.com.
> 10.12.50.163: talos-r3-w7-002.build.scl1.mozilla.com.
> 10.12.50.171: talos-r3-w7-010.build.scl1.mozilla.com.
> 10.12.50.55: talos-r3-snow-002.build.scl1.mozilla.com.

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 25

•

13 years ago

(In reply to comment #24)
> Actually, this one only accessed the fedora mirror and php.net, so maybe
> there's something that gets installed on Windows through Boot Camp that
> mucks with time. Still looking into it.

Indeed, there's an AppleTimeSrv, which claims to keep time in sync when booting back and forth between OS X and Windows. I'll track disabling it or otherwise fixing it to use our NTP server in bug 646056.

Thanks for pulling this log, it's super helpful!

Zandr Milewski [:zandr]

Comment 26

•

13 years ago

As noted in https://bugzilla.mozilla.org/show_bug.cgi?id=650436#c4 this breaks Windows Activation. Current solution is to get the machines activated on the cage WiFi network, because telephone activation sucks.

Dustin J. Mitchell [:dustin] (he/him)

Comment 27

•

13 years ago

AIUI, the alternatives for windows activation are:
 * Network access to activation servers from build network
 * WiFi for activation
 * Dedicated switchport + long ethernet cable for activation

Others? Which is preferred?

Zandr Milewski [:zandr]

Comment 28

•

13 years ago

(In reply to comment #27)
> AIUI, the alternatives for windows activation are:
>  * Network access to activation servers from build network

Non-trivial set of hosts: http://www.oleksiygayda.com/2010/08/how-to-windows-activation-firewall.html

>  * WiFi for activation

This seems to work OK, but it's yet more manual configuration to enable and then remember to disable the wireless connection.

>  * Dedicated switchport + long ethernet cable for activation

Which leaves us with only being able to activate one host at a time, and it needs to be a different long cable than I'm using for imaging (or I need to turn up deploystudio on vlan75 as well. Not my first choice)

> Others? Which is preferred?

WiFi, so far.

http://technet.microsoft.com/en-us/library/ff793432.aspx is interesting reading but may not apply to our situation.

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 29

•

13 years ago

It looks like the MTV1 hosts were missed, specifically:
> > > > mv-moz2-linux-ix-slave01
> > > > moz2-darwin10-slave01, 02

I'm able to ping external hosts from these machines.

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 30

•

13 years ago

(In reply to comment #29)
> It looks like the MTV1 hosts were missed, specifically:
> > > > > mv-moz2-linux-ix-slave01
> > > > > moz2-darwin10-slave01, 02
> 
> I'm able to ping external hosts from these machines.

*un*able, sorry.

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 31

•

13 years ago

(In reply to comment #30)
> (In reply to comment #29)
> > It looks like the MTV1 hosts were missed, specifically:
> > > > > > mv-moz2-linux-ix-slave01
> > > > > > moz2-darwin10-slave01, 02
> > 
> > I'm able to ping external hosts from these machines.
> 
> *un*able, sorry.

No, no, my original comments was correct. Sorry for the churn here :(.

Zandr Milewski [:zandr]

Comment 32

•

13 years ago

(In reply to comment #31)
> (In reply to comment #30)
> > (In reply to comment #29)
> > > It looks like the MTV1 hosts were missed, specifically:
> > > > > > > mv-moz2-linux-ix-slave01
> > > > > > > moz2-darwin10-slave01, 02
> > > 
> > > I'm able to ping external hosts from these machines.
> > 
> > *un*able, sorry.
> 
> No, no, my original comments was correct. Sorry for the churn here :(.

These are in mtv1. Again, I want to do this one DC at a time. And mtv1 I want to do this as part of the new firewalls, so we're a little bit away from being ready to go.

Dustin J. Mitchell [:dustin] (he/him)

Comment 33

•

13 years ago

So the current state is:
 scl1: staging machines blocked
 sjc1: staging machines open
 mtv1: staging machines open

From comment 32, there are new firewalls on the way for mtv1 so it's best to lump this change in with their turn-up, so the immediate next step is staging machines in sjc1.

Amy Rich [:arr] [:arich]

Updated

•

13 years ago

Assignee: server-ops-releng → zandr

Chris Cooper [:coop] (he/him)

Comment 34

•

13 years ago

(In reply to comment #28) 
> > Others? Which is preferred?
> 
> WiFi, so far.
> 
> http://technet.microsoft.com/en-us/library/ff793432.aspx is interesting
> reading but may not apply to our situation.

Is it worth filing a separate IT bug to explore the various options here? What's our path forward?

Dustin J. Mitchell [:dustin] (he/him)

Comment 35

•

13 years ago

I opened bug 667045 for the purpose.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 36

•

13 years ago

(In reply to comment #28)
> (In reply to comment #27)
> > AIUI, the alternatives for windows activation are:
> >  * Network access to activation servers from build network
> 
> Non-trivial set of hosts:
> http://www.oleksiygayda.com/2010/08/how-to-windows-activation-firewall.html

I only see 9 hosts here. Does this list change frequently? Is there something else I'm missing on why we cant just whitelist 9 machines and be done?

Dustin J. Mitchell [:dustin] (he/him)

Comment 37

•

13 years ago

Wrong bug - copied to bug 667045.

Dustin J. Mitchell [:dustin] (he/him)

Comment 38

•

13 years ago

(In reply to comment #33)
> So the current state is:
>  scl1: staging machines blocked
>  sjc1: staging machines open
>  mtv1: staging machines open
> 
> From comment 32, there are new firewalls on the way for mtv1 so it's best to
> lump this change in with their turn-up, so the immediate next step is
> staging machines in sjc1.

I was corrected on this point the day I wrote it, but didn't update the bug.  The firewalls in sjc1 are already overloaded, so we would need new firewalls to implement blocking there.  However, sjc1 is quite the rat's nest at a variety of levels, so that dc will not be getting new firewalls, and thus will not have outgoing access blocked.

So this project is currently waiting on
 (a) bouncer fixes (bug 646076)
 (b) new firewalls in mtv1 (part of bug 649422)

(a) is blocked on PTO for the moment.  I'll get an update on (b). Note that we've been leaning on an already-overloaded netops for higher-priority items on the infra list for a while now.

Dustin J. Mitchell [:dustin] (he/him)

Comment 39

•

13 years ago

(status description copied to bug 617414 - sorry, this was the wrong place for it)

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 40

•

13 years ago

Let's see how things look after the P2P link is enabled.

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 41

•

13 years ago

Wrong bug

Aki Sasaki (not active)

Updated

•

13 years ago

Blocks: 498425

matthew zeier [:mrz]

Assignee

Comment 42

•

12 years ago

I have no idea why this bug is still open and got lost going through 42 comments.  Help?

Assignee: zandr → mrz

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 43

•

12 years ago

This bug is still open because we only got part way through disabling external access for staging/preproduction slaves, primarily because we got blocked by bug 646076 not being possible. I'm going to say that this bug has probably outlived its usefulness though. We can file specific follow-up bugs to get access disabled for other machines, when ready.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → INCOMPLETE

Nobody; OK to take it and work on it

Updated

•

11 years ago

Component: Server Operations: RelEng → RelOps

Product: mozilla.org → Infrastructure & Operations