Closed Bug 646076 Opened 13 years ago Closed 11 years ago

set-up bouncer region/country/ip blocks for build network that only point to internal mirrors, and point build network machines at it

Categories

(Release Engineering :: General, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: bhearsum, Unassigned)

References

Details

(Whiteboard: [bouncer][network])

In order to it possible to do bug 617414 in a maintainable way we need to change our releasetest tests to look at only internal mirrors, instead of all. By doing so, we avoid needing to whitelist all mirrors.
Blocks: 646046
I've been trying to do this for the better part of the morning and haven't been able to, successfully. Rather than continuing to bang my head and bug you guys on IRC, I'm going to toss this over for someone to grab when they have time.

The machine I've been trying to test with is 10.2.71.18.

So far, I have:
- Added a new Country "Mozilla Land!", which is part of the Stage region.
- Tried adding an IP block for this machine's external address (63.245.208.144), which was in Mozilla Land.
- Tried adding a second Mirror entry for dm-download02, that was *only* in the Stage region.
- Tried bumping up the rating on both of the dm-download02 Mirror entries.

Through all of the above, the machine continues to be redirected to various mirrors across the world.

At this point, I've reverted everything I've done, except the Country.
Assignee: bhearsum → server-ops-releng
Component: Release Engineering → Server Operations: RelEng
QA Contact: release → zandr
Summary: set-up bouncer region/country/ip blocks for build network that only point to internal mirrors → set-up bouncer region/country/ip blocks for build network that only point to internal mirrors, and point build network machines at it
Moving to Server Ops, as I know little of bouncer. cc: justdave
Assignee: server-ops-releng → server-ops
Component: Server Operations: RelEng → Server Operations
QA Contact: zandr → mrz
This is only a test
Severity: normal → blocker
Test works!
Severity: blocker → normal
Assignee: server-ops → justdave
(In reply to comment #1)
> - Tried adding an IP block for this machine's external address
> (63.245.208.144), which was in Mozilla Land.

It you didn't also remove that IP from whatever block it was already in, that's probably why it failed.
(In reply to comment #6)
> (In reply to comment #1)
> > - Tried adding an IP block for this machine's external address
> > (63.245.208.144), which was in Mozilla Land.
> 
> It you didn't also remove that IP from whatever block it was already in, that's
> probably why it failed.

Didnt follow, but after discussion in RelEng meeting, nthomas will ping justdave offline.
Need to figure out the auth for https://tuxedo.stage.mozilla.com/ so that I can play with this in a safe place. I'll have to split any block contain the IP of interest in two.

fwenzel, what's the algorithm for converting 'Ip start addr' to 'Ip start' for an IP block in bouncer ?
(In reply to comment #8)
> fwenzel, what's the algorithm for converting 'Ip start addr' to 'Ip start' for
> an IP block in bouncer ?

n/m, after a few moments inspection it's
 1.2.3.4 --> 1*256^3 + 2*256^2 + 3*256 + 4 = 16909060
The staging tuxedo instance is no good because the DB is really stale, so I overcame my reticence to futz with production.

This is what I needed to make it work:
* picked linux-ix-slave03, which has IP 63.245.220.220 after traversing NAT to outside world
* in the geo-ip settings changed 
 Ip start addr Ip start      Ip end addr     Ip end      Country 
 63.245.128.0   1073053696   63.246.14.221   1073090269  United States (US)
to 
 63.245.128.0   1073053696   63.245.220.119  1073077367  United States (US)
 63.245.220.200 1073077468   63.245.220.200  1073077468  Mozilla Land! (ZZ)
 63.245.220.221 1073077469   63.246.14.221   1073090269  United States (US)
* verified country 'Mozilla Land!' is in region 'Stage'
* On the 'Stage' region, set the 'GeoIP Throttle' to 100 (was 0), so all requests from this region go to Stage mirrors, which is just dm-download02
* waited a minute or so for propagation
* got a consistent response
$ curl -I "http://download.mozilla.org/?product=firefox-4.0.1&os=osx&lang=en-US"
[snip]
Location: http://dm-download02.mozilla.org/pub/mozilla.org/firefox/releases/4.0.1/mac/en-US/Firefox%204.0.1.dmg

Setting the GeoIP throttle back to 0 gives a random mirror, as does undoing the IP block changes. I have left bouncer as I found it (ie both changes reverted).

So does this bug become 'figure what space of addresses build machines can end up with' ? Dave mentioned that the geo-ip database needs updating, so we'll have to figure out how to persist the changes we need.
Nick, thanks a bunch for figuring out the hard part here. 

(In reply to comment #10)
> So does this bug become 'figure what space of addresses build machines can
> end up with' ? Dave mentioned that the geo-ip database needs updating, so
> we'll have to figure out how to persist the changes we need.

This summary sounds correct to me. I'll work on pushing this through.

Taking this back from Server Ops because it seems like most of the work is in my court, now.
Assignee: justdave → nobody
Component: Server Operations → Release Engineering
QA Contact: mrz → release
It strikes me that one way to make this super easy to maintain is to have the build network resolve download.mozilla.org to an internal IP address. I have no idea if that's feasible or has negative consequences. Dave (or anyone, really), do you know if that's a reasonable option?
cc: Ravi to add the list of hide NATs for mtv1, scl1, sjc1.
Re: Comment 13

MTV1: 63.245.220.220/32
SJC1: 63.245.208.144/32
SCL1: 63.245.222.66/32
(In reply to comment #13)
> cc: Ravi to add the list of hide NATs for mtv1, scl1, sjc1.

(In reply to comment #14)
> Re: Comment 13
> 
> MTV1: 63.245.220.220/32
> SJC1: 63.245.208.144/32
> SCL1: 63.245.222.66/32

Does this imply that all machines inside of the build network will appear to have one of these IP addresses when connecting to download.mozilla.org?
Assignee: nobody → bhearsum
(In reply to comment #15)
> (In reply to comment #13)
> > cc: Ravi to add the list of hide NATs for mtv1, scl1, sjc1.
> 
> (In reply to comment #14)
> > Re: Comment 13
> > 
> > MTV1: 63.245.220.220/32
> > SJC1: 63.245.208.144/32
> > SCL1: 63.245.222.66/32
> 
> Does this imply that all machines inside of the build network will appear to
> have one of these IP addresses when connecting to download.mozilla.org?

Tossing back over the fence to get an answer to this.
Assignee: bhearsum → server-ops
Component: Release Engineering → Server Operations
QA Contact: release → mrz
(In reply to comment #15)
> > MTV1: 63.245.220.220/32
> > SJC1: 63.245.208.144/32
> > SCL1: 63.245.222.66/32
> 
> Does this imply that all machines inside of the build network will appear to
> have one of these IP addresses when connecting to download.mozilla.org?

Yes.
Thanks!
Assignee: server-ops → nobody
Component: Server Operations → Release Engineering
QA Contact: mrz → release
(In reply to comment #10)
> * in the geo-ip settings changed 
>  Ip start addr Ip start      Ip end addr     Ip end      Country 
>  63.245.128.0   1073053696   63.246.14.221   1073090269  United States (US)
> to 
>  63.245.128.0   1073053696   63.245.220.119  1073077367  United States (US)
>  63.245.220.200 1073077468   63.245.220.200  1073077468  Mozilla Land! (ZZ)
>  63.245.220.221 1073077469   63.246.14.221   1073090269  United States (US)
> * verified country 'Mozilla Land!' is in region 'Stage'
> * On the 'Stage' region, set the 'GeoIP Throttle' to 100 (was 0), so all
> requests from this region go to Stage mirrors, which is just dm-download02

I re-did these changes, and am now re-running a final verification builder from 4.0.1 to verify.

> So does this bug become 'figure what space of addresses build machines can
> end up with' ? Dave mentioned that the geo-ip database needs updating, so
> we'll have to figure out how to persist the changes we need.

The space of addresses that needs updating is small, thankfully, as confirmed by zandr/ravi/cshields. Maybe we could set-up a nagios check to ensure that the build network is getting correctly routed? Just on one host or a fake host of some sort, per colo.
Component: Release Engineering → Server Operations
Component: Server Operations → Release Engineering
Assignee: nobody → bhearsum
(In reply to comment #19)
> (In reply to comment #10)
> > * in the geo-ip settings changed 
> >  Ip start addr Ip start      Ip end addr     Ip end      Country 
> >  63.245.128.0   1073053696   63.246.14.221   1073090269  United States (US)
> > to 
> >  63.245.128.0   1073053696   63.245.220.119  1073077367  United States (US)
> >  63.245.220.200 1073077468   63.245.220.200  1073077468  Mozilla Land! (ZZ)
> >  63.245.220.221 1073077469   63.246.14.221   1073090269  United States (US)
> > * verified country 'Mozilla Land!' is in region 'Stage'
> > * On the 'Stage' region, set the 'GeoIP Throttle' to 100 (was 0), so all
> > requests from this region go to Stage mirrors, which is just dm-download02
> 
> I re-did these changes, and am now re-running a final verification builder
> from 4.0.1 to verify.

The final verification I ran looked good, so I added the other two external IPs to Mozilla Land!. And as it turns out, the one-IP-only blocks seem to override the ranged ones (either that, or newer entries override older ones), so I'm a bit less concerned about getting busted here. Regardless, I think it's good to have Nagios checks for this, I'll be filing a bug on that shortly.

One strange thing I did notice is that when retrieving Firefox products (eg, http://download.mozilla.org/?product=firefox-4.0.1&os=osx&lang=en-US) the redirect always sent me to dm-download02, but when retrieving the nagios test product (http://download.mozilla.org/?product=nagios-test-product&os=none), I got sent to random mirrors. Not sure why this is, but not going to block on it, since Firefox products have been working correctly for the past 12 hours.
This continues to work as expected, resolving.

(In reply to comment #20)
> One strange thing I did notice is that when retrieving Firefox products (eg,
> http://download.mozilla.org/?product=firefox-4.0.1&os=osx&lang=en-US) the
> redirect always sent me to dm-download02, but when retrieving the nagios
> test product
> (http://download.mozilla.org/?product=nagios-test-product&os=none), I got
> sent to random mirrors. Not sure why this is, but not going to block on it,
> since Firefox products have been working correctly for the past 12 hours.

I'm going to guess that this is because Firefox is monitored by Sentry, and nagios-test-product isn't. Regardless, doesn't affect resolution of this bug.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
This doesn't seem to be working fully. For some reason, all of the Windows requests are getting pointed at 3crowd:
curl -I "http://download.mozilla.org/?product=firefox-5.0b3-partial-5.0b2&os=win&lang=af&force=1"
HTTP/1.1 302 Found
Server: Apache
X-Backend-Server: pp-app-dist06
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0, private
Content-Type: text/html; charset=UTF-8
Date: Wed, 01 Jun 2011 16:51:26 GMT
Location: http://mozilla-crowdcache.3crowd.com/mozilla/firefox/releases/5.0b3/update/win32/af/firefox-5.0b2-5.0b3.partial.mar
Pragma: no-cache
Transfer-Encoding: chunked
Connection: Keep-Alive
Set-Cookie: dmo=10.8.84.200.1306947086368240; path=/; expires=Thu, 31-May-12 16:51:26 GMT
X-Powered-By: PHP/5.1.6


While all of the other ones are getting pointed at dm-download02:
curl -I "http://download.mozilla.org/?product=firefox-5.0b3-partial-5.0b2&os=linux64&lang=af&force=1"
HTTP/1.1 302 Found
Server: Apache
X-Backend-Server: pp-app-dist02
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0, private
Content-Type: text/html; charset=UTF-8
Date: Wed, 01 Jun 2011 16:51:36 GMT
Location: http://dm-download02.mozilla.org/pub/mozilla.org/firefox/releases/5.0b3/update/linux-x86_64/af/firefox-5.0b2-5.0b3.partial.mar
Pragma: no-cache
Transfer-Encoding: chunked
Connection: Keep-Alive
Set-Cookie: dmo=10.8.84.200.1306947096991603; path=/; expires=Thu, 31-May-12 16:51:36 GMT
X-Powered-By: PHP/5.1.6


We've got plenty of uptake, and dm-download02 appears to have the Windows files, so I'm not sure what's going on here.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Not sure if it's relevant or not, but I did notice that if lang and os or omitted from the download.m.o query string, we default to win32/en-US. Eg:
curl -I "http://download.mozilla.org/?product=firefox-5.0b3-partial-5.0b2"
HTTP/1.1 302 Found
Server: Apache
X-Backend-Server: pp-app-dist04
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0, private
Content-Type: text/html; charset=UTF-8
Date: Wed, 01 Jun 2011 17:04:10 GMT
Location: http://mozilla-crowdcache.3crowd.com/mozilla/firefox/releases/5.0b3/update/win32/en-US/firefox-5.0b2-5.0b3.partial.mar
Pragma: no-cache
Transfer-Encoding: chunked
Connection: Keep-Alive
Set-Cookie: dmo=10.8.84.200.1306947850717037; path=/; expires=Thu, 31-May-12 17:04:10 GMT
X-Powered-By: PHP/5.1.6
Interestingly, all the win32 requests are now going to dm-download02:
curl -I "http://download.mozilla.org/?product=firefox-5.0b3-partial-5.0b2&os=win&lang=af&force=1"
HTTP/1.1 302 Found
Server: Apache
X-Backend-Server: pp-app-dist08
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0, private
Content-Type: text/html; charset=UTF-8
Date: Wed, 01 Jun 2011 19:29:25 GMT
Location: http://dm-download02.mozilla.org/pub/mozilla.org/firefox/releases/5.0b3/update/win32/af/firefox-5.0b2-5.0b3.partial.mar
Pragma: no-cache
Transfer-Encoding: chunked
Connection: Keep-Alive
Set-Cookie: dmo=10.8.84.200.1306956565400815; path=/; expires=Thu, 31-May-12 19:29:25 GMT
X-Powered-By: PHP/5.1.6


The only change between my previous comment and now is that we've got slightly higher uptake (4000 vs 2500) on Windows. I think Linux and Mac were both at around 4000 when I re-opened this, too, which makes me wonder if Bouncer isn't obeying the '100' throttle that's set for the Stage region when uptake is low.
No longer blocks: 646046
Depends on: 663358
I hit this again, when we had tons of uptake:
Running on mv-moz2-linux-ix-slave13.build.mozilla.org:
Using config file update.cfg
Using  https://aus2.mozilla.org/update/1/Firefox/3.6.17/20110420140830/WINNT_x86-msvc/af/releasetest/update.xml?force=1
Calling <function run_with_timeout at 0xb7c9302c> with args: (['wget', '--no-check-certificate', '-q', '-O', 'update.xml', 'https://aus2.mozilla.org/update/1/Firefox/3.6.17/20110420140830/WINNT_x86-msvc/af/releasetest/update.xml?force=1'], 300, None, None, False, True), kwargs: {}, attempt #1
Executing: ['wget', '--no-check-certificate', '-q', '-O', 'update.xml', 'https://aus2.mozilla.org/update/1/Firefox/3.6.17/20110420140830/WINNT_x86-msvc/af/releasetest/update.xml?force=1']
Process stdio:

Process stderr:

Testing http://download.mozilla.org/?product=firefox-3.6.18-partial-3.6.17&os=win&lang=af&force=1
HTTP/1.1 302 Found
Server: Apache
X-Backend-Server: pm-app-dist05
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0, private
Content-Type: text/html; charset=UTF-8
Date: Tue, 21 Jun 2011 02:51:22 GMT
Location: http://mozilla.ftp.halifax.rwth-aachen.de/mozilla/firefox/releases/3.6.18/update/win32/af/firefox-3.6.17-3.6.18.partial.mar
Pragma: no-cache
Transfer-Encoding: chunked
Connection: Keep-Alive
Set-Cookie: dmo=10.2.84.100.1308624682463410; path=/; expires=Wed, 20-Jun-12 02:51:22 GMT
X-Powered-By: PHP/5.1.6

HTTP/1.1 200 OK
Server: nginx/1.0.0
Date: Tue, 21 Jun 2011 02:51:22 GMT
Content-Type: application/octet-stream
Content-Length: 2193254
Last-Modified: Wed, 15 Jun 2011 09:42:42 GMT
Connection: close
Accept-Ranges: bytes


Uptake looked like this:
Product 	OS 	Available 	Total
Firefox-3.6.18 	linux 	79826 	222115
Firefox-3.6.18 	osx 	71869 	222115
Firefox-3.6.18 	win 	51807 	222115
Firefox-3.6.18 	opensolaris-i386 	81179 	222115
Firefox-3.6.18 	opensolaris-sparc 	81179 	222115
Firefox-3.6.18 	solaris-i386 	81134 	222115
Firefox-3.6.18 	solaris-sparc 	81134 	222115
Could you clarify what the error is here ? The 302 is to mozilla.ftp.halifax.rwth-aachen.de rather than 3crowd for dm-download02.
Based on the IP block/region modifications I did I'm expecting all requests from within the build network to hit dm-download02, since its the only member of the "stage" group.

Maybe I'm misunderstanding how this works, though?
More likely I spaced on reading the bug summary.
According to justdave, this is expected behaviour when a Region is overloaded:
[11:15] <bhearsum> justdave: will Bouncer redirect people to outside their assigned Region when that particular region is overloaded? i'm asking in the context of https://bugzilla.mozilla.org/show_bug.cgi?id=646076 only working intermittently
[11:15] <justdave> it should, yes.
Hmm, how is 'overloaded' defined ?
I sent mail to relops and justdave about this bug, to try and figure out a solution. (I didn't think it was worthwhile to have a bunch of back and forth in here). Once we figure out what to do, I'll update this bug with that information.
Catlee, justdave, and I chatted about this today and I think we've got a path forward. Justdave said that the one mirror currently in the Stage region is backed by an NFS share that's subject to other load, which is probably why it's getting overloaded from time to time.

(In reply to comment #30)
> Hmm, how is 'overloaded' defined ?

According to justdave, "does not respond within 5 seconds".

Therefore, if we have a mirror that's _not_ on that NFS share or an otherwise poorly performing machine, we should be able to avoid redirection. Additionally, bug 613620 (which has just been picked up by Rik from webdev) should allow us to set-up Bouncer to serve 503s instead of redirects to external mirrors if we _do_ become overloaded, which are easy to detect and retry on.

So, I'm going to look at setting up a mirror inside of the Build network, and getting that tracking mozilla-prereleases & in Bouncer.
Depends on: 670978
Depends on: 671075
releng-mirror01 is all set-up now, and I think I've got Bouncer configured correctly to point at it. I more or less started from scratch, here's what I did:
* Added a new region:
 Name: Build Network
 Priority: 5
 GeoIP Throttle: 100
 Mirrors: releng-mirror01.build.scl1.mozilla.com
* Added a new country:
 Code: ZZ
 Region: Build Network
 Country Name: Release Engineering
 Continent: NA
* Added a new mirror:
 Name: releng-mirror01.build.scl1.mozilla.com
 Base URL: http://releng-mirror01.build.scl1.mozilla.com/mozilla-prereleases
 Rating: 1
 Active: Yes
 Regions: Build Network

After waiting 5 or 10 minutes (to allow Bouncer/Sentry to catch up, I guess), things seem to be working, with machines within the build network getting redirected to releng-mirror01 100% of the time.

Dave or Nick, does the above look sane to you?
It looks like releng-mirror01, in its current form, may not even be able to keep up with the minuscule load that sentry causes....since I made the changes to Bouncer, Sentry has been marking it red every once in awhile with:
Checking mirror releng-mirror01.build.scl1.mozilla.com ...
http://releng-mirror01.build.scl1.mozilla.com/mozilla-prereleases sent no response after 5 seconds!  Checking recent history...
A couple of things come to mind here. Just thinking out loud...

Did these errors occur while the disk was full? I saw it fill up again recently.

This might be related to the link into scl1 being overloaded, so sentry thinks there's a problem.
(In reply to comment #33)
Looks fine to me, but could you add the IP details for ZZ for completeness ? Is that just comment #14 ? Would be great to have the full path for IP -> Country -> Region -> Mirror documented.

(In reply to comment #35)
> This might be related to the link into scl1 being overloaded, so sentry
> thinks there's a problem.

I think it's likely this is the problem. FYI, sentry has a 5 second timeout for a response.
(In reply to comment #35)
> A couple of things come to mind here. Just thinking out loud...
> 
> Did these errors occur while the disk was full? I saw it fill up again
> recently.

Nope, I cleared that up prior to making the Bouncer changes.

(In reply to comment #36)
> (In reply to comment #35)
> > This might be related to the link into scl1 being overloaded, so sentry
> > thinks there's a problem.
> 
> I think it's likely this is the problem. FYI, sentry has a 5 second timeout
> for a response.

Ah! Assuming this is true, things should look better once the P2P link is enabled?
(In reply to comment #33)
> releng-mirror01 is all set-up now, and I think I've got Bouncer configured
> correctly to point at it. I more or less started from scratch, here's what I
> did:
> * Added a new region:
>  Name: Build Network
>  Priority: 5
>  GeoIP Throttle: 100
>  Mirrors: releng-mirror01.build.scl1.mozilla.com
> * Added a new country:
>  Code: ZZ
>  Region: Build Network
>  Country Name: Release Engineering
>  Continent: NA
> * Added a new mirror:
>  Name: releng-mirror01.build.scl1.mozilla.com
>  Base URL: http://releng-mirror01.build.scl1.mozilla.com/mozilla-prereleases
>  Rating: 1
>  Active: Yes
>  Regions: Build Network

Per your request, Nick, here's the IP Block configuration:
MTV1:
* IP Start: 1073077468 (63.245.220.220)
* IP End: 1073077468 (63.245.220.220)
* Country: ZZ
SJC1:
* IP Start: 1073074320 (63.245.208.144)
* IP End: 1073074320 (63.245.208.144)
* Country: ZZ
SCL1:
* IP Start: 1073077826 (63.245.222.66)
* IP End: 1073077826 (63.245.222.66)
* Country: ZZ
Let's see how things look after the P2P link is enabled.
Final verification for 6.0b5 hit releng-mirror01 for all of its tests, and passed. So, once it's more stable (hopefully after the P2P link is up) we should be all done here.
bug 677183 talks about the Bouncer changes I made affecting more than just the build network. I had assumed that the hide NATs were specific to the build network (though, now that I read back I don't see anything supporting that assumption). Can someone from IT confirm whether those IPs are for the whole colo, just the build network, or some other subset?
Assignee: bhearsum → nobody
Assignee: nobody → bhearsum
(In reply to Ben Hearsum [:bhearsum] from comment #41)
> bug 677183 talks about the Bouncer changes I made affecting more than just
> the build network. I had assumed that the hide NATs were specific to the
> build network (though, now that I read back I don't see anything supporting
> that assumption). Can someone from IT confirm whether those IPs are for the
> whole colo, just the build network, or some other subset?

Throwing over the fence to get this answered.
Assignee: bhearsum → server-ops-releng
Component: Release Engineering → Server Operations: RelEng
QA Contact: release → zandr
Can someone from netops answer this?  In particular, what set of internal hosts would appear at the IPs in comment 13, and would that include seamonkey's hosts?  mtv1 desktop systems?  Thanks!
Assignee: server-ops-releng → network-operations
Component: Server Operations: RelEng → Server Operations: Netops
QA Contact: zandr → mrz
And, secondarily, is it straightforward to assign the build vlan its own hide NATs so that bouncer can recognize its request sources as distinct from other Mozilla systems?
Re: Comment 43

This was answered in comment 14

Re: Comment 44

Yes.
Re: Comment 44

Just to clarify, the build networks in all three locations are already configured with dedicated hide NATs. The IP addresses provided in Comment 14 are used by build networks only.
(In reply to Derek Moore from comment #46)
> Re: Comment 44
> 
> Just to clarify, the build networks in all three locations are already
> configured with dedicated hide NATs. The IP addresses provided in Comment 14
> are used by build networks only.

Are these completely exclusive to the RelEng Build Network, or is it shared with the Community Build Network? bug 677183 hints that it is.
(In reply to Derek Moore from comment #46)
> Re: Comment 44
> 
> Just to clarify, the build networks in all three locations are already
> configured with dedicated hide NATs. The IP addresses provided in Comment 14
> are used by build networks only.

Could we get a list of subnets associated with each NAT? I think we may need to adjust that list.
Re: Comment 47

The community build network (63.245.210.0/26) is not behind a hide NAT. Each machine is individually addressable.
Re: Comment 48

63.245.208.144 contains:
    10.2.71.0/24 (sjc1 vlan 71)
    10.2.90.0/23 (sjc1 vlan 90)

63.245.220.220 contains:
    10.250.48.0/22 (mtv1 vlan 500)

63.245.222.66 contains
    10.12.40.0/22 (scl1 vlan 40)
    10.12.47.0/24 (scl1 vlan 47)
    10.12.48.0/22 (scl1 vlan 48)
    10.12.75.0/24 (scl1 vlan 75)
Depends on: 677183
https://bugzilla.mozilla.org/show_bug.cgi?id=677183#c3 suggests that bouncer is throwing traffic from outside "Mozilla Land!" to this mirror as well. Is there a way to prevent that?

If that's the case, we might be doing this to users as well.
(In reply to Zandr Milewski [:zandr] from comment #51)
> https://bugzilla.mozilla.org/show_bug.cgi?id=677183#c3 suggests that bouncer
> is throwing traffic from outside "Mozilla Land!" to this mirror as well. Is
> there a way to prevent that?

I don't understand Bouncer well enough to tell you for sure, but based on my understanding it seems unlikely.

> If that's the case, we might be doing this to users as well.

Yes, indeed. I'm going to dig a bit for obvious answers, and revert if that yields nothing.

Is it possible to find out from Bouncer logs who has been getting redirected to releng-mirror01?
Component: Server Operations: Netops → Server Operations: RelEng
Didn't mean to change the component.
Component: Server Operations: RelEng → Server Operations: Netops
And so we are. (breaking users)

We need to turn this off and find another solution. I don't want to create a public mirror in scl1, which would be the other fix to this problem.
Backing out those changes right away.
(In reply to Ben Hearsum [:bhearsum] from comment #55)
> Backing out those changes right away.


> MTV1:
> * IP Start: 1073077468 (63.245.220.220)
> * IP End: 1073077468 (63.245.220.220)
> * Country: ZZ
> SJC1:
> * IP Start: 1073074320 (63.245.208.144)
> * IP End: 1073074320 (63.245.208.144)
> * Country: ZZ
> SCL1:
> * IP Start: 1073077826 (63.245.222.66)
> * IP End: 1073077826 (63.245.222.66)
> * Country: ZZ

I switched these blocks back to US.

Based on what we know now here (that actual users have been redirected to this mirror) and bug 677183, I'm starting to suspect that for some Betas Sentry has decided that all of the other internal mirrors (pv-mirror01, dm-download02, etc.) are overloaded, and that it should send people to releng-mirror01 instead. Basically, the inverse of comment #34.
(In reply to Ben Hearsum [:bhearsum] from comment #56)
> (In reply to Ben Hearsum [:bhearsum] from comment #55)
> > Backing out those changes right away.
> 
> 
> > MTV1:
> > * IP Start: 1073077468 (63.245.220.220)
> > * IP End: 1073077468 (63.245.220.220)
> > * Country: ZZ
> > SJC1:
> > * IP Start: 1073074320 (63.245.208.144)
> > * IP End: 1073074320 (63.245.208.144)
> > * Country: ZZ
> > SCL1:
> > * IP Start: 1073077826 (63.245.222.66)
> > * IP End: 1073077826 (63.245.222.66)
> > * Country: ZZ
> 
> I switched these blocks back to US.

...and marked releng-mirror01 as "inactive".
To clarify from the discussion I just had with Zandr on IRC, bouncer's geoip has a throttle percent on each region.  The throttle only affects the ORIGIN of the traffic, not the destination.  The throttled percent of traffic ORIGINATING in a region will be sent to mirrors within that region, so a region set to 100% should always keep all of the traffic originating in that region going to mirrors that are allocated to that region.  A region set to 50% (like North America, because we don't have enough mirror capacity or the traffic within North America), will serve 50% of the traffic originating in that region to mirrors allocated to that region, and the remaining 50% will get evenly spread across the entire global pool of mirrors (which would include your internal mirror).

Bug 613620 would partially fix this, but would only work if every single region we have explicitly gets a backup region fixed.  To truly fix this, we'd need to have a destination throttle in addition to the origin throttle, or somesuch.
What would it take to run a private instance of bouncer?
(In reply to Zandr Milewski [:zandr] from comment #59)
> What would it take to run a private instance of bouncer?

This isn't a good option IMHO, for two reasons:
1) releasetest channel snippets would have to differ from the release channel ones -- which gives us less confidence that the release channel snippets are correct.
2) We wouldn't be testing the production Bouncer instance at all - which is a huge blind spot.

As a short term way to get bug 617414 in motion again this might be OK, but I wouldn't be comfortable doing this for an extended period of time.
Blocks: 498425
per meeting with IT yesterday:

Another option we discussed was simply to run these tests on a machine that was not locked down, i.e. outside the build network. Would that pose issues for reporting results?
I'm not super keen on having these run outside of our main pool, but I could live with it until Bouncer lets us do the original plan.

Were you thinking of a Build Slave that is located outside of the build network, or something else? If the former, we'd have to poke a hole in the firewall.
That violates the goal of this process -- if there's a machine in the build network that has access to the outside world, then getting stuff out of the build network just involves getting that stuff to the machine with access first.
(In reply to Dustin J. Mitchell [:dustin] from comment #63)
> That violates the goal of this process -- if there's a machine in the build
> network that has access to the outside world, then getting stuff out of the
> build network just involves getting that stuff to the machine with access
> first.

In *my* mind, the idea is that we'd have a Build Slave that exists outside of the build network attached to a master inside of it. It's not great, but it's not subject to what you describe above.
If we go that route, we also need to run mozmill update testing (nightlies+releases) on the slave(s) outside the build network as well.
(In reply to Aki Sasaki [:aki] from comment #65)
> If we go that route, we also need to run mozmill update testing
> (nightlies+releases) on the slave(s) outside the build network as well.

Nightly stuff should be OK because it only needs FTP, not Bouncer/mirrors. releasetest testing will hit this though, ugh :(.

Maybe it's worthwhile waiting for bug 613620. I just chatted with Rik, and he told me he's hoping to have it done by the end of the quarter.
(In reply to Ben Hearsum [:bhearsum] from comment #64)
> (In reply to Dustin J. Mitchell [:dustin] from comment #63)
> > That violates the goal of this process -- if there's a machine in the build
> > network that has access to the outside world, then getting stuff out of the
> > build network just involves getting that stuff to the machine with access
> > first.
> 
> In *my* mind, the idea is that we'd have a Build Slave that exists outside
> of the build network attached to a master inside of it. It's not great, but
> it's not subject to what you describe above.

Thus creating a route in through the firewalls. No different, IMO.

I think we're just going to have to wait on bug 613620 for this one. (the end of Q3 isn't that far away)
Depends on: 613620
OS: Linux → All
Hardware: x86_64 → All
Where does this bug stand?  It's owned by netops but I'm not clear what netops is to do.
Ravi- It's blocked by 613620. We can't do this without affecting normal users until that's fixed.
Depends on: 741774
No longer depends on: 741774
Depends on: 741774
No longer depends on: 741774
No longer blocks: 741774
Depends on: 741774
Depends on: 746624
I think this bug, which is about setting up the internal-only mirror is a RelEng one. Actually turning off access to machines should be tracked in another bug blocking 617414.
Status: REOPENED → RESOLVED
Closed: 13 years ago12 years ago
Resolution: --- → FIXED
Augh, Bugzilla! This is not fixed yet.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I can't follow this bug.  What's the Netops action?
So in comment 69 Zandr said we were blocked on bug 613620 which is now closed.  So I'll echo mrz and myself from comment 68 and ask what is left to do.

If there are action items can someone restate them?
(In reply to Ravi Pina [:ravi] from comment #73)
> So in comment 69 Zandr said we were blocked on bug 613620 which is now
> closed.  So I'll echo mrz and myself from comment 68 and ask what is left to
> do.
> 
> If there are action items can someone restate them?

bhearsum is driving this, and I'm sure he'll comment when he returns from vacation on Monday.
With bug 613620 fixed, I think this part needn't involve IT anymore. RelEng all has access to make adjustments to Bouncer mirrors, and I'm planning on doing so soon.
Assignee: network-operations → nobody
Component: Server Operations: Netops → Release Engineering: Automation (General)
QA Contact: mrz → catlee
Assignee: nobody → bhearsum
Do we have updates here? QA is still waiting in being able to use the internal mirrors for the Mozmill update tests. Do we have any ETA when this bug can be finally solved?
The required Bouncer code has landed in staging, bug 613620. Right now we're waiting for a fully functioning staging Bouncer set-up (bug 750798) so we can test the new code. Once that's done we can push it to production and set-up production RelEng and QA networks to be restricted to the internal mirrors.
Depends on: 750798
Assignee: bhearsum → nobody
Priority: -- → P3
Whiteboard: [bouncer][network]
Blocks: 884843
Product: mozilla.org → Release Engineering
I don't think we'll be doing this anymore - we've poked holes in the firewall instead.
Status: REOPENED → RESOLVED
Closed: 12 years ago11 years ago
Resolution: --- → WONTFIX
I think WONTFIXing causes issues for bug 617414 and bug 498425 (which was WONTFIXed in favor of what seems like a dup bug 813629)
(In reply to Aki Sasaki [:aki] from comment #79)
> I think WONTFIXing causes issues for bug 617414 and bug 498425 (which was
> WONTFIXed in favor of what seems like a dup bug 813629)

I don't think this is an issue for bug 617414 anymore. The only reason we needed this (AFAIK) was because we previously weren't willing to poke holes in the firewall to allow machines to access real mirrors. I'm pretty sure we're fine with/intend to do that now.

I don't think this affects bugs 498425 or 813629 because we've dropped the concept of "internal-only mirror". All files go straight to the CDN as soon as they hit the "releases" directory.
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.