[tracking bug] Migrate Balrog to AWS/CloudOps

RESOLVED FIXED

Status

Release Engineering
Balrog: Backend
RESOLVED FIXED
a year ago
10 months ago

People

(Reporter: mostlygeek, Assigned: mostlygeek)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Comment hidden (empty)
(Assignee)

Updated

a year ago
Depends on: 1248747
(Assignee)

Updated

a year ago
Depends on: 1248748
(Assignee)

Updated

a year ago
Depends on: 1248751
(Assignee)

Updated

a year ago
Depends on: 1248755
(Assignee)

Updated

a year ago
Depends on: 1248756
(Assignee)

Updated

a year ago
Depends on: 1248759
(Assignee)

Updated

a year ago
Depends on: 1248760
(Assignee)

Updated

a year ago
Depends on: 1248762
(Assignee)

Updated

a year ago
Assignee: nobody → bwong
Depends on: 1251335
Depends on: 1251338
Blocks: 1127875
(Assignee)

Updated

a year ago
Depends on: 1264801
Depends on: 1266392
(Assignee)

Updated

a year ago
Depends on: 1275722
Depends on: 1275911
(Assignee)

Updated

a year ago
Depends on: 1277611
Depends on: 1281622
(Assignee)

Updated

a year ago
Depends on: 1281638
(Assignee)

Updated

a year ago
Depends on: 1281906
Depends on: 1283492
For lack of a better place, here's my analysis of the Zeus configs from bug 1281622, organized by domain:
* aus.mozillamessaging.com - dead, nothing to carry forward
* aus2.mozillamessagin.com - dead, nothing to carry forward
* aus4-dev.allizom.org - dying after we transition, nothing to carry forward
* aus4-admin-dev.allizom.org - dying after transition, nothing to carry forward
* aus5-dev.allizom.org - dying after we transition, nothing to carry forward

* aus2.mozilla.org
** DNS currently controlled by ns1/2.mozilla.org
** https 301's to aus4.mozilla.org
** Has request rules that:
*** Sets X-Forwarded-For to the client's IP address
*** Works around an old Apache DoS vulnerability

* aus3.mozilla.org
** DNS controlled by CloudOps
** Backended by the aus4 pool
** Has request rules that:
*** Sets X-Forwarded-For to the client's IP address
*** Works around an old Apache DoS vulnerability
*** Strips the Cache-Control header
*** References to some old, dead rules (WebOps tells me they don't exist anymore, so even though they're included in the list they don't apply)
** Has a response rule that overrides the Cache-Control header set by the server, and sets it to: "max-age=120"
** ....however, this is overriden by another instruction that sets it to "no-store, must-revalidate, post-check=0, pre-check=0, private"

* aus4.mozilla.org
** Same as aus3.mozilla.org, except no references to the dead request rules.

* aus5.mozilla.org
** Same as aus3.mozilla.org, except no references to dead request rules nor the one that sets X-Forwarded-For.

* aus4-admin.mozilla.org
** Nothing special here.

Also noticed that aus.mozilla.org forwards to mozilla.org now. That was only used for 1.5 IIRC, so it's probably not worth fixing.


So, two potentially actionable things coming out of this:
1) We need to figure out what we actually want to set Cache-Control to in the responses, and do that in the app.
2) Move aus2 DNS to CloudOps, if necessary.

What do you think about #2, Ben?
Flags: needinfo?(bwong)
Depends on: 1111032
(Assignee)

Comment 2

a year ago
Since AUS2 doesn't have cert pinning, after the migration I was going to get an SSL certificate for aus5.mozilla.org with an ALTNAME for aus2.mozilla.org. Then I'll just point aus2/aus5 at the same cluster.
Flags: needinfo?(bwong)
Depends on: 1285969
(Assignee)

Updated

a year ago
Depends on: 1286333
Depends on: 1286824

Comment 3

11 months ago
(In reply to Benson Wong [:mostlygeek] from comment #2)
> Since AUS2 doesn't have cert pinning, after the migration I was going to get
> an SSL certificate for aus5.mozilla.org with an ALTNAME for
> aus2.mozilla.org. Then I'll just point aus2/aus5 at the same cluster.

Looks like aus2 didn't get repointed yet, w0ts0n found this when he was starting decom of the old infra:
host -i aus2.mozilla.org
aus2.mozilla.org is an alias for aus2.external.zlb.scl3.mozilla.com.
aus2.external.zlb.scl3.mozilla.com has address 63.245.213.19

Is this still in the works?
Flags: needinfo?(bwong)
(Assignee)

Comment 4

11 months ago
Is it worth it? 

I think we should just drop it. We would be adding new infrastructure for FF 2.0/3.0 clients. That's almost 10 years ago. I'm thinking if they haven't updated by now they're not going to do it.
Flags: needinfo?(bwong)

Comment 5

11 months ago
(In reply to Benson Wong [:mostlygeek] from comment #4)
> Is it worth it? 
> 
> I think we should just drop it. We would be adding new infrastructure for FF
> 2.0/3.0 clients. That's almost 10 years ago. I'm thinking if they haven't
> updated by now they're not going to do it.

I'm having trouble arguing with this logic. Nick, do you have strong feelings here?
Flags: needinfo?(nthomas)

Comment 6

10 months ago
It has been a long time, and we've been serving no updates with Balrog even though we thought we had switched over. However, it would block us completing bug 998721, and I'm not sure it's our call.
Flags: needinfo?(nthomas)

Comment 7

10 months ago
I'm not exactly sure who should be making this call. Robert, Sylvestre - how do you feel about no longer maintaining aus2.mozilla.org? It would mean we can't serve updates to Firefox 2.0-3.0.
Flags: needinfo?(sledru)
Flags: needinfo?(robert.strong.bugs)
What is the volume of these versions?

Anyway, I guess that if they are still using these versions, they are probably stuck forever.
Flags: needinfo?(sledru)
I'd like it if bug 998721 was fixed and that update offered to these users before it was removed.
Flags: needinfo?(robert.strong.bugs)
OK, so if we're to support aus2, we need to use a cert from a CA that was included in Firefox 2.0 & 3.0. DigiCert is NOT in that list, so that's not an option. Notable CAs in the list include: Entrust.net, Geotrust, StartCom, Thawte, Verisign. Looks like "VeriSign Class 3 Public PCA - G1.5" is an active root that Firefox 2.0 & 3.0 support. There's probably others as well - I haven't looked in depth.
(In reply to Ben Hearsum (:bhearsum) from comment #10)
> OK, so if we're to support aus2, we need to use a cert from a CA that was
> included in Firefox 2.0 & 3.0. DigiCert is NOT in that list, so that's not
> an option. Notable CAs in the list include: Entrust.net, Geotrust, StartCom,
> Thawte, Verisign. Looks like "VeriSign Class 3 Public PCA - G1.5" is an
> active root that Firefox 2.0 & 3.0 support. There's probably others as well
> - I haven't looked in depth.

I did a bit more looking and most of these CA's roots that were in Firefox 2 & 3 are now retired. GeoTrust is the only that seems promising - a few of the roots on https://www.geotrust.com/resources/root-certificates/index.html are embedded in Firefox 2 & 3. I wasn't able to figure out which (if any) of their products are actually issued from those roots, so I've sent them mail and am waiting on a response.

Comment 12

10 months ago
I'd forgotten but aus2.m.o is not serving content, it just redirects to aus4.m.o (ie Balrog). The cert is still valid to May 11 12:00:00 2018 GMT. So the issue is that the redirect is happening on WebOps infra, which they would like to decommission, and CloudOps is asking if it's worth setting anything on their side ? 

If so, a couple of obvious options
* delay the decomissioning a couple of months to finish up bug 998721 and give it a chance to have an effect. Can possibly reduce infra before then because the redirect is in the ZLB, see bug 1281622
* modify the proxy CloudOps has to handle aus2.m.o, moving the existing cert over. Option of renewing, or not, in 2018
(In reply to Nick Thomas [:nthomas] from comment #12)
> I'd forgotten but aus2.m.o is not serving content, it just redirects to
> aus4.m.o (ie Balrog). The cert is still valid to May 11 12:00:00 2018 GMT.
> So the issue is that the redirect is happening on WebOps infra, which they
> would like to decommission, and CloudOps is asking if it's worth setting
> anything on their side ? 

Hm, for some reason I thought the cert was expiring much sooner than that. We probably don't need to bother with a renewal in that case (it's not looking likely to be possible anyways).

> If so, a couple of obvious options
> * delay the decomissioning a couple of months to finish up bug 998721 and
> give it a chance to have an effect. Can possibly reduce infra before then
> because the redirect is in the ZLB, see bug 1281622
> * modify the proxy CloudOps has to handle aus2.m.o, moving the existing cert
> over. Option of renewing, or not, in 2018

Ryan, Benson - what do you think? AFAICT, aus2 is actually in a working state at the moment, due to the redirect that Nick noted. Maybe we should just keep it alive in Zeus until the cert expires, if that's not a big burden.
Flags: needinfo?(rwatson)
Flags: needinfo?(bwong)

Comment 14

10 months ago
I am happy to leave the current config/redirect in our zlb (I'll make a note to this bug in the config) for as long as our zlb's stick around. We are unsure if we will be using these Q1/Q2 next year, but until that day comes.. We can leave it there?.
Flags: needinfo?(rwatson)
(Assignee)

Comment 15

10 months ago
Leaving it on the zeus WFM.
Flags: needinfo?(bwong)
OK, with aus2 taken care of (for now) I think we're done here? bug 1283492 is open, but we're just waiting for an opportune time to do the last bit of catch-up, and then enable the cronjob.
Status: NEW → RESOLVED
Last Resolved: 10 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.