Last Comment Bug 1248741 - [tracking bug] Migrate Balrog to AWS/CloudOps
: [tracking bug] Migrate Balrog to AWS/CloudOps
Status: RESOLVED FIXED
:
Product: Release Engineering
Classification: Other
Component: Balrog: Backend (show other bugs)
: unspecified
: Unspecified Unspecified
-- normal (vote)
: ---
Assigned To: Benson Wong [:mostlygeek]
: Ben Hearsum (:bhearsum)
:
Mentors:
Depends on: 1111032 1248747 1248748 1248751 1248755 1248756 1248759 1248760 1248762 1251335 1251338 1264801 1266392 1275722 1275911 1277611 1281622 1281638 1281906 1283492 1285969 1286333 1286824
Blocks: 1127875
  Show dependency treegraph
 
Reported: 2016-02-16 13:24 PST by Benson Wong [:mostlygeek]
Modified: 2016-09-12 12:50 PDT (History)
10 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments

Description User image Benson Wong [:mostlygeek] 2016-02-16 13:24:51 PST

    
Comment 1 User image Ben Hearsum (:bhearsum) 2016-07-06 06:51:57 PDT
For lack of a better place, here's my analysis of the Zeus configs from bug 1281622, organized by domain:
* aus.mozillamessaging.com - dead, nothing to carry forward
* aus2.mozillamessagin.com - dead, nothing to carry forward
* aus4-dev.allizom.org - dying after we transition, nothing to carry forward
* aus4-admin-dev.allizom.org - dying after transition, nothing to carry forward
* aus5-dev.allizom.org - dying after we transition, nothing to carry forward

* aus2.mozilla.org
** DNS currently controlled by ns1/2.mozilla.org
** https 301's to aus4.mozilla.org
** Has request rules that:
*** Sets X-Forwarded-For to the client's IP address
*** Works around an old Apache DoS vulnerability

* aus3.mozilla.org
** DNS controlled by CloudOps
** Backended by the aus4 pool
** Has request rules that:
*** Sets X-Forwarded-For to the client's IP address
*** Works around an old Apache DoS vulnerability
*** Strips the Cache-Control header
*** References to some old, dead rules (WebOps tells me they don't exist anymore, so even though they're included in the list they don't apply)
** Has a response rule that overrides the Cache-Control header set by the server, and sets it to: "max-age=120"
** ....however, this is overriden by another instruction that sets it to "no-store, must-revalidate, post-check=0, pre-check=0, private"

* aus4.mozilla.org
** Same as aus3.mozilla.org, except no references to the dead request rules.

* aus5.mozilla.org
** Same as aus3.mozilla.org, except no references to dead request rules nor the one that sets X-Forwarded-For.

* aus4-admin.mozilla.org
** Nothing special here.

Also noticed that aus.mozilla.org forwards to mozilla.org now. That was only used for 1.5 IIRC, so it's probably not worth fixing.


So, two potentially actionable things coming out of this:
1) We need to figure out what we actually want to set Cache-Control to in the responses, and do that in the app.
2) Move aus2 DNS to CloudOps, if necessary.

What do you think about #2, Ben?
Comment 2 User image Benson Wong [:mostlygeek] 2016-07-06 15:13:30 PDT
Since AUS2 doesn't have cert pinning, after the migration I was going to get an SSL certificate for aus5.mozilla.org with an ALTNAME for aus2.mozilla.org. Then I'll just point aus2/aus5 at the same cluster.
Comment 3 User image Ben Hearsum (:bhearsum) 2016-08-08 07:37:50 PDT
(In reply to Benson Wong [:mostlygeek] from comment #2)
> Since AUS2 doesn't have cert pinning, after the migration I was going to get
> an SSL certificate for aus5.mozilla.org with an ALTNAME for
> aus2.mozilla.org. Then I'll just point aus2/aus5 at the same cluster.

Looks like aus2 didn't get repointed yet, w0ts0n found this when he was starting decom of the old infra:
host -i aus2.mozilla.org
aus2.mozilla.org is an alias for aus2.external.zlb.scl3.mozilla.com.
aus2.external.zlb.scl3.mozilla.com has address 63.245.213.19

Is this still in the works?
Comment 4 User image Benson Wong [:mostlygeek] 2016-08-08 10:38:21 PDT
Is it worth it? 

I think we should just drop it. We would be adding new infrastructure for FF 2.0/3.0 clients. That's almost 10 years ago. I'm thinking if they haven't updated by now they're not going to do it.
Comment 5 User image Ben Hearsum (:bhearsum) 2016-08-12 06:22:32 PDT
(In reply to Benson Wong [:mostlygeek] from comment #4)
> Is it worth it? 
> 
> I think we should just drop it. We would be adding new infrastructure for FF
> 2.0/3.0 clients. That's almost 10 years ago. I'm thinking if they haven't
> updated by now they're not going to do it.

I'm having trouble arguing with this logic. Nick, do you have strong feelings here?
Comment 6 User image Nick Thomas [:nthomas] 2016-08-22 15:21:30 PDT
It has been a long time, and we've been serving no updates with Balrog even though we thought we had switched over. However, it would block us completing bug 998721, and I'm not sure it's our call.
Comment 7 User image Ben Hearsum (:bhearsum) 2016-08-23 06:29:36 PDT
I'm not exactly sure who should be making this call. Robert, Sylvestre - how do you feel about no longer maintaining aus2.mozilla.org? It would mean we can't serve updates to Firefox 2.0-3.0.
Comment 8 User image Sylvestre Ledru [:sylvestre] 2016-08-23 12:46:47 PDT
What is the volume of these versions?

Anyway, I guess that if they are still using these versions, they are probably stuck forever.
Comment 9 User image Robert Strong [:rstrong] (use needinfo to contact me) 2016-08-24 01:19:37 PDT
I'd like it if bug 998721 was fixed and that update offered to these users before it was removed.
Comment 10 User image Ben Hearsum (:bhearsum) 2016-08-24 12:18:24 PDT
OK, so if we're to support aus2, we need to use a cert from a CA that was included in Firefox 2.0 & 3.0. DigiCert is NOT in that list, so that's not an option. Notable CAs in the list include: Entrust.net, Geotrust, StartCom, Thawte, Verisign. Looks like "VeriSign Class 3 Public PCA - G1.5" is an active root that Firefox 2.0 & 3.0 support. There's probably others as well - I haven't looked in depth.
Comment 11 User image Ben Hearsum (:bhearsum) 2016-08-31 13:21:09 PDT
(In reply to Ben Hearsum (:bhearsum) from comment #10)
> OK, so if we're to support aus2, we need to use a cert from a CA that was
> included in Firefox 2.0 & 3.0. DigiCert is NOT in that list, so that's not
> an option. Notable CAs in the list include: Entrust.net, Geotrust, StartCom,
> Thawte, Verisign. Looks like "VeriSign Class 3 Public PCA - G1.5" is an
> active root that Firefox 2.0 & 3.0 support. There's probably others as well
> - I haven't looked in depth.

I did a bit more looking and most of these CA's roots that were in Firefox 2 & 3 are now retired. GeoTrust is the only that seems promising - a few of the roots on https://www.geotrust.com/resources/root-certificates/index.html are embedded in Firefox 2 & 3. I wasn't able to figure out which (if any) of their products are actually issued from those roots, so I've sent them mail and am waiting on a response.
Comment 12 User image Nick Thomas [:nthomas] 2016-08-31 16:26:24 PDT
I'd forgotten but aus2.m.o is not serving content, it just redirects to aus4.m.o (ie Balrog). The cert is still valid to May 11 12:00:00 2018 GMT. So the issue is that the redirect is happening on WebOps infra, which they would like to decommission, and CloudOps is asking if it's worth setting anything on their side ? 

If so, a couple of obvious options
* delay the decomissioning a couple of months to finish up bug 998721 and give it a chance to have an effect. Can possibly reduce infra before then because the redirect is in the ZLB, see bug 1281622
* modify the proxy CloudOps has to handle aus2.m.o, moving the existing cert over. Option of renewing, or not, in 2018
Comment 13 User image Ben Hearsum (:bhearsum) 2016-09-01 06:21:07 PDT
(In reply to Nick Thomas [:nthomas] from comment #12)
> I'd forgotten but aus2.m.o is not serving content, it just redirects to
> aus4.m.o (ie Balrog). The cert is still valid to May 11 12:00:00 2018 GMT.
> So the issue is that the redirect is happening on WebOps infra, which they
> would like to decommission, and CloudOps is asking if it's worth setting
> anything on their side ? 

Hm, for some reason I thought the cert was expiring much sooner than that. We probably don't need to bother with a renewal in that case (it's not looking likely to be possible anyways).

> If so, a couple of obvious options
> * delay the decomissioning a couple of months to finish up bug 998721 and
> give it a chance to have an effect. Can possibly reduce infra before then
> because the redirect is in the ZLB, see bug 1281622
> * modify the proxy CloudOps has to handle aus2.m.o, moving the existing cert
> over. Option of renewing, or not, in 2018

Ryan, Benson - what do you think? AFAICT, aus2 is actually in a working state at the moment, due to the redirect that Nick noted. Maybe we should just keep it alive in Zeus until the cert expires, if that's not a big burden.
Comment 14 User image Ryan Watson [:w0ts0n] 2016-09-01 06:31:15 PDT
I am happy to leave the current config/redirect in our zlb (I'll make a note to this bug in the config) for as long as our zlb's stick around. We are unsure if we will be using these Q1/Q2 next year, but until that day comes.. We can leave it there?.
Comment 15 User image Benson Wong [:mostlygeek] 2016-09-01 09:49:51 PDT
Leaving it on the zeus WFM.
Comment 16 User image Ben Hearsum (:bhearsum) 2016-09-12 12:50:31 PDT
OK, with aus2 taken care of (for now) I think we're done here? bug 1283492 is open, but we're just waiting for an opportune time to do the last bit of catch-up, and then enable the cronjob.

Note You need to log in before you can comment on or make changes to this bug.