Closed Bug 896078 Opened 7 years ago Closed 5 years ago

turn on OCSP stapling on many Mozilla https servers

Categories

(Cloud Services :: Operations: Marketplace, task)

task
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: keeler, Unassigned)

References

Details

(Whiteboard: [change - configuration])

OCSP stapling has landed in Nightly. It would be great to both dogfood and promote adoption of this by enabling it on Mozilla https servers.
Assignee: server-ops → server-ops-webops
Component: Server Operations → Server Operations: Web Operations
QA Contact: shyam → nmaul
Time to replace Apache 2.2.x with nginx ;)
(In reply to Reed Loden [:reed] from comment #1)
> Time to replace Apache 2.2.x with nginx ;)

My understanding is that most, if not all, of these servers are being some load balancer. (Zeus?) The important thing to keep in mind is that the thing doing the SSL termination (usually the load balancer) must be the thing that implements OCSP stapling. So, if that's Zeus, then that means Zeus must be configured to support stapling. Two years ago, one of the first things I did was ask the engineers at Zeus to support OCSP stapling on behalf of Mozilla, so I'm sure they have it already.
CC'ing Opsec as well
We'd love to support this. The current load balancers are Riverbed Stingrays, aka "zeus". Riverbed has not implemented OCSP stapling.
(In reply to Brian Smith (:briansmith), was bsmith@mozilla.com (:bsmith) from comment #3)
> Two years ago, one of the first things I did
> was ask the engineers at Zeus to support OCSP stapling on behalf of Mozilla,
> so I'm sure they have it already.

Do you have an email copy of your conversation with Riberbed from 2 years ago?
We had a discussion with Riverbed last Friday, and are waiting to hear back from them.
Flags: needinfo?(brian)
(In reply to Julien Vehent [:ulfr] from comment #6)
> (In reply to Brian Smith (:briansmith), was bsmith@mozilla.com (:bsmith)
> from comment #3)
> > Two years ago, one of the first things I did
> > was ask the engineers at Zeus to support OCSP stapling on behalf of Mozilla,
> > so I'm sure they have it already.
> 
> Do you have an email copy of your conversation with Riberbed from 2 years
> ago? We had a discussion with Riverbed last Friday, and are waiting to
> hear back from them.

This conversation happened by phone, not by email.

(In reply to Joe Stevensen [:joes] from comment #5)
> We'd love to support this. The current load balancers are Riverbed
> Stingrays, aka "zeus". Riverbed has not implemented OCSP stapling.

Note that we'd mostly agreed already not to let this affect our plans for changing Gecko. We're not going to gate our changes on any vendor, even if it is the vendor that mozilla.org is using.
Flags: needinfo?(brian)
The tentative plan I'd heard was to attempt to disable OCSP/CRL checking for non-EV certs in 2013Q3 (relying only on OCSP stapling), and to disable it for EV certs in 2014Q1. These aren't set in stone yet, but my understanding is whatever dates we do decide will be pretty firm.

I'm concerned this might be a little fast, but that discussion is not for this bug.


The only action we can take on this right now is to address Riverbed and get a timeline for when we can expect OCSP stapling support. Presuming the response is satisfactory, we can proceed with an upgrade when the time comes. If not, we'll have to make some sort of fallback plan.


(In reply to Reed Loden [:reed] from comment #1)
> Time to replace Apache 2.2.x with nginx ;)

We'd move to Apache 2.4.x sooner... it supports OCSP stapling, and would be a much simpler migration path. But as mentioned, we almost never do SSL termination at that layer anyway.
(In reply to Jake Maul [:jakem] from comment #8)
> The only action we can take on this right now is to address Riverbed and get
> a timeline for when we can expect OCSP stapling support. Presuming the
> response is satisfactory, we can proceed with an upgrade when the time
> comes. If not, we'll have to make some sort of fallback plan.

I meant to elaborate and say we've already asked them about it (less than a week ago) and are waiting on a response. Just today I passed along the intention to ultimately make Firefox rely solely on OCSP stapling, and that seemed to provoke a greater sense of importance (as one would expect).
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
(In reply to Jake Maul [:jakem] from comment #9)

> I meant to elaborate and say we've already asked them about it (less than a
> week ago) and are waiting on a response. Just today I passed along the
> intention to ultimately make Firefox rely solely on OCSP stapling, and that
> seemed to provoke a greater sense of importance (as one would expect).

Indeed it did - and we (Riverbed) have since announced support for OCSP Stapling in version 9.5 of the Stingray Traffic Manager, released in December 2013.
Whiteboard: [change - configuration]
Depends on: 958682
We enabled this for www.mozilla.org. Seems to be working well so far, but it's only been about 30 hours. :)

Do we have a list of top priority sites for enabling this? I imagine we eventually want it everywhere, but at the moment we don't have a good way to mass-enable it, so might be worthwhile to hit the high points first.

Also, do we have any expectation (or concerns) around what this might mean for our own services? I'm thinking in terms of load, bandwidth, CPU usage, response times, etc.

Other than a neat little "Yes" over at ssllabs.com, what sort of direct pro's or con's should we be looking for? I would like to find a way to measure how useful this turns out to be. Specifically I'd like to be able to say things like "this reduces page load time for www.mozilla.org by X%", or "this increases load balancer CPU usage by Y%"... stuff like that. How do we measure this capability?
I wonder if we have telemetry data on the average response time of OCSP responders. OCSP Stapling will eliminate that delay entirely. But since OCSP requests aren't enforced by default, I doubt we can call this a direct performance improvement.

OCSP stapling moves us toward a better handling of certificates revocations, it's a security improvement, but not something you can put numbers on very easily.

Pinging :geekboy who may have more info on what OCSP Stapling means for Firefox. I know :briansmith wanted to make OCSP Stapling a requirement for the FF green bar of EV certs, but I don't know if that plan is still alive.
Flags: needinfo?(sstamm)
(In reply to Jake Maul [:jakem] from comment #11)
> Do we have a list of top priority sites for enabling this? I imagine we
> eventually want it everywhere, but at the moment we don't have a good way to
> mass-enable it, so might be worthwhile to hit the high points first.
> 

I don't know, but how about login.persona.org and AMO?
ulfr: what kind of info are you looking for exactly?  

In general, stapling helps by reducing RTTs for validation and when you combine it with must-staple (bug 901698) you get a much better validation guarantee than regular ol' OCSP or CRL checking.  By policy we require all EV certs to use OCSP (and would prefer it if they stapled responses) to get the EV "green".  See https://wiki.mozilla.org/CA:ImprovingRevocation#No_EV_Treatment_when_OCSP_Fails_or_Not_Provided
Flags: needinfo?(sstamm)
(In reply to Julien Vehent [:ulfr] from comment #12)
> I wonder if we have telemetry data on the average response time of OCSP
> responders.

I seem to recall 300ms, but I can't find any data to support that at the moment.

> OCSP Stapling will eliminate that delay entirely. But since OCSP
> requests aren't enforced by default, I doubt we can call this a direct
> performance improvement.

It's definitely a direct performance improvement. Without stapling, the connection would block until the responder had given a response or timed out. OCSP requests may not be strictly enforced by default, but they're still fetched by default (in FF).

> I know :briansmith wanted to make OCSP Stapling a requirement for
> the FF green bar of EV certs, but I don't know if that plan is still alive.

We plan to eventually turn off OCSP fetching. EV requires revocation information, so the only way to reliably get that would be to do OCSP stapling.
(In reply to Jake Maul [:jakem] from comment #11)
> Do we have a list of top priority sites for enabling this? I imagine we
> eventually want it everywhere, but at the moment we don't have a good way to
> mass-enable it, so might be worthwhile to hit the high points first.
> 

Chris More maintains a list of domains sorted by importance. I picked a few from that list that we can start with (need to discard the ones that aren't hosted behind the ZLB):

www.mozilla.org
start.mozilla.org
addons.mozilla.org
support.mozilla.org
getfirebug.com
developer.mozilla.org
marketplace.firefox.com
bugzilla.mozilla.org
blog.mozilla.org
wiki.mozilla.org
input.mozilla.org
webmaker.org
hacks.mozilla.org
affiliates.mozilla.org
nightly.mozilla.org
browserquest.mozilla.org
sendto.mozilla.org
careers.mozilla.org
mozillalabs.com
openbadges.org
Assignee: server-ops-webops → nmaul
I've done all these that I can. Status info below. Note I have not verified these with a 3rd party service, only enabled the setting in Stingray (though we did verify it on www.mozilla.org, so I have confidence).

www.mozilla.org - done
start.mozilla.org - not HTTPS-aware (legacy Firefox users' homepage)
addons.mozilla.org - 302 Cloud Services
support.mozilla.org - done
getfirebug.com - done
developer.mozilla.org - done
marketplace.firefox.com - 302 Cloud Services
bugzilla.mozilla.org - done
blog.mozilla.org - done
wiki.mozilla.org - done
input.mozilla.org - done
webmaker.org - 302 Mozilla Foundation
hacks.mozilla.org - done
affiliates.mozilla.org - done
nightly.mozilla.org - done
browserquest.mozilla.org - 302 ???
sendto.mozilla.org - 302 ???
careers.mozilla.org - done
mozillalabs.com - done
openbadges.org - 302 ???
openbadges.org and sendto.m.o are handled through the Foundation, although the latter is used by their donation(?) contractor, so it might be under limited control.
Whiteboard: [change - configuration] → [kanban:https://kanbanize.com/ctrl_board/4/115] [change - configuration]
This should now be done for the vast majority of our web properties - everything hosted by our primary PHX1 and SCL3 ZLB clusters.

Remaining is PEK1 and our HCI environment, plus whatever Cloud Services may need to do (mainly Addons, Marketplace, Sync, Persona, FxAccounts).
Completed PEK1 and HCI.


Punting to Cloud Services because I don't think they have their own bug for this.
Component: WebOps: Other → Server Operations: AMO Operations
Product: Infrastructure & Operations → mozilla.org
QA Contact: nmaul → oremj
Assignee: nmaul → server-ops-amo
Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/115] [change - configuration] → [change - configuration]
Assignee: server-ops-amo → nobody
Component: Server Operations: AMO Operations → Operations: Marketplace
Product: mozilla.org → Mozilla Services
QA Contact: oremj → operations-mkt
We are unable to do this, because it is not supported by our current load balancers.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
Is this an "unable to do this *yet*"?
I think we should submit a feature request to the vendor.
Also, looking at the history of this bug, it seems we have stapling on some Mozilla sites/servers, but not all. Should we mark this is as fixed and file a new bug for the remaining sites/servers?
(In reply to Frederik Braun [:freddyb] from comment #22)
> Is this an "unable to do this *yet*"?
> I think we should submit a feature request to the vendor.

I filed a feature request with Citrix last year. We will plan on turning on the feature if they implement it.

(In reply to David Keeler [:keeler] (use needinfo?) from comment #23)
> Also, looking at the history of this bug, it seems we have stapling on some
> Mozilla sites/servers, but not all. Should we mark this is as fixed and file
> a new bug for the remaining sites/servers?

r+
Resolution: WONTFIX → FIXED
Summary: turn on OCSP stapling on all Mozilla https servers → turn on OCSP stapling on many Mozilla https servers
You need to log in before you can comment on or make changes to this bug.