Closed Bug 974151 Opened 10 years ago Closed 10 years ago

Production and staging telemetry-experiments.mozilla.org

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: benjamin, Assigned: cturra)

References

Details

(Whiteboard: [business - new app])

The telemetry experiments system will need a domain with staging and production versions with an SSL certificate. We will probably need to pin the certificate in the client the same way we pin the AMO cert, since it will be used to serve executable code updates.

The system will be much like FHR: scripts which produce flat files to serve, no dynamic server code.

We intend for this to go into production for nightly/aurora in 4 weeks.
Blocks: 973998
Assignee: server-ops → server-ops-webops
Component: Server Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
QA Contact: shyam → nmaul
Background: FHR is served from the CDN: the scripts to generate the static content run on the CDN origin. Jake set it up and knows all :)
FHR (fhr.cdn.mozilla.net) does not pin cert... how certain are we that this is a requirement here? It affects what we do with SSL and/or CDNs. Relevant note, most properties used by the browser (snippets, fxfeeds, fhr, crash-reports, etc) don't use pinned certs.
(In reply to Jake Maul [:jakem] from comment #2)
> FHR (fhr.cdn.mozilla.net) does not pin cert... how certain are we that this
> is a requirement here? It affects what we do with SSL and/or CDNs. Relevant
> note, most properties used by the browser (snippets, fxfeeds, fhr,
> crash-reports, etc) don't use pinned certs.

AUS is pinned to a couple specific Certificate Authorities, but afaik we don't any kind of pinning anywhere else
I'm still working on getting guidance from the sec teams about that, but since this will be shipping executable code to Firefox users I expect it will likely be necessary to pin the cert the same we do for AMO (and AUS, right?). "ships executable code to the browser" seems to be our criterion for needing a pinned cert.
:bsmedberg - can you please point me to where the code for this project is hosted? similar to fhr, we'll setup an auto-deploying dev environment for your dev/testing and one for prod, which we can assist with deployments for.
Flags: needinfo?(benjamin)
Whiteboard: [business - new app]
We did security review yesterday and the conclusion is that we do need to pin the cert for this service the same way we do for AMO and AUS in the client.

Here is the initial code for the system which does nothing but publish an empty manifest: http://hg.mozilla.org/users/bsmedberg_mozilla.com/telemetry-experiment-server/

Build-time requirement: python and genshi
runtime requirement: mod_rewrite rules via .htaccess, but if you have a different desired solution for mapping URLs like /manifest/Firefox/27.0/beta to flat files like firefox-manifest.json I'm happy to do something different.
Flags: needinfo?(benjamin)
Assignee: server-ops-webops → cturra
i have completed the dev setup for this environment. every 15 minutes an update script is run to do a hg pull/update and rebuild the webroot (destination path).

you can access dev at:

  https://telemetry-experiment-dev.allizom.org/
:bsmedberg - before i get too far into the production setup for this, i just wanted to confirm the url i am using is okay with you? generally, we use singular names and it looks like this is how you've also names your hg repo. the plan for production will be: telemetry-experiment
Flags: needinfo?(benjamin)
Hrm, everything else has been "experiments" (that's the name we expose in the UI). But I also don't care that much because the website itself isn't really user-visible; the HTML content is mainly intended for developers.
Flags: needinfo?(benjamin)
good news! the production environment is now setup for this new service and completely fronted with one of our CDNs (just like fhr).

 $ curl -I https://telemetry-experiment.cdn.mozilla.net/
 HTTP/1.1 200 OK
 Server: Apache
 X-Backend-Server: generic2.webapp.phx1.mozilla.com
 Content-Type: text/html; charset=UTF-8
 Strict-Transport-Security: max-age=15768000 ; includeSubDomains
 Accept-Ranges: bytes
 ETag: "1b9"
 Last-Modified: Wed, 12 Mar 2014 18:30:08 GMT
 X-Cache-Info: not cacheable; response specified "Cache-Control: no-cache"
 Content-Length: 441
 Expires: Thu, 13 Mar 2014 06:23:34 GMT
 Cache-Control: max-age=0, no-cache, no-store
 Pragma: no-cache
 Date: Thu, 13 Mar 2014 06:23:34 GMT
 Connection: keep-alive
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Looking at the production certificate, should this be pinned to "Cybertrust Public SureServer SV CA" in the client with a CN of "*.cdn.mozilla.net"?
Flags: needinfo?(cturra)
to be perfectly honest, issuer pinning concerns me in this case. akamai is hosting the ssl certificate for us, since they're the cdn we're using in this case. we have no control over which certificate authority and when they reissue. this could cause us all sorts of headaches down the road.
Flags: needinfo?(cturra)
Chris,

For certPinning (bug 744204, coming real soon now (TM))for the cdn  I am currently pin so the keys of several CAs (currently to all of verisign, gte, equifax, geotrust, digicert, thawte and Baltimore). Are comfortable with that set? if not who do you think I should ge in touch to augment the set.
that seems like a fair approach, and in this case, the the root ca is: Baltimore CyberTrust Root. does this issuer pinning code follow cert chains all the way back to the root?
(In reply to Chris Turra [:cturra] from comment #14)
> that seems like a fair approach, and in this case, the the root ca is:
> Baltimore CyberTrust Root. does this issuer pinning code follow cert chains
> all the way back to the root?

Correct, the test ensures that there is an intersection between the specified keys and the keys in the computed chain (including the root)[1]. I in the future we could even pin intermediates, once we have a non-optimistic algorithm for certificate path building.

[1] There is default exception so that when a chain terminates in a non-built-in root we do not enforce pinning (self mitm)
I don't understand how the discussion here affects the situation for shipping this in Firefox 30. Using CertUtils.jsm I can validate any aspect of the site certificate that is exposed on nsIX509Certificate{,2,3} which includes the following attributes:

.nickname  (N/A)
.emailAddress (N/A)
.subjectName "CN=*.cdn.mozilla.net,OU=IT,O=Mozilla,L=Mountain View,ST=CALIFORNIA,C=US"
.commonName "*.cdn.mozilla.net"
.organization "Mozilla"
.organizationalUnit "IT"
.sha1Fingerprint "3D:FA:78:5B:D4:CF:A3:A6:0A:89:BF:70:39:1E:50:63:0B:EB:ED:8A"
.issuerName "CN=Cybertrust Public SureServer SV CA,O=Cybertrust Inc"
.serialNumber "01:00:00:00:00:01:44:0F:FD:39:11:20:74:BA"
.issuerCommonName "Cybertrust Public SureServer SV CA"

But unless we added some new feature to CertUtils.jsm there isn't a way to specify that it be a cert for *.cdn.mozilla.net and have specific root attributes.
It does not affect what is shipping in Firefox 30.
ok that leaves us back at what I should pin the client to in Firefox 30.
Flags: needinfo?(cturra)
i don't believe i should be the one making that decision. we're stuck with what we have from akamai. i would prefer, if possible, that we pin trust on the root certificate authority (Baltimore CyberTrust Root) rather than the intermediate issuer.
Flags: needinfo?(cturra)
I don't have the technical ability to pin on a root cert, which is why I expressed the pinning concern as maybe a blocker for not using the CDN. So now we're using the CDN and can't pin properly? Or can I just pin against the cert that we're actually using on the CDN?
I am uncomfortable with the idea of cert pinning altogether, at least in this form... doubly so when we're pinning against a cert purchased and provided by a 3rd party.


Here's the scenario that scares me most: say we pin this in Firefox 30 (the exact cert, intermediate, root, whatever). What happens when the cert is changed (expires, new provider, etc)? Obviously we can change the pin in *new* versions, but there's always a trickle of users that (for some reason) get stuck/abandoned on older versions... they're not going to get updated. Hence, this functionality will simply be broken for them somehow (I don't know what that UI looks like).

Who can accept responsibility for this type of risk? How many people (or what %) are we willing to abandon over pinning? How will we make sure this cert pinning is not simply forgotten about in future when changes need to be made?

I think these are questions that need a higher-level business owner to answer, not us folk on the ground implementing it. Someone like Bob Moss, perhaps?


I know this sounds very negative, but I feel it's important. We very nearly got badly burned by the pinning on aus3.mozilla.org just last year. This of course isn't *that* bad, but I'm still concerned by it. Pinning has multi-year ramifications, and so (IMO) deserves special consideration.
Some quick responses/questions on my way out the door:

1) Pin failures here are much less tragic than pin failures on AUS. Pin failures on AUS mean losing the ability to update all users forever. Pin failures on telemetry experiments mean losing the ability to push new experiments until we can update our users to look for new pins. (AIUI! Benjamin, correct me swiftly if I'm wrong) We should not treat them as comparable risk profiles (though in the absence of that clarity, I think Jake is right to raise it)

2) Once we get Camilo's cert pinning code in, do we intend to remove this special-purpose pinning and use that generic pinning instead? I think I hear that happening here, but that also scopes down our risk for this approach to N releases.

3) If we have a security requirement around pinning that can't be satisfied by our CDN, then we shouldn't use the CDN for it. Unless I'm misunderstanding things (possible! I'm standing up as I type this! There's wine waiting for me!) this is about delivering occasional chunks of experiment code to millions, but not tens or hundreds of millions, of users from time to time. Is akamai necessary/desirable for solving that problem, particularly given the peculiarities of that code's security properties?

4) Jake asks about how many people we're willing to abandon, but unless I've deeply misunderstood, in no case should this feature (pinned or otherwise) cause us to lose users when it breaks. Benjamin?

I can be the guy to yea/nay that risk on behalf of moz, but I'll need to understand those questions to do so. If they are as I suspect (pin failures are indeed not scary; we do plan to switch to camilo's pinning soon; we don't need CDN; we won't lose users) then I hereby do so.
1) correct, we lose the ability to push new experiments or remote-kill bad ones, but it doesn't affect other functionality
2) AIUI Camilo's pinning requires server cooperation, but yes, we plan to move to the less fragile thing soon.
3) During security review pinning was identified as a requirement, since we know that our code-deployment services have been the subject of certificate-spoofing attempts in the past.

The expected load on this service is 1-2M daily pings for a manifest. .xpi fetches might peak on experiment-deployment days at 150k/day but will typically be almost 0.

4) None of this should cause abandonment.
How do you feel, Jake?
Given the decreased risk here as compared to something like AUS, I'm much less concerned than I was back in comment 21, months ago. If there's no risk of abandoning users, then the biggest risk would seem to be the accidental invalidation or delay of an experiment, should the CDN vendor change their cert on us unexpectedly. That's much more recoverable, because at the very least future versions would be able to pin on the new thing properly, and we'd just lose the time in-between.

r+ from me
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.