Closed Bug 1368807 Opened 7 years ago Closed 7 years ago

Normandy's content signature cert has expired

Categories

(Shield :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jhirsch, Assigned: mythmon)

References

Details

Saw this error when starting today's Nightly (55.0a1 (2017-05-30) (64-bit)) on macos 10.12.4:

1496174305060	extensions.shield-recipe-client.recipe-runner	ERROR	Could not fetch recipes from https://normandy.cdn.mozilla.net/api/v1: "Error: Recipe signature is not valid"

The trace isn't super helpful, but I'll include it for context:

App_append resource://gre/modules/Log.jsm:748:9
	log resource://gre/modules/Log.jsm:386:7
	error resource://gre/modules/Log.jsm:394:5
	run resource://shield-recipe-client/lib/RecipeRunner.jsm:129:7
	throw self-hosted:1174:9
Is this reproducible, or did it just happen once?

What is the value of the preference security.content.signature.root_hash? It should match the value given for Production here: http://normandy.readthedocs.io/en/latest/qa/environments.html#production

There may be other logging prefixed with extensions.shield-recipe-client around that message. That would also be useful to see.
Flags: needinfo?(jhirsch)
I'm no longer able to repro this bug (on today's Nightly).

The value of security.content.signature.root_hash does match production.

I don't have any logs saved beyond what I reported yesterday.

Feel free to close if this isn't a useful bug report.
Flags: needinfo?(jhirsch)
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
Jared, I think this is a good bug report, and useful to report. I don't think there is anything useful to do with the data here.
I have a test profile from a few days ago that is still seeing this error. (I'm investigating an IndexedDB error with the Screenshots extension in Nightly, and have a few busted test profiles.)

So, here's the complete startup log from starting today's nightly with a profile that threw IndexedDB errors a few days ago, and is still experiencing those (perhaps related?) errors: https://gist.github.com/6a68/d99fb26cebc8ddd48713a04e7f67b0af

The Screenshots bug is https://github.com/mozilla-services/screenshots/issues/2911

The value of security.content.signature.root_hash matches the one in production.

The value of extensions.shield-recipe-client.user_id is e9bbe2a2-6cf6-2e42-84bb-65331775b835, in case that corresponds to anything useful on the server.

I'll leave this profile open for now, in case there's anything you'd like me to try.
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Flags: needinfo?(mcooper)
Sorry, but I don't seen any signature errors in the gist you posted, or any Shield logging at all. Shield doesn't run at startup, but instead on a 24 hour timer (similar to updates). We'll want to find logs for a time when Shield is actually active.

To increase the amount of log data we get, please set extensions.shield-recipe-client.logging.level to 0. This will print the debug logs as well.

You can force shield to run immediately run by pasting this in a chrome-context console (like the one on about:config):

Components.utils.import('resource://shield-recipe-client/lib/RecipeRunner.jsm');
RecipeRunner.testRun("https://normandy-mock.dev.mozaws.net/ConsoleLogBasic/api/v1")
  .then(console.log.bind(console, 'ok'), console.error.bind(console, 'error'));

That should say "ok undefined", and there should be some extra logs from Shield in your browser console.

Finally, if you can visit https://normandy.cdn.mozilla.net/api/v1/recipe/signed/ in that browser, and attach the output, that can help us rule out weird caching behaviors. Hopefully you aren't getting stale or bad data from the API.
Flags: needinfo?(mcooper)
Great, thanks. I set extensions.shield-recipe-client.logging.level to 0, then ran that code snippet, and got this in the browser console:

ok undefined  (unknown)
1496353862764	extensions.shield-recipe-client.recipe-runner	ERROR	Could not fetch recipes from https://normandy-mock.dev.mozaws.net/ConsoleLogBasic/api/v1: "Error: Recipe signature is not valid"  Log.jsm:748
	App_append resource://gre/modules/Log.jsm:748:9
	log resource://gre/modules/Log.jsm:386:7
	error resource://gre/modules/Log.jsm:394:5
	run resource://shield-recipe-client/lib/RecipeRunner.jsm:129:7
	throw self-hosted:1174:9


When I visit https://normandy.cdn.mozilla.net/api/v1/recipe/signed/, the page loads some JSON with a current timestamp inside the signature field, so I suspect it's not weirdly cached. I've gisted it here: https://gist.github.com/6a68/eded1b3a6eae2f027f16b2afa19e07b8

I do see a ton of CSP errors in the console when I visit the normandy page: 

Content Security Policy: Couldn’t process unknown directive ‘worker-src’ 
Content Security Policy: Directive ‘frame-src’ has been deprecated. Please use directive ‘child-src’ instead. 
Content Security Policy: The page’s settings blocked the loading of a resource at self (“default-src https://normandy.cdn.mozilla.net https://normandy-cloudfront.cdn.mozilla.net/”). Source: float: right; margin-right: 10px. signed
Content Security Policy: The page’s settings blocked the loading of a resource at self (“default-src https://normandy.cdn.mozilla.net https://normandy-cloudfront.cdn.mozilla.net/”). signed
Content Security Policy: The page’s settings blocked the loading of a resource at self (“default-src https://normandy.cdn.mozilla.net https://normandy-cloudfront.cdn.mozilla.net/”). Source: float:left. signed
Content Security Policy: The page’s settings blocked the loading of a resource at self (“default-src https://normandy.cdn.mozilla.net https://normandy-cloudfront.cdn.mozilla.net/”). signed
Content Security Policy: The page’s settings blocked the loading of a resource at self (“default-src https://normandy.cdn.mozilla.net https://normandy-cloudfront.cdn.mozilla.net/”). Source: clear: both. signed
Content Security Policy: The page’s settings blocked the loading of a resource at self (“default-src https://normandy.cdn.mozilla.net https://normandy-cloudfront.cdn.mozilla.net/”). signed
Content Security Policy: The page’s settings blocked the loading of a resource at self (“default-src https://normandy.cdn.mozilla.net https://normandy-cloudfront.cdn.mozilla.net/”). Source: 
        window.drf = {
          csrfHe.... signed:1022
TypeError: window.drf is undefined[Learn More] csrf.js:41:5
Content Security Policy: The page’s settings blocked the loading of a resource at self (“default-src https://normandy.cdn.mozilla.net https://normandy-cloudfront.cdn.mozilla.net/”). Source: onfocusin attribute on DIV element. signed
Content Security Policy: The page’s settings blocked the loading of a resource at self (“default-src https://normandy.cdn.mozilla.net https://normandy-cloudfront.cdn.mozilla.net/”). Source: 
        $(document).ready(function() {
.... signed:1034
The recipe signatures in the gist you posted all look good to me. I verified that the data and the signatures match using some tools in the normandy repo to independently verify them.

The CSP errors are normal, and shouldn't be related to this issue. They only happen when you view the html version of the page, so shouldn't affect Firefox, which uses the JSON version of the API.

I think to dig in further, we should try and get Firefox to tell us *why* the signatures are invalid. This information isn't available to the JS that is doing this logging. If we turn up the log level for the content signature verifier, I think we can get the info we need. This can be done by setting the environment variable MOZ_LOG="ContentSignatureVerifier:4" (based this code[0] and these docs[1]).

[0]: https://dxr.mozilla.org/mozilla-central/source/security/manager/ssl/ContentSignatureVerifier.cpp#39-40
[1]: https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Gecko_Logging
I was able to reproduce this locally, and I got these log lines after setting MOZ_LOG="ContentSignatureVerifier:4":

[Main Thread]: D/ContentSignatureVerifier CSVerifier: found directive p384ecdsa
[Main Thread]: D/ContentSignatureVerifier CSVerifier: found a ContentSignature directive
[Main Thread]: D/ContentSignatureVerifier CSVerifier: The supplied chain is bad
[Main Thread]: D/ContentSignatureVerifier CSVerifier: Signature verification failed

I'm digging in to this further.
Assignee: nobody → mcooper
Status: REOPENED → ASSIGNED
The signing certificates have expired, and our monitoring failed. I'll redeploy a fixed configuration today.

$ curl https://content-signature.cdn.mozilla.net/chains/normandy.content-signature.mozilla.org-20161125.prod.chain | openssl x509 -text
        Validity
            Not Before: Nov 25 16:20:36 2016 GMT
            Not After : May 26 16:20:36 2017 GMT
Assignee: mcooper → jvehent
Component: General → Security
Product: Shield → Cloud Services
QA Contact: jvehent
Summary: Error in today's Nightly: Recipe signature is not valid → Normandy's autograph staging cert has expired
Blocks: 1368141
The certificate has been renewed and our monitoring is being repaired. Please test.
Relud: Can we run `./manage.py update_recipe_signatures --force` to update our signatures with the new certs?
Flags: needinfo?(dthorn)
Relud: nevermind, Julien has informed me that it isn't necessary to update signatures for this to be fixed.

With curl I can verify that the certificate chain has been updated with dates valid for the next 6 months. However, I can still reproduce this problem on my test profile because Firefox is using a stale version of the certificate chain, despite there being no cache headers.

On a fresh profile, this problem is fixed.
Flags: needinfo?(dthorn)
Seems like the stale cache might be on the same nature as the issue discussed here -> https://bugzilla.mozilla.org/show_bug.cgi?id=1354151 ? I've noticed the stale certificate on a few old random profiles I had laying around (pointed to prod.) - but not reproducible on new profiles. 
Mike, can you maybe isolate a test case that reproduces this issue constantly? Maybe a mock test, that sets the certificate validity for a small interval like a second?
Flags: needinfo?(mcooper)
See Also: → 1354151
Giving this back to :mythmon.
Assignee: jvehent → mcooper
Component: Security → General
Product: Cloud Services → Shield
Summary: Normandy's autograph staging cert has expired → Normandy's content signature cert has expired
I don't think manipulating the cache headers on the server is going to help us reproduce this issue. We tried to get steps to reproduce this in the past, and I don't have any new ideas for how we can reproduce this.

We have just deployed a cache bust on the server to avoid the caching bug. I think this bug is fully fixed from Normandy's point of view, and we'll continue investigating the cache issue in bug 1354151.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago7 years ago
Flags: needinfo?(mcooper)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.