Closed Bug 1701530 Opened 3 years ago Closed 3 years ago

HPKP issues prevent Lando from loading on clean operating system install

Categories

(Cloud Services :: Operations: LandoUI, task)

task

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: tcampbell, Assigned: bobm)

References

Details

Attachments

(1 file)

I did a fresh install of Win10 and Firefox Nightly, and am unable to connect to https://lando.services.mozilla.com/. Instead I get a MOZILLA_PKIX_ERROR_KEY_PINNING_FAILURE failure.

Using https://phabricator.services.mozilla.com/ is fine, but that uses the DigiCert signer, while Lando seems to use the new Let'sEncrypt R3 signing.

Since this patch landed, we shouldn't be pinning to the Let's Encrypt intermediate certificates anymore, but instead to the ISRG X1 froot (bug 1680372). I can't reproduce this bug, and tried on Desktop Release, Desktop Nightly and Android Beta. Brian Pitts reported on Slack that he was able to reproduce this on Android 87.0.0-rc.1.

I'm really confused what's happening here. Dana, do you have any idea?

Flags: needinfo?(dkeeler)

This morning it now works for me on Win10 after leaving the computer alone for a while. Aryx suggested that this may apply.

Firefox Android (nightly) does not work for me though. I simply get "Secure Connection Failed".

See Also: → 1696840
See Also: → 1680372

If we're pinning to "ISRG Root X1", but the certs in the handshake chain to "DST Root CA X3" instead, a clean install wouldn't have the Let's Encrypt R3 intermediate that is signed by ISRG X1? Does that mean we should add DST Root CA X3 to the pinset, or add the other intermediate to the extra_certificates list?

I have a theory what's happening. The Let's Encrypt R3 intermediate is signed by the ISRG X1 root, and cross-signed by the DST X3 root. The Lando server presents the cross-signed version in the handshake:

$ openssl s_client -showcerts -connect lando.services.mozilla.com:443 < /dev/null
CONNECTED(00000003)
depth=2 O = Digital Signature Trust Co., CN = DST Root CA X3
verify return:1
depth=1 C = US, O = Let's Encrypt, CN = R3
verify return:1
depth=0 CN = lando.services.mozilla.com
verify return:1
---
Certificate chain
 0 s:CN = lando.services.mozilla.com
   i:C = US, O = Let's Encrypt, CN = R3
-----BEGIN CERTIFICATE-----
[...]
-----END CERTIFICATE-----
 1 s:C = US, O = Let's Encrypt, CN = R3
   i:O = Digital Signature Trust Co., CN = DST Root CA X3
-----BEGIN CERTIFICATE-----
[...]
-----END CERTIFICATE-----
[...]

Since we have a preloaded pin to the X1 root for *.services.mozilla.com, a fresh installation will reject this certificate. However, after a while the browser downloads the intermediate certificates served by Remote Settings, including the R3 certificate with the X1 signature. Now the browser can see that Lando is trusted by the X1 root, and finally accepts the connection.

(I'm just piecing this together from bits of information I picked up somewhere; I don't actually know how these things work.)

Sven's reasoning sounds correct. My two cents is that the server should send intermediates that chain to the root we're pinning that host to in Firefox.

Flags: needinfo?(dkeeler)

Thanks, Dana! We can't really choose what intermediates to send, since we use Google's managed certificates, which in turn use the chain they get from Let's Encrypt, and that currently uses the cross-signed intermediate. JC mentioned on Slack that this will probably change in the first full week in May, so one way to fix this is to wait five weeks.

Here's the public announcement for the Let's Encrytp certificate chain change: https://community.letsencrypt.org/t/providing-a-longer-certificate-chain-by-default/148738

After waiting for the change to go live in Lets Encrypt production, I issued a new Lets Encrypt certificate for https://api.profiler.firefox.com through Google and it's showing the longer certificate chain in use.

:whd tested loading https://api.profiler.firefox.com/ in Firefox Nightly for Android and it loaded successfully. He also tested loading https://lando.services.mozilla.com/ and it failed to load with the MOZILLA_PKIX_ERROR_KEY_PINNING_FAILURE error as expected.

This means that all we need to do is re-issue the Lets Encrypt certificate for https://lando.services.mozilla.com to pick up the new longer certificate chain and it should work for Firefox Nightly Android again.

Unfortunately, re-issuing a Google-managed cert isn't easy. The only way to force renewal is to delete and recreate the certificate, but this will cause a service interruption. The cert was renewed about two weeks ago, and Google will renew it "about" one month before it expires on 2021-07-22. I guess the easiest option is to wait another six weeks until the problem fixes itself.

I filed a support case with Google to see if they can trigger an early renewal of the certificate: https://console.cloud.google.com/support/cases/detail/27849803?project=moz-fx-lando-prod-b46f

Update from Google Support, they renewed the certificate early for us and noted a workaround if we run into this again:

I see that the product team went ahead and renewed the certificate and now the site you mentioned yesterday opens without any warnings.

I also think that if something like that happens in the future, you can actually perform the renewal yourself, all you need to do is create/add additional google-managed certificate, wait until it is ACTIVE and retire/delete the old one:

  • Create a new SslCertificate for the same domain,
  • Attach it to the Load Balancer
  • Wait until it becomes ACTIVE
  • Detach the old SslCertificate from the Load Balancer
  • Delete the old SslCertificate

Hope this helps, please let me know if you have any questions.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: