Closed Bug 1519848 Opened 6 years ago Closed 6 years ago

CertManager did not renew cert for https://dustin.taskcluster-dev.net/

Categories

(Taskcluster :: Operations and Service Requests, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: dustin)

Details

This cert is now expired - CertManager should have prevented that!

(same for Irene's dev environment)

Error preparing issuer for certificate ingress-controller/root-url-tls: Ingress.extensions "certificate-challenge-ingress" is invalid: spec: Invalid value: []extensions.IngressRule(nil): either backend or rules must be specified

So yesterday I randomly tried running the 0.5.2 cert-manager docker image. It failed:

 starting cert-manager v0.5.2 (revision 9e8c3ad899c5aafaa360ca947eac7f5ba6301035) I 
 Using the following nameservers for DNS01 checks: [10.59.240.10:53] I 
 attempting to acquire leader lease  cert-manager/cert-manager-controller... I 
 Listening on http://0.0.0.0:9402 I 
 Stopping Prometheus metrics server... I 
 Prometheus metrics server gracefully stopped I 
 Error running prometheus metrics server: http: Server closed E 
 Control loops exited F 

(wow, copy/pasta from stackdriver stinks)

I then reverted it to 0.5.0 and soon things succeeded:

I  successfully acquired lease cert-manager/cert-manager-controller 
I  Starting certificates controller 
I  Starting ingress-shim controller 
I  Starting clusterissuers controller 
I  Starting issuers controller 
I  certificates controller: syncing item 'ingress-controller/root-url-tls' 
I  Preparing certificate ingress-controller/root-url-tls with issuer 
I  Calling GetOrder 
I  Cleaning up previous order for certificate ingress-controller/root-url-tls 
I  Cleaning up old/expired challenges for Certificate ingress-controller/root-url-tls 
I  Cleaning up challenge for domain "dustin.taskcluster-dev.net" as part of Certificate ingress-controller/root-url-tls 
I  Calling CreateOrder 
I  Created order for domains: [{dns dustin.taskcluster-dev.net}] 
I  Calling GetAuthorization 
I  Calling HTTP01ChallengeResponse 
I  Cleaning up old/expired challenges for Certificate ingress-controller/root-url-tls 
I  Calling GetChallenge 
I  No existing HTTP01 challenge solver pod found for Certificate "ingress-controller/root-url-tls". One will be created. 
I  No existing HTTP01 challenge solver service found for Certificate "ingress-controller/root-url-tls". One will be created. 
I  Found status change for Certificate "root-url-tls" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2019-01-15 01:34:27.01518107 +0000 UTC m=+76.194659283 
I  Error preparing issuer for certificate ingress-controller/root-url-tls: http-01 self check failed for domain "dustin.taskcluster-dev.net" 
I  Certificate ingress-controller/root-url-tls scheduled for renewal in -748 hours 
E  certificates controller: Re-queuing item "ingress-controller/root-url-tls" due to error processing: http-01 self check failed for domain "dustin.taskcluster-dev.net" 
I  clusterissuers controller: syncing item 'letsencrypt-issuer' 
I  Skipping re-verifying ACME account as cached registration details look sufficient. 
I  clusterissuers controller: Finished processing work item "letsencrypt-issuer" 
I  ingress-shim controller: syncing item 'ingress-controller/certificate-challenge-ingress' 
I  Not syncing ingress ingress-controller/certificate-challenge-ingress as it does not contain necessary annotations 
I  ingress-shim controller: Finished processing work item "ingress-controller/certificate-challenge-ingress" 
I  ingress-shim controller: syncing item 'default/taskcluster-ingress' 
I  Not syncing ingress default/taskcluster-ingress as it does not contain necessary annotations 
I  ingress-shim controller: Finished processing work item "default/taskcluster-ingress" 
I  certificates controller: syncing item 'ingress-controller/root-url-tls' 
I  Preparing certificate ingress-controller/root-url-tls with issuer 
I  Calling GetOrder 
I  Calling GetAuthorization 
I  Calling HTTP01ChallengeResponse 
I  Cleaning up old/expired challenges for Certificate ingress-controller/root-url-tls 
I  Calling GetChallenge 
I  wrong status code '503' 
I  Found status change for Certificate "root-url-tls" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2019-01-15 01:34:31.315955638 +0000 UTC m=+80.495433874 
I  Error preparing issuer for certificate ingress-controller/root-url-tls: http-01 self check failed for domain "dustin.taskcluster-dev.net" 
I  Certificate ingress-controller/root-url-tls scheduled for renewal in -748 hours 
E  certificates controller: Re-queuing item "ingress-controller/root-url-tls" due to error processing: http-01 self check failed for domain "dustin.taskcluster-dev.net" 
I  certificates controller: syncing item 'ingress-controller/root-url-tls' 
I  Preparing certificate ingress-controller/root-url-tls with issuer 
I  Calling GetOrder 
I  Calling GetAuthorization 
I  Calling HTTP01ChallengeResponse 
I  Cleaning up old/expired challenges for Certificate ingress-controller/root-url-tls 
I  Calling GetChallenge 
I  Accepting challenge for domain "dustin.taskcluster-dev.net" 
I  Calling AcceptChallenge 
I  Waiting for authorization for domain "dustin.taskcluster-dev.net" 
I  Calling WaitAuthorization 
I  starting cert-manager v0.5.0 (revision 7924346bd84e41053cc508956b0a1b567c932416) 
I  Using the following nameservers for DNS01 checks: [10.59.240.10:53] 
I  Listening on http://0.0.0.0:9402 
I  attempting to acquire leader lease  cert-manager/cert-manager-controller... 
I  Successfully authorized domain "dustin.taskcluster-dev.net" 
I  Cleaning up challenge for domain "dustin.taskcluster-dev.net" as part of Certificate ingress-controller/root-url-tls 
I  Found status change for Certificate "root-url-tls" condition "ValidateFailed": "True" -> "False"; setting lastTransitionTime to 2019-01-15 01:35:15.84942152 +0000 UTC m=+125.028899739 
I  Renewing certificate... 
I  Calling GetOrder 
I  Calling FinalizeOrder 
I  Stopping Prometheus metrics server... 
I  Prometheus metrics server gracefully stopped 
E  Error running prometheus metrics server: http: Server closed 
I  Certificate ingress-controller/root-url-tls scheduled for renewal in -748 hours 
E  certificates controller: Re-queuing item "ingress-controller/root-url-tls" due to error processing: error getting certificate from acme server: context canceled 
F  Control loops exited 
I  successfully acquired lease cert-manager/cert-manager-controller 
I  Starting certificates controller 
I  Starting clusterissuers controller 
I  Starting ingress-shim controller 
I  Starting issuers controller 
I  ingress-shim controller: syncing item 'default/taskcluster-ingress' 
I  Not syncing ingress default/taskcluster-ingress as it does not contain necessary annotations 
I  ingress-shim controller: Finished processing work item "default/taskcluster-ingress" 
I  clusterissuers controller: syncing item 'letsencrypt-issuer' 
I  ingress-shim controller: syncing item 'ingress-controller/certificate-challenge-ingress' 
I  Not syncing ingress ingress-controller/certificate-challenge-ingress as it does not contain necessary annotations 
I  ingress-shim controller: Finished processing work item "ingress-controller/certificate-challenge-ingress" 
I  Calling GetAccount 
I  letsencrypt-issuer: verified existing registration with ACME server 
I  clusterissuers controller: Finished processing work item "letsencrypt-issuer" 
I  certificates controller: syncing item 'ingress-controller/root-url-tls' 
I  Preparing certificate ingress-controller/root-url-tls with issuer 
I  Calling GetOrder 
I  Calling GetAuthorization 
I  Cleaning up old/expired challenges for Certificate ingress-controller/root-url-tls 
I  Found status change for Certificate "root-url-tls" condition "ValidateFailed": "False" -> "False"; setting lastTransitionTime to 2019-01-15 01:36:35.842437981 +0000 UTC m=+79.880456714 
I  Renewing certificate... 
I  Calling GetOrder 
I  Calling FinalizeOrder 
I  successfully obtained certificate: cn="dustin.taskcluster-dev.net" altNames=[dustin.taskcluster-dev.net] url="https://acme-v02.api.letsencrypt.org/acme/order/43916368/271687349" 
I  Certificate renewed successfully 
I  Found status change for Certificate "root-url-tls" condition "Ready": "False" -> "True"; setting lastTransitionTime to 2019-01-15 01:36:37.925115987 +0000 UTC m=+81.963134685 
I  Certificate ingress-controller/root-url-tls scheduled for renewal in -748 hours 
I  certificates controller: Finished processing work item "ingress-controller/root-url-tls" 
I  certificates controller: syncing item 'ingress-controller/root-url-tls' 
I  Certificate ingress-controller/root-url-tls scheduled for renewal in 1438 hours 
I  certificates controller: Finished processing work item "ingress-controller/root-url-tls" 

That suggests that something in 0.5.0 doesn't work and is fixed in 0.5.2 -- but that upgrading to 0.5.2 will require a little more than just changing the docker image version.

Irene confirmed this worked.

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Component: Operations → Operations and Service Requests
You need to log in before you can comment on or make changes to this bug.