Closed Bug 1772644 Opened 3 years ago Closed 2 years ago

Apple: CRL issuance frequency deviates from CPS in some cases

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: certification_authority, Assigned: certification_authority)

Details

(Whiteboard: [ca-compliance] [crl-failure] [policy-failure])

While conducting a review of our validation services related to Bug 1771398, one of our engineers identified that although our CRL issuance frequency for our public TLS CAs was configured to be 24 hours, in some cases it was actually occurring at a 37.5 hour interval which deviates from our stated practice in the Apple Public CPS. This has since been resolved. A full report will be provided no later than June 17, 2022.

Assignee: bwilson → certification_authority
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance]

1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

On May 27th, 2022, while conducting a review of our validation services related to Bug 1771398, one of our engineers identified that although our CRL issuance frequency for our public TLS and S/MIME CAs was configured to be 24 hours, in some cases it was actually occurring at a 37.5 hour interval which deviates from our stated practice in the Apple Public CPS.

2) A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

  • 2020-04-29: Effective date of the Apple Public CPS version 5.0, which first stated that CRL issuance frequency is every 24 hours.
  • 2022-05-27 at 10:19 PT: An engineer on our team identified, and alerted the compliance team, that while conducting a review of our validation services related to Bug 1771398, that although our CRL issuance frequency for our public TLS and S/MIME CAs was configured to be every 24 hours, in some cases it was actually occurring at a 37.5 hour interval.
  • 2022-05-27 at 10:30 PT: Our compliance team confirmed that this was an incident because our stated practice for CRL issuance frequency in section 4.9.7 of the Apple Public CPS is every 24 hours, not 37.5 hours.
  • 2022-05-27 at 10:44 PT: We began investigating the root cause and initially suspected it may be a bug in our CA software as the CRL issuance interval was configured for 24 hours which aligned with the stated CPS practice.
  • 2022-05-27 at 11:39 PT: Our operations team began trying to reproduce the issue in our non-production environment.
  • 2022-05-27 at 13:45 PT: We opened a support ticket with our software vendor.
  • 2022-05-27 at 15:48 PT: Our operations team was unable to reproduce the issue in our non-production environment. We determined there was too much risk in making a change in production before a long holiday weekend that we were not certain would fix the issue and that could risk breaking CRL generation all together.
  • 2022-05-31 at 10:00 PT: Our operations team resumed investigation following the holiday weekend.
  • 2022-06-02 at 16:30 PT: Our operations team applied a fix by creating an additional CRL worker with fewer CAs.
  • 2022-06-03 at 16:32 PT: We confirmed that the fix had resolved the issue and CRL issuance frequency for our public TLS and S/MIME CAs was occurring at a 24 hour interval. We determined that we were unable to reproduce the issue in our non-production environment on 2022-05-27 because CRLs in our non-production environment are much smaller than production CRLs and therefore the CRL worker was not being overwhelmed and thus able to honor the configured CRL issuance interval.
  • 2022-06-03 at 18:16 PT: We filed the Bugzilla and posted our initial issue report.
  • 2022-06-09 at 11:19 PT: Our operations team increased the logging around the CRL workers so we have more insight into how long they have been running.
  • 2022-06-16 at 16:00 PT: Our operations team began monitoring the size of CRLs so we know when to create additional workers.
  • 2022-06-16 at 16:00 PT: Our operations team added monitoring to alert if CRLs are not generated every 24 hours.

3) Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident. A statement that you have stopped will be considered a pledge to the community; a statement that you have not stopped requires an explanation.

This did not affect certificate issuance. Our operations team applied a fix so that CRL issuance frequency for our public TLS and S/MIME CAs began occurring at a 24 hour interval on 2022-06-02 at 16:30 PT.

4) In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.), please provide other similar statistics, aggregates, and a summary for each type of problem identified. This will help us measure the severity of each problem.

There were no problematic certificates issued in error, hence no certificates needed to be revoked and reissued.

5) In a case involving certificates, the complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem. In other cases not involving a review of affected certificates, please provide other similar, relevant specifics, if any.

There were no problematic certificates issued in error, hence no certificates needed to be revoked and reissued.

The following public TLS and S/MIME CAs, while configured to issue CRLs every 24 hours, were in some cases issuing CRLs at a 37.5 hour interval:

6) Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

How and why the mistakes were made or bugs introduced:

  • We had two CRL workers responsible for issuing CRLs for a large number of CAs, some with very large CRLs that can take hours to issue. This had the unintended consequence of the CA software not being able to honor the configured CRL issuance interval. The CRL issuance internal was configured for 24 hours which aligned with the stated CPS practice.

How the mistakes avoided detection until now:

  • Although CRL monitoring is in place, we were not monitoring that CRLs were being generated every 24 hours.

7) List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

  • We created an additional CRL worker with fewer CAs and we are now monitoring the size of CRLs so we know when to create additional workers.
  • We added monitoring to alert us if CRLs are not generated every 24 hours and increased the logging around the CRL workers so we have more insight into how long they have been running.

It seems to me that this bug also demonstrates the risk inherent in having the CPS set a hard interval ("Apple Public CA issues a new CRL every 24 hours.") rather than a soft interval (e.g. "...issues a new CRL at least once every 24 hours"), and in configuring systems to operate right at the limit (setting "24h" in the config, rather than e.g. "20h"). Based on the current phrasing, issuing a CRL 24 hours and 1 second after the previous one would be a violation of this CPS, as would issuing a non-emergency CRL 23 hours and 59 seconds after the previous one.

Is Apple also considering reviewing and updating the CPS to find and fix any places where hard intervals such as this are set?

We have considered this. In addition to the remediations we have already shared in this incident report, we are also actively reviewing and updating the Apple Public CPS to address the issue you’ve highlighted. We will be posting an updated CPS to our public repository with changes to sections 4.9.7 and 4.9.8 next week.

We have posted an updated version of the Apple Public CPS to our public repository with changes to sections 4.9.7 and 4.9.8. Apple CA is continuing to monitor this bug for comments or questions.

There are no outstanding tasks for this incident report. Since no other questions or concerns have been raised, can this incident be closed?

I will close this on or about Wed. 6-July-2022, unless further discussion needs to occur.

Flags: needinfo?(bwilson)
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] → [ca-compliance] [crl-failure] [policy-failure]
You need to log in before you can comment on or make changes to this bug.