Please see our responses below.
You're correct that part of my concern is about the level of detail in the initial report in order to assess the scope, and in ensuring prompt responses, both to external problem reports and overall to incident reports. If I'm correctly interpreting Comment #4, the Problem Report came from an Apple employee (but not someone on the CA team), hence the redactions and additional disclosures that might not otherwise be included in a problem report if it was externally reported. That's understandable for this situation, and I think that the proposed remediation - ensuring that root programs and the root CAs are notified with sufficient detail - is a good path to avoiding it in the future. It's hopefully not necessary to emphasize that, as a publicly trusted CA, Apple may also receive external problem reports that require timely investigation and response, and so making sure the processes are robust to handle that, both publicly and with root programs, is of paramount importance.
Yes, you are correct in interpreting Comment #4, in that the Problem Report came from an Apple employee (but not someone on the CA team), hence the redactions and additional disclosures that might not otherwise be included in a problem report if it was externally reported. We are aware that problem reports may come externally and we have one consistent process for handling all reports regardless of the source.
With respect to the original report, and the request for the incident report, the concern was over this phrase: "we’ve determined that in some cases when the OCSP service receives a request it cannot process". My concern is that the original message does not help build a sense of the scope of the problem, and the impact to Relying Parties, which is critical to understanding the timeliness of the response. Providing more details about the cases you're aware of, which seem clear in the problem report shared with the reporter, is a critical part in understanding the issue and the risk/severity/how the CA is treating it. That doesn't mean it should be used to excuse delays ("fire? What fire? This is fine!"), but it is a key piece in being confident that the CA has everything in control and is responding appropriately.
I want to call out: had Apple included those initial three bullets in their problem report in the report shared with browsers, there would have been better understanding about the scope and nature of the issue, and thus understanding about the possible delays or challenges. Throw in an explanation about what steps were being taken to dig in / investigate further, and that would have been totally in line with the initial problem reports that help scope, while a CA works on a more formal problem report with investigation (and, presumably, approvals)
We acknowledge the level of detail in our initial incident report was not enough to help root programs or relying parties understand the full scope of the problem. We did not specify what CAs or types of certificates were impacted nor provide initial information that would help other CAs understand if they were also impacted by the same issue. We also acknowledge that including additional detail (such as the initial three bullets in our preliminary problem report) as well as what steps we were taking to further investigate would have helped the community have a better understanding about the scope and nature of the issue. As mentioned in Comment #4, we are working internally to modify our processes to provide more prompt postings and replies.
That said, there's one thing that stands out to me: Apple is one of the CAs that, like many others, was affected by the EJBCA issue. The description makes it sound like SHA-256 was not supported by the responder, but EJBCA has supported SHA-256 in the CertID since version 6.2.2, Released 3 September 2014. From the description, I've parsed that when an unrecognized algorithm is used for the CertID, it triggers the "unknown issuer" logic (documented in 6.12 over here), while if it recognizes the issuer, it'd provide authoritative responses. The change, to not have a default responder, will result in "Unauthorized" (per that documentation and your description)
Prior to our planned upgrade that began on 07-October-2019 and was completed on 18-October-2019, Apple’s OCSP service was on version 4.0.14 of EJBCA and therefore did not support requests that used SHA-256 in the CertID nor allowed us to disable the default OCSP responder so that the responder would respond ‘unauthorized’ for all unknown issuers. The problem report initially alerted us to the fact that our OCSP service was responding to any OCSP request that used a hash algorithm other than SHA-1 (e.g., SHA-256) for CertID with an ‘unknown’ response signed by a default responder. But more importantly, it alerted us to the fact that we were non-compliant with the Baseline Requirements section 4.9.9 whenever the OCSP service could not identify the issuer (as we were not signing the response with a certificate signed by the CA that issued the certificate whose revocation status was being checked). The fix was to complete the planned upgrade of EJBCA running on our OCSP service and disable the default responder. A positive side-effect is that now we can also support SHA-256 for CertID which is, as far as we know, neither required nor forbidden by any requirements or policy.
This would suggest that Apple is either running a bespoke OCSP responder, was running a significantly out-of-date OCSP responder, or that I've completely misunderstood the underlying root cause. I'm hoping it's that latter - I looked through the EJBCA change logs to try to understand if it was a bug that was fixed / configuration not supported, but I entirely admit I'm not familiar there.
As mentioned above, we were running on version 4.0.14 of EJBCA prior to the upgrade that began on 07-October-2019 and was completed on 18-October-2019. The software upgrade was tested, planned, and scheduled before this incident was identified. A separate bug will be opened with more details.
I ask, because it seems important for the overall ecosystem, particularly those that may rely on EJBCA, to understand a bit more about the underlying issue and how Apple's resolved it, since that helps make sure all CAs are able to learn and similarly examine and remediate their systems.
We think an important lesson that other CAs can take away from this incident is that if using EJBCA for their OCSP service they should a) disable the default responder and b) be running version 6.2.0 or above to ensure that the responder will reply ‘unauthorized’ (as per RFC 6960) for all unknown issuers.
Similarly, in terms of "How do we help the ecosystem grow", and in line with improving the OCSP test cases, it'd be useful if Apple could share the OCSP test cases it has / what it tests (i.e. the test objective, not necessarily the test itself). From an ecosystem perspective, this may identify other test cases that should be added, or, alternatively, it may highlight Apple's good practices here, and be a model for other CAs to examine their own systems as holistically. Both of these are in the spirit of learning together and making the ecosystem better.
We hope sharing this additional information will prove informative to other CAs, relying parties, and root programs. In addition, after we’ve finished updating our OCSP test cases per Comment 3, we’ll share the test objectives with the community in the spirit of learning together to help the ecosystem grow.