Closed Bug 1716163 Opened 6 months ago Closed 5 months ago

GLOBALTRUST: Revoked test website not using revoked certificate

Categories

(NSS :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: agwa-bugs, Assigned: ca)

Details

(Whiteboard: [ca-compliance])

Attachments

(2 files)

GLOBALTRUST has disclosed the following URL as their revoked test website:

https://testrevoked-2020-server-qualified-ev-1.e-monitoring.at/

As of 2021-06-12 at 22:46 UTC, this website is serving the following certificate:

https://crt.sh/?sha256=C0:CA:D9:D1:0B:F5:0C:23:21:05:31:0A:82:9D:B0:3C:DE:B5:21:26:D1:6A:16:3E:F1:B7:2C:64:FA:A2:48:6C&opt=ocsp

This certificate is not revoked. I have attached a signed OCSP response as evidence.

Flags: needinfo?(ca)
Assignee: bwilson → ca
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance]

Thank you for pointing this out. The certificate has been revoked. We will provide a complete incident report until Friday 18th at the latest.

Flags: needinfo?(ca)

Resetting N-I

Flags: needinfo?(ca)

Dear All,

Attached is the full incident report.

Flags: needinfo?(ca)

Q: If the problem is lack of compliance to an RFC, Baseline Requirement or Mozilla
Policy requirement: were you aware of this requirement? If not, why not? If so, was
an attempt made to meet it? If not, why not? If so, why was that attempt flawed? Do
any processes need updating for making sure your CA complies with the latest
version of the various requirements placed upon it?
A: The underlying requirements as defined in various root store policies for instance, were
well-known and always followed.

BR. section 2.2 specifically requires a CA to host seperate web pages that have an i.) valid, ii.) revoked, and iii.) expired subscriber certificate. Ergo, the underlying requirements of the M.R.S.P. were not "always followed".

Q: List of steps your CA is taking to resolve the situation and ensure that such situation or
incident will not be repeated in the future, accompanied with a binding timeline of when
your CA expects to accomplish each of these remediation steps
A: There are no pending remediation steps necessary.

What steps have been taken to resolve this issue?

The erroneous situation itself would have had a lifetime much shorter than until the next
self-audit anyway.

Do you have any further information as to why that would be the case?

So it came to the revocation simply scheduled to take place on the next working day, which was Monday, 2021-06-14. This was also done, although in the meantime the Bugzilla-report in question had already been made

Could you explain why you believe it to be OK to not comply with the BR (albeit for a limited time period)?

Attached is the full incident report.

Could you in the future post your incident reports verbatim in the issue, instead of as an attachment?

(In reply to Matthias from comment #4)

Thank you Matthias for your comment. We will consider your feedback for any future communications.

BR. section 2.2 specifically requires a CA to host seperate web pages that have an i.) valid, ii.) revoked, and iii.) expired subscriber certificate. Ergo, the underlying requirements of the M.R.S.P. were not "always followed".

I agree with you that it could have been stated more clearly that the requirements were always followed except for the present incident

What steps have been taken to resolve this issue?

As indicated in the "timeline", there have been thourough adotptions of internal guidances. For instance It is explicitly warned that this special case is to be treated as a time critical revocation case, and the installation of certificates for test websites has to be accompanied by staff authorized also for certification and revocation.
To further reduce the risk of human error, today in the morning, so after the incident reports deadline, the system monitoring system was improved taking into account this case.

Do you have any further information as to why that would be the case?

As stated in the report, the revoked was scheduled for Monday 14th. The next self audit scheduled for the second week of july would have found a revoked certificate, as well as the next external audit (presumably first quarter of 2022) would have done

Could you explain why you believe it to be OK to not comply with the BR (albeit for a limited time period)?

globaltrust would like to make a clear statement that we do not believe that any deviation is "OK". However, I must admit that the word "although" may have provided you with some space for this interpretation.

Could you in the future post your incident reports verbatim in the issue, instead of as an attachment?

OK

(In reply to Daniel Zens from comment #5)

BR. section 2.2 specifically requires a CA to host seperate web pages that have an i.) valid, ii.) revoked, and iii.) expired subscriber certificate. Ergo, the underlying requirements of the M.R.S.P. were not "always followed".

I agree with you that it could have been stated more clearly that the requirements were always followed except for the present incident

You might also have missed your open Bug 1716123 whilst considering your statement, but allright.

What steps have been taken to resolve this issue?

As indicated in the "timeline", there have been thourough adotptions of internal guidances. For instance It is explicitly warned that this special case is to be treated as a time critical revocation case, and the installation of certificates for test websites has to be accompanied by staff authorized also for certification and revocation.
To further reduce the risk of human error, today in the morning, so after the incident reports deadline, the system monitoring system was improved taking into account this case.

These thorough adaptations of internal guidelines were not thoroughly detailed in the report, making it difficult to determine if these changes were anything more than the worst case of 'we updated the version number after looking at the documents for a few hours'. The items you mentioned would have been great in the "list of steps your CA is taking" section.

The section "List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when" is used to describe in detail the changes made by the CA so that this issue should or will not happen again. As the section was devoid of content, I was expecting a more detailed description of meaningful changes made in your policies.

Could you explain why you believe it to be OK to not comply with the BR (albeit for a limited time period)?

globaltrust would like to make a clear statement that we do not believe that any deviation is "OK". However, I must admit that the word "although" may have provided you with some space for this interpretation.

Based on your incident report:

The internal procedural instructions for this case of certificate issuance, which the employees involved had also adhered to, expressly stipulated that the certificate was to be installed and only then revoked. This is because the server used does not allow the installation of a certificate that has already been revoked. For this reason, there is always a certain "latency period" during which the website displays a certificate that has not been revoked; ideally, of course, this is seconds or minutes

Assuming that this part of your policy hasn't materially changed, it seems that it is still policy that your 'revoked' web-page will host a non-revoked certificate for some time, each time you replace the certificate; which is a structural BR non-compliance (albeit time-limited, it is non-compliance).

I have additional concerns regarding the (potential lack of) prompt availability of the 'revoked' status in CRL and OCSP if the revocation is only done after installing the certificate on the BR-mandated 'revoked' web page. The 'revoked' status of a certificate is only materially 'revoked' if a relying party can verify that the certificate is revoked, i.e. through a 'revoked' OCSP response, or the certificate appearing in the CRLs.

I want to echo Matthias' Comment #6, which matches my impressions as well.

In particular, it does seem like the current process still leaves the opportunity for non-compliance on a technical level, and also heavily rests on a human/manual process, which we know can be error prone. I'd be curious to understand if Globaltrust has given any consideration to more meaningful technical changes to ensure compliance and automatically.

For example, to avoid the risk of non-compliance, you could imagine a system that:

  • Replaced the Web Server with something that would Definitely Allow revoked certificates to be configured.
  • Automate the issuance (and revocation) of the certificate so that it does not require human interaction
  • Have a process that, prior to the expiration of the current revoked certificate, "stages" the new replacement certificate, ensures revocation data is globally propagated, and then switches out the old certificate with the new certificate, prior to expiration.
  • Monitoring and probing, both internal and external, to make sure that the necessary views are being served.

This sort of control is something that could be fully automated, and seems like it would more systemically address the risks here.

It also represents a sort of "design philosophy" that can be considered throughout the CA process: how much can we automate here, and what processes do need to be truly manual? Even the issuance of sub-CA certificates is something that we've seen repeated incidents in the past from, and why it's important to do things like "scripts to generate, which can be tested with test keys", rather than manual ceremonies. Or, for CAs issuing OV/EV certificates, "automated ingestion from the RDS/QGIS", rather than relying on RAs entering data manually/copy-and-pasting.

Flags: needinfo?(ca)

Thank you Matthias and Ryan, it is encouraging to hear your recommendations. We will implement the feedback regarding what is expected in reporting. We also look forward to taking your comments and interpretations into consideration for our further operational and infrastructural activities.

As far as the design-philosophy mentioned by Ryan is concerned, I have to agree completely. It is also our experience and opinion that this is the better approach. We have already been successfully replacing error-prone manual steps in many cases. For instance, the automatic issuance and enrollment of OCSP responders and CRL has been fully automatized years ago and provided us with a high quality service since; which also shows that the technical basis for Ryan's suggestion #2 is already available. It had been already thought about to handle the test pages automatically, but the topic was not given such a high priority because renewing the test websites is still a comparatively seldom case. But at this point it is not possible for me to reliably say whether the next wave of test websites might already be treated in this way.

Flags: needinfo?(ca)

From our viewpoint this is issue has been resolved and nothing new is expected.
Unless the community does not have any further questions, I request this bug to be closed.

Flags: needinfo?(bwilson)

I will schedule this to be closed on or about next Wed. 2021-07-07, unless I hear otherwise.

Status: ASSIGNED → RESOLVED
Closed: 5 months ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.