Closed Bug 1630870 Opened 4 years ago Closed 4 years ago

GlobalSign: Certificate issued with RSASSA-PSS public key

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: paul.brown, Assigned: paul.brown)

Details

(Whiteboard: [ca-compliance] [dv-misissuance])

This is Initial incident report that GlobalSign have issued one certificate with RSASSA-PSS public key, full incident report will follow

How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

Our post issuance check flagged a certificate with RSASSA-PSS key at 14:26 UTC 16 April 2020

A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

Investigated certificate and revoked at 15:41 UTC 16 April 2020
Started investigation on why pre-issuance checker failed to flag issue and stop issuance
Checked for any other certificate from our systems with same error

A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.
The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.

One certificate: https://crt.sh/?id=2697208174

Assignee: wthayer → paul.brown
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance]

Paul: I wasn't sure, were you planning on filling out a proper incident report explaining why it happened and what's being done to fix it?

The initial report appears incomplete, but I wanted to make sure you were following up on it.

Flags: needinfo?(paul.brown)

Hi Ryan
Yes we are still planning a full incident report, however during our investigation we see other certs with same key type are being stopped, and we have no other certificate issued with this algorithm, so we have been talking with primekey about the reason why this one in particular got through.

Flags: needinfo?(paul.brown)

Here follows the FULL INCIDENT REPORT for this issue:

  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

Our post issuance check flagged a certificate issued with RSASSA-PSS key at 14:26 UTC 16 April 2020

  1. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

16 April 2020
Received message from post issuance linter warning of certificate issuance
Investigated certificate and revoked at 15:41 UTC 16 April 2020
Started investigation on why pre-issuance checker failed to flag issue and stop issuance
Checked for any other certificate from our systems with same error

19 April 2020
Engineers went to data center to check validator CA configuration and found that the validator was operational since certificates were being validated.
Checked logs from 16th and also found certificates were being validated. Opened ticket with Primekey for help in understanding why there could be no validator actioned for the specific certificate.

19-28 April 2020
Had continued conversations with Primekey about possible causes involving different settings and checking log file and CA settings - none of which helped find solution.

29 April 2020
Compliance Team requested extraction of whole months logs from CA and found change implemented on 19th which turned seemed to turn validator on for that CA. Checked with engineer on what was changed on 19, which was confirmed nothing but that he had clicked on "Save" after selecting validator in CA settings.

30 April 2020
Confirmed on a different CA that highlighting the specific validator on the CA configuration was an additional required step after adding the CA profile to the validator in the validator settings.
Checked in turn that this added same items in logs details, sent certificate to that CA and confirmed that it validated and was therefore a required action for enforcing a validator.

  1. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.

Our systems have been updated and we have now correctly added validator and pre-issuance checks for the affected CA.

  1. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.
    The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.

One certificate was issued with problem: https://crt.sh/?id=2697208174

  1. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

Check for "RSA" at point of CSR input allowed this key through since RSASSA-PSS was considered part of the RSA suite that was whitelisted.

Internal documentation for CA setup did not include validator (pre-issuance) configuration setup.

Engineer A (at time of CA setup) presumed that applying CAs to validator was sufficient to start the validation for that CA. Despite the current version of the CA manual more clearly indicating how to configure a validator, the phrasing of the initial version of the manual left room for interpretation, causing assumptions by the engineers. Additionally, the actual interface is confusing since it appears to indicate that the CA is being validated when it is not.

Engineer B (on 19 April), presuming validator was set due to it being available in CA settings tried clicking the validator and hit Save.

Issue took so long to conclude due to fact engineers were focusing on logs from 16th April and before, when solution was introduced on 19th April

  1. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

Documentation for CA setup has been updated to include validator setup procedure. This step will now include sending a CSR for issuance to the CA and the Engineer shall confirm validator is validating.

We have also agreed the following mitigations, in order to both prevent future miss-issuance, and help to identify the exact state that causes this problem to occur is not repeated. We can then work with the vendor to fully resolve any problem in the code/confusion in the interface.

June 2020: We will be utilising our log shipping and monitoring tools to perform on-going comparisons between per-CA issuance and validator events, enabling us to identify any failure to submit a precert to a validator for the CAs with this configured. This will be performed on a regular basis.
July - September 2020: We are improving broad compliance checks in our RA application to trap for issues including unsuitable key algorithms before requests are submitted to CA for issuance.

Thanks for the update. I'm having difficulty understanding the root cause and the mitigation, although I appreciate you sharing details about the investigation.

Checked with engineer on what was changed on 19, which was confirmed nothing but that he had clicked on "Save" after selecting validator in CA settings.

I'm not sure how to parse "which was confirmed nothing but that" - could you help rephrase this?

Confirmed on a different CA that highlighting the specific validator on the CA configuration was an additional required step after adding the CA profile to the validator in the validator settings.

I'm not sure I fully understand this, but it sounds like it's describing where things broke down. Could you provide a bit more description here of the situation and the problem, assuming a lack of familiarity with this particular interface?

Despite the current version of the CA manual more clearly indicating how to configure a validator, the phrasing of the initial version of the manual left room for interpretation, causing assumptions by the engineers. Additionally, the actual interface is confusing since it appears to indicate that the CA is being validated when it is not.

Could you clarify a bit more detail about "initial version of the manual" and "current version"? It's unclear if "current version" means after this issue was identified, or if the initial version was some version much older and the engineer had not been retrained on the updates?

It does sound that there's something that might be applicable to all CAs (running on PrimeKey software) due to potential confusion, if I'm understanding correctly. Does that sound correct? I'm trying to figure if it would behove Mozilla and/or PrimeKey to communicate this to participating CAs, to ensure things are configured.

This step will now include sending a CSR for issuance to the CA and the Engineer shall confirm validator is validating.

This sounds like a good improvement. Have you systemically examined other configuration steps for opportunities to confirm things are working correctly (i.e. beyond validators)?

Flags: needinfo?(paul.brown)

Hi Ryan,

Apology for delay

Thanks for the update. I'm having difficulty understanding the root cause and the mitigation, although I appreciate you sharing details about the investigation.

Checked with engineer on what was changed on 19, which was confirmed nothing but that he had clicked on "Save" after selecting validator in CA settings.

    I'm not sure how to parse "which was confirmed nothing but that" - could you help rephrase this

What we meant was that the engineer thought he was not changing settings, however, in fact by highlighting the validator the engineer was changing settings without realising settings were being changed.

Confirmed on a different CA that highlighting the specific validator on the CA configuration was an additional required step after adding the CA profile to the validator in the validator settings.

2. I'm not sure I fully understand this, but it sounds like it's describing where things broke down. Could you provide a bit more description here of the situation and the problem, assuming a lack of familiarity with this particular interface?

The configuration of a validator requires 1) enabling the CA in the validator-specific configuration and 2) highlighting the validator in the CA-specific configuration.

The user interface for (1) the validator configuration indicates "Apply for certificate profiles" with a multi-selection box.

The user interface for (2) the CA configuration states "Other data", "Validators" and a multi-selection box.

Adding the CA to the validator in (1) does not turn it on. Given that the validator was shown in (2), it was understood as being enabled by the engineers. However, it requires explicit highlighting in the CA configuration list (2) to enforce the validator.

Despite the current version of the CA manual more clearly indicating how to configure a validator, the phrasing of the initial version of the manual left room for interpretation, causing assumptions by the engineers. Additionally, the actual interface is confusing since it appears to indicate that the CA is being validated when it is not.

3. Could you clarify a bit more detail about "initial version of the manual" and "current version"? It's unclear if "current version" means after this issue was identified, or if the initial version was some version much older and the engineer had not been retrained on the updates? It does sound that there's something that might be applicable to all CAs (running on PrimeKey software) due to potential confusion, if I'm understanding correctly. Does that sound correct? I'm trying to figure if it would behove Mozilla and/or PrimeKey to communicate this to participating CAs, to ensure things are configured.

Original manual is from 2019, latest manual is from latest version of software (released this year). Indeed engineer had not been retrained, nor had our CA implementation guide been updated.

This step will now include sending a CSR for issuance to the CA and the Engineer shall confirm validator is validating.

4. This sounds like a good improvement. Have you systemically examined other configuration steps for opportunities to confirm things are working correctly (i.e. beyond validators)?

Yes, we examined each of our steps of configuring a new CA to verify there was a check in place that it was working as expected, the additional item we have added is the log check for validator functionality.

Flags: needinfo?(paul.brown)

Thanks. I think this provides a bit more useful context to evaluate where things went wrong.

To recap from Comment #3, here's my understanding of the current mitigations:

  • 2020-04-30: Documentation has been updated to better document how to set up validators.
  • 2020-04-30: Playbook for updating validators includes sending a CSR that the validator should reject, to confirm the validator is validating.
  • 2020-06-??: Log shipping/monitoring tools configured to compare, on an ongoing/automated basis, per-CA issuance/validator events, to ensure that expected and actual configurations are correct.
  • 2020-09-??: Updating the RA platform to block these earlier in the process, so that they're rejected before the validator process.

I think Comment #5 identified several other failure modes / concerns that might be useful to systemically address:

  • It seems that the CA interface itself is error-prone and thus lends itself to these mistakes. Have you thought about changing the interface (either directly or by working with your vendor) to ensure better user interface design?
  • It seems that there were changes to the CA software that didn't lead to retraining.
    • In this case, it seems it was useful clarifications that could have mitigated this, as opposed to an actual change. However, it seems it's also possible that changes can be introduced that are meaningful and different from previous versions.
    • Have you examined how you're managing that process? CA software updates, RA software updates, retraining, clarifications, etc? It seems that if this issue had been spotted, the improved process, documentation updates, and training could have been accomplished sooner.
Flags: needinfo?(paul.brown)
Thanks. I think this provides a bit more useful context to evaluate where things went wrong.

To recap from Comment #3, here's my understanding of the current mitigations:

    2020-04-30: Documentation has been updated to better document how to set up validators.
    2020-04-30: Playbook for updating validators includes sending a CSR that the validator should reject, to confirm the validator is validating.
    2020-06-??: Log shipping/monitoring tools configured to compare, on an ongoing/automated basis, per-CA issuance/validator events, to ensure that expected and actual configurations are correct.
    2020-09-??: Updating the RA platform to block these earlier in the process, so that they're rejected before the validator process.

Yes this is correct understanding

I think Comment #5 identified several other failure modes / concerns that might be useful to systemically address:

    It seems that the CA interface itself is error-prone and thus lends itself to these mistakes. Have you thought about changing the interface (either directly or by working with your vendor) to ensure better user interface design?

We are still having an ongoing conversation with the provider about this particular issue. Additionally, we expect to migrate to our new platform (Atlas). The migration to the new Atlas platform happens on a per product-basis, where enterprise TLS products are planned to happen in 2020-2021, and retail SSL during the course of 2021-2022. Once the full migration is completed the current CA software will be decommissioned.

    It seems that there were changes to the CA software that didn't lead to retraining.

        In this case, it seems it was useful clarifications that could have mitigated this, as opposed to an actual change. However, it seems it's also possible that changes can be introduced that are meaningful and different from previous versions.

        Have you examined how you're managing that process? CA software updates, RA software updates, retraining, clarifications, etc? It seems that if this issue had been spotted, the improved process, documentation updates, and training could have been accomplished sooner.

There were no changes to the software - only to the documentation - this particular section (i.e. configuring validators) that was updated in the documentation was not raised for review due to it not being linked to an actual code, software or functionality change for the validators. Normally we would only review our documentation when a software change has taken place (e.g. if theres is a specific ticket attached to new software release/feature).

However our original reading of the initial documentation led us to wrongly write an implementation guide which did not cover the second point of the configuration, which in turn led to a new engineer not fully implementing the feature.

Flags: needinfo?(paul.brown)

Is the documentation public?

I ask, because it seems important to share with other CAs to ensure they can similarly examine their systems and verify correct configuration, and having references we can point to is important.

Flags: needinfo?(paul.brown)

Kathleen: I think you may be interested in this bug, as it appears to have been caused by confusion around how to properly configure EJBCA (Comment #6), which EJBCA has updated documentation for (see Comment #9). It might be useful for a CA communication and/or reminder for CAs to examine their linting/validator configurations to ensure that they are as expected. Or could rely on CAs following Bugzilla, which I doubt they do :)

I'm setting Next-Update based on the Comment #6 deliverable of:

2020-06-??: Log shipping/monitoring tools configured to compare, on an ongoing/automated basis, per-CA issuance/validator events, to ensure that expected and actual configurations are correct.

Flags: needinfo?(kwilson)
Whiteboard: [ca-compliance] → [ca-compliance] Next update - 30-June, 2020

(In reply to Ryan Sleevi from comment #10)

Kathleen: I think you may be interested in this bug, as it appears to have been caused by confusion around how to properly configure EJBCA (Comment #6), which EJBCA has updated documentation for (see Comment #9). It might be useful for a CA communication and/or reminder for CAs to examine their linting/validator configurations to ensure that they are as expected. Or could rely on CAs following Bugzilla, which I doubt they do :)

Thanks, Ryan. Ben is going to take care of messaging about EJBCA expectations in m.d.s.p (and maybe refer to the discussion from CABF), so clearing my NI.

Flags: needinfo?(kwilson)
QA Contact: wthayer → bwilson
Flags: needinfo?(bwilson)

I sent the following email to the m.d.s.p. list:

Often CA configurations and settings are complex and can be difficult to manage. We would like to remind CA operators that they need to be familiar with the configuration and operation of all aspects of CA software and ensure that they have adequate documentation and training.
For example, in April, a CA operator in the Mozilla Root Program received a post-issuance warning that a certificate with an RSASSA-PSS key had made it through the EJBCA pre-issuance check.[1][2] Apparently, “Check for RSA” on CSR input allowed an RSASSA-PSS key through because it was considered part of the RSA suite that was whitelisted. Internal documentation for CA setup did not include correct validator (pre-issuance) configuration setup.
The CA operator started an investigation into why this occurred. Upon investigation the CA operator discovered that the validator had started functioning due to a configuration change occurring unbeknownst to an engineer when he clicked on save after selecting the validator in CA settings. The CA operator explained that highlighting the specific validator was an additional required step after adding a certificate profile in the validator settings. This additional step was not clearly stated in the CA software manual.
The vendor has explained that this misunderstanding was due to the fact that validators need to be enabled on a certificate-profile basis, in order to allow the same CA to host multiple profiles without validators conflicting with each other. As certificate profiles can be shared amongst multiple CAs, the validator needs to be selected there as well.
The vendor also recommends that CA operators use the provided human readable configuration export tool to run and diff after upgrades and configuration changes to verify that nothing unintended has changed.
In summary, the general purpose of this email is to urge all CA operators to be familiar with configuration processes of the CA software that they use, and specifically to alert users of EJBCA to the procedural measures described above.
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1630870
[2] EJBCA software by Primekey has a pre-issuance “validator” system for keys, amongst which an external validator to run linters. See https://doc.primekey.com/ejbca/ejbca-operations/ejbca-ca-concept-guide/validators-overview/post-processing-validators

Flags: needinfo?(bwilson)

In followup to this incident report we would like to inform that log shipping has been put in place and validator events have successfully been compared for the past two months on a per-CA basis to verify the expected configuration. We also added configuration parsing to verify if the validator is configured on the appropriate CAs, including the CAs where no certificates have been issued during the period.
Can it be confirmed whether more information is required or this ticket can now be closed?

I am closing this ticket as completed/fixed.

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] Next update - 30-June, 2020 → [ca-compliance] [dv-misissuance]
You need to log in before you can comment on or make changes to this bug.