Closed Bug 1970968 Opened 2 months ago Closed 1 month ago

Microsoft PKI Services: Incorrect Revocation Reason Code

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: CentralPKI, Assigned: CentralPKI)

Details

(Whiteboard: [ca-compliance] [crl-failure])

Attachments

(1 file)

Preliminary Incident Report

Summary

  • Incident Description Microsoft PKI Services selected the wrong Revocation Reason code while revoking a batch of certificates yesterday. While revoking a batch of 25,000 certificates yesterday afternoon (2025-06-05 ~4 PM PDT) an incorrect Revocation Reason code was selected by our Certificate Authority, we intended to select “superseded” but inadvertently marked this batch as “affiliationChanged”.
    Please note this is related to remediation activities that are being performed related to https://bugzilla.mozilla.org/show_bug.cgi?id=1965612.

  • Relevant policies:

    • Adherence to CPS definitions of Revocation Reasons - Section 4.9.3.1 of v3.2.1 of the MS PKI Services Public TLS CPS
    • Mozilla Root Store Requirements – Section 6.0 Revocation
  • Source of incident disclosure:
    MS PKI Services detected the incorrect reason code shortly after executing the revocation (2025-06-05 ~6PM PDT).

Assignee: nobody → CentralPKI
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance] [crl-failure]

Full Incident Report

Summary

  • CA Owner CCADB unique ID:
    A002577
  • Incident description:
    Microsoft PKI Services selected the wrong Revocation Reason code while revoking a batch of certificates as part of a repair item associated to Bug 1965612. While revoking a batch of 25,000 certificates at approx. 4:00PM PST on 2025-06-05, an incorrect Revocation Reason code was selected by our Certificate Authority operator, we intended to select “superseded” but inadvertently marked this batch as “affiliationChanged”.
  • Timeline summary:
    • Non-compliance start date:
      2025-06-05 23:00:00
    • Non-compliance identified date:
      2025-06-06 01:00:00
    • Non-compliance end date:
      2025-06-06 01:00:00
  • Relevant policies:
    • Adherence to CPS definitions of Revocation Reasons - Section 4.9.3.1 of v3.2.1 of the MS PKI Services Public TLS CPS
    • Mozilla Root Store Requirements – Section 6.0 Revocation
  • Source of incident disclosure:
    MS PKI Services detected the incorrect reason code shortly after executing the revocation (2025-06-05 ~6PM PDT).

Impact

  • Total number of certificates:
    25,000
  • Total number of "remaining valid" certificates:
    0 (all were revoked in the batch)
  • Affected certificate types:
    Organization Validated TLS Subscriber Certificates
  • Incident heuristic:
    These subscriber certificates were associated with the remediation activities for Bug 1965612 and are listed in the Appendix. All certificates revoked on 06-05-2025 with reason code AffiliationChanged. All certificates were associated to Microsoft Azure TLS RSA 03 CA.
  • Was issuance stopped in response to this incident, and why or why not?:
    No. Issuance was not stopped as this was a revocation-side process failure. Certificate issuance processes and profiles were not impacted.
  • Analysis:
    Not applicable. The Whiteboard field does not contain revocation-delay, and no delay occurred. All affected certificates were revoked as part of the incident.
  • Additional considerations:
    N/A

Timeline

  • 2025-04-25 Third-party researcher emailed a Certificate Problem Report to Microsoft PKI Services identifying mismatches between Subscriber certificates and CPS document language related to bug 1962829
  • 2025-05-09 Bug 1965612 was opened for failure to revoke certificates impacted in bug 1962829 in accordance with BR 4.9.1.1.
  • 2025-05-29 Batched bulk revocation began as a repair item for Bug 1965612
  • 2025-06-04 Final scheduling of revocation batch for 6/5/2025 completed
  • 2025-06-05 Revocation batch executed. 25k certificates revoked with reason code AffiliationChanged
  • 2025-06-05 Non-compliance identified anomaly in revocation reason usage
  • 2025-06-06 Microsoft PKI team initiated internal incident review
  • 2025-06-06 Incident disclosed in Bugzilla as Bug 1970968

Related Incidents

Bug Date Opened Description
1965612 2025-05-09 This incident triggered the revocation batch during which the incorrect reason code (“affiliationChanged”) was used.
1793467 2022-10-03 Google Trust Services used an incorrect reason code in CRLs (“certificateHold”), similar in nature to this issue where the wrong revocation metadata was applied.
1907949 2024-07-15 iTrusChina included an undefined revocation reason code (“7”) in CRLs, representing a parallel case of misapplying revocation metadata.
1914365 2024-08-22 SHECA made several revocations, but later found that the CRLReason code of some revoked certificates does not comply with regulations. These certificates should not use privilegeWithdrawn as the CRLReason code.

Root Cause Analysis

Contributing Factor #: Revocation tooling defaults to affiliationChanged

  • Description:
    The administrative revocation tooling we leveraged pre-populated affiliationChanged as the default reason code with the option to change. The reason codes are displayed in alphabetical order. Peer approval step also did not catch this error.
    This introduces a latent risk: if the operator forgets or overlooks the field, the default is silently applied.

  • Timeline:

    • 2021-03-30 Tooling updated revocation reasons to be displayed in alphabetical order
    • 2025-06-05 Revocation batch of 25K was submitted and processed with the revocation reason code “AffiliationChange”
  • Detection:
    The issue was identified shortly after execution on June 5, 2025, through human review that flagged the unexpected reason code. Prior to this, the default behavior had not resulted in a visible issue, in part because correct revocation reasons were selected by operators.

  • Interaction with other factors:
    The presence of a default value amplified the risk of human oversight. In this incident, the default was unintentionally left unchanged as the reviewers missed to update the selection.

  • Root Cause Analysis methodology used:
    5-Whys and postmortem tool configuration review.

Lessons Learned

  • What went well:
    • Timely detection: The anomaly in the revocation reason code was flagged within approximately two hours of execution, allowing for rapid containment and correction.
  • What didn’t go well:
    • Tooling defaulted to an inappropriate revocation reason: The revocation interface automatically populated affiliationChanged, which was inappropriate for bulk revocations tied to certificate replacement. The default was silently accepted unless manually changed, and there were no safeguards or prompts to confirm the selected reason.
    • Approval process failed to verify revocation reason code: Although peer approval was in place, the workflow lacked an explicit expectation to validate the revocation reason. Reviewers focused on batch scope and volume, and the reason code field was overlooked.
    • Lack of automated validation of reason-code during mass revocation: The tooling did not include logic to detect mismatches between the type of revocation being performed (e.g., superseded due to planned replacement) and the selected reason code.
  • Where we got lucky:
    • The misuse of affiliationChanged did not result in operational trust issues: Had this incident involved a more severe misuse (e.g., using keyCompromise incorrectly) it could have led to larger consequences or misinterpretation of the incident by relying parties.
  • Additional:
    N/A

Action Items

Action Item Description Kind Corresponding Root Cause(s) Evaluation Criteria Due Date Status
Modify revocation tooling to remove affiliationChanged as the default reason code and require explicit selection by the operator. Prevent Root Cause #1 Tooling commits and release notes publicly document change; internal validation confirms no default is applied. Can be verified via updated internal policy and change log audit. TBD New
Implement contextual validation in revocation tooling to flag inconsistent reason codes (e.g., flag use of affiliationChanged for supersession events). Detect Root Cause #1 Validation logic is unit-tested and enforced; logs show alert behavior on test case injection. Compliance reviews include test results. TBD New
Update revocation SOP and reviewer checklist to include explicit review of reason codes as part of the approval process. Prevent Root Cause #1 SOP and checklist updates published to internal documentation; reviewers sign off on usage; effectiveness measured by audit sampling. TBD New

Appendix

Related Incidents/Documentation:

Weekly Update


We have updated the due dates for action items mentioned in our Full Incident Report

Action Item Description Kind Corresponding Root Cause(s) Evaluation Criteria Due Date Status
Modify revocation tooling to remove affiliationChanged as the default reason code and require explicit selection by the operator. Prevent Root Cause #1 Tooling commits and release notes publicly document change; internal validation confirms no default is applied. Can be verified via updated internal policy and change log audit. 6/20/2025 Done
Implement contextual validation in revocation tooling to flag inconsistent reason codes (e.g., flag use of affiliationChanged for supersession events). Detect Root Cause #1 Validation logic is unit-tested and enforced; logs show alert behavior on test case injection. Compliance reviews include test results. TBD New
Update revocation SOP and reviewer checklist to include explicit review of reason codes as part of the approval process. Prevent Root Cause #1 SOP and checklist updates published to internal documentation; reviewers sign off on usage; effectiveness measured by audit sampling. 6/20/2025 Done

Weekly Status Update


We’ve closed two repair items related to the reported bugs. One remaining item—contextual validation for revocation tooling reason codes—will be removed from public tracking, as the underlying risk is already mitigated through complementary safeguards we’ve put in place as part of the first repair item. Internally, it will be managed as part of our ongoing quality improvements to ensure consistency and long-term resilience.

Action Item Description Kind Corresponding Root Cause(s) Evaluation Criteria Due Date Status
Modify revocation tooling to remove affiliationChanged as the default reason code and require explicit selection by the operator. Prevent Root Cause #1 Tooling commits and release notes publicly document change; internal validation confirms no default is applied. Can be verified via updated internal policy and change log audit. 6/20/2025 Done
Update revocation SOP and reviewer checklist to include explicit review of reason codes as part of the approval process. Prevent Root Cause #1 SOP and checklist updates published to internal documentation; reviewers sign off on usage; effectiveness measured by audit sampling. 6/20/2025 Done

Report Closure Summary


  • Incident description:
    Microsoft PKI Services selected the wrong Revocation Reason code while revoking a batch of certificates as part of a repair item associated to Bug 1965612. While revoking a batch of 25,000 certificates at approx. 4:00PM PST on 2025-06-05, an incorrect Revocation Reason code was selected by our Certificate Authority operator, we intended to select “superseded” but inadvertently marked this batch as “affiliationChanged”.

  • Incident Root Cause(s):
    The revocation tooling defaulted to affiliationChanged as the pre-selected reason code, displayed in alphabetical order. This default behavior introduced a latent risk of unintentional selection if operators overlooked the field. In the affected batch, peer reviewers also failed to detect the incorrect reason code, allowing it to proceed. The functionality which did not force the operator to make a selection, combined with human oversight led to the inappropriate use of affiliationChanged for supersession events. Root cause analysis was conducted using the 5-Whys methodology and a postmortem review of tool configuration.

  • Remediation description:
    Three action items were identified and addressed in response to this incident. These included: (1) removing the default reason code from revocation tooling, (2) updating internal SOPs and reviewer checklists, and (3) evaluating implementation of contextual validation. The first two were completed and publicly tracked. As mentioned above for the third item, the underlying risk is already mitigated through complementary safeguards we’ve put in place.

  • Commitment summary:
    All identified action items were completed. Beyond these action items, we remain committed to continuous improvements in our tooling to minimize opportunities for human errors.

All Action Items disclosed in this report have been completed as described, and we request its closure.

Flags: needinfo?(incident-reporting)

This is a final call for comments or questions on this Incident Report.

Otherwise, it will be closed on approximately 2025-07-08.

Whiteboard: [ca-compliance] [crl-failure] → [close on 2025-07-08] [ca-compliance] [crl-failure]

Weekly Status Update


The closure report associated to this bug has been submitted. Please close if no other comments are provided.

Status: ASSIGNED → RESOLVED
Closed: 1 month ago
Flags: needinfo?(incident-reporting)
Resolution: --- → FIXED
Whiteboard: [close on 2025-07-08] [ca-compliance] [crl-failure] → [ca-compliance] [crl-failure]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: