Closed Bug 1628292 Opened 4 years ago Closed 4 years ago

Buypass: Failure to revoke PSD2 QWACs within mandated 5 days

Categories

(CA Program :: CA Certificate Compliance, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mads.henriksveen, Assigned: mads.henriksveen)

Details

(Whiteboard: [ca-compliance] [leaf-revocation-delay])

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36

This is an incident report for a failure to revoke 16 PSD2 QWACs issued by Buypass within the BR mandated 5 days.

The issuance of the affected certificates is described in bug 1626078 where we reported this as an issue (problem) and not a misissuance.

  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

Buypass became aware of a problem with the PSD2 QWACs by an email received from a member of the PSD2 community Thursday 26 March 2020 (see bug 1626078 for details).

  1. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

All times are local Norwegian times (CEST).

2020-03-26, 13:02: Buypass received the email notifying us about the problem.

2020-03-26, 15:07: We provided a feedback to the notifier and acknowledged the problem.

2020-03-27, 16:00: A fix was implemented and verified in test.

2020-03-29, 17:00: The fix was deployed to production and verified before we started issuance of postponed PSD2 QWACs due to this issue.

2020-03-30, 22:01: We reported this as an incident (bug 1626078)

2020-03-30, 22:26: Ryan Sleevi commented on the bug and made us aware that this was a misissuance and should be handled accordingly.

2020-04-30, 23:30: We decided to handle this as misissuance and replace and revoke all affected certificates.

2020-04-31, 01:30: We contacted all customers and informed about what to do the next days.

2020-04-04, 18:30: All affected certificates were revoked

  1. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.

Buypass stopped the issuance of PSD2 QWACs immediately when becoming aware of the problem.

  1. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.

The 16 affected certificates were issued in the period February 4th 2020 to March 20th 2020.

  1. The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.spreadsheet, with one list per distinct problem.

The affected certificates are:
https://crt.sh/?id=2557648789
https://crt.sh/?id=2528429797
https://crt.sh/?id=2512093551
https://crt.sh/?id=2503683683
https://crt.sh/?id=2604810665
https://crt.sh/?id=2471346305
https://crt.sh/?id=2599785898
https://crt.sh/?id=2599780360
https://crt.sh/?id=2591181055
https://crt.sh/?id=2591109199
https://crt.sh/?id=2591055509
https://crt.sh/?id=2591042445
https://crt.sh/?id=2591008891
https://crt.sh/?id=2590154860
https://crt.sh/?id=2590152271
https://crt.sh/?id=2414909564

  1. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
    We defined this to be an issue and not a misissuance at the time we were initially notified of the issue. The reason for this was partly that in the initial notification this was presented as a possible ambiguity and partly our erroneous understanding of the exact encoding of the cabfOrganizationIdentifier.

However, we decided that this was an issue and resolved this by fixing the issue according to a recommendation given in the PSD2 community.

Based on this we also decided to report this as an incident to share this information with the broader community.

It was the feedback on this incident that made it clear to us that this was a violation of the EV Guidelines. We must emphasize that we do not intend to place the responsibility for such decisions on the root stores. It was simply that it was this clarification that made us aware of our misunderstanding.

Therefore we acknowledge that the affected certificates should have been defined as misissuance at the time we became aware of the problem and revoked within 5 days from that point in time.

  1. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

Buypass is focused on being a trusted CA and we put much effort in trying to be compliant with all relevant requirements, including BR and EVGL.

We acknowledge that the decision to define this as an issue and not a misissuance was taken on the wrong basis. This was a human decision and we will try to ensure that such important decisions are more thoroughly handled in Buypass to reduce the probability of making wrong decisions in the future. We will tighten up our procedures for taking such decisions, e.g. by involving more persons.

We will also consider to propose improvements to the language in EVGL 9.8.2 to remove any ambiguity causing such misunderstanding.

Assignee: wthayer → mads.henriksveen
Status: UNCONFIRMED → ASSIGNED
Type: defect → task
Ever confirmed: true
Whiteboard: [ca-compliance]

Mads: I appreciate the "human decision" and that Buypass "will try to ensure that such important decisions are more thoroughly handled"

The goal of these postmortems isn't about assigning individual blame, it's about trying to work collaboratively, as an industry, to find out ways we can improve this, and implement those improvements.

I haven't seen any proposed language changes from Buypass on the particular language, but I think it'd also be useful and important to understand how Buypass plans to prevent such issues going forward.

For example, if there is an incident report, and it's revealed that the language is ambiguous, does Buypass have a process to gather feedback from the broader community in order to assist in that (e.g. the questions@cabforum.org list, asking Root Programs)?

If there is an ambiguity, has Buypass considered failing closed - that is, premptively revoking the certificates if they /might/ be wrong, rather than not revoking and assuming they're fine?

These are the sorts of process changes that can help more quantifiably progress and address these issues.

Similarly, has Buypass examined its other past problem reports, to see if perhaps wrong decisions were made? When it's revealed that something has failed, it's useful to go back and look to see if there were other instances of failures that could lead towards improvements.

Flags: needinfo?(mads.henriksveen)

Buypass continuously work to improve procedures and practices for our CA operations including management of incidents. We have raised the awareness about what constitutes an incident, i.e. how to identify an incident and also defined procedures for how to act when incidents are identified. Based on experience from the last couple of years, I will definitively say that the internal incident awareness and incident management processes has been significantly improved.

The procedures for how to handle incidents includes human decisions. Traditionally it has been easy to prefer the option that causes least problems for affected customers (at least where the risk exposed for the wider community has been considered to be negligible).
However, based on our experience and problems reported in the past we understand that there are additional considerations that must be taken and I can assure that we will place even more emphasis on investigating alternatives before making the final decision.

We do follow the discussions in CA/Browser Forum and m.d.s.p closely and use this as valuable sources of information to improve our procedures and practices. We appreciate the open discussions and incident reporting as recommended by Mozilla in order to create a more secure web. We fully support the work Mozilla has done in this area.

Flags: needinfo?(mads.henriksveen)

Mads,
I would like to close this bug, but I'm wondering, are there any amendments that you intend to propose for EVGL 9.8.2 (to remove potential ambiguities) per comment 1?
Thanks,
Ben

Whiteboard: [ca-compliance] → [ca-compliance] [delayed-revocation-leaf]
Flags: needinfo?(mads.henriksveen)

Ben, I intend to propose an amendment for EVGL 9.8.2 (and also 9.2.8). I will contact you directly to discuss this.

Flags: needinfo?(mads.henriksveen)

Nearly four months ago (2020-04-08), in Comment #0 Buypass stated:

We will also consider to propose improvements to the language in EVGL 9.8.2 to remove any ambiguity causing such misunderstanding.

In Comment #5, it's now proposed to correct privately. I don't think that really fits with the spirit here of public and transparent incident response, and responses that help the whole of the CA community benefit. I'm concerned especially that it took 3 months and a follow-up from Ben before any progress was made. This was despite Buypass's response in Comment #3, which presumably involved it reviewing https://wiki.mozilla.org/CA/Responding_To_An_Incident#Keeping_Us_Informed around the expectations for incident report.

I realize this is a revocation delay bug, and so this isn't about discussing the original incident. At the core, these bugs reflect intentional choices by CAs to violate policies, and understanding the systemic factors going in play here. The lack of timely updates is a systemic failure to follow expectations, and I'd hope a more thoughtful examination, from bottom to top, about incident handling will happen here.

At the core, the proposed response in Comment #0:

This was a human decision and we will try to ensure that such important decisions are more thoroughly handled in Buypass to reduce the probability of making wrong decisions in the future. We will tighten up our procedures for taking such decisions, e.g. by involving more persons.

Can be shown as not really having improved things, as evidenced from this bug.

I'm hoping a more thorough and detailed plan about how Buypass handles incidents will be forthcoming, helping build a complete understanding end-to-end, in sufficient detail that another CA could reasonably implement the same practices and expect to fully comply with Mozilla policy. This is how we learn from these incidents.

Flags: needinfo?(mads.henriksveen)
QA Contact: wthayer → bwilson

When reporting this incident, we considered the language in EVGL 9.8.2 to be ambiguous and intended to propose changes. However, as we have little experience with proposing and managing ballots in CA/B Forum, this was not given high priority. We did not consider the handling of this proposal as a part of this bug. We understand that this was incorrect and that we should have made an update.

When Ben reminded us about this in comment #4 we decided to do an attempt to propose changes and contacted him privately to get advice on how to proceed. It was not our intention to discuss the proposal with Ben privately, only to get advice on the process.

While working with the proposal, we realized that a clarification in terms of a minor change in the language would be sufficient. As there already was a draft ballot focusing on cleanup and clarifications in CA/B Forum we suggested (Thursday last week) to include our change in this ballot.

Our incident procedures cover all phases of an incident. This includes detection/identification, classification, reporting (including incident reporting in Bugzilla), root cause analyses, actions to be taken and follow up of the incident during its complete lifetime. Our procedures focus on the immediate and important actions to be taken and are less focused on a proper follow up until closing of incidents.

We have made changes to our procedures related to incidents reported in Bugzilla to ensure that incidents are properly updated according to Mozilla’s expectations (as described in https://wiki.mozilla.org/CA/Responding_To_An_Incident#Keeping_Us_Informed).

Flags: needinfo?(mads.henriksveen)

There is no new information to this bug now. This update is to ensure that we respond according to Mozilla’s expectations (weekly update).

There is no new information to this bug.

Flags: needinfo?(bwilson)

There is no new information to this bug.

There is no new information to this bug.

Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] [delayed-revocation-leaf] → [ca-compliance] [delayed-revocation-leaf] Next update 2020-12-01

I will close this bug on or about 20-November-2020 unless there are issues to explore further.

Flags: needinfo?(bwilson)
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] [delayed-revocation-leaf] Next update 2020-12-01 → [ca-compliance] [leaf-revocation-delay]
You need to log in before you can comment on or make changes to this bug.