<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Updated

•

5 years ago

Assignee: bwilson → Robin.Alden

Blocks: 1563579

Status: UNCONFIRMED → ASSIGNED

Ever confirmed: true

Whiteboard: [ca-compliance]

https://crt.sh/?q=A10D502F5B6770CC633EA6BB4C472E6971B42C7AF33156D639E579B616C82BF2
https://crt.sh/?q=D0A3B6E663D2AA07FA386AA70B4FA34A861473533684067AE4551FD38FE70558

Comment 2

•

5 years ago

How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

https://crt.sh/?q=A10D502F5B6770CC633EA6BB4C472E6971B42C7AF33156D639E579B616C82BF2
24-Jun-2020 12:59 UTC - Email was received to SSL abuse address detailing problem of incorrect Subject information in certificate.

https://crt.sh/?q=D0A3B6E663D2AA07FA386AA70B4FA34A861473533684067AE4551FD38FE70558
24-Jun-2020 13:18 UTC - Email was received to SSL abuse address detailing problem of incorrect Subject information in certificate.

A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

https://crt.sh/?q=A10D502F5B6770CC633EA6BB4C472E6971B42C7AF33156D639E579B616C82BF2:
24-Jun-2020 16:38 UTC - Reply sent by Sectigo staff to acknowledge the report.
24-Jun-2020 19:59 UTC - Message communicated to customer/partner regarding incorrect information, stating revocation will be actioned 'Sunday, June 28th, 2020 at 4:30pm EST'.
24-Jun-2020 20:06 UTC - Partner confirmed and requested more time to have the new certificate installed.
24-Jun-2020 20:09 UTC - Sectigo staff confirmed new revocation on 'Monday, June 29th at 10:00am EST'.
29-Jun-2020 14:12 UTC - Certificate was revoked.
29-Jun-2020 14:19 UTC - Email sent by Sectigo to reporter confirming revocation.

https://crt.sh/?q=D0A3B6E663D2AA07FA386AA70B4FA34A861473533684067AE4551FD38FE70558:
24-Jun-2020 16:39 UTC - Reply sent by Sectigo staff to acknowledge the report.
24-Jun-2020 19:40 UTC - Message communicated to customer regarding incorrect information, stating revocation will be actioned 'Sunday, June 28th, 2020 at 4:30pm EST'.
26-Jun-2020 20:30 UTC - Reply sent by Sectigo staff to reporter thanking for the report and confirming the certificate will be revoked on the date and time above.
29-Jun-2020 14:12 UTC - Certificate was revoked.
29-Jun-2020 14:18 UTC - Email sent by Sectigo to reporter confirming revocation.

Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.

NA

A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.

The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.

See 4.

Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

The original misissuance was related to the issues in: 1575022

List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

Issues with validation are being addressed in: 1575022
I have reached out directly to the reporter to futher investigate the responses that were not received when sent from our ticketing system (Salesforce). The last report was before 3 of the responses were sent, so they may have been received now.
(Reporter confirmed they received the responses. They have suggested a change to better help with email delivery that we are investigating with Saleforce).

Reporter

Comment 3

•

5 years ago

Thanks for the incident report.

I received your responses of "Thank you for bringing this to our attention. We will look into this immediately and inform you of any relevant updates." which are not considered preliminary reports.

When you sent emails to your customers about revocation why did you not send this to me as required by BR 4.9.5

Within 24 hours after receiving a Certificate Problem Report, the CA SHALL investigate the facts and
circumstances related to a Certificate Problem Report and provide a preliminary report on its findings to both
the Subscriber and the entity who filed the Certificate Problem Report

Flags: needinfo?(nick)

Comment 4

•

5 years ago

Thanks, George.

We do aim to provide preliminary reports for these Certificate Problem Reports confirming the issues, and in these cases our staff did not provide this report confirming your findings until around 48 hours later when the revocation date was given. As above, your findings were accurate and the certificates revoked in these cases.
I note other reports you have made were confirmed (or explained) within 24 hours.

Flags: needinfo?(nick)

Reporter

Comment 5

•

5 years ago

I believe an incident report should be produced explaining how these two reports were not sent preliminary reports in 24 hours and what Sectigo is going to do to insure they will in the future.

Comment 6

•

5 years ago

I agree with Comment #5, this report seems to have overlooked the underlying/systemic issue.

Flags: needinfo?(Robin.Alden) → needinfo?(nick)

Comment 7

•

5 years ago

I do not believe there is a systemic issue, but human error on these two reports (which were received close together and handled by the same staff member at the same time).

An initial response was made by a member of staff (not an auto-response), but this did not confirm the findings and revocation back to the reporter.
A communication was made to the subscriber within the 24 hour window, confirming revocation, but in error this was not also copied to the reporter.

I have checked each of the 24 reports sent by the original reporter throughout June and found all but the two listed here had preliminary reports confirming revocation within the required window.

I am happy to expand that into a full incident report?

Flags: needinfo?(nick)

Reporter

Comment 8

•

5 years ago

Your initial incident report hasn't really explained how the issue occurred or what Sectigo is doing to make sure it doesn't happen again. Under Responding To An Incident:

For example, it’s not sufficient to say that “human error” of “lack of training” was a root cause for the incident, nor that “training has been improved” as a solution. While a lack of training may have contributed to the issue, it’s also possible that error-prone tools or practices were required, and making those tools less reliant on training is the correct solution. When training or a process is improved, the CA is expected to provide specific details about the original and corrected material, and specifically detail the changes that were made, and how they tie to the issue. Training alone should not be seen as a sufficient mitigation, and focus should be made on removing error-prone manual steps from the system entirely.

I think a new incident report should be posted with all of these steps explained clearly.

Comment 9

•

5 years ago

I agree with George, and made a similar remark in the incident report provided in Bug 1650845 to the same effect, as it involved a separate reporter with a remarkably similar root cause.

Flags: needinfo?(nick)

Comment 10

•

5 years ago

We are still investigating what changes we can make to our customer-service systems for our staff responding to incident reports (that cannot be handled in an automated manner). This may require changes across our third-party ticketing system as well as our internal order-management systems.

We may combine the response with bug 1650845 and do a single (new) report with remediation for both.

Flags: needinfo?(nick)

Ben Wilson

Comment 11

•

5 years ago

When can we expect to see more information on this revamp?

Comment 12

•

5 years ago

Ben: We had a meeting late last week of the Incident Response group, and discussed a requirements document from the team directly handling these reports.
It's already being worked into a development specification.

Since the Sectigo/Comodo carve-out, we have used Salesforce as the primary system for our staff to communicate with partners and customers - Validation, Support and our Abuse teams.
A greatly increased volume of reports that we deal with (which we would still like to automate) we realise that - specifically for abuse reporting - a different system is needed specifically tailored deal with handling these reports, verifying keys where needed, having humans check any Subject information or other parts that cannot be automated, completing revocation and then ensuring the various involved parties are notified correctly and on time.

I don't have an ETA for development yet, but will of course share once we do, with timely updates as needed.

Comment 13

•

5 years ago

It's now been nearly two weeks, and I would say "timely updates" have not been provided.

Flags: needinfo?(rob)

Flags: needinfo?(nick)

Assignee

Comment 14

•

5 years ago

We're working on a system change which will, in cases where we agree with the reporting party, and revoke the certificate, automate the preliminary report to let the reporting party know what action we're taking. I met with the manager of our customer service team last week to begin spec'ing this out. I don't yet have an ETA, but hope to have a first draft spec completed by end if this week or early next.

Rob Stradling

Updated

•

5 years ago

Flags: needinfo?(rob)

Flags: needinfo?(rich)

Flags: needinfo?(nick)

Assignee

Comment 15

•

5 years ago

Still working on the spec, so no additional updates yet.

Assignee

Comment 16

•

5 years ago

Draft spec is near completion, but no additional updates at this time.

Assignee

Comment 17

•

5 years ago

No additional updates yet.

Flags: needinfo?(rich)

Comment 18

•

5 years ago

Rich: How are things progressing on the draft spec? What's causing the delay from Comment #14's projection to present?

Flags: needinfo?(rich)

Assignee

Comment 19

•

5 years ago

In response to Ryan, comment 18;
The spec is 90% complete. Unfortunately, both I and the CS manager with whom I've been collaborating on it have been pulled in other directions over the past few weeks. I hope to pull it back up in the coming week and pin down the last bits it needs before sending on to the dev team, but I also know that there are several items higher on the priority list for both of us, so it will depend on both of us being able to work through our respective task lists and also being able to get our schedules synced to bring this over the finish line. Know that this has not fallen off the radar. It's still high on my task list but, as inevitably happens, it has been bumped down a couple slots due to unforeseen things coming up.

Comment 20

•

5 years ago

Rich: I'm hoping you can be more candid and transparent in the answer here. This doesn't really seem to fit with https://wiki.mozilla.org/CA/Responding_To_An_Incident#Keeping_Us_Informed

"I got busy, they got busy, it's important, but there's other things more important" doesn't really inspire confidence or trust. If there are competing concerns, being transparent about what those concerns helps us recognize "OK, they've got the right priorities, and their priorities are what's best for users". Yet vagueries, especially with no concrete timelines or commitments other than hopes and aspirations, don't really do that.

I think Sectigo is at an inflection point where it really needs to be quite candid and transparent, given the systemic patterns of issues that are non-communication, a lack of transparency, and a lack of accountability. If you believe you've got the right priorities, share them. Otherwise, it feels like this is Sectigo not taking compliance seriously, and I doubt that's the impression Sectigo wants to leave.

Assignee

Comment 21

•

5 years ago

In response to Ryan Sleevi;

Ryan, first off I’m happy to report that the initial draft of the functional specification for the above mentioned improvements to the revocation process has been completed and sent to the dev team for review and feedback. I will continue to report back here on progress until such time as the new code is complete and live.

I hope you can appreciate that this is an unusual period for us. We’ve had a host of extraordinary needs raise their heads all on top of each other. We have been trying to bring our audit to conclusion and to examine and potentially replace 21,000 or so OV certificates. We put in upgrades to systems to fix the bugs that allowed these misissuances, and we’ve been aggressively cleaning up our outstanding Bugzilla backlog. We are working on behind-the-scenes improvements like the one being discussed here. In the midst of all this we recently had a root expiration that went thermonuclear due to previously unknown bugs in OpenSSL and other popular tools. One might say these are mostly self-imposed problems, and though that may be, that doesn’t change the fact that Sectigo’s compliance team is firing at a lot of targets right now. So rightly or wrongly, I prioritized things like examining the OV certificate population over the specification for this bug.

You ask what our priorities are? We’re working to improve and clean up our public CA issuance and certificate base and to build the systems and business processes necessary to achieve total certificate agility with all public certificates. That is a complex and multivariate project, not so much a task as a large number of separate tasks of various scope and nature, some (or many) of which are not fully understood or perhaps even known about at this stage.

Our BHAGs are “No misissued certificates” and “All public certificates can be replaced within the specified timeframes with zero negative effect on subscribers or relying parties.” These are a tall order, but that’s why they’re BHAGs. We don’t have to get all the way there to make meaningful improvements that will matter to us, our subscribers, and the community as a whole. When I recently pointed out that an internal process control found a flaw that in prior times would have gotten through, that is a sign to us that the initiative is having an impact. We like to see that kind of thing.

The carve out from Comodo Group was a tough time for us. We had twenty years’ worth of completely intertwined systems that had to be disentangled ASAP, a vast hairball of legacy code to deal with, and a skeleton crew of employees that numbered well under half of what we needed to operate in any reasonable fashion. At that time we put first things first, addressing the most egregious needs and risk factors and working our way down the list. We have been grinding at it for nearly three years and will continue to grind for years to come. Examples of what we have worked on and improved have been topics of conversation in this forum (see 1645868 comment #41 for instance). We are not as automated as we would like to be. Sometimes an individual employee makes a wrong call. We still stub our toes on previously invisible software flaws. This will continue for a while. Every day we’re a little better than the day before, and that’s how these things get done.

Flags: needinfo?(rich)

Comment 22

•

5 years ago

That is a complex and multivariate project, not so much a task as a large number of separate tasks of various scope and nature, some (or many) of which are not fully understood or perhaps even known about at this stage.

I think the opportunity here is to share concretely what those separate tasks are, and to make sure that this bug, and the related Bug 1563579, which are fundamentally highlighting that there seems to be systemic risk in continuing to trust Sectigo, helps reassure folks, by understanding what's being prioritized and/or here's where time was spent.

Similarly, I hope you can understand how highlighting the Comodo/Sectigo split, which does seem to have resulted in better outcomes with respect to governance, also opens up questions about pending aquisitions, to ensure that the resources and support will be provided, rather than further pared back.

It sounds like Sectigo has identified a number of key areas for change, and is executing on them. I think when it comes to comments like Comment #19, we need more transparency about what's being done. Comment #21 speaks to goals, and that's laudable, but I think there's understandable concern that actions speak louder here. I understand that not everything will be ready yet, some is still being done, but if you're going to provide an update like Comment #19, it should really be more detailed and concrete about what's soaking that time.

When it comes to big hairy audacious goals, those are understandably the product of many smaller things, so sharing updates about progress being made to those BHAGs, especially if it's detracting from near-term feedback, is critical. I think that in the length of Comment #21, I only really see one concrete example of that, namely:

So rightly or wrongly, I prioritized things like examining the OV certificate population over the specification for this bug.

That's incredibly useful! That's the sort of detail that we're looking for. There's no information not appropriate to share, so even if it was down to the level of timesheets of "Rich was focused on this, Rob on this, Inigo was working to close out this issue, and Tim's focused on this; with everyone focused on those, this issue didn't get brought over the finish line. However, the focus helped us close those out, with minimal customer disruption, and that allows us to provide more attention to this issue going forward" is the sort of thing that helps highlight priority. From the perspective of module peers, we're no strangers to the fact that some bugs require a lot more time investment than others, but this is where sharing is key for CAs.

Hopefully this helps provide more concrete feedback here regarding updates like Comment #19, and frankly, like Comment #21.

Similarly, when you talk about "draft spec", that feels a bit like "training has been improved" as a solution, just with "code" in between. I think sharing more here is what we want. What was the old procedure? What's the new draft spec proposing? You can share those before review with the dev team, because that also allows updates like "The dev team pointed out X and Y were infeasiable, but offered A, B, C as approaches that would do the same or better, so the final plan is A, B, and Z". Those are the things that help us learn, by helping us also highlight what doesn't work.

Flags: needinfo?(rich)

Rob Stradling

Updated

•

5 years ago

Assignee: Robin.Alden → rich

Assignee

Comment 23

•

5 years ago

We met with the dev team to review the specification and answer any questions they might have. Out of that meeting we have a few actions to clarify the specification, but mostly it’s finalized. We pinned down the time required to code the changes to 5 to 10 days, and are now looking to get a concrete timeframe for getting that block of time inserted into the dev schedule. I’ll update on that as soon as I’m able.

In comment 22, Ryan said:

Similarly, when you talk about "draft spec", that feels a bit like "training has been improved" as a solution, just with "code" in between.

Point taken. To be more specific, we are undertaking to automate more of the actions required for both revoking the certificate, and notifying the various parties who need to be notified, including the reporting party.

Specifically within our certificate lifecycle management system, we are making the following improvements to the certificate revocation workflow:

Add a field to contain the email address of the reporting party, and auto-generate the response to inform the reporting party as to the action to be taken as a result of the report. Automatic reporting to certificate Subscriber already exists.
Add the exact time at which the initial report came in so that all other actions can be calculated based upon that time.
Calculate the time in which the certificate MUST be revoked based upon the time the report came in and the stated reason for the revocation, to match max revocation window with BR requirements, and prevent the agent processing the revocation from setting a scheduled revocation time outside that window. Note: Currently the ability to schedule the revocation exists, but allows the agent to specify any date/time they choose without any reference to the date and time of the report or anything else.

These changes will help ensure both that the reporting party is advised of preliminary findings, as well as to further help ensure that the certificate is revoked w/in the timeframe required.

Flags: needinfo?(rich)

Assignee

Comment 24

•

5 years ago

Still working with the dev team to get an ETA. Hope to have that pinned down NLT early next week.

Assignee

Comment 25

•

5 years ago

I've been advised by the dev team lead that these improvements to our revocation processing should be live by the end of November.

Assignee

Comment 26

•

5 years ago

No further update at this time.

Assignee

Comment 27

•

5 years ago

No further update at this time.