Closed Bug 1658995 Opened 4 years ago Closed 4 years ago

Microsoft: Firewall log data retention

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Dustin.Hollenback, Assigned: Dustin.Hollenback)

Details

(Whiteboard: [ca-compliance] [uncategorized])

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36 Edg/84.0.522.59

Steps to reproduce:

Note: This bug encompasses impacted Issuing CAs managed by DSRE PKI at Microsoft, which includes the following Issuing CAs:
• Microsoft IT TLS CA 1
• Microsoft IT TLS CA 2
• Microsoft IT TLS CA 4
• Microsoft IT TLS CA 5
• Microsoft RSA TLS CA 01**
• Microsoft RSA TLS CA 02**
** While these CAs are currently online, they were not fully deployed and had not issued any subscriber certificates at the time of the missing log files.

How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.
This was identified at 2020-07-29 20:03 UTC as part of a manual log file review. While the logs were not retained, we did confirm that they were reviewed through our security automation and there were no incidents during the time window of these missing logs. The issue is that we did not meet the 7 year retention requirement for these firewall logs as part of the Baseline Requirements.

A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

2020-07-21 15:23 UTC Beginning time of missing log data gap
2020-07-21 17:04 UTC End of missing firewall log data
2020-07-29 20:03 UTC Issue identified after manual log review
2020-08-06 20:44 UTC Confirmed that missing logs were rolled and not recoverable. We started our investigation for the root cause.
2020-08-13 07:36 UTC Confirmed that our security log reviewing software was in place and there were no incidents during the time of these missing logs.

Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.
This is a compliance-only issue that does not directly impact certificates. We have not stopped certificate issuance.

A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.
There are no problematic certificates.

The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.
There are no problematic certificates.

Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
These are the items that we have been able to identify as contributing to the gap in log retention:
• Monitoring did not provide an alert
• The log retention on the devices has a limited size before the logs are rolled, which results in a short timeline.

List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.
We are still working to identify the root cause and identify the remediation steps. We will provide an update within the next week as we collect more information.

Actual results:

See above.

Expected results:

See above.

Assignee: bwilson → Dustin.Hollenback
Status: UNCONFIRMED → ASSIGNED
Type: defect → task
Ever confirmed: true
Whiteboard: [ca-compliance]
Whiteboard: [ca-compliance] → [ca-compliance] Next Update - 21-August 2020

After more investigation, the two main factors that caused us to lose these logs were a combination of limited monitoring sensitivity and a limited time window to recover once the issue was identified.

Monitoring:

  • The existing monitoring was not as sensitive as it needs to be. The existing monitoring could not be tuned to be more sensitive. It will only be able to alert on issues that last multiple hours at the minimum.
  • We are adding additional monitoring tools that will be more sensitive. This will require code changes for the monitoring and tweaks to some of the automation for collecting the logs. We will continue using frequent human review until this this additional monitoring is implemented.

Log Recovery:

  • Due to a recent architectural change, firewall traffic increased by ~2x. This reduced the available logs on the firewalls that contributed to our inability to recover them once we were aware of the issue. Additional storage was added to increase this to a timeline that the teams have confidence we can recover from in the future.

At this time, we consider this issue mitigated since we have high confidence that we can recover missing logs in the future. We still have some changes that we will be tracking internally such as finalizing a more robust monitoring solution as well as several other process improvements that we identified during our investigation, but those do not prevent us considering this mitigated.

I believe that this bug can be closed. I will schedule it for closure on 21-Sept-2020 unless there are additional questions or issues to be considered, discussed, or remediated.

Flags: needinfo?(bwilson)
Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] Next Update - 21-August 2020 → [ca-compliance] Next Update - 21-Sept-2020
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Summary: Microsoft: Firewall log data retention → Microsoft DSRE PKI: Firewall log data retention

Updating the summary back to how we think about the organization (one organization, Microsoft), rather than how it internally divides the work.

Summary: Microsoft DSRE PKI: Firewall log data retention → Microsoft: Firewall log data retention
Product: NSS → CA Program
Whiteboard: [ca-compliance] Next Update - 21-Sept-2020 → [ca-compliance] [uncategorized]
You need to log in before you can comment on or make changes to this bug.