Closed Bug 1919304 Opened 2 months ago Closed 19 days ago

GlobalSign: Caching headers inaccurate for subset of CRLs

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: christophe.bonjean, Assigned: christophe.bonjean)

Details

(Whiteboard: [ca-compliance] [crl-failure])

GlobalSign received a report that our CRL distribution points returned inaccurate caching headers for a subset of CRLs.

We will provide an incident report no later than Tuesday 24 September.

Assignee: nobody → christophe.bonjean
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance] [crl-failure]

Incident Report

Summary

For the CRLs published through our Atlas platform, the HTTP caching headers (Expires, Last-Modified and ETag) were inaccurate between August 22, 2024 13:38 and September 13, 2024 15:10.

Fresh CRLs were generated, published and served to relying parties, but due to a fault in the update process, the corresponding HTTP caching header values were not properly updated.

Impact

The issue only impacted our Atlas platform, with CRLs for 185 CAs being published with incorrect (expired) HTTP caching headers between August 22, 2024 13:38 and September 13, 2024 15:10.

Clients requesting the CRLs with conditional GET requests based on the last modified date would have received a 304 "Not Modified" response during this period.

Timeline

All times are UTC.

DD/MM/YYYY Description
22/08/2024 - 13:38 Updates performed to first CRL server. QA checks completed and start of publishing expired HTTP caching headers (Expires, Last-Modified and ETag).
27/08/2024 - 12:36 Updates performed to second CRL server. QA checks completed and start of publishing expired HTTP caching headers (Expires, Last-Modified and ETag).
13/09/2024 - 12:32 Received email from reporter informing us about issue being reported in crt.sh.
13/09/2024 - 12:52 Escalation of the report to compliance team.
13/09/2024 - 12:53 Investigation started by compliance team. Start root-cause analysis.
13/09/2024 - 14:49 Root cause identified, issue caused by stale configuration files due to a bug in the update process.
13/09/2024 - 15:10 Updated the configuration of both CRL servers with correct HTTP headers. Issue resolved.
13/09/2024 - 15:18 Updated scripts for publishing CRLs and reloading the configuration.
19/09/2024 - 06:21 Monitoring extended to also cover HTTP headers.

Root Cause Analysis

The web servers of our CRL nodes rely on configuration files for the values of the HTTP headers (Expires, Last-Modified, ..). Each CRL file has a corresponding configuration file that is updated when a new CRL is generated, to ensure that the HTTP header values match the CRL timestamps. After the configuration file is updated, the web server must be reloaded to load the new HTTP header settings into memory.

The issue occurred after an operating system upgrade. On the upgraded systems, the server reload command did not execute as expected and the HTTP header configuration did not get reloaded. The script that performs the update and reloads the server configuration did not include appropriate error handling for the reload command and appeared to complete successfully. As of that moment, the new HTTP header settings were no longer loaded into memory and the fresh CRLs were served, but with incorrect (expired) HTTP headers.

QA testing was performed during the upgrades, but was performed for the initial batch of CRLs. At that point, the HTTP headers were correct, as the issue only started a few hours later when new CRLs were generated and the reload was unsuccessful.

CRL monitoring is also performed on a recurring basis on all CRLs, but only covers regular HTTP requests (i.e. no conditional GET requests with last modified parameter) and checks on the published CRL files. Therefore, the monitoring process did not identify the issue with outdated HTTP headers.

Lessons Learned

What went well

  • After identifying the root cause, the configuration issue was resolved in a timely and efficient manner.

What didn't go well

  • Monitoring of the CRL services was missing a check for correctness of HTTP headers.

Where we got lucky

  • A limited number of clients relied on the caching headers and conditional GET requests, therefore the issue did not impact all relying parties.

Action Items

Action Item Kind Due Date Status
Update scripts for publishing CRLs and reloading the configuration Prevent 13/09/2024 - 15:18 Done
Extend monitoring for correctness of HTTP headers Detect 19/09/2024 - 06:21 Done
Improve QA process to cover testing of HTTP header responses and configuration reload behavior Prevent 08/10/2024 Pending

There were no scheduled deliverables this week. We are on track to deliver the other actions as per the schedule.

We completed the updates to the QA process to ensure coverage of HTTP header responses and configuration reload behavior.

This concludes the identified remedial activities - unless there are any further questions we believe this issue can be closed.

Flags: needinfo?(bwilson)

I will close this on Friday, 18-Oct-2024, unless there are still issues to discuss.

Status: ASSIGNED → RESOLVED
Closed: 19 days ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.