Closed Bug 1699822 Opened 4 years ago Closed 4 years ago

Fully migrate download-stats.mozilla.org to the GCP ingestion infra

Categories

(Data Platform and Tools Graveyard :: Operations, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: whd, Unassigned)

References

Details

This is a rehash of bug #1357257 since the state we closed that bug in actually still has an AWS component. This component recently broke (bug #1699645), and this new bug is for removing said component in favor of using the standard ingestion endpoint (or another GCLB) directly.

I think it makes sense operationally to simply create a new TLS cert for download-stats.mozilla.org and add it to the existing GCLB. While DSMO is strictly non-https, this would avoid any confusion around DSMO now responding on port 443 with an invalid cert. We currently accept standard HTTP without forced upgrade to HTTPS for the telemetry endpoint (which means DSMO would work if pointed there), but we've had multiple conversations over the years about whether that is something we should remove. If we did plan to enforce https-only for telemetry at some point, we would likely need to create a separate GCLB for DSMO (or move DSMO to https).

We certainly need to do this by EOY but given recent breakage should likely prioritize it for sooner rather than later. It might make sense to roll deployment of this into bug #1666498, especially if it requires a separate GCLB.

As a stop-gap solution until bug 1666498 is completed, we're looking at changing the download-stats.mozilla.org domain to point directly at the ingestion edge. This would cause a cert mis-match if you hit that domain using HTTPS, but as the current DSMO endpoint only listens on HTTP (which is what the stub installer uses), we don't foresee that being an issue while the referenced bug is being worked on.

A few follow up questions came up when I brought this up in SecOps team meeting today:

  • Is the stub installer signed and served over TLS? Is the initial update/full install signed? (I believe this is yes and yes, but wanted to confirm)
  • Do we have an estimate for how long we'd see the cert mismatch error?
  • Are there any complications to standing up a download-stats.mozilla.org endpoint with a valid cert on GCP or just migrating the endpoint but not the whole pipeline that I missed (like expected traffic or CAA records)?

Is the stub installer signed and served over TLS? Is the initial update/full install signed? (I believe this is yes and yes, but wanted to confirm)

NI :mhowell for confirmation on this.

Do we have an estimate for how long we'd see the cert mismatch error?

We'd likely see the cert mismatch until bug #1666498 is completed, which I anticipate is on the order of 1-2 weeks (but could be longer, given how long that bug has been around). We'd need to provision the cert but once it's been provisioned, adding it to the GCLB would be very fast. In point of fact we can decouple the work of migrating off the AWS tee for incoming.tmo from this DNS migration by CNAMEing to prod.data-ingestion.prod.dataops.mozgcp.net directly, and that GCLB can be assigned the newly-generated DSMO cert directly.

Are there any complications to standing up a download-stats.mozilla.org endpoint with a valid cert on GCP or just migrating the endpoint but not the whole pipeline that I missed (like expected traffic or CAA records)?

The whole pipeline (except this tee) was already migrated to GCP in bug #1357257. I don't think there's any expected traffic or CAA record type traffic related to DSMO.

Flags: needinfo?(mhowell)

(In reply to Greg Guthe [:g-k] [:gguthe] from comment #2)

  • Is the stub installer signed and served over TLS? Is the initial update/full install signed? (I believe this is yes and yes, but wanted to confirm)

You're right, the stub and full installers are all signed and all served over TLS. The stub installer verifies that the full installer has a signature that is trusted and also has the expected subject and issuer strings before running it.

Flags: needinfo?(mhowell)

Thanks :whd and :mhowell. Since this doesn't impact the integrity of the initial stub and main installers, the stopgap works for me, but :whd's option would be preferable to avoid the cert mismatch error:

we can decouple the work of migrating off the AWS tee for incoming.tmo from this DNS migration by CNAMEing to prod.data-ingestion.prod.dataops.mozgcp.net directly, and that GCLB can be assigned the newly-generated DSMO cert directly.

I've added the cert to the GCLB via https://github.com/mozilla-services/cloudops-infra/pull/2989, and curl -H 'Host: download-stats.mozilla.org' https://prod.data-ingestion.prod.dataops.mozgcp.net/__version__ works for me locally, so we should be good to cut DNS over to the current edge. We'll need to change this again to prod.ingestion-edge.prod.dataops.mozgcp.net once bug #1666498 is completed.

DNS has been cut over and seems to be working fine from here for both HTTP and HTTPS.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Product: Data Platform and Tools → Data Platform and Tools Graveyard
See Also: → 1811648
You need to log in before you can comment on or make changes to this bug.