Closed Bug 1573822 Opened 6 years ago Closed 5 years ago

Create GCP Project for Publicly Available Datasets

Categories

(Data Platform and Tools Graveyard :: Operations, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: frank, Assigned: jason)

Details

(Whiteboard: [DataOps])

This supports the Public Data initiative. This project will host all of our public datasets, both in BQ as publicly accessible datasets, and GCS public buckets, hosting JSON/YAML files.

When this project is created, we'll need write access to it. One way of doing that is to give write access to the bq-load-gke-1 GKE cluster, which runs the bigquery-etl Airflow job.

We will also need to coordinate creation of the underlying datasets within the project. They will be the ones with public-level permissions. [1] One way is to have it be done by the Airflow job (where we would check to see if the appropriate dataset exists), [2] another is adding all known datasets, during GCP-ingestion deploy.

The downside of [1] is it moves a somewhat ops owned piece to the bigquery-etl job, but [2] will make it difficult to add new datasets that don't exist in the telemetry pipeline, as well as create a slew of likely-unused datasets in that project. I am leaning towards [1].

We'll take a look a this for Q4 work.

Whiteboard: [DataOps]

Hey Melissa, Anna is going to start working on this; any idea if this and bug 1573826 are doable in the near future?

I'll get it assigned out when we triage next week.

Assignee: nobody → jthomas

Do we have a suggestion on what to name this project? I was going to go with:

moz-fx-data-public-prod and moz-fx-data-public-nonprod. We may not need the nonprod version but it might be useful for testing.

Flags: needinfo?(fbertsch)

I'd like to see mozilla-public-data as the name. In particular, I'd like to avoid the -prod suffix as irrelevant to end users. The -fx piece also likely has no obvious meaning to folks outside Mozilla.

I think Jeff's proposed name sounds good. :jason if we erun this by branding / comms / other stakeholders and they want a different name, would it be possible to change this before we "go live" (for some definition of "live")?

In other words, are we blocked on the name?

Flags: needinfo?(jthomas)

In other words, are we blocked on the name?

We can't change the GCP project id once the project has been created. Any changes would require us to recreate the project and any resources we created. I wouldn't say we are blocked but it may require effort (it may be significant depending on the resources) if we decide to change the name in the future.

Flags: needinfo?(jthomas)

Cool. Chatted w/ stakeholders. mozilla-public-data works. Good suggestion :klukas. We are not blocked by the naming at this point.

mozilla-public-data gcp project created. data-platform and airflow service accounts have bigquery admin privileges.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Product: Data Platform and Tools → Data Platform and Tools Graveyard
You need to log in before you can comment on or make changes to this bug.