Create GCP Project for Publicly Available Datasets
Categories
(Data Platform and Tools Graveyard :: Operations, task)
Tracking
(Not tracked)
People
(Reporter: frank, Assigned: jason)
Details
(Whiteboard: [DataOps])
This supports the Public Data initiative. This project will host all of our public datasets, both in BQ as publicly accessible datasets, and GCS public buckets, hosting JSON/YAML files.
When this project is created, we'll need write access to it. One way of doing that is to give write access to the bq-load-gke-1 GKE cluster, which runs the bigquery-etl Airflow job.
We will also need to coordinate creation of the underlying datasets within the project. They will be the ones with public-level permissions. [1] One way is to have it be done by the Airflow job (where we would check to see if the appropriate dataset exists), [2] another is adding all known datasets, during GCP-ingestion deploy.
The downside of [1] is it moves a somewhat ops owned piece to the bigquery-etl job, but [2] will make it difficult to add new datasets that don't exist in the telemetry pipeline, as well as create a slew of likely-unused datasets in that project. I am leaning towards [1].
Comment 1•6 years ago
|
||
We'll take a look a this for Q4 work.
Updated•6 years ago
|
| Reporter | ||
Comment 2•5 years ago
|
||
Hey Melissa, Anna is going to start working on this; any idea if this and bug 1573826 are doable in the near future?
Comment 3•5 years ago
|
||
I'll get it assigned out when we triage next week.
| Assignee | ||
Updated•5 years ago
|
| Assignee | ||
Comment 4•5 years ago
|
||
Do we have a suggestion on what to name this project? I was going to go with:
moz-fx-data-public-prod and moz-fx-data-public-nonprod. We may not need the nonprod version but it might be useful for testing.
| Assignee | ||
Updated•5 years ago
|
Comment 5•5 years ago
|
||
I'd like to see mozilla-public-data as the name. In particular, I'd like to avoid the -prod suffix as irrelevant to end users. The -fx piece also likely has no obvious meaning to folks outside Mozilla.
I think Jeff's proposed name sounds good. :jason if we erun this by branding / comms / other stakeholders and they want a different name, would it be possible to change this before we "go live" (for some definition of "live")?
In other words, are we blocked on the name?
| Assignee | ||
Comment 7•5 years ago
|
||
In other words, are we blocked on the name?
We can't change the GCP project id once the project has been created. Any changes would require us to recreate the project and any resources we created. I wouldn't say we are blocked but it may require effort (it may be significant depending on the resources) if we decide to change the name in the future.
Cool. Chatted w/ stakeholders. mozilla-public-data works. Good suggestion :klukas. We are not blocked by the naming at this point.
| Assignee | ||
Comment 9•5 years ago
|
||
| Assignee | ||
Comment 10•5 years ago
|
||
mozilla-public-data gcp project created. data-platform and airflow service accounts have bigquery admin privileges.
Updated•2 years ago
|
Description
•