Give airflow bq job create privs in shared-prod
Categories
(Data Platform and Tools Graveyard :: Operations, enhancement)
Tracking
(Not tracked)
People
(Reporter: klukas, Assigned: jason)
Details
Currently, Airflow does not have permission to create BQ jobs in shared-prod, only in derived-datasets. As we transition tables to shared-prod, it will become more important to be able to run jobs in that project rather than derived-datasets.
Perhaps the best solution here would be to create a new GCP connection for shared-prod so that jobs that want to execute in shared-prod would set gcp_conn_id="google_cloud_shared_prod". That may require having an entirely separate kube cluster, though.
Absent that, it would be nice to be able to pass a project name to our bigquery-etl jobs and have the queries themselves run in shared-prod, able to take advantage of non-qualified table names, etc. within that project.
This is not urgent, but will become more of a pain over time as we write ETL that targets shared-prod.
| Assignee | ||
Updated•6 years ago
|
| Reporter | ||
Comment 1•6 years ago
|
||
Checking in on this. Do you see a clear path forward?
| Assignee | ||
Comment 2•6 years ago
|
||
Absent that, it would be nice to be able to pass a project name to our bigquery-etl jobs and have the queries themselves run in shared-prod, able to take advantage of non-qualified table names, etc. within that project.
I think all we need to do here is give the current GKE cluster in moz-fx-data-derived-datasets BigQuery User and BigQuery Job User access in the shared projects. The bigquery client should do the right thing as long as you pass in the shared project id. The actual gcp_conn_id does not matter since it allows airflow to schedule pods/deployments within the GKE cluster.
We should eventually create a new GKE cluster as the eventual goal is to depreciate moz-fx-data-derived-datasets project and the associated resources. I've discussed within ops and we had decided that we should create a new GKE cluster in the current airflow project. This new GKE cluster will have access to the shared projects similar to the current GKE cluster. I will file a bug for that work.
| Reporter | ||
Comment 3•6 years ago
|
||
(In reply to Jason Thomas [:jason] from comment #2)
I think all we need to do here is give the current GKE cluster in
moz-fx-data-derived-datasetsBigQuery User and BigQuery Job User access in the shared projects. The bigquery client should do the right thing as long as you pass in the shared project id. The actual gcp_conn_id does not matter since it allows airflow to schedule pods/deployments within the GKE cluster.
Is this something we can do in the near term? This would probably get us unblocked from being able to move most derived datasets into shared-prod.
| Assignee | ||
Comment 4•6 years ago
|
||
Yes I will work on it today and update the bug once it is completed.
| Assignee | ||
Comment 5•6 years ago
|
||
| Assignee | ||
Comment 6•6 years ago
|
||
Done. Filed bug 1593134 for standing up the new cluster.
Updated•3 years ago
|
Description
•