Closed Bug 1582791 Opened 6 years ago Closed 6 years ago

Give airflow bq job create privs in shared-prod

Categories

(Data Platform and Tools Graveyard :: Operations, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: klukas, Assigned: jason)

Details

Currently, Airflow does not have permission to create BQ jobs in shared-prod, only in derived-datasets. As we transition tables to shared-prod, it will become more important to be able to run jobs in that project rather than derived-datasets.

Perhaps the best solution here would be to create a new GCP connection for shared-prod so that jobs that want to execute in shared-prod would set gcp_conn_id="google_cloud_shared_prod". That may require having an entirely separate kube cluster, though.

Absent that, it would be nice to be able to pass a project name to our bigquery-etl jobs and have the queries themselves run in shared-prod, able to take advantage of non-qualified table names, etc. within that project.

This is not urgent, but will become more of a pain over time as we write ETL that targets shared-prod.

Assignee: nobody → jthomas

Checking in on this. Do you see a clear path forward?

Absent that, it would be nice to be able to pass a project name to our bigquery-etl jobs and have the queries themselves run in shared-prod, able to take advantage of non-qualified table names, etc. within that project.

I think all we need to do here is give the current GKE cluster in moz-fx-data-derived-datasets BigQuery User and BigQuery Job User access in the shared projects. The bigquery client should do the right thing as long as you pass in the shared project id. The actual gcp_conn_id does not matter since it allows airflow to schedule pods/deployments within the GKE cluster.

We should eventually create a new GKE cluster as the eventual goal is to depreciate moz-fx-data-derived-datasets project and the associated resources. I've discussed within ops and we had decided that we should create a new GKE cluster in the current airflow project. This new GKE cluster will have access to the shared projects similar to the current GKE cluster. I will file a bug for that work.

(In reply to Jason Thomas [:jason] from comment #2)

I think all we need to do here is give the current GKE cluster in moz-fx-data-derived-datasets BigQuery User and BigQuery Job User access in the shared projects. The bigquery client should do the right thing as long as you pass in the shared project id. The actual gcp_conn_id does not matter since it allows airflow to schedule pods/deployments within the GKE cluster.

Is this something we can do in the near term? This would probably get us unblocked from being able to move most derived datasets into shared-prod.

Yes I will work on it today and update the bug once it is completed.

Done. Filed bug 1593134 for standing up the new cluster.

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Product: Data Platform and Tools → Data Platform and Tools Graveyard
You need to log in before you can comment on or make changes to this bug.