Closed Bug 1713005 Opened 5 years ago Closed 5 years ago

New GCP Project Request: firefox-client-personalities-outreachy

Categories

(Data Platform and Tools :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ccd, Assigned: whd)

Details

(Whiteboard: [gcp-project-request])

** Please fill out the following information and needinfo the data engineering contact you specified below (unless the contact is yourself), don't forget to change the title to use your project name! **

GCP-compatible project name (e.g. missioncontrol-v2-dev, adi-forecasting-dev): firefox-client-personalities-outreachy
LDAP of people who require administrative privileges for this project: Leif Oines, Corey Dow-Hygelund
Project timeline (maximum 6 months, projects may be renewed if development is still ongoing at the end of that period): 4 months
Approximate budget for this project (if expected to be greater than $1000):
Whether this project will be used to import external data into GCP, and if so, from where (if the answer is yes, needinfo a member of Data SRE for an ops evaluation): No
Data Engineering contact for this project: Mark Reid

For more information, please see the gcp project cookbook on docs.telemetry.mozilla.org.

Given that provisioning this project will require some tweaks to restrict access I'm going to handle the provisioning step. :mreid will remain the DE contact once the project is provisioned.

firefox-client-personalities-outreachy is too long for a GCP project id. I'm tentatively going with a project id of client-personalities-outreachy, with the outreachy suffix removing the automated provisioning of access to standard mozilla-confidential DE infrastructure e.g. mozdata, shared bigquery, pubsub.

Assignee: nobody → whd

I've provisioned client-personalities-outreachy via https://github.com/mozilla-services/data-sandbox-terraform/pull/54.

The notebook instance is available at https://3cf6c7c3be9346e9-dot-us-west1.notebooks.googleusercontent.com/lab?authuser=0. I'd expect Leif and Corey to copy notebooks from the moz-fx-data-bq-data-science notebook instance to this one. The notebook does not have access to the public internet, but has access to GCP services and in particular has BQ write access to the client-personalities-outreachy.analysis dataset. I'd expect Leif and Corey to copy whatever derived data is needed from the canonical locations to this dataset, and they should have access to do that. If the analysis will not require writing any data back to bigquery, we can remove the notebook service account's write access in favor of read access.

Access to provision notebooks is currently restricted to DE+SRE, since notebooks must be provisioned in a specific way (no public IP, specific service account that the external user can act as).

Leif and Corey have been granted viewer access to the project: https://console.cloud.google.com/ai-platform/notebooks/list/instances?organizationId=442341870013&project=client-personalities-outreachy. Depending on whether the above configuration is sufficient, there may be additional steps taken to grant access to various parties. As configured, the intern gmail account should only have access to use the notebook instance via the proxy link and will not be able to view the project from the GCP console.

Once we've verified that access works as expected, we can add this project to the sandbox projects mana page and close out this bug and bug #1712808. It's expected that the regular triage of sandbox projects will allow us to remove access to the project after the internship ends in timely fashion.

Thanks! The intern is able to load the notebook. However, both her and I cannot query any dataset using Python, as we receive the following error:

403 POST https://bigquery.googleapis.com/bigquery/v2/projects/client-personalities-outreachy/jobs?prettyPrint=false: Access Denied: Project client-personalities-outreachy: User does not have bigquery.jobs.create permission in project client-personalities-outreachy

Can this be fixed, or is there something missing in the query?

Dataset specific query

q_date = '2021-05-01'

sql = """
SELECT * EXCEPT (client_id),
DENSE_RANK() OVER (order by client_id ASC) as client_id
FROM {tbl}
WHERE submission_date = '{date}'
and subsample_id < 10
"""

df = client.query(sql.format(tbl = client_28_day_table_id, date = q_date)).to_dataframe()

This public dataset query fails with same error:

%%bigquery
SELECT
    source_year AS year,
    COUNT(is_male) AS birth_count
FROM `bigquery-public-data.samples.natality`
GROUP BY year
ORDER BY year DESC
LIMIT 15

The service account needed some project-level permissions to actually use bigquery, which I added to https://github.com/mozilla-services/data-sandbox-terraform/pull/54/commits/89a1d06181e15f8e8ee5f868dd6cb0f4a617c76b. I'm going update the mana doc to include this project and close this out.

I added the project to https://mana.mozilla.org/wiki/display/DATA/Active+GCP+Prototype+Projects with an end date of 2021-09-27 (4 months from provisioned date), but if there's a more precise time to add I can change it.

I'm still waiting on :mreid (NI) to either approve https://github.com/mozilla-services/data-sandbox-terraform/pull/54 or delegate DE point of contact to a member of his team. Once this step is done and we've confirmed the intern can access the data we can close this out.

Flags: needinfo?(mreid)

This looks good to me from an approval point of view, but I defer to :amiyaguchi as the point of contact in DE for this project (I'll tag him for review in the PR too).

Flags: needinfo?(mreid)

The PR is merged and I've updated the DE contact on the mana page. :ccd once you've confirmed that your intern can access the notebook and data and that 2021-09-27 is an appropriate end date, please close this out.

Re-open or ping me if there are any issues with the access.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED

This prototype project is scheduled for deletion. Please confirm or recommend an appropriate end date.

Flags: needinfo?(cdowhygelund)

I confirm the project is ready for deletion.

Flags: needinfo?(cdowhygelund)
You need to log in before you can comment on or make changes to this bug.