Closed Bug 1661565 Opened 5 years ago Closed 5 years ago

Backfill pings for `#/environment/system/gfx/adapters/N/GPUActive` from 2020-07-04 to 2020-08-20

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: amiyaguchi, Assigned: amiyaguchi)

References

Details

(Whiteboard: [dataquality])

Attachments

(2 files)

2020-08-27-GPUActive errors start.png 5 years ago Anthony Miyaguchi [:amiyaguchi] 19.37 KB, image/png		Details
Bug 1661565 - Backfill GPU active pings #11 5 years ago Anthony Miyaguchi [:amiyaguchi] 52 bytes, text/x-github-pull-request		Details \| Review

Anthony Miyaguchi [:amiyaguchi]

Assignee

Description

•

5 years ago

Attached image 2020-08-27-GPUActive errors start.png — Details

Bug 1657142 fixed schema errors related to GPUActive fields. The extent of the errors goes from 2020-07-04 until it was fixed on 2020-08-20. These dates can be backfilled from the errors table.

https://sql.telemetry.mozilla.org/queries/74259/source#185637

Jeff Klukas [:klukas] (UTC-4)

Comment 1

•

5 years ago

Let me know if you want to talk through setup for the backfill. It looks like it's actually been quite a while since we've done a backfill from the errors table: https://github.com/mozilla/bigquery-backfill/tree/master/backfill/2020-01-23-sync-ping

Anthony Miyaguchi [:amiyaguchi]

Assignee

Comment 2

•

5 years ago

There's a more recent backfill done here, but it'd certainly be a good idea to talk through the setup when the time comes around: https://github.com/mozilla/bigquery-backfill/tree/master/backfill/2020-03-30-gcs-error

Frank Bertsch [:frank]

Comment 3

•

5 years ago

Anthony, are you going to take this?

Flags: needinfo?(amiyaguchi)

Anthony Miyaguchi [:amiyaguchi]

Assignee

Comment 4

•

5 years ago

Yes, I'll take this, but I'll leave it as a P3.

Assignee: nobody → amiyaguchi

Flags: needinfo?(amiyaguchi)

Anthony Miyaguchi [:amiyaguchi]

Assignee

Updated

•

5 years ago

Priority: P3 → P1

Anthony Miyaguchi [:amiyaguchi]

Assignee

Comment 5

•

5 years ago

I've taken a closer look at this, now 4 months after the initial filing of the bug. There are in total 4.1 million documents to be backfilled, however these documents no longer exist in the live table which has a 30 day retention policy.

Reading through the 2020-01-23 sync ping backfill, it looks there need to be a few modification for this to go through successfully.

From the sync backfill:

Process from errors into a live table in a backfill project.
Append live tables in the backfill project to the live tables in the shared prod project.
Run copy_deduplicate from the live tables into the stable tables.

This assumes that the live table exists in full. Reading through copy_deduplicate, it looks like it does a WRITE_TRUNCATE. It seems like this could lead to data loss if the live table only held documents from the backfilled errors.

I can imagine a process that appends directly the stable table with the following psuedo-SQL as the source data:

declare date DATE;
set date = "2020-08-20"

-- documents should be deduplicated in the live table too...
select *
from backfill.telemetry_live.bhr_v4
where date(submission_timestamp) = date
and document_id not in (
    select distinct document_id
    from shared_prod.telemetry_stable.bhr_v4
    where date(submission_date) = date
)

On note is that there may be client ids that have already gone through the shredder process that will be introduced into these tables. Presumably, these dates will be reprocessed using the entire deletion-request table such that this is not an issue.

Is the modification above of skipping the copy deduplicate stage reasonable?

Flags: needinfo?(jklukas)

Anthony Miyaguchi [:amiyaguchi]

Assignee

Comment 6

•

5 years ago

I had a quick conversion with :klukas to go through the plan. We can append directly from a stable table in the backfill project into the shared prod project without having to go through copy-deduplicate.

The process will look something like this:

Filter out the set of documents directly from payload bytes format. Apply de-duplication and remove relevant client ids from deletion requests as necessary.
Run the beam job to populate stable tables in the backfill project
Run a bq cp from backfill to shared prod.

Anthony Miyaguchi [:amiyaguchi]

Assignee

Updated

•

5 years ago

Flags: needinfo?(jklukas)

Anthony Miyaguchi [:amiyaguchi]

Assignee

Comment 7

•

5 years ago

Attached file Bug 1661565 - Backfill GPU active pings #11 — Details

Anthony Miyaguchi [:amiyaguchi]

Assignee

Comment 8

•

5 years ago

I've created the set of stable tables that can be copied into the prod project now. The procedure is something like this:

Mirror tables from the production project for the live and stable tables.
Copy the subset of payload bytes error into the backfill project, run the beam decoder job on it to populate the live table in the backfill project.
Optionally prune the set of empty tables from the live dataset, then run copy_deduplicate over the live dataset.
Prune the set of empty tables from the stable dataset, then run shredder_delete on the stable dataset.
Append backfill stable tables from backfill to shared prod.
Delete old error set from shared prod and append errors from backfill into shared prod.

It's pretty reasonable running copy deduplicate and shredder delete inside of the backfill project. The options do need to be inspected closely and there are some caveats to usage, but it's mostly straightforward.

Anthony Miyaguchi [:amiyaguchi]

Assignee

Comment 9

•

5 years ago

The backfill is complete, with the stable tables and errors being appended to prod. There was a slight issue with the table clustering being incorrect on the stable tables one backfill project, so I had to mirror them in a separate project to cluster the data before the append could be done in prod (notes in the PR).

Status: NEW → RESOLVED

Closed: 5 years ago

Resolution: --- → FIXED

Anna Scholtz [:ascholtz]

Updated

•

3 years ago

Whiteboard: [data-quality] → [dataquality]

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Backfill pings for `#/environment/system/gfx/adapters/N/GPUActive` from 2020-07-04 to 2020-08-20

Categories

(Data Platform and Tools :: General, task, P1)

Tracking

(Not tracked)

People

(Reporter: amiyaguchi, Assigned: amiyaguchi)

References

Details

(Whiteboard: [dataquality])

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Comment 6

Updated

Comment 7

Comment 8

Comment 9

Updated

Attachment

General

Description

File Name

Content Type