Closed Bug 1286220 Opened 8 years ago Closed 8 years ago

Backfill: Reprocess Landfill data for the affected range

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mreid, Assigned: robotblake)

References

Details

With an S3 Input reading from the Landfill bucket, process each day from 20160704 through 20160709.

Compute all the standard S3 outputs, but send them to a temporary location:
s3://net-mozaws-prod-us-west-2-pipeline-analysis/backfill_bug1285621/...normal partitions...

Verify that the updated data looks correct (compared to a similar unaffected day). Do record counts look ok? Do total MBs of data look ok? Note that we expect a smaller number of files per day during backfill, but the total size should be "normal".

One day at a time (to minimize "missing data" time):
  - delete data from the prod bucket for that day
  - move data from the temp location to the prod location

This may require some messing around with cross-iam S3 permissions.
Assignee: nobody → bimsland
Blocks: 1285621
Points: --- → 1
Priority: -- → P1
Blocks: 1286226
The data has been reprocessed and copied over to the prod bucket, except for 20160705 which should be done copying in the next hour or two. However, running a quick test of get_pings(sc, app="Firefox", channel="nightly", submission_date="20160704", fraction=0.1) from spark returns no results. Looking at the lambda job that does the indexing, there are a high number of invocation errors during the reupload due to throttling, so it looks like we may need to repopulate the simpledb index for 201607.
20160705 just completed. :rvitillo, could you repopulate the simpledb indices for 201607?
Flags: needinfo?(rvitillo)
(In reply to Wesley Dawson [:whd] from comment #2)
> 20160705 just completed. :rvitillo, could you repopulate the simpledb
> indices for 201607?

I repopulated the indices for all affected dates.
Flags: needinfo?(rvitillo)
Ok, get_pings is returning results now, so I think it's safe to close this bug and move on to restoring the derived datasets.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Blocks: 1287585
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.