Closed
Bug 1286220
Opened 8 years ago
Closed 8 years ago
Backfill: Reprocess Landfill data for the affected range
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)
Cloud Services Graveyard
Metrics: Pipeline
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mreid, Assigned: robotblake)
References
Details
With an S3 Input reading from the Landfill bucket, process each day from 20160704 through 20160709. Compute all the standard S3 outputs, but send them to a temporary location: s3://net-mozaws-prod-us-west-2-pipeline-analysis/backfill_bug1285621/...normal partitions... Verify that the updated data looks correct (compared to a similar unaffected day). Do record counts look ok? Do total MBs of data look ok? Note that we expect a smaller number of files per day during backfill, but the total size should be "normal". One day at a time (to minimize "missing data" time): - delete data from the prod bucket for that day - move data from the temp location to the prod location This may require some messing around with cross-iam S3 permissions.
Reporter | ||
Updated•8 years ago
|
Comment 1•8 years ago
|
||
The data has been reprocessed and copied over to the prod bucket, except for 20160705 which should be done copying in the next hour or two. However, running a quick test of get_pings(sc, app="Firefox", channel="nightly", submission_date="20160704", fraction=0.1) from spark returns no results. Looking at the lambda job that does the indexing, there are a high number of invocation errors during the reupload due to throttling, so it looks like we may need to repopulate the simpledb index for 201607.
Comment 2•8 years ago
|
||
20160705 just completed. :rvitillo, could you repopulate the simpledb indices for 201607?
Flags: needinfo?(rvitillo)
Comment 3•8 years ago
|
||
(In reply to Wesley Dawson [:whd] from comment #2) > 20160705 just completed. :rvitillo, could you repopulate the simpledb > indices for 201607? I repopulated the indices for all affected dates.
Flags: needinfo?(rvitillo)
Comment 4•8 years ago
|
||
Ok, get_pings is returning results now, so I think it's safe to close this bug and move on to restoring the derived datasets.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•