Closed
Bug 1234286
Opened 9 years ago
Closed 8 years ago
Some crash_summary_* and main_summary_* tables are missing
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect)
Cloud Services Graveyard
Metrics: Pipeline
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: benjamin, Unassigned)
Details
Missing: crash_summary_20151214 crash_summary_20151217 The matching main_summary_* tables are also missing.
Comment 1•9 years ago
|
||
We ran into a "SEARCH_COUNTS" histogram that was malformed, it blew up the derived streams. PR to fix it: https://github.com/mozilla-services/data-pipeline/pull/176
Comment 2•8 years ago
|
||
Backfill jobs for those two days are running now. I will update / close this bug when they complete.
Comment 3•8 years ago
|
||
Missing tables have been populated.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Comment 4•8 years ago
|
||
For future reference, steps for backfilling: - Fix the problem in the job code - Update the job code using the relevant 'package.sh' script[1] - Remove any existing data for the backfill target days from redshift and S3 (drop tables, delete data files) - Run a scheduled job with a "run command" of "./run.sh 20151214" (one or each backfill date) - Once the job has launched, delete the scheduled job. This can safely be done before the job finishes.
Reporter | ||
Comment 5•8 years ago
|
||
20160102 has also failed
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Updated•8 years ago
|
Flags: needinfo?(mreid)
Comment 6•8 years ago
|
||
I kicked off the backfill job a bit over an hour ago. I will update when it completes. Meanwhile, I'll add some monitoring / alerting to the job.
Flags: needinfo?(mreid)
Comment 7•8 years ago
|
||
The missing tables are now available.
Status: REOPENED → RESOLVED
Closed: 8 years ago → 8 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 8•8 years ago
|
||
20160106 is broken now as well.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Updated•8 years ago
|
Flags: needinfo?(mreid)
Comment 9•8 years ago
|
||
Backfill job kicked off. Trink deployed the updated code to fix the underlying problem too, so this particular problem shouldn't come up again. I filed Bug 1238676 to add monitoring so we can deal with this in a more timely way going forward. Again, I'll update the bug when the job completes, which I estimate to be around Jan 12 @ 09:00 UTC
Flags: needinfo?(mreid)
Comment 10•8 years ago
|
||
The tables are now available.
Status: REOPENED → RESOLVED
Closed: 8 years ago → 8 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 11•8 years ago
|
||
20160115 is also broken
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Updated•8 years ago
|
Flags: needinfo?(mreid)
Comment 12•8 years ago
|
||
This is a new issue. The job for that day appears to not have run. I'm running it now, should be available in the morning. I'll clear the needinfo when the missing tables are actually available. Thanks for your patience while we get monitoring up and running in bug 1238676.
Updated•8 years ago
|
Status: REOPENED → RESOLVED
Closed: 8 years ago → 8 years ago
Resolution: --- → FIXED
Comment 15•8 years ago
|
||
Update on Comment 12 - looks like the analysis.t.m.o service encountered an OOM error, we tracked down the cause to running a whole bunch of backfill jobs at the same time. We're planning to boost the ec2 instance type to give a bit more breathing room for that case. The job from 20160122 is missing due to a job timeout after 1400 minutes (!!). I'm running the backfill now. It shouldn't take that long to process a day, so we will need to investigate further if the job times out again.
Comment 16•8 years ago
|
||
The tables for 20160122 are now in place. The backfill job ran in about the usual amount of time, around 14 hours.
Updated•8 years ago
|
Status: REOPENED → RESOLVED
Closed: 8 years ago → 8 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•