Closed Bug 850782 Opened 12 years ago Closed 12 years ago

Too much backfilling. Is crontabber failing to self-heal?

Categories

(Socorro :: General, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 852991

People

(Reporter: peterbe, Unassigned)

References

Details

In the space of a week, we've had to run backfill 4 times. https://bugzilla.mozilla.org/buglist.cgi?bug_id=848370%2C848947%2C849212%2C850668&list_id=6006091 2 (according to Lonnen) of those were because ADU's from metrics failed. We need to investigate if other problems happened because if crontabber failed. Like, why isn't it self-healing appropriately.
If anybody got any other tips to explanations, please comment.
(In reply to Peter Bengtsson [:peterbe] from comment #1) > If anybody got any other tips to explanations, please comment. IIRC ADU job runs once per day, but crontabber cannot retry failed jobs more often than their normal schedule. It would be nice to retry failed daily jobs once per hour instead of waiting until the next day, for instance.
(In reply to Robert Helmer [:rhelmer] from comment #2) > (In reply to Peter Bengtsson [:peterbe] from comment #1) > > If anybody got any other tips to explanations, please comment. > > IIRC ADU job runs once per day, but crontabber cannot retry failed jobs more > often than their normal schedule. It would be nice to retry failed daily > jobs once per hour instead of waiting until the next day, for instance. That *would* be nice but it's not trivial. ...for backfill based ones.
(In reply to Peter Bengtsson [:peterbe] from comment #3) > (In reply to Robert Helmer [:rhelmer] from comment #2) > > (In reply to Peter Bengtsson [:peterbe] from comment #1) > > > If anybody got any other tips to explanations, please comment. > > > > IIRC ADU job runs once per day, but crontabber cannot retry failed jobs more > > often than their normal schedule. It would be nice to retry failed daily > > jobs once per hour instead of waiting until the next day, for instance. > > That *would* be nice but it's not trivial. ...for backfill based ones. Our hackaround for this in the old system was to run ADU hourly and ignore the failures (really the SP should throw a warning not raise an error). Perhaps ADU should be made into an hourly job, and not have anything directly depend upon it anymore (this is like the change we made for ftpscraper in bug 845949)? Although maybe this wouldn't really solve anything, because we still can't recover from ADU changes that come in after the nightly reporting starts (~10 AM UTC)
I'm making this depend on #851184 because I want to investigate first how badly ordered our list of jobs are.
Depends on: 851184
Bad news regarding, 851184 [1] I built the code and submitted the PR [2] and then ran it against our configured jobs [3] and "unfortunately" the order we have jobs in is the "perfect" one. In other words, poorly ordered jobs is not an explanation to any of our troubles. We (we, because lonnen and I are doing this together) will still investigate what potential errors could have caused this problem and why it didn't self-heal sufficiently fast enough. [1] https://bugzilla.mozilla.org/show_bug.cgi?id=851184 [2] https://github.com/mozilla/socorro/pull/1141 [3] https://github.com/mozilla/socorro/blob/master/socorro/cron/crontabber.py#L31
The only problem with crontabber was that the nightly builds matview didn't work. And hasn't worked for almost two weeks! See https://bugzilla.mozilla.org/show_bug.cgi?id=852991#c2
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.