Closed Bug 857100 Opened 12 years ago Closed 12 years ago

Investigate missing aggregations for April 1

Categories

(Socorro :: General, task)

task
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: laura, Unassigned)

References

Details

We have ADU for last night, but aggregations do not appear in the UI. Work out why and fix it.
$ grep -E "DEBUG" /var/log/socorro/crontabber.log | grep -vE "FTPScraper|Emails|Bugzilla|Duplicates|ReportsClean" | grep "successful" 2013-04-01 03:07:21,081 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.matviews.SignaturesCronApp'> on 2013-04-01 10:00:02.037445+00:00 2013-04-01 03:07:22,901 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.matviews.BuildADUCronApp'> on 2013-04-01 10:00:16.365841+00:00 2013-04-01 03:08:09,145 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.matviews.CorrelationsCronApp'> on 2013-04-01 10:00:23.561458+00:00 2013-04-01 03:08:11,668 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.matviews.CrashesByUserBuildCronApp'> on 2013-04-01 10:00:21.426379+00:00 2013-04-01 03:08:12,692 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.matviews.NightlyBuildsCronApp'> on 2013-04-01 10:00:01.313985+00:00 2013-04-01 03:08:13,202 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.matviews.HomePageGraphBuildCronApp'> on 2013-04-01 10:00:49.036329+00:00 2013-04-01 03:08:14,070 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.matviews.ADUCronApp'> on 2013-04-01 10:00:48.600524+00:00 2013-04-01 03:08:14,627 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.matviews.ProductVersionsCronApp'> on 2013-04-01 10:08:14.092286+00:00 2013-04-01 03:08:15,415 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.matviews.TCBSBuildCronApp'> on 2013-04-01 10:00:49.560122+00:00 2013-04-01 03:08:16,358 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.matviews.HomePageGraphCronApp'> on 2013-04-01 10:00:47.507316+00:00 2013-04-01 03:12:18,348 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.matviews.TCBSCronApp'> on 2013-04-01 10:00:14.664714+00:00 2013-04-01 03:14:15,189 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.matviews.ExplosivenessCronApp'> on 2013-04-01 10:00:50.349892+00:00 2013-04-01 03:14:17,583 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.matviews.CrashesByUserCronApp'> on 2013-04-01 10:00:17.633244+00:00 2013-04-02 03:00:02,490 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.matviews.NightlyBuildsCronApp'> on 2013-04-02 10:00:01.313985+00:00 2013-04-02 03:00:02,873 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.matviews.ProductVersionsCronApp'> on 2013-04-02 10:00:02.553741+00:00
(reformatting the output above to be bugzilla friendly) $ grep -E "DEBUG" /var/log/socorro/crontabber.log | grep -vE "FTPScraper|Emails|Bugzilla|Duplicates|ReportsClean" | grep "successful" ran SignaturesCronApp on 2013-04-01 10:00:02.037445+00:00 ran BuildADUCronApp on 2013-04-01 10:00:16.365841+00:00 ran CorrelationsCronApp on 2013-04-01 10:00:23.561458+00:00 ran CrashesByUserBuildCronApp on 2013-04-01 10:00:21.426379+00:00 ran NightlyBuildsCronApp on 2013-04-01 10:00:01.313985+00:00 ran HomePageGraphBuildCronApp on 2013-04-01 10:00:49.036329+00:00 ran ADUCronApp on 2013-04-01 10:00:48.600524+00:00 ran ProductVersionsCronApp on 2013-04-01 10:08:14.092286+00:00 ran TCBSBuildCronApp on 2013-04-01 10:00:49.560122+00:00 ran HomePageGraphCronApp on 2013-04-01 10:00:47.507316+00:00 ran TCBSCronApp on 2013-04-01 10:00:14.664714+00:00 ran ExplosivenessCronApp on 2013-04-01 10:00:50.349892+00:00 ran CrashesByUserCronApp on 2013-04-01 10:00:17.633244+00:00 ran NightlyBuildsCronApp on 2013-04-02 10:00:01.313985+00:00 ran ProductVersionsCronApp on 2013-04-02 10:00:02.553741+00:00
No cause identified yet, but only NightlyBuilds and ProductVersions daily jobs report success from the overnight. Strange. Confirmed with `grep -E "on 2013-04-02" /var/log/socorro/crontabber.log | grep -vE "Duplicates|Bugzilla|FTPScraper|Emails|ReportsClean"`
Interesting, why didn't the other jobs run today? Or did they start and not complete?
Kairo: don't know yet. lonnen, peterbe: can we safely backfill so Kairo can do his work?
Yeah this is odd, here's the current tcbs-matview state from crontabbers.json: "tcbs-matview": { ... "last_run": "2013-04-02 10:00:02.939109", "last_success": "2013-04-01 10:00:14.664714", "error_count": 0, "last_error": {} }, In /var/log/socorro/crontabbers.log I see it logging the "last_run" time when it tried to start: 2013-04-02 03:00:02,939 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.matviews.TCBSCronApp'> However there is no corresponding "successfully ran ..." log after that, and no errors either.
@Laura -- Backfill anytime, it won't destroy anything. All the info we're going to get is already in those logs.
(In reply to Robert Helmer [:rhelmer] from comment #6) > Yeah this is odd, here's the current tcbs-matview state from > crontabbers.json: > > "tcbs-matview": { > ... > "last_run": "2013-04-02 10:00:02.939109", > "last_success": "2013-04-01 10:00:14.664714", > "error_count": 0, > "last_error": {} > }, > > In /var/log/socorro/crontabbers.log I see it logging the "last_run" time > when it tried to start: > > 2013-04-02 03:00:02,939 DEBUG - MainThread - about to run <class > 'socorro.cron.jobs.matviews.TCBSCronApp'> > > However there is no corresponding "successfully ran ..." log after that, and > no errors either. This could be very bad. It means that the backfill concluded that there was nothing to run. The "about to run" message happens if the dependencies are all passing and the next_run says it's time to run. Then, when it goes into the backfill machinery it does this based on last_success to know how many days it's going to catch up on. So, since this is run at "2013-04-02 03:00:02" (which is 10:00:02 in UTC) that means 24 hours has NOT passed since "2013-04-01 10:00:14.664714". (see, 23 hours, 59 minutes and 47 seconds) I fear this is a bug. Here in fact: https://github.com/mozilla/socorro/blob/master/socorro/cron/base.py#L163 What it does is that it avoids any drift by forcing the time to be "10:00" but it forgets the seconds (and milliseconds). So, this means it should kick in tomorrow but it'll be 1 day off. I'll write a patch.
Status: NEW → ASSIGNED
New pull request: https://github.com/mozilla/socorro/pull/1174 It's late in the afternoon for heavy date logic but I think it's solid.
Commit pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/f6342821711f55053156d9fbd7fa4d645ba23c9f fixes bug 857100 - reset time by seconds and microseconds too, r=rhelmer
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Blocks: 860233
You need to log in before you can comment on or make changes to this bug.