Closed
Bug 777670
Opened 12 years ago
Closed 12 years ago
crontabber not running daily matviews since 2012-07-25
Categories
(Socorro :: Infra, task)
Socorro
Infra
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: rhelmer, Assigned: peterbe)
Details
(Whiteboard: [qa-])
Attachments
(1 file)
12.79 KB,
text/plain
|
Details |
I'm not sure what to make of the log, looks like it tried to run things? 2012-07-26 03:00:02,069 DEBUG - MainThread - skipping <class 'socorro.cron.jobs.weekly_reports_partitions.WeeklyReportsPartitionsCronApp'> because it's not time to run 2012-07-26 03:00:02,069 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.nightly_builds.NightlyBuildsCronApp'> 2012-07-26 03:00:02,092 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.matviews.ProductVersionsCronApp'> 2012-07-26 03:00:02,628 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.matviews.ProductVersionsCronApp'> on 2012-07-26 10:00:02.092426+00:00 2012-07-26 03:00:02,652 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.matviews.SignaturesCronApp'> 2012-07-26 03:00:02,675 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.matviews.OSVersionsCronApp'> 2012-07-26 03:00:02,699 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.matviews.TCBSCronApp'> 2012-07-26 03:00:02,721 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.matviews.ADUCronApp'> 2012-07-26 03:00:02,745 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.matviews.DailyCrashesCronApp'> 2012-07-26 03:00:02,771 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.matviews.HangReportCronApp'> 2012-07-26 03:00:02,795 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.matviews.RankCompareCronApp'> 2012-07-26 03:00:02,821 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.matviews.NightlyBuildsCronApp'> 2012-07-26 03:00:02,853 DEBUG - MainThread - skipping <class 'socorro.cron.jobs.duplicates.DuplicatesCronApp'> because it's not time to run 2012-07-26 03:00:02,854 DEBUG - MainThread - skipping <class 'socorro.cron.jobs.reports_clean.ReportsCleanCronApp'> because it's not time to run 2012-07-26 03:00:02,854 DEBUG - MainThread - skipping <class 'socorro.cron.jobs.bugzilla.BugzillaCronApp'> because it's not time to run According to crontabbers.json there were no errors and it ran: "tcbs-matview": { "next_run": "2012-07-27 10:00:00.000000", "last_error": {}, "first_run": "2012-07-25 23:19:56.679210", "last_run": "2012-07-26 10:00:02.699569", "error_count": 0 }, However there is no data for 2012-07-25 in TCBS etc. I checked and we do have data in raw_adu (if the problem was something like this I'd expect an error). It looks like crontabber finished in under 5m which is also suspicious.
Reporter | ||
Comment 1•12 years ago
|
||
Running backfill as breakpad_rw: breakpad=> select backfill_matviews('2012-01-25', '2012-01-25');
Reporter | ||
Comment 2•12 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #1) > Running backfill as breakpad_rw: > > breakpad=> select backfill_matviews('2012-01-25', '2012-01-25'); Oops of course I actually meant: breakpad=> select backfill_matviews('2012-07-25', '2012-07-25'); (restarted the psql connection etc)
Reporter | ||
Comment 3•12 years ago
|
||
Backfill completed: breakpad=> select backfill_matviews('2012-07-25', '2012-07-25'); INFO: backfilling reports_clean INFO: backfilling 06:00:00 of reports_clean starting at 2012-07-25 00:00:00+00 CONTEXT: SQL statement "SELECT backfill_reports_clean( first_rc, last_rc )" PL/pgSQL function "backfill_matviews" line 48 at PERFORM INFO: backfilling 06:00:00 of reports_clean starting at 2012-07-25 06:00:00+00 CONTEXT: SQL statement "SELECT backfill_reports_clean( first_rc, last_rc )" PL/pgSQL function "backfill_matviews" line 48 at PERFORM INFO: backfilling 06:00:00 of reports_clean starting at 2012-07-25 12:00:00+00 CONTEXT: SQL statement "SELECT backfill_reports_clean( first_rc, last_rc )" PL/pgSQL function "backfill_matviews" line 48 at PERFORM INFO: backfilling 06:00:00 of reports_clean starting at 2012-07-25 18:00:00+00 CONTEXT: SQL statement "SELECT backfill_reports_clean( first_rc, last_rc )" PL/pgSQL function "backfill_matviews" line 48 at PERFORM INFO: backfilling 06:00:00 of reports_clean starting at 2012-07-26 00:00:00+00 CONTEXT: SQL statement "SELECT backfill_reports_clean( first_rc, last_rc )" PL/pgSQL function "backfill_matviews" line 48 at PERFORM INFO: backfilling 02:00:00 of reports_clean starting at 2012-07-26 06:00:00+00 CONTEXT: SQL statement "SELECT backfill_reports_clean( first_rc, last_rc )" PL/pgSQL function "backfill_matviews" line 48 at PERFORM INFO: backfilling other matviews for 2012-07-25 INFO: adu INFO: tcbs NOTICE: table "new_tcbs" does not exist, skipping CONTEXT: SQL statement "DROP TABLE IF EXISTS new_tcbs" PL/pgSQL function "backfill_matviews" line 58 at SQL statement INFO: daily crashes INFO: signatures INFO: hang report INFO: nightly builds INFO: rank_compare INFO: correlations backfill_matviews ------------------- t (1 row)
Reporter | ||
Comment 4•12 years ago
|
||
peterbe, could you please take a look at why this didn't work? See comment 0, logs are on sp-admin01
Assignee: nobody → peterbe
Reporter | ||
Updated•12 years ago
|
Summary: crontabber did not run TCBS etc. for 2012-07-25 → crontabber did not run daily matviews for 2012-07-25
Assignee | ||
Comment 5•12 years ago
|
||
Josh: Can you double-check that the code is right: https://github.com/mozilla/socorro/blob/master/socorro/cron/jobs/matviews.py#L51 In particular: 1. Is the name of the stored procedure spelled correctly? 'update_tcbs' 2. Note that the code doesn't do any "external" transaction commits. 3. Note that the jobs based on the 'MatViewBackfillBase' class take the "now date" (which would have been "2012-07-26 10:00:02") minus 1 day and with the hour part trimmed off, e.g. "2012-07-25". 3b. Because this ran earlier in the release-blur but at a different hour it would have already been fed the "2012-07-25" date.
Assignee | ||
Comment 6•12 years ago
|
||
Josh: Ignore my previous comment. We've figure out what the problem was. Most of the matviews were backfill based and they don't just look at the "next_run" attribute. Because when they run, the also look at the last_run and based on that it takes last_run + frequency till it exceeds "now". So for those it doesn't just run once, it runs as many times as it has missed. Because we manually fudged the "next_run" to correct for the misunderstand of UTC vs. Pacific time, we forgot to also fudge the last_run. And because last_run + frequency < 24 hours, it wasn't time to run them.
Assignee | ||
Comment 7•12 years ago
|
||
I'm now pretty confident the tcbs-matview job will run the next time. In fact, all backfilling jobs that specify a time will run. I've added some tests which hopefully prove it: https://github.com/mozilla/socorro/pull/743 If a job doesn't have a `last_success` to base it's backfilling calculation on, it will instead use the `first_run` date.
Reporter | ||
Comment 8•12 years ago
|
||
Looks like it tried to run, but for the wrong day: InternalError: nightly_builds has already been run for 2012-07-25. InternalError: TCBS has already been run for the day 2012-07-25. etc. Running backfill now: breakpad=> select backfill_matviews('2012-07-26', '2012-07-26');
Summary: crontabber did not run daily matviews for 2012-07-25 → crontabber did not run daily matviews for 2012-07-25 or 26
Reporter | ||
Comment 9•12 years ago
|
||
No TCBS or graphs update, although it does look like everything ran, or at least logged that it did: 2012-07-28 03:00:01,961 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.nightly_builds.NightlyBuildsCronApp'> 2012-07-28 03:00:03,984 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.nightly_builds.NightlyBuildsCronApp'> on 201 2-07-28 10:00:01.467704+00:00 2012-07-28 03:00:04,267 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.matviews.ProductVersionsCronApp'> 2012-07-28 03:00:04,507 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.matviews.ProductVersionsCronApp'> on 2012-07 -28 10:00:04.267924+00:00 2012-07-28 03:00:04,709 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.matviews.SignaturesCronApp'> 2012-07-28 03:03:37,224 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.matviews.SignaturesCronApp'> on 2012-07-28 1 0:00:01.881395+00:00 2012-07-28 03:03:37,249 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.matviews.OSVersionsCronApp'> 2012-07-28 03:03:38,375 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.matviews.OSVersionsCronApp'> on 2012-07-28 1 0:02:56.304722+00:00 2012-07-28 03:03:38,397 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.matviews.TCBSCronApp'> 2012-07-28 03:07:44,977 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.matviews.TCBSCronApp'> on 2012-07-28 10:02:5 7.743661+00:00 2012-07-28 03:07:45,003 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.matviews.ADUCronApp'> 2012-07-28 03:07:45,042 DEBUG - MainThread - error when running <class 'socorro.cron.jobs.matviews.ADUCronApp'> on None Traceback (most recent call last): File "/data/socorro/application/socorro/cron/crontabber.py", line 675, in _run_one for last_success in self._run_job(job_class, config, info): File "/data/socorro/application/socorro/cron/crontabber.py", line 89, in main function(when) File "/data/socorro/application/socorro/cron/crontabber.py", line 128, in _run_proxy self.run(connection, date) File "/data/socorro/application/socorro/cron/jobs/matviews.py", line 31, in run cursor.callproc(self.get_proc_name(), [target_date]) InternalError: update_adu has already been run for 2012-07-27 2012-07-28 03:07:45,067 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.matviews.DailyCrashesCronApp'> 2012-07-28 03:07:53,173 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.matviews.DailyCrashesCronApp'> on 2012-07-28 10:02:57.869434+00:00 2012-07-28 03:07:53,773 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.matviews.HangReportCronApp'> 2012-07-28 03:38:14,925 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.matviews.HangReportCronApp'> on 2012-07-28 1 0:02:57.909317+00:00 2012-07-28 03:38:14,983 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.matviews.RankCompareCronApp'> 2012-07-28 03:38:17,150 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.matviews.RankCompareCronApp'> on 2012-07-28 10:02:58.012266+00:00 2012-07-28 03:38:17,176 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.matviews.NightlyBuildsCronApp'> 2012-07-28 03:38:19,663 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.matviews.NightlyBuildsCronApp'> on 2012-07-2 8 10:03:00.275062+00:00 2012-07-28 03:38:19,801 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.duplicates.DuplicatesCronApp'> 2012-07-28 03:42:06,553 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.duplicates.DuplicatesCronApp'> on 2012-07-28 10:23:49.277860+00:00
Summary: crontabber did not run daily matviews for 2012-07-25 or 26 → crontabber did not run daily matviews for 2012-07-[25,26,27]
Reporter | ||
Comment 10•12 years ago
|
||
Started backfill for yesterday: breakpad=> select backfill_matviews('2012-07-27', '2012-07-27');
Reporter | ||
Comment 11•12 years ago
|
||
peterbe, I don't see a cursor.commit() in either of the run() methods of the matview base classes: https://github.com/mozilla/socorro/blob/master/socorro/cron/jobs/matviews.py#L19 Is this in autocommit mode or done elsewhere? If not then could this be it, since psycopg2 uses transactions by default, right?
Assignee | ||
Comment 12•12 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #11) > peterbe, I don't see a cursor.commit() in either of the run() methods of the > matview base classes: > > https://github.com/mozilla/socorro/blob/master/socorro/cron/jobs/matviews. > py#L19 > > Is this in autocommit mode or done elsewhere? If not then could this be it, > since psycopg2 uses transactions by default, right? It does NOT do any transaction commits in the python code. However, I asked Josh about this twice and he said it wasn't needed. Perhaps he misinterpreted the question. Perhaps we need to quickly add a connection.commit() to all the matviews.py jobs.
Comment 13•12 years ago
|
||
Commit pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/3aefac0030199e60553fc6e44f44fd413e336fe0 bug777670 - commit the transaction on matviews, r=rhelmer
Reporter | ||
Comment 14•12 years ago
|
||
Did not work again tonight, looks like a scheduling problem again maybe? 2012-07-29 03:00:01,465 DEBUG - MainThread - skipping <class 'socorro.cron.jobs.weekly_reports_partitions.WeeklyReportsPartitionsCronApp'> because it's not time to run 2012-07-29 03:00:01,465 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.nightly_builds.NightlyBuildsCronApp'> 2012-07-29 03:00:01,490 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.matviews.ProductVersionsCronApp'> 2012-07-29 03:00:01,661 DEBUG - MainThread - successfully ran <class 'socorro.cron.jobs.matviews.ProductVersionsCronApp'> on 2012-07-29 10:00:01.491261+00:00 2012-07-29 03:00:01,682 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.matviews.SignaturesCronApp'> 2012-07-29 03:00:01,712 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.matviews.OSVersionsCronApp'> 2012-07-29 03:00:01,736 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.matviews.TCBSCronApp'> 2012-07-29 03:00:01,760 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.matviews.ADUCronApp'> 2012-07-29 03:00:01,804 DEBUG - MainThread - error when running <class 'socorro.cron.jobs.matviews.ADUCronApp'> on None Traceback (most recent call last): File "/data/socorro/application/socorro/cron/crontabber.py", line 675, in _run_one for last_success in self._run_job(job_class, config, info): File "/data/socorro/application/socorro/cron/crontabber.py", line 89, in main function(when) File "/data/socorro/application/socorro/cron/crontabber.py", line 128, in _run_proxy self.run(connection, date) File "/data/socorro/application/socorro/cron/jobs/matviews.py", line 32, in run cursor.callproc(self.get_proc_name(), [target_date]) InternalError: update_adu has already been run for 2012-07-27 2012-07-29 03:00:01,829 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.matviews.DailyCrashesCronApp'> 2012-07-29 03:00:01,852 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.matviews.HangReportCronApp'> 2012-07-29 03:00:01,879 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.matviews.RankCompareCronApp'> 2012-07-29 03:00:01,901 DEBUG - MainThread - about to run <class 'socorro.cron.jobs.matviews.NightlyBuildsCronApp'> 2012-07-29 03:00:01,924 DEBUG - MainThread - skipping <class 'socorro.cron.jobs.duplicates.DuplicatesCronApp'> because it's not time to run 2012-07-29 03:00:01,925 DEBUG - MainThread - skipping <class 'socorro.cron.jobs.reports_clean.ReportsCleanCronApp'> because it's not time to run 2012-07-29 03:00:01,925 DEBUG - MainThread - skipping <class 'socorro.cron.jobs.bugzilla.BugzillaCronApp'> because it's not time to run
Summary: crontabber did not run daily matviews for 2012-07-[25,26,27] → crontabber not running daily matviews since 2012-07-25
Reporter | ||
Comment 15•12 years ago
|
||
Since the hourly jobs seem to be running fine, instead of doing a full backfill I am just running the old dailyMatviews job. It only takes around 10min for the most important things (tcbs, signature, etc) and then quite a bit longer for the hang report. If crontabber doesn't work again tomorrow, I'll run it this way in the morning.
Reporter | ||
Comment 16•12 years ago
|
||
Did not work tonight, likely due to previous failures (currently there is no way to distinguish "already ran" from actual failures, which will be fixed separately in the matviews) I have the Socorro v16 daily_matviews scripts checked out, and am running them now since this takes way less time than backfilling.
Comment 17•12 years ago
|
||
Peter and I talked in IRC and agreed that we should roll back to the "old" crontabber until staging is in a good state. Let's get that done today (7/30) so nobody has to be hand-running stuff at 3am tomorrow morning. Thanks for the diligence and hard work, guys.
Reporter | ||
Comment 18•12 years ago
|
||
(In reply to Laura Thomson :laura from comment #17) > Peter and I talked in IRC and agreed that we should roll back to the "old" > crontabber until staging is in a good state. > > Let's get that done today (7/30) so nobody has to be hand-running stuff at > 3am tomorrow morning. > > Thanks for the diligence and hard work, guys. OK, why don't we leave the new code in place and just bring the old code back? That way we can switch or fallback just by changing the system crontab, and it'll be less disruptive than the way we've tried the past few times.
Assignee | ||
Comment 19•12 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #18) > (In reply to Laura Thomson :laura from comment #17) > > Peter and I talked in IRC and agreed that we should roll back to the "old" > > crontabber until staging is in a good state. > > > > Let's get that done today (7/30) so nobody has to be hand-running stuff at > > 3am tomorrow morning. > > > > Thanks for the diligence and hard work, guys. > > OK, why don't we leave the new code in place and just bring the old code > back? That way we can switch or fallback just by changing the system > crontab, and it'll be less disruptive than the way we've tried the past few > times. I suggest we comment out the matviews once from crontabber.ini and leave the code in place. Also, we'll need to re-surrect the old daily_matviews.py code plus scripts.
Reporter | ||
Comment 20•12 years ago
|
||
(In reply to Peter Bengtsson [:peterbe] from comment #19) > (In reply to Robert Helmer [:rhelmer] from comment #18) > > (In reply to Laura Thomson :laura from comment #17) > > > Peter and I talked in IRC and agreed that we should roll back to the "old" > > > crontabber until staging is in a good state. > > > > > > Let's get that done today (7/30) so nobody has to be hand-running stuff at > > > 3am tomorrow morning. > > > > > > Thanks for the diligence and hard work, guys. > > > > OK, why don't we leave the new code in place and just bring the old code > > back? That way we can switch or fallback just by changing the system > > crontab, and it'll be less disruptive than the way we've tried the past few > > times. > > I suggest we comment out the matviews once from crontabber.ini and leave the > code in place. Also, we'll need to re-surrect the old daily_matviews.py code > plus scripts. Makes sense. I'll do a PR.
Reporter | ||
Comment 21•12 years ago
|
||
peterbe, r? https://github.com/mozilla/socorro/pull/751
Comment 22•12 years ago
|
||
Commits pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/80075bf2c97fc1fb280a44df6e51bc23566fd023 bug 777670 - resurrect old daily_matviews until crontabber is ready https://github.com/mozilla/socorro/commit/5bdea1bf5dfa97994887fafb09ea01c85a07cde8 Merge pull request #751 from rhelmer/bug777670-resurrect-daily_matviews bug 777670 - resurrect old daily_matviews until crontabber is ready
Updated•12 years ago
|
Whiteboard: [qa-]
Comment 23•12 years ago
|
||
Commits pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/43fc29564690c352f376ae7bc208ab17a3378986 bug777670 - backfills slow jobs with time https://github.com/mozilla/socorro/commit/9da90ff79a5994b9fae717267b4506ea54b66923 Merge pull request #743 from peterbe/bug777670-backfills-slow-with-time bug777670 - backfills slow jobs with time
Assignee | ||
Comment 24•12 years ago
|
||
This is a bug from the days when we tried to deploy crontabber. ...which we later retracted. crontabber is a different beast these days and will hopefully soon be reattempted.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → INCOMPLETE
You need to log in
before you can comment on or make changes to this bug.
Description
•