Closed
Bug 1024580
Opened 10 years ago
Closed 10 years ago
Socorro didn't (fully) update to current data today (2014-06-12)
Categories
(Socorro :: Backend, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: kairo, Assigned: selenamarie)
References
Details
I noticed today that while some underlying data was present (my custom reports ran fine and those need e.g. product_adu to be updated, AFAIK by the adu-matview job), the Socorro UI was not displaying today's data correctly.
The ADU push came late today (10am UTC instead of usually ~3:20am), so data could have been somewhat delayed anyhow, but looking at things, this wasn't even the issue. Something went down in a strange way in Crontabber land.
What I first noticed was that in https://crash-stats.mozilla.com/crontabber-state/, all daily jobs have a next_run time of "in X hours", which should mean they ran fine and are scheduled again for tomorrow - but some have a last_success of "Y hours ago" (where Y is 24-X, as expected when they have been run fine today) but a lot of them have "a day ago" which suggests they did not run today. Together with the next_run being in the future (would be in the past if today's run was scheduled but not run yet, i.e. overdue), it looks like they were not even scheduled to run today!
Reporter | ||
Comment 1•10 years ago
|
||
Two things that can play into this are that due to bug 1023867, we had almost all daily jobs blocked for hours yesterday and run very late, and bug 1010609 has applied config changes to schedule the non-correlations jobs at 5am instead of 10am UTC (but didn't reset them yet).
I focused together with peterbe on two example jobs that are fairly similar but one did run and one didn't: tcbs-build-matview (not run) and tcbs-matview (run correctly today).
Looking at the times they actually ran last, I came to suspect that those jobs that did perform their very late run yesterday before midnight UTC were correctly scheduled today but those that happened to be run after midnight, i.e. already today, were not scheduled for their regular run today (but for tomorrow right away).
This query from peterbe seems to confirm that with the two examples mentioned:
breakpad=# select app_name,log_time, success, duration from crontabber_log where app_name like '%tcbs%' order by log_time desc limit 10;
app_name | log_time | success | duration
-------------------+-------------------------------+------------------------+----------------
tcbs-build-matview | 2014-06-12 10:22:26.090643+00 | 2014-06-12 05:00:00+00 | 00:00:36.58112
tcbs-matview | 2014-06-12 00:38:32.627713+00 | 2014-06-11 05:00:00+00 | 00:06:08.47249
tcbs-build-matview | 2014-06-11 23:45:49.365399+00 | 2014-06-11 10:00:00+00 | 00:00:37.30777
tcbs-matview | 2014-06-10 18:42:07.702854+00 | 2014-06-10 10:00:00+00 | 00:05:14.33292
tcbs-build-matview | 2014-06-10 10:22:04.266779+00 | 2014-06-10 10:00:00+00 | 00:00:38.205
tcbs-matview | 2014-06-09 11:02:11.661655+00 | 2014-06-09 10:00:00+00 | 00:06:26.26315
tcbs-build-matview | 2014-06-09 10:21:23.024723+00 | 2014-06-09 10:00:00+00 | 00:00:30.6378
tcbs-matview | 2014-06-08 11:04:24.315397+00 | 2014-06-08 10:00:00+00 | 00:06:09.19385
tcbs-build-matview | 2014-06-08 10:23:07.009557+00 | 2014-06-08 10:00:00+00 | 00:00:31.00978
tcbs-matview | 2014-06-07 17:43:44.773219+00 | 2014-06-07 10:00:00+00 | 00:05:05.78041
Reporter | ||
Comment 2•10 years ago
|
||
Note that interestingly, all those jobs run since midnight seem to say they ran for the new 5am timing from bug 1010609 while those ran before midnight have the old 10am time. Also, the /crontabber-state and /api/CrontabberState pages say that the runs for tomorrow are all scheduled for the new 5am time.
Reporter | ||
Comment 3•10 years ago
|
||
Selena said on IRC that it probably was the push of the changed config that created the issue, but bug 1024172 says the push was done at 23:15 UTC (and the comment with that log is from 16:26, 11 minutes later):
[2014-06-11 23:15:12] Finished commit_www
But looking at the crontabber log, it looks like the issues did not start there, they only started at midnight UTC:
breakpad=> select app_name,log_time, success, duration from crontabber_log where log_time > '2014-06-11 00:00:00' and app_name like '%matview' order by log_time desc;
app_name | log_time | success | duration
-----------------------------------------+-------------------------------+-------------------------------+----------------
correlations-core-matview | 2014-06-12 18:34:04.46661+00 | 2014-06-12 07:30:00+00 | 00:25:12.4421
correlations-module-matview | 2014-06-12 17:40:36.632819+00 | 2014-06-12 08:00:00+00 | 03:45:22.14739
correlations-addon-matview | 2014-06-12 13:38:39.205578+00 | 2014-06-12 07:00:00+00 | 02:38:28.14928
adu-matview | 2014-06-12 10:35:20.451389+00 | 2014-06-12 05:00:00+00 | 00:00:00.74012
nightly-builds-matview | 2014-06-12 10:25:49.191775+00 | 2014-06-12 05:00:00+00 | 00:00:03.53962
tcbs-build-matview | 2014-06-12 10:22:26.090643+00 | 2014-06-12 05:00:00+00 | 00:00:36.58112
crashes-by-user-build-matview | 2014-06-12 10:21:26.238424+00 | 2014-06-12 05:00:00+00 | 00:00:44.84509
build-adu-matview | 2014-06-12 10:20:41.18487+00 | 2014-06-12 05:00:00+00 | 00:00:01.27609
signatures-matview | 2014-06-12 10:20:39.729457+00 | 2014-06-12 05:00:00+00 | 00:11:39.44403
signature-summary-products-matview | 2014-06-12 10:09:00.021468+00 | 2014-06-12 05:00:00+00 | 00:01:59.69654
signature-summary-installations-matview | 2014-06-12 10:03:51.180705+00 | 2014-06-12 05:00:00+00 | 00:02:15.01764
product-versions-matview | 2014-06-12 10:01:35.703209+00 | 2014-06-12 10:01:35.312672+00 | 00:00:00.3807
adu-matview | 2014-06-12 10:01:35.148604+00 | | 00:00:00.03901
raw-update-channel-matview | 2014-06-12 10:01:30.561645+00 | 2014-06-12 05:00:00+00 | 00:00:00.11114
android-devices-matview | 2014-06-12 10:01:30.284566+00 | 2014-06-12 05:00:00+00 | 00:00:54.71838
graphics-device-matview | 2014-06-12 10:00:35.248821+00 | 2014-06-12 05:00:00+00 | 00:00:28.33271
explosiveness-matview | 2014-06-12 00:40:24.667944+00 | 2014-06-11 05:00:00+00 | 00:01:15.3627
exploitability-matview | 2014-06-12 00:38:56.210562+00 | 2014-06-11 05:00:00+00 | 00:00:10.39105
crashes-by-user-matview | 2014-06-12 00:38:45.559467+00 | 2014-06-11 05:00:00+00 | 00:00:07.12553
tcbs-matview | 2014-06-12 00:38:32.627713+00 | 2014-06-11 05:00:00+00 | 00:06:08.47249
signature-summary-uptime-matview | 2014-06-12 00:21:53.954498+00 | 2014-06-11 05:00:00+00 | 00:04:08.72477
home-page-graph-matview | 2014-06-12 00:07:39.35401+00 | 2014-06-11 05:00:00+00 | 00:00:04.00544
nightly-builds-matview | 2014-06-11 23:49:17.93129+00 | 2014-06-11 10:00:00+00 | 00:00:03.55446
tcbs-build-matview | 2014-06-11 23:45:49.365399+00 | 2014-06-11 10:00:00+00 | 00:00:37.30777
crashes-by-user-build-matview | 2014-06-11 23:44:44.372092+00 | 2014-06-11 10:00:00+00 | 00:00:42.27451
build-adu-matview | 2014-06-11 23:44:01.892927+00 | 2014-06-11 10:00:00+00 | 00:00:01.59691
signatures-matview | 2014-06-11 23:44:00.114891+00 | 2014-06-11 10:00:00+00 | 00:11:16.14642
correlations-core-matview | 2014-06-11 23:32:43.79662+00 | 2014-06-11 11:30:00+00 | 00:29:33.33036
signature-summary-products-matview | 2014-06-11 23:03:10.264574+00 | 2014-06-11 10:00:00+00 | 00:02:33.93785
correlations-addon-matview | 2014-06-11 22:36:46.355177+00 | 2014-06-11 11:00:00+00 | 02:36:44.61951
signature-summary-installations-matview | 2014-06-11 20:00:01.287253+00 | 2014-06-11 10:00:00+00 | 00:02:20.57461
product-versions-matview | 2014-06-11 19:57:40.501262+00 | 2014-06-11 19:57:39.968419+00 | 00:00:00.52633
adu-matview | 2014-06-11 19:57:39.806435+00 | 2014-06-11 10:00:00+00 | 00:00:01.42956
correlations-module-matview | 2014-06-11 19:26:36.442943+00 | 2014-06-11 12:00:00+00 | 03:53:28.37117
raw-update-channel-matview | 2014-06-11 10:01:32.215334+00 | 2014-06-11 10:00:00+00 | 00:00:00.09673
android-devices-matview | 2014-06-11 10:01:31.944959+00 | 2014-06-11 10:00:00+00 | 00:00:54.79721
graphics-device-matview | 2014-06-11 10:00:35.645732+00 | 2014-06-11 10:00:00+00 | 00:00:28.34705
(37 rows)
Interestingly, up to midnight, all jobs said they completed for the 10am job, afterwards they say they completed for the 5am one. So it looks like the changed config somehow magically started taking effect exactly at midnight.
Assignee | ||
Comment 4•10 years ago
|
||
Backfilling for 6/11/04.
breakpad=# select backfill_matviews('2014-06-11', '2014-06-11', False); -- 'False' is to avoid redoing all the reports clean, which went fine yesterday.
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → sdeckelmann
Status: NEW → ASSIGNED
Reporter | ||
Comment 5•10 years ago
|
||
So, this is from IRC and I think sheds some light on this:
<KaiRo> selenamarie: ah, ok. so we just happened to start the next run after the config change around midnight. and that confused the hell out of crontabber as the next run would be scheduled less than 1d (24h) after the previous one now
<KaiRo> do I get that right?
<selenamarie> KaiRo: yes, i think so. there's some nuance in there - the "confusion" was really ours. the scheduler did what we told it to do and at least one of the dependencies for "make this thing run TODAY" was met
<selenamarie> KaiRo: the underlying issue is that we can't think of jobs as "queued up" -- they are scheduled for Day X, and if we change a config, it resets everything sort of like groundhog day (the movie)
<KaiRo> hmm, ok
<selenamarie> also complicating things is that crontabber runs on UTC, but we reconfigure things in PT
<selenamarie> so... the day changeover isn't always apparent to us
<peterbe> selenamarie: yeah, messing with the backfill is scary business. That's why `--job=my-backfill-app --force` gets ignored.
<KaiRo> selenamarie: ok, so we understand why this happened. that's a good step.
<peterbe> selenamarie: Now we didn't use PST time.
<peterbe> selenamarie: What lonnen did was he changed from 10amUTC to 05amUTC.
<KaiRo> selenamarie: now, when tcbs-matview (and others) run tomorrow, will that fill in the data for 2014-06-11 as well or only the data for -12?
<selenamarie> peterbe: i just mean that we did a push of a config change, near midnight UTC, without considering how it would affect ongoing backfill
<peterbe> KaiRo: it will *not* fill in 11.
<selenamarie> peterbe: i shall refer to this as "the Gremlins problem" from here on out
<peterbe> 11th*
<KaiRo> peterbe: I think selenamarie meant those people doing the prod pushes are living in Pacific Time
<peterbe> ah
<KaiRo> peterbe: ok, so we need to backfill everything for the 11th?
<peterbe> KaiRo: not everything. Only those matviews that didn't get 11th. Right selenamarie ?
<KaiRo> if we do have that granular a backfill, I guess :)
<selenamarie> peterbe: yeah
<selenamarie> we don't have that granular a backfill :)
<selenamarie> but i can at least prevent it from redoing all the reports clean stuff
Assignee | ||
Comment 6•10 years ago
|
||
Backfill complete for 6/11.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•