Closed
Bug 1010257
Opened 10 years ago
Closed 7 years ago
product_crash_ratio and /daily data coming very late every now and then since 2014-05-13
Categories
(Socorro :: Database, task)
Socorro
Database
Tracking
(Not tracked)
RESOLVED
INVALID
People
(Reporter: kairo, Unassigned)
Details
My daily jobs pull data out of Socorro with a query similar to this: SELECT adu_date, version_string as version, adjusted_crashes as crashes, adu_count as adu FROM product_crash_ratio WHERE product_name = 'Firefox' AND version_string IN ('29.0.1','29.0','30.0b','30.0b3','31.0a2','32.0a1') AND adu_date BETWEEN '2014-05-10' AND '2014-05-14' ORDER BY adu_date DESC, version DESC; This doesn't include data for yesterday, 20154-05-13, even though it should. https://crash-stats.mozilla.com/daily?p=Firefox also misses the data for this day (might pull from the same table or set of tables in the background).
Comment 1•10 years ago
|
||
Here's the views and tables: breakpad=# \d+ product_crash_ratio View "public.product_crash_ratio" Column | Type | Modifiers | Storage | Description --------------------+--------------+-----------+----------+------------- product_version_id | integer | | plain | product_name | citext | | extended | version_string | citext | | extended | adu_date | date | | plain | crashes | bigint | | plain | adu_count | bigint | | plain | throttle | numeric(5,2) | | main | adjusted_crashes | integer | | plain | crash_ratio | numeric | | main | View definition: SELECT crcounts.product_version_id, product_versions.product_name, product_versions.version_string, crcounts.report_date AS adu_date, sum(crcounts.report_count)::bigint AS crashes, sum(crcounts.adu) AS adu_count, product_release_channels.throttle::numeric(5,2) AS throttle, sum(crcounts.report_count::numeric / product_release_channels.throttle)::integer AS adjusted_crashes, crash_hadu(sum(crcounts.report_count)::bigint, sum(crcounts.adu), product_release_channels.throttle) AS crash_ratio FROM crashes_by_user_rollup crcounts JOIN product_versions ON crcounts.product_version_id = product_versions.product_version_id JOIN product_release_channels ON product_versions.product_name = product_release_channels.product_name AND product_versions.build_type = product_release_channels.release_channel GROUP BY crcounts.product_version_id, product_versions.product_name, product_versions.version_string, crcounts.report_date, product_release_channels.throttle; breakpad=# \d+ crashes_by_user_rollup View "public.crashes_by_user_rollup" Column | Type | Modifiers | Storage | Description --------------------+---------+-----------+----------+------------- product_version_id | integer | | plain | report_date | date | | plain | os_short_name | citext | | extended | report_count | bigint | | plain | adu | integer | | plain | View definition: SELECT crashes_by_user.product_version_id, crashes_by_user.report_date, crashes_by_user.os_short_name, sum(crashes_by_user.report_count) AS report_count, min(crashes_by_user.adu) AS adu FROM crashes_by_user JOIN crash_types USING (crash_type_id) WHERE crash_types.include_agg GROUP BY crashes_by_user.product_version_id, crashes_by_user.report_date, crashes_by_user.os_short_name; breakpad=# \d+ crashes_by_user Table "public.crashes_by_user" Column | Type | Modifiers | Storage | Stats target | Description --------------------+---------+-----------+----------+--------------+------------- product_version_id | integer | not null | plain | | os_short_name | citext | not null | extended | | crash_type_id | integer | not null | plain | | report_date | date | not null | plain | | report_count | integer | not null | plain | | adu | integer | not null | plain | | Indexes: "crashes_by_user_key" PRIMARY KEY, btree (product_version_id, report_date, os_short_name, crash_type_id) Foreign-key constraints: "crashes_by_user_crash_type_id_fkey" FOREIGN KEY (crash_type_id) REFERENCES crash_types(crash_type_id) Has OIDs: no
Reporter | ||
Comment 2•10 years ago
|
||
Note that this was updated fine every day so far, but seems to have not been today for some reason. My first guess was that the matview update process died for some reason before finishing this but Selena said on IRC that she saw no errors.
Comment 3•10 years ago
|
||
Investigated this - the way we're doing dependencies is delaying this unnecessarily (every nightly job that depends on reports_clean is depending on the very latest hour of reports_clean to be up-to-date, when it only really matters if reports_clean is up-to-date as of yesterday UTC). Also, I notice that crontabber is running a job due at 11am (CorrelationsAddonCronApp) before a job due at 10am (CrashesByUserCronApp) - when I added the Correlations*CronApp jobs I purposely set the time higher because I assumed that crontabber would finish up all the other matviews first, since the correlations ones can be quite slow: socorro.cron.jobs.matviews.CorrelationsAddonCronApp|1d|11:00 socorro.cron.jobs.matviews.CrashesByUserCronApp|1d|10:00
Flags: needinfo?(peterbe)
Comment 4•10 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #3) > Investigated this - the way we're doing dependencies is delaying this > unnecessarily (every nightly job that depends on reports_clean is depending > on the very latest hour of reports_clean to be up-to-date, when it only > really matters if reports_clean is up-to-date as of yesterday UTC). > I filed bug 1010362 to improve this situation.
Reporter | ||
Comment 5•10 years ago
|
||
I see the data for yesterday now. 1) Is that because this ran to completion by itself? 2) Do I need to expect getting this so very late every day until bug 1010362 is fixed?
Comment 6•10 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #5) > I see the data for yesterday now. > 1) Is that because this ran to completion by itself? Yes > 2) Do I need to expect getting this so very late every day until bug 1010362 > is fixed? There are two bugs here - that one, and crontabber isn't honoring the start time (or I misunderstand what it's for). Fixing either of these will help - bug 1010362 I could get done pretty quickly I'll take it.
Reporter | ||
Comment 7•10 years ago
|
||
FWIW, today's data came in as early as usual and my jobs could pull it at the normal time it runs, which is pretty early.
Comment 8•10 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #3) > Investigated this - the way we're doing dependencies is delaying this > unnecessarily (every nightly job that depends on reports_clean is depending > on the very latest hour of reports_clean to be up-to-date, when it only > really matters if reports_clean is up-to-date as of yesterday UTC). > > Also, I notice that crontabber is running a job due at 11am > (CorrelationsAddonCronApp) before a job due at 10am (CrashesByUserCronApp) - > when I added the Correlations*CronApp jobs I purposely set the time higher > because I assumed that crontabber would finish up all the other matviews > first, since the correlations ones can be quite slow: > > socorro.cron.jobs.matviews.CorrelationsAddonCronApp|1d|11:00 > socorro.cron.jobs.matviews.CrashesByUserCronApp|1d|10:00 So this is sadly quite non-trivial. I think. I dug into crontabber and made it print the jobs instead of running them. So, first the code gets the list of jobs out from the configuration (the massive JOBS_LIST list). Also, I made it only list the ones we're interested in and you get:: ReportsCleanCronApp CrashesByUserCronApp CorrelationsAddonCronApp CorrelationsCoreCronApp CorrelationsModuleCronApp Then it calls the function _reorder_class_list() whose responsibility is to rearrange the order so that we run "parents" before we run "children". If it didn't do this, there's a risk that you try to run a child before a parent and if that happens the child will have to wait one whole cycle till it gets run. Here's the output of that function: ReportsCleanCronApp CorrelationsModuleCronApp CorrelationsAddonCronApp CorrelationsCoreCronApp CrashesByUserCronApp It clearly demonstrates the problem you explained. The code is here: https://github.com/mozilla/crontabber/blob/master/crontabber/base.py#L24-L84 Lonnen and I wrote that. I don't remember exactly how it works but I can look into it. The thing is, the CrashesByUserCronApp depends on two other apps and CrashesBy*CronApp only depends on 1. Perhaps this whole thing can be solved by using an ordereddict for the sub-list of dependencies. So, for that I've filed https://github.com/mozilla/crontabber/issues/21
Flags: needinfo?(peterbe)
Comment 9•10 years ago
|
||
Come to think of it, this is only a problem the first time. The next day, the crashesperuser will kick in at 10 and the correlations will kick in at 11. I.e. when the heartbeat beats at, say, 10:05 it'll see that `reports-clean` passed and `adu` passed because `reports-clean` passed so the next jobs to consider are: 1. Correlations*CronApp 2. CrashesByUserCronApp That's the wrong order but when it sees that the clock is 10:05 and that Correlations*CronApp isn't supposed to run until 11:00 it'll skip those and process CrashesByUserCronApp. Right?
Reporter | ||
Comment 10•10 years ago
|
||
Sometimes it feels like this bug is mocking me. Almost like any day I'm eager to see the data, it runs into the bug and comes really late - and on the weekend, when I don't care, it comes right away (i.e. jobs done in the right order). And if we do bug 1010609, this just goes away, right? Could we just go there?
Summary: product_crash_ratio and /daily missing data for 2014-05-13 → product_crash_ratio and /daily data coming very late every now and then since 2014-05-13
Updated•10 years ago
|
Assignee: selenamarie → nobody
Comment 11•10 years ago
|
||
I'm not sure why this bug was assigned to me -- the details of it went in a non-Postgres directly. :) :rhelmer, :peterbe is this something either of you can tackle?
Reporter | ||
Comment 12•10 years ago
|
||
FWIW, today is the third day in a row that we have run into this bug.
Comment 13•10 years ago
|
||
(In reply to Selena Deckelmann :selenamarie :selena from comment #11) > I'm not sure why this bug was assigned to me -- the details of it went in a > non-Postgres directly. :) > > :rhelmer, :peterbe is this something either of you can tackle? This one is a crontabber bug, I think I'd rather leave it to :peterbe :) I have bug 1010362 assigned which should help the problem by making the dependencies more sane.
Comment 14•10 years ago
|
||
Oh also bug 1010609 (start matviews earlier in the morning) could be a quick-fix here, too.
Reporter | ||
Comment 15•10 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #12) > FWIW, today is the third day in a row that we have run into this bug. And today is the 6th in a row. Looks like we run into this issue every day now. :(
Comment 16•7 years ago
|
||
The Postgres based daily report is gone and replaced with Crashes Per Day which is based on SuperSearch.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INVALID
You need to log in
before you can comment on or make changes to this bug.
Description
•