Closed
Bug 1010609
Opened 11 years ago
Closed 11 years ago
start matviews earlier in the morning
Categories
(Socorro :: Database, task)
Socorro
Database
Tracking
(Not tracked)
RESOLVED
FIXED
89
People
(Reporter: rhelmer, Assigned: rhelmer)
Details
(Whiteboard: [config change])
Kairo says ADI usually comes in by 4 AM, any reason not to set the matviews to start earlier?
![]() |
||
Comment 1•11 years ago
|
||
When ADI come late, they often come significantly late - and then we just run once the data is available anyhow due to the retry magic in crontabber. That magic we have now might actually be the best reason to run those jobs pretty much as soon as possible in the normal case.
With a query like this one we can see when ADI actually came in for different days this year:
SELECT date,count(*),max(received_at) FROM raw_adu WHERE date BETWEEN '2014-01-01' AND '2014-05-15' GROUP BY date ORDER BY date ASC;
There is clearly a shift with the DST switch, I wonder if the received_at dates are UTC or Pacific, actually, I think it might be local time but it looks like the DB thinks it's UTC (because I see "+00" at the end of those dates).
In any case, it seems to say that in non-DST times, we get ADI between 4:15 and 4:30, in DST times between 3:15 and 3:30. Outliers happen but are not extremely common, and are often (way) after 12:00, and bug 1010257 comment #3 sounds like we start the jobs at 10 now, usually, so in many of those cases, we do our retry dance anyhow. If I assume that's all Pacific Time, then there's a lot of room how much we can slide to earlier there.
Comment 2•11 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #1)
> There is clearly a shift with the DST switch, I wonder if the received_at
> dates are UTC or Pacific, actually, I think it might be local time but it
> looks like the DB thinks it's UTC (because I see "+00" at the end of those
> dates).
We only use the TIMESTAMP WITH TIMEZONE type in our database. As a result, database literally stores the dates in UTC and converts them to whatever timezone is specified either in the query or in the client connection.
In this particular case, the database uses 'now()' as a trigger.
For more information about Postgres' date/time types: http://www.postgresql.org/docs/current/static/datatype-datetime.html
I looked at all the dates from 2013. Here's where things appeared to change in 2013:
2013-11-01 | 41684 | 2013-11-02 03:12:06.660936+00
2013-11-02 | 38141 | 2013-11-03 03:11:44.881357+00
2013-11-03 | 38786 | 2013-11-04 04:11:36.319836+00
2013-11-04 | 43067 | 2013-11-05 23:18:12.025172+00
2013-11-05 | 44312 | 2013-11-06 04:13:38.031002+00
And, DST started in the US Nov 3 last year.
So yeah, the hour shift was because of daylight savings.
![]() |
||
Comment 3•11 years ago
|
||
(In reply to Selena Deckelmann :selenamarie :selena from comment #2)
> We only use the TIMESTAMP WITH TIMEZONE type in our database. As a result,
> database literally stores the dates in UTC and converts them to whatever
> timezone is specified either in the query or in the client connection.
OK, so the "+00" there means that it's actually UTC? I'm just asking because the ADI data coming in at 3 or 4am UTC is much earlier than is expected (but it's awesome, we could really shift *much* earlier in the day with that).
Comment 4•11 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #3)
> (In reply to Selena Deckelmann :selenamarie :selena from comment #2)
> > We only use the TIMESTAMP WITH TIMEZONE type in our database. As a result,
> > database literally stores the dates in UTC and converts them to whatever
> > timezone is specified either in the query or in the client connection.
>
> OK, so the "+00" there means that it's actually UTC? I'm just asking because
> the ADI data coming in at 3 or 4am UTC is much earlier than is expected (but
> it's awesome, we could really shift *much* earlier in the day with that).
Yes. Here is some documentation on time zone designations: https://en.wikipedia.org/wiki/ISO_8601#Time_zone_designators
The caveat here being that while it typically arrives at that time, as you know, it can vary quite a bit. If we adjust the timing of reports, I don't want to set the expectation that we have any control over ADI that arrives late or that the Socorro team can do anything about making it happen faster. Like, if we move it, getting tickets every time ADI is late assigned to us seems like a Really Bad Deal.
![]() |
||
Comment 5•11 years ago
|
||
(In reply to Selena Deckelmann :selenamarie :selena from comment #4)
> The caveat here being that while it typically arrives at that time, as you
> know, it can vary quite a bit. If we adjust the timing of reports, I don't
> want to set the expectation that we have any control over ADI that arrives
> late or that the Socorro team can do anything about making it happen faster.
> Like, if we move it, getting tickets every time ADI is late assigned to us
> seems like a Really Bad Deal.
Sure. We know it it being pushed by metrics (though we are working on a pull mechanism instead). Also, we usually do not file bugs against Socorro on late ADI any more because Socorro can recover from late pushes easily enough nowadays, nowadays we ask metrics about the issues. ;-)
Running our jobs at, say, 5am UTC would mean that we'd run it at 5/6pm in dmajor's timezone for example, which means he still might have some chance to see results at the end of his day if he needs to. And for Europeans it would be available early in our work day so we could already work with the data well enough.
Comment 6•11 years ago
|
||
I think it makes sense. The whole 2am Pacific time was a desperate construct of times past.
One concern I have is that changing the time a backfill job is potentially tricky. Actually without actually testing it I'm not so sure I know exactly what to expect. Considering that we're not actually changing the date (in UTC) but only the hour, that means most of our matviews should be fine, right? They don't accept a full datetime do they?
Actually, if there are any problems we're, these days, much more comfortable with doing a simple reset.
Comment 7•11 years ago
|
||
(In reply to Peter Bengtsson [:peterbe] from comment #6)
> I think it makes sense. The whole 2am Pacific time was a desperate construct
> of times past.
Let's coordinate the change so that if something bad happens (like we hit huge load on the DBs or something) we can kill off the jobs and adjust the times again.
I'll be out Fri/Mon/Tues, but otherwise will be around for the next few weeks.
![]() |
||
Comment 8•11 years ago
|
||
(In reply to Peter Bengtsson [:peterbe] from comment #6)
> One concern I have is that changing the time a backfill job is potentially
> tricky. Actually without actually testing it I'm not so sure I know exactly
> what to expect. Considering that we're not actually changing the date (in
> UTC) but only the hour, that means most of our matviews should be fine,
> right? They don't accept a full datetime do they?
Every now and then we already run them later (because we fail and retry if the ADI push is late), so I believe we are not hour-sensitive there.
![]() |
||
Comment 9•11 years ago
|
||
Can we move forward with this?
We are again and again getting into the situation of bug 1010257 where some of the data I use for assessing our stability comes in very late in the European work day and I may not have the info requested of me for the meetings I report to.
Given that the sources of that data are cut off at 5pm PDT (midnight UTC), it's really strange to people why I can't report on the numbers at 10am PDT the next day.
Comment 10•11 years ago
|
||
Commits pushed to master at https://github.com/mozilla/socorro
https://github.com/mozilla/socorro/commit/09d0a42f7518f0daa1166a2482681436c4fd8078
fixes bug 1010609 by running daily crons earlier
ADU is usually in between 0400 and 0430 so 0500 should be an acceptable time to
run. If it is late the self healer will kick in. Also adds a slightly wider gap
between daily jobs and the correlations job.
https://github.com/mozilla/socorro/commit/d33d0e61069d8b7ae64b7e7f415c9e31ff7bf499
Merge pull request #2095 from lonnen/1010609-rise-and-shine
fixes bug 1010609 by running daily crons earlier
Updated•11 years ago
|
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 11•11 years ago
|
||
Did you reset all the jobs on the production crontabber? They are all in a state of running at 10:00 + 24h forever.
Assignee | ||
Comment 12•11 years ago
|
||
(In reply to Peter Bengtsson [:peterbe] from comment #11)
> Did you reset all the jobs on the production crontabber? They are all in a
> state of running at 10:00 + 24h forever.
How do we do this? ```crontabber_app.py --reset-job``` for every one that was changed?
Flags: needinfo?(peterbe)
Comment 13•11 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #12)
> (In reply to Peter Bengtsson [:peterbe] from comment #11)
> > Did you reset all the jobs on the production crontabber? They are all in a
> > state of running at 10:00 + 24h forever.
>
> How do we do this? ```crontabber_app.py --reset-job``` for every one that
> was changed?
Yes.
Either crontabber_app.py --reset-job=socorro.cron.jobs.matviews.SignaturesCronApp
or crontabber_app.py --reset-job=signatures-matview
Flags: needinfo?(peterbe)
Comment 14•11 years ago
|
||
I have not reset any jobs yet. The auto closer merely closed. Need to reset them on stage and prod.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•11 years ago
|
Whiteboard: [config change]
Updated•11 years ago
|
Target Milestone: --- → 89
Comment 15•11 years ago
|
||
There was an investigation of overnight crontabber issues earlier today. Peter -- from what I read this morning it sounds like jobs started at the new time with no need to reset. Do I still need to go through and manually reset them?
Flags: needinfo?(peterbe)
![]() |
||
Comment 16•11 years ago
|
||
Bug 1024580 is about the problems of this morning, FWIW.
Comment 17•11 years ago
|
||
No, need to manually reset them because they're on the the 5am regime now. I had forgotten that how it works is that it takes the periodicity (24h) and makes the next_run out of that but before it saves it it uses the specified time to set the hour and minute on the next_run.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Flags: needinfo?(peterbe)
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•