If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Fix crontabber to target specific hours for hourly jobs

RESOLVED FIXED in 28

Status

Socorro
Backend
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: jberkus, Assigned: peterbe)

Tracking

unspecified
x86
Mac OS X

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [qa-])

(Reporter)

Description

5 years ago
Currently, when running hourly jobs, crontabber runs according to the current clock time.  For example, for reportsClean, it takes the current clock time, subtracts 2 hours, and runs for an hour window ending at that time.

Instead, we should be targeting a specific hour of the clock/calendar, e.g. '2012-07-27 16:00:00'-'2012-07-27 17:00:00'.  This also means that backfill will make more sense/be possible.
(Assignee)

Comment 1

5 years ago
It depends on the job. It's not generic to the crontabber framework. The matviews for example, take todays date minus one day and then strips the hour/minute/seconds part and just feeds in the date. 

We have two options to do specific hours and your expert advice is needed. 

1. We specify a dedicated time when a job is supposed to run. E.g. 16:00 and let the app itself just take the current hour and minute as input into whatever it does. 

2. Let it run any time during the day but inside the job itself we strip the hour/minute and replace it with our own. E.g. it's run at 2012-07-31 12:34 but we change it to 2012-07-31 16:00:00 before it's passed on to the stored procedure or whatever it's used for.
(Assignee)

Updated

5 years ago
Assignee: peterbe → josh
(Assignee)

Comment 2

5 years ago
Josh, I changed the assignment over to you for the moment. You'll need to address the options and point at the relevant jobs where it applies. I'll take over the bug once that's settled. More than happy to do the actual coding.
(Reporter)

Comment 3

5 years ago
The jobs I'm talking about are reports-clean and reports-duplicates.  product-versions, while it does run hourly, is cumulative an does not take a parameter.

I don't understand your two options.

So, the hourly targeted jobs (reports-clean and reports-duplicates) need to run for each clock hour, just as the daily jobs run for each calendar day.  So if, for some reason, crontabber has been down for 8 hours, we need to run the reports-clean hourly job for each of those 8 hours.  This becomes much easier to track if we're targeting clock hours, i.e. 16:00 to 17:00, not an arbitrary delta from when the job kicked in (e.g. 16:23:52 to 17:23:52). 

For one thing, the problem with the arbitrary delta is that if the time at which the crontabber starts the hourly job shifts for some reason (such as adding new jobs or an outage), we'll have errors and/or data loss.  Second, I really don't understand how you can make backfill work with an arbitrary delta.

Back to you!
Assignee: josh → peterbe

Updated

5 years ago
Target Milestone: 20 → 21
(Assignee)

Comment 4

5 years ago
It's been a little while now so without looking at the code, I'm pretty certain this problem has been solved. I know for a fact that it has been solved for daily jobs. 

So, if say "run every second day" at "17:00" and the job takes a whole minute to complete, it won't run the second time at 17:01 and the one after that at 17:02. No it'll run at 17:00 and 17:00 and 17:00. 

However, since it synchronous, you could potentially have configured 2 slow jobs to run at 17:00. If the first takes 1 minute, the second job will always start at 17:01. 

In an earlier version what it did was that it took the periodicity, e.g. 2 days (==48 hours) and then made the next_run value to be next_run=NOW()+48 hours. 
In the most recent version (pending review) it's next_run=NOW()+48 hours @ 17:00. 

I guess the best way for me is to confirm this works equally for the hourly jobs. I'm more than happy to do that with a fully manual test once we land the code in review.
(Assignee)

Updated

5 years ago
Depends on: 781010
(Reporter)

Comment 5

5 years ago
Peter,

I am certain that it hasn't been fixed, since you've never understood what I was asking for.  We really need to meet about this, preferably with Selena.
(Assignee)

Comment 6

5 years ago
(In reply to [:jberkus] Josh Berkus from comment #5)
> Peter,
> 
> I am certain that it hasn't been fixed, since you've never understood what I
> was asking for.  We really need to meet about this, preferably with Selena.

Ok. Fair enough. I'm going PTO for 3 days after today. 

Selena and I will be in the office today if that would be a good time for a chat?
(Assignee)

Comment 7

5 years ago
Some notes from todays meeting about this.

* I believe the functionality required is all there once all current patches are reviewed. 

* My only "weakness in confidence" is that of hourly jobs and backfilling being done with the correct backfilled hour. 

** e.g. If a job is set to run on the hour every day (01:00, 02:00, 03:00, etc.) and for some reason it doesn't for a while (dependencies failing for example) then when it backfills does it send the exact hour as the parameter. I.e. is the datetime parameter sent in something like "2012-09-24 03:00:00, 2012-09-24 04:00:00, 2012-09-24 05:00:00"

** Once the patches land I will set up a live simulation on my laptop and attempt to reproduce and check that it works as expected.
(Assignee)

Updated

5 years ago
Target Milestone: 21 → 22
(Assignee)

Updated

5 years ago
Target Milestone: 22 → 23
(Assignee)

Updated

5 years ago
Target Milestone: 23 → 24
(Assignee)

Updated

5 years ago
Target Milestone: 24 → 25
(Assignee)

Updated

5 years ago
Target Milestone: 25 → 26
(Assignee)

Updated

5 years ago
Target Milestone: 26 → 27
(Assignee)

Updated

5 years ago
Target Milestone: 27 → 28
(Assignee)

Comment 8

5 years ago
Good news! I believe it works!

So I set up a simulation containing 3 jobs.
One, Two (depends on One) and Three (depends on One and Two)
They're configured to run every 1 hour. The config is simple::

    jobs='''
        sluggish.jobs.SlowOne|1h
	sluggish.jobs.SlowTwo|1h
	sluggish.jobs.SlowThree|1h
    '''

I called it "sluggish" because it's built to be deliberately slow jobs. They take several seconds to complete. 

They're all backfill based. 

What I also did was insert a 10% chance (on each job!) that they'd raise some exception. 

All the job does is basically this: http://www.hastebin.com/pomasodine.py

So as you can see, internally it barfs if it's fed the same date parameter more than once. Just like many of the stored procedures do. 

So I run this for a while and then look at all the log files it creates and here is the result: https://gist.github.com/4034425

The good news is that no date (I call it "date" but it's actually a datetime) is ever repeated. That's despite a complex dependency graph, sporadic exceptions and me being away from the computer to run it every 5 minutes. The backfilling just works (TM)!

The "bad" news is that crontabber has no way to get a particular minute on the hourly jobs. The reason you see *:27 in the log files above is because it all started at 27 min past the hour when it all started and then it (re-)cycled from there. So, if your particular inner job that your crontabber app wraps needs a specific hour, you have to do it yourself. For example, a stored procedure might need to kick off at exactly 15 minutes past the hour; then you have to do this::

 def run(self, connection, date):
    date = date.replace(minute=15)
    assert date.strftime('%M') == '15'
    cursor = connection.cursor()
    cursor.callproc('my_stored_proc', date)

So, in other words. It works!!
(Assignee)

Updated

5 years ago
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED

Updated

5 years ago
Whiteboard: [qa-]
You need to log in before you can comment on or make changes to this bug.