Closed Bug 1470703 Opened 6 years ago Closed 6 years ago

Monitor timing data for crontabber jobs

Categories

(Socorro :: Backend, task, P2)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: osmose, Assigned: willkg)

References

Details

Attachments

(1 file)

Our monitoring infrastructure in Datadog has no insight into crontabber currently. During the last all-hands we discussed wanting to know timing data for crontabber. This bug covers determining what data we want to collect about crontabber and where we want to send it, then implementing that collection.

As a starting point, we could send datadog a ping containing the runtime and name of a crontabber job after it has finished running.
Priority: -- → P2
Crontabber is tricky because the app is in a library that other people possibly use. I think if we want to do this, the best plan involves forking crontabber and vendoring it in the Socorro repo. That gives us a lot more flexibility. If we did that, then adding metrics around things is like a 20 minute task.
My idea is to capture timing in two gauges [0], e.g. crontabber.job_success_runtime and crontabber.job_failure_runtime, that are chosen based on whether the job succeeded or failed. We would store the job name (and any other useful metadata) in tags.

Then we can track normal runtimes for each job and alert if any of them fail.

[0] we can use a gauge instead of a histogram because the same job will never complete more than once within a ten second interval, right?
Brian: I don't think jobs kick off more often than every 5 minutes.

I decided to vendor crontabber before doing this since that makes this easier to do. Making this block on that work.
Depends on: 1478080
Grabbing this to do now.
Assignee: nobody → willkg
Status: NEW → ASSIGNED
Commits pushed to master at https://github.com/mozilla-services/socorro

https://github.com/mozilla-services/socorro/commit/12fd5e489b78b0aefb002a8458977b57561b2687
fix bug 1470703: add crontabber job failure/success metrics

https://github.com/mozilla-services/socorro/commit/0558bf2bad26b1f0282ba231a3dc139eb8ef7169
Merge pull request #4606 from willkg/1470703-timing-crontabber

fix bug 1470703: add crontabber job failure/success metrics
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: