Closed
Bug 1470703
Opened 6 years ago
Closed 6 years ago
Monitor timing data for crontabber jobs
Categories
(Socorro :: Backend, task, P2)
Socorro
Backend
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: osmose, Assigned: willkg)
References
Details
Attachments
(1 file)
Our monitoring infrastructure in Datadog has no insight into crontabber currently. During the last all-hands we discussed wanting to know timing data for crontabber. This bug covers determining what data we want to collect about crontabber and where we want to send it, then implementing that collection. As a starting point, we could send datadog a ping containing the runtime and name of a crontabber job after it has finished running.
Reporter | ||
Updated•6 years ago
|
Priority: -- → P2
Assignee | ||
Comment 1•6 years ago
|
||
Crontabber is tricky because the app is in a library that other people possibly use. I think if we want to do this, the best plan involves forking crontabber and vendoring it in the Socorro repo. That gives us a lot more flexibility. If we did that, then adding metrics around things is like a 20 minute task.
Comment 2•6 years ago
|
||
My idea is to capture timing in two gauges [0], e.g. crontabber.job_success_runtime and crontabber.job_failure_runtime, that are chosen based on whether the job succeeded or failed. We would store the job name (and any other useful metadata) in tags. Then we can track normal runtimes for each job and alert if any of them fail. [0] we can use a gauge instead of a histogram because the same job will never complete more than once within a ten second interval, right?
Assignee | ||
Comment 3•6 years ago
|
||
Brian: I don't think jobs kick off more often than every 5 minutes. I decided to vendor crontabber before doing this since that makes this easier to do. Making this block on that work.
Depends on: 1478080
Assignee | ||
Comment 4•6 years ago
|
||
Grabbing this to do now.
Assignee: nobody → willkg
Status: NEW → ASSIGNED
Assignee | ||
Comment 5•6 years ago
|
||
Comment 6•6 years ago
|
||
Commits pushed to master at https://github.com/mozilla-services/socorro https://github.com/mozilla-services/socorro/commit/12fd5e489b78b0aefb002a8458977b57561b2687 fix bug 1470703: add crontabber job failure/success metrics https://github.com/mozilla-services/socorro/commit/0558bf2bad26b1f0282ba231a3dc139eb8ef7169 Merge pull request #4606 from willkg/1470703-timing-crontabber fix bug 1470703: add crontabber job failure/success metrics
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•