Closed
Bug 774161
Opened 13 years ago
Closed 13 years ago
reduce the # of cron errors sent to infra-dbnotices
Categories
(Data & BI Services Team :: DB: MySQL, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: cshields, Assigned: scabral)
Details
(Whiteboard: [2012q3])
(internal IT goal) Focus some time in Q3 on reducing the number of these errors. If some help is needed in adding a bit more intelligence to our cron scripts so they are more resilient w/r/t error cases, reach out to Rob for help.
Comment 1•13 years ago
|
||
The goal is to "reduce the number of these errors." How are we going to measure success here since it doesn't appear to be an all or nothing task. Is there some percentage of clean up we are looking for?
| Assignee | ||
Comment 2•13 years ago
|
||
So, some numbers:
From a random sampling of Jul 1-7th (one week), there were 49 different e-mail threads, with 2,048 messages (each e-mail thread had multiple messages). But there are only a few different actual errors - for example, the kill_pigs-from-cron.pl script accounted for 34 of the 49 threads.
So I'd say 50% reduction.
| Assignee | ||
Comment 3•13 years ago
|
||
While looking into the numbers, I'm noticing that kill_pigs-from-cron.pl are all coming from:
app-bugs01
app-bugs02
app-bugs03
It should be relatively easy to stop these from coming to cron. These are actually not errors, they're output, but the output is relatively useless, all we get is the PID and the # of seconds, for example:
"killing pid 22635 at 360 seconds."
That doesn't really help unless we have the general log turned on, because the slow query log doesn't actually log the query until it finishes (so it knows how long the whole query took). If the script showed us what query was being run it'd be more useful.
I think it's reasonable to set a goal of optimizing the killed queries so they happen less frequently. If they're just crazy searches, we may want to consider logging the kill information locally on the machine, not to infra-dbnotices (if it's truly just noise).
| Assignee | ||
Comment 4•13 years ago
|
||
Adding bug 775248 - another frequent cron error (every 15 minutes) is the fact that addons1 can't copy binary logs to db-backup1.ops.phx1.mozila.com for incremental backup purposes. Once that's done we should change the script to copy from addons7 anyway.
| Assignee | ||
Comment 5•13 years ago
|
||
As for the backup logs, the name has changed and the VLAN, so I changed the script to be:
BACKUPDEST="backup1.db.phx1.mozilla.com"
I changed this on addons1 itself, and am working on changing puppet now.
| Assignee | ||
Comment 6•13 years ago
|
||
This file is not under puppet control, so that's not a problem. This particular cron script is A-OK.
| Assignee | ||
Comment 7•13 years ago
|
||
We are down to:
10 threads from Aug 16-23rd, 38 new messages. I'm calling this resolved. Next Q we can work on resolving specific errors that haven't been resolved here, now that we can actually handle reading the threads.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Data & BI Services Team
You need to log in
before you can comment on or make changes to this bug.
Description
•