Closed Bug 843843 Opened 11 years ago Closed 11 years ago

Remove the old socorro cron log nagios check

Categories

(mozilla.org Graveyard :: Server Operations, task)

x86
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ericz, Assigned: ericz)

Details

As per :lonnen, we should remove the check for this alert as it is no longer pertinent as of today.

< nagios-phx1> | Thu 14:50:41 PST [152] sp-admin01.phx1.mozilla.com:Socorro Admin - cron_bugzilla.log is CRITICAL: FILE_AGE CRITICAL: /var/log/socorro/cron_bugzilla.log is 5951 seconds old and 9806793 bytes

IRC conversation:
[16:37] <     lonnen> | ericz: we made changes to our crontab today, and introduced a new single cron that manages and runs scripts that used to be crons
[16:37] <     lonnen> | ericz: used to be meaning... this morning
[16:37] <       ericz> | Does that mean it's ok if /var/log/socorro/cron_bugzilla.log is old?
[16:37] <     lonnen> | ericz: yes. it also means we should disable that alert
[16:38] <      lonnen> | because that job is no longer running
[16:40] <       ericz> | lonnen: Ok, should we disable that alert everywhere (assuming it runs in more than one place) or just specific servers/environments?
[16:41] <     lonnen> | ericz: I believe it runs on the admin node for stage and prod (and dev, but I don't know if nagios is hooked up on dev)
[16:41] <     lonnen> | ericz: and yeah, everywhere. although it should already be disabled on dev and stage
Do the other cron checks need to go too? This is the full list:

/var/log/socorro/cron_bugzilla.log
/var/log/socorro/cron_status.log
/var/log/socorro/cron_create_partitions.log
/var/log/socorro/cron_submitter-crash-reports.allizom.org.log
We're going to hold on this for a bit as there is debate about rolling back to the old checks.
Due to the looming weekend, we're going to config off this new cron system and re-enable the old cron job system. We'll need to re-enable the cron checks listed above, including the bugzilla cron.

Apologies for the confusion before.
No problem.  None of them have been removed yet (just one was ack'd) so we should be good for the weekend.  We can reconvene next week.
Assignee: server-ops → eziegenhorn
Blocks: 844283
No longer blocks: 844283
Thank you!
We have pushed the same change to our crontab back into production. We are watching  closely overnight to make sure we've fixed the bugs from last week.

The following may alert overnight:

/var/log/socorro/cron_bugzilla.log
/var/log/socorro/cron_status.log
/var/log/socorro/cron_create_partitions.log
/var/log/socorro/cron_submitter-crash-reports.allizom.org.log


if all is well in the morning, they will need to be removed
We've had two good overnight runs,  so I think its safe to proceed.
It's a no change Friday but from the nagios side, seems only the following two in a "Critical" State;
/var/log/socorro/cron_bugzilla.log
/var/log/socorro/cron_submitter-crash-reports.allizom.org.log

The rest are ok;
/var/log/socorro/cron_create_partitions.log
/var/log/socorro/cron_status.log
/var/log/socorro/socorro-monitor.log

So to make sure, which ones are we now keeping?
From the original request, seems we were only removing;
/var/log/socorro/cron_bugzilla.log

So to make sure this is done correct, please clarify which ones you want to remove come Monday.
Flags: needinfo?(chris.lonnen)
Sorry for the confusion. Please remove:

/var/log/socorro/cron_bugzilla.log
Flags: needinfo?(chris.lonnen)
Removed socorro-admin-cron_bugzilla check.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.