Remove the old socorro cron log nagios check

RESOLVED FIXED

Status

mozilla.org Graveyard
Server Operations
RESOLVED FIXED
5 years ago
3 years ago

People

(Reporter: ericz, Assigned: ericz)

Tracking

Details

(Assignee)

Description

5 years ago
As per :lonnen, we should remove the check for this alert as it is no longer pertinent as of today.

< nagios-phx1> | Thu 14:50:41 PST [152] sp-admin01.phx1.mozilla.com:Socorro Admin - cron_bugzilla.log is CRITICAL: FILE_AGE CRITICAL: /var/log/socorro/cron_bugzilla.log is 5951 seconds old and 9806793 bytes

IRC conversation:
[16:37] <     lonnen> | ericz: we made changes to our crontab today, and introduced a new single cron that manages and runs scripts that used to be crons
[16:37] <     lonnen> | ericz: used to be meaning... this morning
[16:37] <       ericz> | Does that mean it's ok if /var/log/socorro/cron_bugzilla.log is old?
[16:37] <     lonnen> | ericz: yes. it also means we should disable that alert
[16:38] <      lonnen> | because that job is no longer running
[16:40] <       ericz> | lonnen: Ok, should we disable that alert everywhere (assuming it runs in more than one place) or just specific servers/environments?
[16:41] <     lonnen> | ericz: I believe it runs on the admin node for stage and prod (and dev, but I don't know if nagios is hooked up on dev)
[16:41] <     lonnen> | ericz: and yeah, everywhere. although it should already be disabled on dev and stage
Do the other cron checks need to go too? This is the full list:

/var/log/socorro/cron_bugzilla.log
/var/log/socorro/cron_status.log
/var/log/socorro/cron_create_partitions.log
/var/log/socorro/cron_submitter-crash-reports.allizom.org.log
(Assignee)

Comment 2

5 years ago
We're going to hold on this for a bit as there is debate about rolling back to the old checks.

Comment 3

5 years ago
Due to the looming weekend, we're going to config off this new cron system and re-enable the old cron job system. We'll need to re-enable the cron checks listed above, including the bugzilla cron.

Apologies for the confusion before.
(Assignee)

Comment 4

5 years ago
No problem.  None of them have been removed yet (just one was ack'd) so we should be good for the weekend.  We can reconvene next week.
(Assignee)

Updated

5 years ago
Assignee: server-ops → eziegenhorn

Updated

5 years ago
Blocks: 844283

Updated

5 years ago
No longer blocks: 844283

Comment 5

5 years ago
Thank you!

Comment 6

5 years ago
We have pushed the same change to our crontab back into production. We are watching  closely overnight to make sure we've fixed the bugs from last week.

The following may alert overnight:

/var/log/socorro/cron_bugzilla.log
/var/log/socorro/cron_status.log
/var/log/socorro/cron_create_partitions.log
/var/log/socorro/cron_submitter-crash-reports.allizom.org.log


if all is well in the morning, they will need to be removed

Comment 7

5 years ago
We've had two good overnight runs,  so I think its safe to proceed.
It's a no change Friday but from the nagios side, seems only the following two in a "Critical" State;
/var/log/socorro/cron_bugzilla.log
/var/log/socorro/cron_submitter-crash-reports.allizom.org.log

The rest are ok;
/var/log/socorro/cron_create_partitions.log
/var/log/socorro/cron_status.log
/var/log/socorro/socorro-monitor.log

So to make sure, which ones are we now keeping?
From the original request, seems we were only removing;
/var/log/socorro/cron_bugzilla.log

So to make sure this is done correct, please clarify which ones you want to remove come Monday.
Flags: needinfo?(chris.lonnen)

Comment 9

5 years ago
Sorry for the confusion. Please remove:

/var/log/socorro/cron_bugzilla.log
Flags: needinfo?(chris.lonnen)
(Assignee)

Comment 10

5 years ago
Removed socorro-admin-cron_bugzilla check.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.