As per :lonnen, we should remove the check for this alert as it is no longer pertinent as of today. < nagios-phx1> | Thu 14:50:41 PST  sp-admin01.phx1.mozilla.com:Socorro Admin - cron_bugzilla.log is CRITICAL: FILE_AGE CRITICAL: /var/log/socorro/cron_bugzilla.log is 5951 seconds old and 9806793 bytes IRC conversation: [16:37] < lonnen> | ericz: we made changes to our crontab today, and introduced a new single cron that manages and runs scripts that used to be crons [16:37] < lonnen> | ericz: used to be meaning... this morning [16:37] < ericz> | Does that mean it's ok if /var/log/socorro/cron_bugzilla.log is old? [16:37] < lonnen> | ericz: yes. it also means we should disable that alert [16:38] < lonnen> | because that job is no longer running [16:40] < ericz> | lonnen: Ok, should we disable that alert everywhere (assuming it runs in more than one place) or just specific servers/environments? [16:41] < lonnen> | ericz: I believe it runs on the admin node for stage and prod (and dev, but I don't know if nagios is hooked up on dev) [16:41] < lonnen> | ericz: and yeah, everywhere. although it should already be disabled on dev and stage
Do the other cron checks need to go too? This is the full list: /var/log/socorro/cron_bugzilla.log /var/log/socorro/cron_status.log /var/log/socorro/cron_create_partitions.log /var/log/socorro/cron_submitter-crash-reports.allizom.org.log
We're going to hold on this for a bit as there is debate about rolling back to the old checks.
Due to the looming weekend, we're going to config off this new cron system and re-enable the old cron job system. We'll need to re-enable the cron checks listed above, including the bugzilla cron. Apologies for the confusion before.
No problem. None of them have been removed yet (just one was ack'd) so we should be good for the weekend. We can reconvene next week.
We have pushed the same change to our crontab back into production. We are watching closely overnight to make sure we've fixed the bugs from last week. The following may alert overnight: /var/log/socorro/cron_bugzilla.log /var/log/socorro/cron_status.log /var/log/socorro/cron_create_partitions.log /var/log/socorro/cron_submitter-crash-reports.allizom.org.log if all is well in the morning, they will need to be removed
We've had two good overnight runs, so I think its safe to proceed.
It's a no change Friday but from the nagios side, seems only the following two in a "Critical" State; /var/log/socorro/cron_bugzilla.log /var/log/socorro/cron_submitter-crash-reports.allizom.org.log The rest are ok; /var/log/socorro/cron_create_partitions.log /var/log/socorro/cron_status.log /var/log/socorro/socorro-monitor.log So to make sure, which ones are we now keeping? From the original request, seems we were only removing; /var/log/socorro/cron_bugzilla.log So to make sure this is done correct, please clarify which ones you want to remove come Monday.
Sorry for the confusion. Please remove: /var/log/socorro/cron_bugzilla.log
Removed socorro-admin-cron_bugzilla check.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.