Closed Bug 1037873 Opened 10 years ago Closed 10 years ago

nagios monitoring for sp-admin01.phx1.mozilla.com:/var/log/socorro/crontabber.log

Categories

(Infrastructure & Operations Graveyard :: WebOps: Socorro, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rhelmer, Assigned: cliang)

References

Details

(Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/643] )

Can we get a simple monitor on sp-admin01:/var/log/socorro/crontabber.log ? If it hasn't been written to in ~3 hours we should warn, and page at 6 (we can tighten this up later as we make our jobs faster)

This would catch things like bug 1037870
No longer blocks: 1037870
See Also: → 1037870
Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/643]
Assignee: server-ops-webops → cliang
I've added a log file age check that should warn if the file age is older than 3 hours and page if it is older than 6 hours.   (The age is calculated in number of seconds.)  

This should be similar to the file age check that is run on the processros, which warn if the /var/log/socorro/socorro-processor.log hasn't been written to in 60 seconds and pages if it's been more than 5 minutes since the last write to that file.


Index: modules/nagios/manifests/mozilla/services.pp
===================================================================
--- modules/nagios/manifests/mozilla/services.pp	(revision 92208)
+++ modules/nagios/manifests/mozilla/services.pp	(working copy)
@@ -3642,6 +3642,18 @@
                 ]
             }
         },
+        'socorro-admin-crontab-log' => {
+            service_description => "Socorro Admin - crontab log file age",
+            check_command => 'check_file_age!10800!21600!/var/log/socorro/crontabber.log',
+            contact_groups => 'sysalerts, socorroalerts',
+            hostgroups => $::fqdn ? {
+                'nagios1.private.phx1.mozilla.com' => [
+                    'socorro-admin'
+                ],
+                default => [
+                ]
+            }
+        },
         'socorro-web-http' => {
             service_description => 'crash-stats.m.c - http string',
             check_command => 'check_http_string!crash-stats.mozilla.com!/products/Firefox!\'Mozilla Crash Reports\'',
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.