Closed Bug 1055600 Opened 10 years ago Closed 8 years ago

add nagios monitoring for long running aws_* scripts on aws-manager

Categories

(Release Engineering :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: massimo, Unassigned)

References

Details

We have just noticed that aws_process_cloudtrail.py got stuck on 1st Aug (2 weeks ago), we clearly need to monitor this script.

Right now, the only script monitored on aws-manager is aws_stop_idle.log

From a quick check, the following scripts create a lockfile:

* aws_manager-aws_clean_log_dir.py.sh
* aws_manager-aws_get_cloudtrail_logs.py.sh
* aws_manager-aws_process_cloudtrail_logs.py.sh
* aws_manager-aws_publish_amis.py.sh
* aws_manager-aws_sanity_checker.py.sh
* aws_manager-aws_stop_idle_servo.sh
* aws_manager-aws_watch_pending.py.sh
* aws_manager-aws_watch_pending_servo.sh
* aws_manager-bld-linux64-ec2-golden.sh
* aws_manager-delete_old_spot_amis.py.sh
* aws_manager-spot_sanity_check.py.sh
* aws_manager-tag_spot_instances.py.sh
* aws_manager-try-linux64-ec2-golden.sh
* aws_manager-tst-emulator64-ec2-golden.sh
* aws_manager-tst-linux32-ec2-golden.sh
* aws_manager-tst-linux64-ec2-golden.sh

so we might want to monitor them too
See Also: → 1001416
The goldens are monitored via nagios, some of the watcher scripts now have cron jobs that will kill them off if they run too long, and some others are monitored via papertrail.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Component: Tools → General
You need to log in before you can comment on or make changes to this bug.