Closed Bug 1156801 Opened 9 years ago Closed 9 years ago

Alert via papertrail when a golden ami has been running a long time

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Callek, Assigned: arich)

References

Details

I'd say 2+ days for now, but maybe 1 day or so is a better metric:

 Apr 21 06:02:25 aws-manager2.srv.releng.scl3.mozilla.com aws_sanity_checker.py: 19 tst-linux32-ec2-golden (i-9972264e, us-east-1) up for 11d 4h:36m (0h:0m since last build) 

as one example.

(note that the script that generates THAT log is only run once a day)
This would require that the script that does that checking output that to syslog or to stdout/stderr (where we could pipe it to syslog)instead of/in addition to the large log file that we use to log those script results to. Who would be a good person to talk to about the necessary modifications to the script?
Flags: needinfo?(bugspam.Callek)
This was previously part of the cron mail, but that log line I quoted was currently in papertrail.

I suspect that meets the requirements you cite
Flags: needinfo?(bugspam.Callek)
Ah, this is from sanity_checker, not the idle killer, gottcha. So you want an alert that every time we see a log entry that says "ec2-golden" "up for" on the aws-manager servers to alert? Papertrail can't do regex searches, only substring, so we can't build in logic to look for > 1d or whatever timeframe.
Ahhh that up for can be omitting days entirely, since if this script runs while a golden is being generated it could easily say up for 4h or some such.

But I want it to at least match day, but matching merely "d" is not tenable. Knowing that we might just need to leave this big be until we have a means that can either regex in papertrail, or we can automate act on a papertrail notice of the generic message to say "ok, this one has been more than 2 days..."
I matched on "d " as well. Alert created:

https://papertrailapp.com/searches/4463694/edit

If this doesn't perform as expected, we can revisit (you can also edit as appropriate by logging in to the papertrail account).
Assignee: relops → arich
Blocks: 1150557
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.