Closed
Bug 1156801
Opened 10 years ago
Closed 10 years ago
Alert via papertrail when a golden ami has been running a long time
Categories
(Infrastructure & Operations :: RelOps: General, task)
Infrastructure & Operations
RelOps: General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: Callek, Assigned: arich)
References
Details
I'd say 2+ days for now, but maybe 1 day or so is a better metric:
Apr 21 06:02:25 aws-manager2.srv.releng.scl3.mozilla.com aws_sanity_checker.py: 19 tst-linux32-ec2-golden (i-9972264e, us-east-1) up for 11d 4h:36m (0h:0m since last build)
as one example.
(note that the script that generates THAT log is only run once a day)
| Assignee | ||
Comment 1•10 years ago
|
||
This would require that the script that does that checking output that to syslog or to stdout/stderr (where we could pipe it to syslog)instead of/in addition to the large log file that we use to log those script results to. Who would be a good person to talk to about the necessary modifications to the script?
Flags: needinfo?(bugspam.Callek)
| Reporter | ||
Comment 2•10 years ago
|
||
This was previously part of the cron mail, but that log line I quoted was currently in papertrail.
I suspect that meets the requirements you cite
Flags: needinfo?(bugspam.Callek)
| Assignee | ||
Comment 3•10 years ago
|
||
Ah, this is from sanity_checker, not the idle killer, gottcha. So you want an alert that every time we see a log entry that says "ec2-golden" "up for" on the aws-manager servers to alert? Papertrail can't do regex searches, only substring, so we can't build in logic to look for > 1d or whatever timeframe.
| Reporter | ||
Comment 4•10 years ago
|
||
Ahhh that up for can be omitting days entirely, since if this script runs while a golden is being generated it could easily say up for 4h or some such.
But I want it to at least match day, but matching merely "d" is not tenable. Knowing that we might just need to leave this big be until we have a means that can either regex in papertrail, or we can automate act on a papertrail notice of the generic message to say "ok, this one has been more than 2 days..."
| Assignee | ||
Comment 5•10 years ago
|
||
I matched on "d " as well. Alert created:
https://papertrailapp.com/searches/4463694/edit
If this doesn't perform as expected, we can revisit (you can also edit as appropriate by logging in to the papertrail account).
Assignee: relops → arich
Blocks: 1150557
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•