[log-aggregator1.srv.releng.mdc1.mozilla.com] File Age - /var/log/messages is CRITICAL
Categories
(Infrastructure & Operations :: RelOps: General, task)
Tracking
(Not tracked)
People
(Reporter: riman, Unassigned)
Details
nagios-releng-mdc1 started sending the following alert during my shift:
Sun 07:44:29 UTC [8002] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is CRITICAL: FILE_AGE CRITICAL: /var/log/messages is 5032 seconds old and 7706 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
Reporter | ||
Comment 1•6 years ago
|
||
All good now, I'll close the bug.
Sun 08:12:30 UTC [8004] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is OK: FILE_AGE OK: /var/log/messages is 46 seconds old and 8402 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
Comment 2•6 years ago
|
||
Today it begin sending those critical/warning messages again.
Sat 07:56:41 UTC [8000] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is WARNING: FILE_AGE WARNING: /var/log/messages is 3865 seconds old and 2472832 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
09:59 Sat 08:00:41 UTC [8001] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is CRITICAL: FILE_AGE CRITICAL: /var/log/messages is 4105 seconds old and 2472832 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
10:15 Sat 08:16:40 UTC [8002] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is CRITICAL: FILE_AGE CRITICAL: /var/log/messages is 5065 seconds old and 2472832 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
10:31 Sat 08:32:40 UTC [8003] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is CRITICAL: FILE_AGE CRITICAL: /var/log/messages is 6025 seconds old and 2472832 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
10:47 Sat 08:48:40 UTC [8004] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is CRITICAL: FILE_AGE CRITICAL: /var/log/messages is 6985 seconds old and 2472832 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
11:03 Sat 09:04:40 UTC [8005] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is CRITICAL: FILE_AGE CRITICAL: /var/log/messages is 7945 seconds old and 2472832 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
11:11 Sat 09:12:40 UTC [8006] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is OK: FILE_AGE OK: /var/log/messages is 7 seconds old and 2473702 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
13:45 Sat 11:46:40 UTC [8007] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is WARNING: FILE_AGE WARNING: /var/log/messages is 3843 seconds old and 2480348 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
13:49 Sat 11:50:41 UTC [8008] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is CRITICAL: FILE_AGE CRITICAL: /var/log/messages is 4083 seconds old and 2480348 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
13:53 Sat 11:54:41 UTC [8009] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is OK: FILE_AGE OK: /var/log/messages is 30 seconds old and 2481044 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
13:57 Sat 11:58:22 UTC [8010] nagios1.private.releng.mdc1.mozilla.com:signing scriptworker queue is CRITICAL: CLUSTER CRITICAL: signing scriptworker queue: 1 ok, 21 warning, 0 unknown, 0 critical (http://m.mozilla.org/signing+scriptworker+queue)
14:07 Sat 12:08:22 UTC [8011] nagios1.private.releng.mdc1.mozilla.com:signing scriptworker queue is OK: CLUSTER OK: signing scriptworker queue: 22 ok, 0 warning, 0 unknown, 0 critical (http://m.mozilla.org/signing+scriptworker+queue)
18:15 Sat 16:16:40 UTC [8012] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is WARNING: FILE_AGE WARNING: /var/log/messages is 3934 seconds old and 2512868 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
18:17 Sat 16:18:39 UTC [8013] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is CRITICAL: FILE_AGE CRITICAL: /var/log/messages is 4054 seconds old and 2512868 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
18:33 Sat 16:34:41 UTC [8014] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is CRITICAL: FILE_AGE CRITICAL: /var/log/messages is 5015 seconds old and 2512868 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
18:49 Sat 16:50:40 UTC [8015] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is CRITICAL: FILE_AGE CRITICAL: /var/log/messages is 5975 seconds old and 2512868 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
I have acknowledged the alert
Comment 3•6 years ago
•
|
||
Today, it also has sent the following alerts :
Sun 06:06:41 UTC [8024] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is WARNING: FILE_AGE WARNING: /var/log/messages is 3915 seconds old and 4988 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
08:07:46 Sun 06:08:41 UTC [8025] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is CRITICAL: FILE_AGE CRITICAL: /var/log/messages is 4035 seconds old and 4988 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
Sun 06:24:41 UTC [8026] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is CRITICAL: FILE_AGE CRITICAL: /var/log/messages is 4995 seconds old and 4988 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
08:39:45 Sun 06:40:41 UTC [8027] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is CRITICAL: FILE_AGE CRITICAL: /var/log/messages is 5955 seconds old and 4988 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
08:55:46 Sun 06:56:41 UTC [8028] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is CRITICAL: FILE_AGE CRITICAL: /var/log/messages is 6915 seconds old and 4988 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
Sun 07:12:41 UTC [8000] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is CRITICAL: FILE_AGE CRITICAL: /var/log/messages is 7875 seconds old and 4988 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
I've acknowledged the alert
Comment 4•6 years ago
|
||
in the mean time it has recovered :
log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is OK: FILE_AGE OK: /var/log/messages is 67 seconds old and 14319 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
We will close the bug for now. If the problem will persist in the future, we will re-open this bug.
Comment 5•6 years ago
|
||
re-opening.. the problem still persists... Last time, after acknowledge, after an hour or so the Ok has came..
Sun 17:06:41 UTC [8007] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is WARNING: FILE_AGE WARNING: /var/log/messages is 3922 seconds old and 30883 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
19:07:45
Sun 17:08:41 UTC [8008] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is CRITICAL: FILE_AGE CRITICAL: /var/log/messages is 4042 seconds old and 30883 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
19:23:45
Sun 17:24:41 UTC [8009] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is CRITICAL: FILE_AGE CRITICAL: /var/log/messages is 5002 seconds old and 30883 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
I have ack the alert
Comment 6•6 years ago
|
||
currently the Ok status has arrived, but I will keep the bug open for tracking.
Sun 17:42:41 UTC [8010] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is OK: FILE_AGE OK: /var/log/messages is 96 seconds old and 31118 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
Reporter | ||
Comment 7•6 years ago
|
||
The issue is back again. Could someone have a look please? it has been alerting our channel several times in the last days.
<nagios-releng-mdc1>
Sun 21:00:40 UTC [8011] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is WARNING: FILE_AGE WARNING: /var/log/messages is 3956 seconds old and 37764 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
Sun 21:02:40 UTC [8012] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is CRITICAL: FILE_AGE CRITICAL: /var/log/messages is 4076 seconds old and 37764 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
Sun 21:18:40 UTC [8013] log-aggregator1.srv.releng.mdc1.mozilla.com:File Age - /var/log/messages is CRITICAL: FILE_AGE CRITICAL: /var/log/messages is 5036 seconds old and 37764 bytes (http://m.mozilla.org/File+Age+-+/var/log/messages)
Reporter | ||
Comment 8•6 years ago
|
||
Logs from Papertrail:
- log-aggregator1.srv.releng.mdc1.mozilla.com rsyslogd-2165: netstream session 0x7f318ba9ce90 will be closed due to error [try http://www.rsyslog.com/e/2165 ]
- log-aggregator1.srv.releng.mdc1.mozilla.com crond: pam_limits(crond:session): unknown limit item 'nofiles'
(In reply to Radu Iman[:riman] from comment #8)
Logs from Papertrail:
- log-aggregator1.srv.releng.mdc1.mozilla.com rsyslogd-2165: netstream session 0x7f318ba9ce90 will be closed due to error [try http://www.rsyslog.com/e/2165 ]
- log-aggregator1.srv.releng.mdc1.mozilla.com crond: pam_limits(crond:session): unknown limit item 'nofiles'
I found the "nofiles" error goes back through March 3rd. And that config setting has been in place since 2015. So it may be a typo or incorrect setting, but I believe it is unrelated to this problem.
https://github.com/mozilla-releng/build-puppet/blob/b07896a502b0dd8e1c9698f35a3677668043b8ba/modules/log_aggregator/files/etc/limits.conf
Comment 10•6 years ago
|
||
I've restarted rsyslog on log-aggregator1.srv.releng.mdc1.
top - 03:59:44 up 556 days, 10:02, 1 user, load average: 0.00, 0.00, 0.00
Tasks: 108 total, 1 running, 107 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.2%us, 0.3%sy, 0.0%ni, 99.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 3924412k total, 3550292k used, 374120k free, 131000k buffers
Swap: 4194300k total, 1998608k used, 2195692k free, 171268k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1363 root 20 0 5472m 2.7g 1236 S 1.0 71.7 37419:09 rsyslogd
let's see if the new process repeats the delays in updating the log on disk.
Comment 11•6 years ago
|
||
I see other mdc1 host messages continue to be relayed to papertrail. (I tested login to a macos worker in mdc1 and verified a log entry for my ssh session appeared in the papertrail logs)
Reporter | ||
Comment 12•6 years ago
|
||
It looks good in the last few hours. We will update this bug in case of a recurrence.
Current Status: OK (for 0d 4h 44m 5s)
Status Information: FILE_AGE OK: /var/log/messages is 58 seconds old and 81204 bytes
Updated•5 years ago
|
Description
•