Closed
Bug 1493209
Opened 7 years ago
Closed 7 years ago
Investigate problem on Syslog-proxy1.dmz.mdc1.mozilla.com
Categories
(Infrastructure & Operations :: MOC: Service Requests, task)
Infrastructure & Operations
MOC: Service Requests
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: phrozyn, Assigned: Usul)
Details
Today I noticed nagios errors that our workers that connect to this host were timing out due to
"Sep 21 15:23:18 mozdef1.private.mdc1.mozilla.com contegix-auditd-worker: amqp.exceptions.NotFound: Queue.declare: (404) NOT_FOUND - failed to perform operation on queue 'auditd' in vhost 'contegix' due to timeout"
So I went to log into the admin interface for rabbitmq running on that host.
It gives me this "Management API returned status code 500 -" and the entire page is blank as though it can't access it's queues.
So I am pinging MOC and Usul responded.
This bug is for tracking this issue.
Assignee | ||
Comment 1•7 years ago
|
||
ran puppet :
Error: /Stage[main]/Yum/Resources[yumrepo]: Failed to generate additional resources using 'generate': Section "base" is already defined, cannot redefine in /etc/yum.repos.d/CentOS-Base.repo
Info: Applying configuration version 'db9abfeddeaae62e368e6cfb18154817f5d2dc11'
Notice: /Stage[main]/Rabbitmq::Updateconfig/Exec[update-rabbitmq-vhosts]/returns: executed successfully
Notice: Finished catalog run in 28.05 seconds
sudo yum-wrapper update was unhappy :
[root@syslog-proxy1.dmz.mdc1 yum.repos.d]# grep mirro * |wc -l
31
[root@syslog-proxy1.dmz.mdc1 yum.repos.d]#
[root@syslog-proxy1.dmz.mdc1 yum.repos.d]# grep mdc1 * |wc -l
10
[root@syslog-proxy1.dmz.mdc1 yum.repos.d]#
tared the files
[root@syslog-proxy1.dmz.mdc1 yum.repos.d]# grep mirro * |awk -F: '{print $1}'|uniq
base.repo
CentOS-Base.repo
CentOS-CR.repo
CentOS-Debuginfo.repo
CentOS-fasttrack.repo
centosplus.repo
CentOS-Sources.repo
cr.repo
extras.repo
fasttrack.repo
updates.repo
[root@syslog-proxy1.dmz.mdc1 yum.repos.d]# grep mirro * |awk -F: '{print $1}'|uniq |xargs rm
rm -f C7*
Error: Package: erlang-ic-20.3.8.7-1.el7.x86_64 (erlang-solutions-direct)
Requires: erlang-kernel(x86-64) = 20.3.8.7-1.el7
Removing: erlang-kernel-20.3-1.el7.centos.x86_64 (@erlang-solutions-direct)
erlang-kernel(x86-64) = 20.3-1.el7.centos
Updated By: erlang-kernel-21.0.6-1.el7.x86_64 (erlang-solutions-direct)
erlang-kernel(x86-64) = 21.0.6-1.el7
Available: erlang-kernel-R16B-03.18.el7.x86_64 (epel)
erlang-kernel(x86-64) = R16B-03.18.el7
Available: erlang-kernel-17.1-1.1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-kernel(x86-64) = 17.1-1.1.el7.centos
Available: erlang-kernel-17.1-3.el7.centos.x86_64 (erlang-solutions-direct)
erlang-kernel(x86-64) = 17.1-3.el7.centos
Available: erlang-kernel-17.3-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-kernel(x86-64) = 17.3-1.el7.centos
Available: erlang-kernel-17.4-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-kernel(x86-64) = 17.4-1.el7.centos
Available: erlang-kernel-17.5-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-kernel(x86-64) = 17.5-1.el7.centos
Available: erlang-kernel-17.5.3-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-kernel(x86-64) = 17.5.3-1.el7.centos
Available: erlang-kernel-18.0-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-kernel(x86-64) = 18.0-1.el7.centos
Available: erlang-kernel-18.1-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-kernel(x86-64) = 18.1-1.el7.centos
Available: erlang-kernel-18.2-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-kernel(x86-64) = 18.2-1.el7.centos
Available: erlang-kernel-18.3-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-kernel(x86-64) = 18.3-1.el7.centos
Available: erlang-kernel-19.0-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-kernel(x86-64) = 19.0-1.el7.centos
Available: erlang-kernel-19.1-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-kernel(x86-64) = 19.1-1.el7.centos
Available: erlang-kernel-19.2-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-kernel(x86-64) = 19.2-1.el7.centos
Available: erlang-kernel-19.3-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-kernel(x86-64) = 19.3-1.el7.centos
Available: erlang-kernel-20.0-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-kernel(x86-64) = 20.0-1.el7.centos
Available: erlang-kernel-20.1-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-kernel(x86-64) = 20.1-1.el7.centos
Available: erlang-kernel-20.2-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-kernel(x86-64) = 20.2-1.el7.centos
Available: erlang-kernel-20.3.8.7-1.el7.x86_64 (erlang-solutions-direct)
erlang-kernel(x86-64) = 20.3.8.7-1.el7
Available: erlang-kernel-21.0-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-kernel(x86-64) = 21.0-1.el7.centos
Available: erlang-kernel-21.0.5-1.el7.x86_64 (erlang-solutions-direct)
erlang-kernel(x86-64) = 21.0.5-1.el7
Error: Package: erlang-ic-20.3.8.7-1.el7.x86_64 (erlang-solutions-direct)
Requires: erlang-erts(x86-64) = 20.3.8.7-1.el7
Removing: erlang-erts-20.3-1.el7.centos.x86_64 (@erlang-solutions-direct)
erlang-erts(x86-64) = 20.3-1.el7.centos
Updated By: erlang-erts-21.0.6-1.el7.x86_64 (erlang-solutions-direct)
erlang-erts(x86-64) = 21.0.6-1.el7
Available: erlang-erts-R16B-03.18.el7.x86_64 (epel)
erlang-erts(x86-64) = R16B-03.18.el7
Available: erlang-erts-17.1-1.1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-erts(x86-64) = 17.1-1.1.el7.centos
Available: erlang-erts-17.1-3.el7.centos.x86_64 (erlang-solutions-direct)
erlang-erts(x86-64) = 17.1-3.el7.centos
Available: erlang-erts-17.3-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-erts(x86-64) = 17.3-1.el7.centos
Available: erlang-erts-17.4-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-erts(x86-64) = 17.4-1.el7.centos
Available: erlang-erts-17.5-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-erts(x86-64) = 17.5-1.el7.centos
Available: erlang-erts-17.5.3-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-erts(x86-64) = 17.5.3-1.el7.centos
Available: erlang-erts-18.0-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-erts(x86-64) = 18.0-1.el7.centos
Available: erlang-erts-18.1-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-erts(x86-64) = 18.1-1.el7.centos
Available: erlang-erts-18.2-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-erts(x86-64) = 18.2-1.el7.centos
Available: erlang-erts-18.3-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-erts(x86-64) = 18.3-1.el7.centos
Available: erlang-erts-19.0-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-erts(x86-64) = 19.0-1.el7.centos
Available: erlang-erts-19.1-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-erts(x86-64) = 19.1-1.el7.centos
Available: erlang-erts-19.2-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-erts(x86-64) = 19.2-1.el7.centos
Available: erlang-erts-19.3-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-erts(x86-64) = 19.3-1.el7.centos
Available: erlang-erts-20.0-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-erts(x86-64) = 20.0-1.el7.centos
Available: erlang-erts-20.1-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-erts(x86-64) = 20.1-1.el7.centos
Available: erlang-erts-20.2-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-erts(x86-64) = 20.2-1.el7.centos
Available: erlang-erts-20.3.8.7-1.el7.x86_64 (erlang-solutions-direct)
erlang-erts(x86-64) = 20.3.8.7-1.el7
Available: erlang-erts-21.0-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-erts(x86-64) = 21.0-1.el7.centos
Available: erlang-erts-21.0.5-1.el7.x86_64 (erlang-solutions-direct)
erlang-erts(x86-64) = 21.0.5-1.el7
Error: Package: erlang-ic-20.3.8.7-1.el7.x86_64 (erlang-solutions-direct)
Requires: erlang-stdlib(x86-64) = 20.3.8.7-1.el7
Removing: erlang-stdlib-20.3-1.el7.centos.x86_64 (@erlang-solutions-direct)
erlang-stdlib(x86-64) = 20.3-1.el7.centos
Updated By: erlang-stdlib-21.0.6-1.el7.x86_64 (erlang-solutions-direct)
erlang-stdlib(x86-64) = 21.0.6-1.el7
Available: erlang-stdlib-R16B-03.18.el7.x86_64 (epel)
erlang-stdlib(x86-64) = R16B-03.18.el7
Available: erlang-stdlib-17.1-1.1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-stdlib(x86-64) = 17.1-1.1.el7.centos
Available: erlang-stdlib-17.1-3.el7.centos.x86_64 (erlang-solutions-direct)
erlang-stdlib(x86-64) = 17.1-3.el7.centos
Available: erlang-stdlib-17.3-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-stdlib(x86-64) = 17.3-1.el7.centos
Available: erlang-stdlib-17.4-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-stdlib(x86-64) = 17.4-1.el7.centos
Available: erlang-stdlib-17.5-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-stdlib(x86-64) = 17.5-1.el7.centos
Available: erlang-stdlib-17.5.3-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-stdlib(x86-64) = 17.5.3-1.el7.centos
Available: erlang-stdlib-18.0-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-stdlib(x86-64) = 18.0-1.el7.centos
Available: erlang-stdlib-18.1-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-stdlib(x86-64) = 18.1-1.el7.centos
Available: erlang-stdlib-18.2-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-stdlib(x86-64) = 18.2-1.el7.centos
Available: erlang-stdlib-18.3-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-stdlib(x86-64) = 18.3-1.el7.centos
Available: erlang-stdlib-19.0-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-stdlib(x86-64) = 19.0-1.el7.centos
Available: erlang-stdlib-19.1-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-stdlib(x86-64) = 19.1-1.el7.centos
Available: erlang-stdlib-19.2-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-stdlib(x86-64) = 19.2-1.el7.centos
Available: erlang-stdlib-19.3-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-stdlib(x86-64) = 19.3-1.el7.centos
Available: erlang-stdlib-20.0-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-stdlib(x86-64) = 20.0-1.el7.centos
Available: erlang-stdlib-20.1-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-stdlib(x86-64) = 20.1-1.el7.centos
Available: erlang-stdlib-20.2-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-stdlib(x86-64) = 20.2-1.el7.centos
Available: erlang-stdlib-20.3.8.7-1.el7.x86_64 (erlang-solutions-direct)
erlang-stdlib(x86-64) = 20.3.8.7-1.el7
Available: erlang-stdlib-21.0-1.el7.centos.x86_64 (erlang-solutions-direct)
erlang-stdlib(x86-64) = 21.0-1.el7.centos
Available: erlang-stdlib-21.0.5-1.el7.x86_64 (erlang-solutions-direct)
erlang-stdlib(x86-64) = 21.0.5-1.el7
You could try using --skip-broken to work around the problem
Assignee: nobody → ludovic
Assignee | ||
Comment 2•7 years ago
|
||
sudo yum-wrapper update --skip-broken
Install 1 Package
Upgrade 72 Packages
Remove 1 Package
Skipped (dependency problems) 106 Packages
Total download size: 195 M
Is this ok [y/d/N]: y
rebooted.
Reporter | ||
Comment 3•7 years ago
|
||
I'll log in and take a look at the erlang issue, We had digi update rabbit and erlang to later versions upon migration to work around some tls issues. It may be the erlang repo simply can't be accessed or something.
Reporter | ||
Updated•7 years ago
|
Summary: Investigate problem on Syslog-proxy1.private.mdc1.mozilla.com → Investigate problem on Syslog-proxy1.dmz.mdc1.mozilla.com
Reporter | ||
Comment 4•7 years ago
|
||
After the initial handshake timeout (because of patching mozdef, this means any connections from the workers that were initiated would time out because they were shut down) we see a file descriptor limit alarm:
018-09-21 03:47:34.603 [warning] <0.271.0> file descriptor limit alarm set.~n~n********************************************************************~n*** New connections will not be accepted until this alarm clears ***~n********************************************************************~n
Then half a minute later we see it clears and immediately retriggers and clears:
2018-09-21 03:48:14.708 [warning] <0.271.0> file descriptor limit alarm cleared~n
2018-09-21 03:48:14.717 [warning] <0.271.0> file descriptor limit alarm set.~n~n********************************************************************~n*** New connections will not be accepted until this alarm clears ***~n********************************************************************~n
2018-09-21 03:48:14.719 [warning] <0.271.0> file descriptor limit alarm cleared~n
This happens again one more time, and then we get:
018-09-21 03:58:49.283 [warning] <0.1130.0> Ranch acceptor reducing accept rate: out of file descriptors
2018-09-21 03:58:50.263 [error] <0.273.0> ** Generic server vm_memory_monitor terminating
After the reboot:
File Descriptors:
current system limit is 791357, and in use are 1408
We should monitor this
Nagios alert for:
cat /proc/sys/fs/file-nr
the first number is the number in use, the last number is the max amount the system can handle.
Reporter | ||
Comment 5•7 years ago
|
||
ulimit -aH showed open file limit to be 4096 which is way to low for time when we are patching mozdef hosts. That means rabbit is piling up messages and may even be caching them to disk for persistence.
So upping the fd limit to 100k with a soft limit of 65535.
It should only ever hit this limit during our patching if it does.
After a reboot we have the new limit:
[root@syslog-proxy1.dmz.mdc1 ~]# ulimit -aH
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 31204
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 100000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 31204
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Reporter | ||
Comment 6•7 years ago
|
||
[root@syslog-proxy1.dmz.mdc1 ~]# cat /proc/767/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 31204 31204 processes
Max open files 1024 4096 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 31204 31204 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
[root@syslog-proxy1.dmz.mdc1 ~]# su - rabbitmq -s /bin/sh -c 'ulimit -n'
65535
[root@syslog-proxy1.dmz.mdc1 ~]# rabbitmqctl status | grep -A 4 limit
{vm_memory_limit,3280923852},
{disk_free_limit,50000000},
{disk_free,50213949440},
{file_descriptors,
[{total_limit,924}, <-- this is hard coded but I believe it changes dynamically.
{total_used,112},
{sockets_limit,829},
{sockets_used,90}]},
{processes,[{limit,1048576},{used,1353}]},
{run_queue,0},
{uptime,453},
{kernel,{net_ticktime,60}}]
Reporter | ||
Comment 7•7 years ago
|
||
Closing this bug, will reopen if we find this issue recurring.
Reporter | ||
Updated•7 years ago
|
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•