Closed Bug 1484428 Opened 7 years ago Closed 6 years ago

Clamd didn't restart properly after a reboot on smtp1

Categories

(Infrastructure & Operations :: Infrastructure: Mail, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Usul, Assigned: cshields)

References

Details

Attachments

(1 file)

<nagios-mdc1> (IRC) Sat 09:05:23 UTC [9056] [Unknown] smtp1.mail.mdc1.mozilla.com:Uptime is WARNING: WARNING: Linux smtp1.mail.mdc1.mozilla.com 3.10.0-862.11.6.el7.x86_64 - up 3 minutes (http://m.mozilla.org/Uptime) <nagios-mdc1> (IRC) Sat 09:10:30 UTC [9059] [Unknown] smtp1.mail.mdc1.mozilla.com:nf_conntrack_count is UNKNOWN: WARNING: iptables module not loaded? (http://m.mozilla.org/nf_conntrack_count) <nagios-mdc1> Sat 09:15:01 UTC [9063] [Unknown] smtp1.mail.mdc1.mozilla.com:clamd Procs is CRITICAL: PROCS CRITICAL: 0 processes with args 'clamd', UID = 995 (clamav) (http://m.mozilla.org/clamd+Procs) [lhirlimann@smtp1.mail.mdc1 ~]$ sudo systemctl status clamd@amavisd.serviceclamd@amavisd.service - clamd scanner (amavisd) daemon Loaded: loaded (/usr/lib/systemd/system/clamd@.service; static; vendor preset: disabled) Active: active (running) since Sat 2018-08-18 09:01:55 UTC; 15min ago Process: 756 ExecStart=/usr/sbin/clamd -c /etc/clamd.d/%i.conf (code=exited, status=0/SUCCESS) Main PID: 3134 (clamd) CGroup: /system.slice/system-clamd.slice/clamd@amavisd.service └─3134 /usr/sbin/clamd -c /etc/clamd.d/amavisd.conf Aug 18 09:01:53 smtp1.mail.mdc1.mozilla.com clamd[3134]: Mail files support e... Aug 18 09:01:53 smtp1.mail.mdc1.mozilla.com clamd[3134]: OLE2 support enabled. Aug 18 09:01:53 smtp1.mail.mdc1.mozilla.com clamd[3134]: PDF support enabled. Aug 18 09:01:53 smtp1.mail.mdc1.mozilla.com clamd[3134]: SWF support enabled. Aug 18 09:01:53 smtp1.mail.mdc1.mozilla.com clamd[3134]: HTML support enabled. Aug 18 09:01:53 smtp1.mail.mdc1.mozilla.com clamd[3134]: XMLDOCS support enab... Aug 18 09:01:53 smtp1.mail.mdc1.mozilla.com clamd[3134]: HWP3 support enabled. Aug 18 09:01:53 smtp1.mail.mdc1.mozilla.com clamd[3134]: Self checking every ... Aug 18 09:01:55 smtp1.mail.mdc1.mozilla.com systemd[1]: Started clamd scanner... Aug 18 09:11:53 smtp1.mail.mdc1.mozilla.com clamd[3134]: SelfCheck: Database ... Hint: Some lines were ellipsized, use -l to show in full. [lhirlimann@smtp1.mail.mdc1 ~]$ ps aux |grep cla root 278 0.0 0.0 0 0 ? S< 09:01 0:00 [xfs-reclaim/sda] root 430 0.0 0.0 0 0 ? S< 09:01 0:00 [xfs-reclaim/sda] amavis 3134 0.0 13.6 717300 530740 ? Ssl 09:01 0:00 /usr/sbin/clamd -c /etc/clamd.d/amavisd.conf lhirlim+ 10708 0.0 0.0 112704 976 pts/0 R+ 09:18 0:00 grep --color=auto cla [lhirlimann@smtp1.mail.mdc1 ~]$ sudo service clamd@clamd status Redirecting to /bin/systemctl status clamd@clamd.serviceclamd@clamd.service - clamd scanner (clamd) daemon Loaded: loaded (/usr/lib/systemd/system/clamd@.service; static; vendor preset: disabled) Active: inactive (dead) [lhirlimann@smtp1.mail.mdc1 ~]$ sudo service clamd@clamd start Redirecting to /bin/systemctl start clamd@clamd.service Job for clamd@clamd.service failed because the control process exited with error code. See "systemctl status clamd@clamd.service" and "journalctl -xe" for details. [lhirlimann@smtp1.mail.mdc1 ~]$ sudo systemctl status clamd@clamd ● clamd@clamd.service - clamd scanner (clamd) daemon Loaded: loaded (/usr/lib/systemd/system/clamd@.service; static; vendor preset: disabled) Active: activating (start) since Sat 2018-08-18 09:21:50 UTC; 197ms ago Control: 10951 (clamd) CGroup: /system.slice/system-clamd.slice/clamd@clamd.service └─10951 /usr/sbin/clamd -c /etc/clamd.d/clamd.conf Aug 18 09:21:50 smtp1.mail.mdc1.mozilla.com systemd[1]: clamd@clamd.service holdoff time over, scheduling restart. Aug 18 09:21:50 smtp1.mail.mdc1.mozilla.com systemd[1]: Starting clamd scanner (clamd) daemon... Aug 18 09:21:50 smtp1.mail.mdc1.mozilla.com clamd[10951]: Received 0 file descriptor(s) from systemd. Aug 18 09:21:50 smtp1.mail.mdc1.mozilla.com clamd[10951]: clamd daemon 0.99.4 (OS: linux-gnu, ARCH: x86_64, CPU: x86_64) Aug 18 09:21:50 smtp1.mail.mdc1.mozilla.com clamd[10951]: Running as user clamav (UID 995, GID 992) Aug 18 09:21:50 smtp1.mail.mdc1.mozilla.com clamd[10951]: Log file size limited to 4294967295 bytes. Aug 18 09:21:50 smtp1.mail.mdc1.mozilla.com clamd[10951]: Reading databases from /var/clamav Aug 18 09:21:50 smtp1.mail.mdc1.mozilla.com clamd[10951]: Not loading PUA signatures. Aug 18 09:21:50 smtp1.mail.mdc1.mozilla.com clamd[10951]: Bytecode: Security mode set to "TrustSigned". Hint: Some lines were ellipsized, use -l to show in full.
<nagios-mdc1> Usul: rechecking all services on host smtp1.mail.mdc1.mozilla.com <nagios-mdc1> Sat 09:22:20 UTC [9067] [Unknown] smtp1.mail.mdc1.mozilla.com:clamd Procs is OK: PROCS OK: 1 process with args 'clamd', UID = 995 (clamav) (http://m.mozilla.org/clamd+Procs) nagios-mdc1> (IRC) Sat 09:35:39 UTC [9070] [Unknown] smtp1.private.mdc1.mozilla.com:clamd Procs is CRITICAL: PROCS CRITICAL: 0 processes with args 'clamd', UID = 995 (clamav) (http://m.mozilla.org/clamd+Procs) ran puppet. [root@smtp1.mail.mdc1 clamav]# systemctl restart clamd@amavisd.service [root@smtp1.mail.mdc1 clamav]# systemctl start clamd@clamd.service fixed it. Sat Aug 18 09:39:40 2018 -> ERROR: Can't unlink the socket file /var/run/clamav/clamd.sock Sat Aug 18 09:39:57 2018 -> ERROR: LOCAL: Socket file /var/run/clamav/clamd.sock could not be bound: No such file or directory Sat Aug 18 09:39:57 2018 -> ERROR: Can't unlink the socket file /var/run/clamav/clamd.sock ERROR: NotifyClamd: Can't find or parse configuration file /etc/clamd.conf ERROR: NotifyClamd: Can't find or parse configuration file /etc/clamd.conf ERROR: NotifyClamd: Can't find or parse configuration file /etc/clamd.conf Maybe the reboot caused an issue. Let see how things evolve.
Summary: Clamd didn't restart properly after a reboot → Clamd didn't restart properly after a reboot on smtp1
(IRC) Fri 09:04:56 UTC [9629] [Unknown] smtp1.mail.mdc2.mozilla.com:Uptime is WARNING: WARNING: Linux smtp1.mail.mdc2.mozilla.com 3.10.0-862.11.6.el7.x86_64 - up 3 minutes (http://m.mozilla.org/Uptime) Fri 09:07:05 UTC [9630] [Unknown] smtp1.mail.mdc2.mozilla.com:clamd Procs is CRITICAL: PROCS CRITICAL: 0 processes with args 'clamd', UID = 994 (clamav) (http://m.mozilla.org/clamd+Procs) Fri 09:11:39 UTC [9634] [Unknown] smtp1.mail.mdc2.mozilla.com:Swap is CRITICAL: SWAP CRITICAL - 1% free (0 MB out of 2046 MB) (http://m.mozilla.org/Swap) (IRC) Fri 09:15:05 UTC [9636] [Unknown] smtp1.mail.mdc2.mozilla.com:Out of memory - killed process is WARNING: Log errors: Aug 24 09:05:22 smtp1.mail.mdc2.mozilla.com kernel: [ 264.073540] Out of memory: Kill process 3060 (clamd) score 88 or sacrifice child (http://m.mozilla.org/Out+of+memory+-+killed+process) Processes eating up the memory are clamd. seeing lines like Thu Apr 26 20:07:39 2018 -> ERROR: Can't open file or directory in the clamd.log. seeing ERROR: NotifyClamd: Can't find or parse configuration file /etc/clamd.conf in freshclam.log. swap recovered when I stopped the services. After restart journald had the following Aug 24 09:31:40 smtp1.mail.mdc2.mozilla.com amavis[2257]: (02257-03) Checking: NMceFIMotLky MYNETS [10.50.77.18] <root@genericadm.private.phx1.mozilla.com> -> <infra-shared@infra.m Aug 24 09:31:44 smtp1.mail.mdc2.mozilla.com amavis[2257]: (02257-03) p001 1 Content-Type: text/plain, 7bit, size: 113, SHA1 digest: 1a9d595a54f4da5a8ec35aeb0262767c41ed3bc3 Aug 24 09:31:46 smtp1.mail.mdc2.mozilla.com amavis[2257]: (02257-03) (!)connect to /var/run/clamav/clamd.sock failed, attempt #1: Can't connect to a UNIX socket /var/run/clamav/cla Aug 24 09:31:46 smtp1.mail.mdc2.mozilla.com amavis[2257]: (02257-03) ClamAV-clamd: All attempts (1) failed connecting to /var/run/clamav/clamd.sock, retrying (1) Aug 24 09:31:47 smtp1.mail.mdc2.mozilla.com amavis[2257]: (02257-03) (!)connect to /var/run/clamav/clamd.sock failed, attempt #1: Can't connect to a UNIX socket /var/run/clamav/cla Aug 24 09:31:47 smtp1.mail.mdc2.mozilla.com amavis[2257]: (02257-03) (!)ClamAV-clamd: All attempts (1) failed connecting to /var/run/clamav/clamd.sock, retrying (2) Aug 24 09:31:49 smtp1.mail.mdc2.mozilla.com amavis[2273]: (02273-03) bounce unverifiable, originating, <root@syslog1.private.mdc1.mozilla.com> -> <infra-shared@infra.mozilla.com> Aug 24 09:31:49 smtp1.mail.mdc2.mozilla.com clamd[12093]: Loaded 6620892 signatures. Aug 24 09:31:49 smtp1.mail.mdc2.mozilla.com postfix/smtpd[12096]: connect from localhost4.localdomain[127.0.0.1] ~
from /var/log/maillog : Aug 24 09:48:18 smtp1.mail.mdc2.mozilla.com amavis[2254]: (02254-05) ESMTP :10024 /var/amavis/tmp/amavis-20180824T090325-02254-kj_NRIG6: <root@proxy2.dmz.phx1.mozilla.com> -> <infra-shared@infra.mozilla.com> SIZE=4958 Received: from smtp.mozilla.org ([127.0.0.1]) by localhost (smtp1.mail.mdc2.mozilla.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP for <infra-shared@infra.mozilla.com>; Fri, 24 Aug 2018 09:48:14 +0000 (UTC) Aug 24 09:48:18 smtp1.mail.mdc2.mozilla.com amavis[2241]: (02241-05) (!)ClamAV-clamscan av-scanner FAILED: /usr/bin/clamscan KILLED, signal 9 (0009) at (eval 131) line 951. Aug 24 09:48:19 smtp1.mail.mdc2.mozilla.com amavis[2241]: (02241-05) (!!)AV: ALL VIRUS SCANNERS FAILED
<nagios-mdc1> Sat 09:05:45 UTC [9188] [Unknown] smtp1.mail.mdc1.mozilla.com:clamd Procs is CRITICAL: PROCS CRITICAL: 0 processes with args 'clamd', UID = 995 (clamav) (http://m.mozilla.org/clamd+Procs) <nagios-mdc1> (IRC) Sat 09:06:09 UTC [9189] [Unknown] smtp1.mail.mdc1.mozilla.com:Uptime is WARNING: WARNING: Linux smtp1.mail.mdc1.mozilla.com 3.10.0-862.11.6.el7.x86_64 - up 5 minutes (http://m.mozilla.org/Uptime) This is the 3rd day in a row. Corey, can you look into this please?
Flags: needinfo?(cshields)
nagios-mdc2> Fri 09:12:00 UTC [9646] [Unknown] smtp1.mail.mdc2.mozilla.com:clamd Procs is CRITICAL: PROCS CRITICAL: 0 processes with args 'clamd', UID = 994 (clamav) (http://m.mozilla.org/clamd+Procs) <nagios-mdc2> Fri 09:12:01 UTC [9647] [Unknown] smtp1.mail.mdc2.mozilla.com:Swap is CRITICAL: SWAP CRITICAL - 9% free (167 MB out of 2046 MB) (http://m.mozilla.org/Swap) restarted the services.
Assignee: infra → cshields
Flags: needinfo?(cshields)
Working with cshields, bumped the CPU/RAM of the smtp1.mail.mdc2 to 2CPU, 8G RAM.
Today it happened on the other host : <nagios-mdc1> (IRC) Sat 09:36:33 UTC [9251] [Unknown] smtp1.private.mdc1.mozilla.com:Swap is WARNING: SWAP WARNING - 19% free (344 MB out of 1906 MB) (http://m.mozilla.org/Swap) <nagios-mdc1> (IRC) Sat 09:41:33 UTC [9256] [Unknown] smtp1.private.mdc1.mozilla.com:Swap is CRITICAL: SWAP CRITICAL - 1% free (4 MB out of 1906 MB) (http://m.mozilla.org/Swap) <nagios-mdc1> (IRC) Sat 09:42:45 UTC [9258] [Unknown] smtp1.private.mdc1.mozilla.com:Load is WARNING: WARNING - load average: 20.09, 20.95, 19.12 (http://m.mozilla.org/Load)
Error: /Stage[main]/Clamav/Package[clamav-server]/ensure: change from absent to present failed: Execution of '/bin/yum -d 0 -e 0 -y list clamav-server' returned 1: Error: No matching Packages to list Notice: /Stage[main]/Postfix::Mta/Ldap_users::Groupmember[user_amavis_to_group_amavis]/Exec[usermod -G clamav -a amavis]: Dependency Package[clamav-server] has failures: true Warning: /Stage[main]/Postfix::Mta/Ldap_users::Groupmember[user_amavis_to_group_amavis]/Exec[usermod -G clamav -a amavis]: Skipping because of failed dependencies [lhirlimann@smtp1.mail.mdc1 ~]$ rpm -qa |grep clamav clamav-data-0.100.1-1.el7.noarch clamav-update-0.100.1-1.el7.x86_64 clamav-server-systemd-0.100.1-1.el7.x86_64 clamav-filesystem-0.100.1-1.el7.noarch clamav-lib-0.100.1-1.el7.x86_64 clamav-0.100.1-1.el7.x86_64 [lhirlimann@smtp1.mail.mdc1 ~]$ looks like our puppet module needs an update : # TODO: This part here is ugly really should clean it up if ($::osfamily == 'RedHat') { if (versioncmp($::operatingsystemmajrelease, '6') == 0) { package { 'clamd': ensure => $package_ensure, name => 'clamd', } package { 'clamav-db': ensure => $package_ensure, name => 'clamav-db', } } elsif (versioncmp($::operatingsystemmajrelease, '7') == 0) { package { 'clamav-server': ensure => $package_ensure, } package { 'clamav-update': ensure => $package_ensure } package { 'clamav-server-systemd': ensure => $package_ensure 47,1 22% Sudo yum search clamav : ============================= N/S matched: clamav ============================== clamav-filesystem.noarch : Filesystem structure for clamav clamav-milter-systemd.x86_64 : Systemd initscripts for the clamav : sendmail-milter clamav-scanner-systemd.x86_64 : systemd initscripts for clamav scanner daemon clamav-server-systemd.x86_64 : Systemd initscripts for clamav server clamav-unofficial-sigs.noarch : Scripts to download unoffical clamav signatures clamav.x86_64 : End-user tools for the Clam Antivirus scanner clamav-data.noarch : Virus signature data for the Clam Antivirus scanner clamav-devel.x86_64 : Header files and libraries for the Clam Antivirus scanner clamav-lib.x86_64 : Dynamic libraries for the Clam Antivirus scanner clamav-milter.x86_64 : Milter module for the Clam Antivirus scanner clamav-update.x86_64 : Auto-updater for the Clam Antivirus scanner data-files according to https://pkgs.org/download/clamav-server it should be clamd
(update) worked on this some Friday, and fell into a rabbit hole of epel not being updated across our mrepo servers due to a 404 mirror site. We increased resources on this VM which should help when clam is running, however I don't have a confident fix to it running on reboot yet.
Attached patch Fix puppetSplinter Review
When one runs on the host it barfs at : Notice: /Stage[main]/Postfix::Mta/Ldap_users::Groupmember[user_amavis_to_group_amavis]/Exec[usermod -G clamav -a amavis]: Dependency Package[clamav-server] has failures: true Warning: /Stage[main]/Postfix::Mta/Ldap_users::Groupmember[user_amavis_to_group_amavis]/Exec[usermod -G clamav -a amavis]: Skipping because of failed dependencies
Attachment #9010891 - Flags: review?(cshields)
It also seems , but I can't prove it - that things work better once iptables are loaded.
(IRC) Fri 09:03:44 UTC [9732] [Unknown] smtp1.mail.mdc2.mozilla.com:Uptime is WARNING: WARNING: Linux smtp1.mail.mdc2.mozilla.com 3.10.0-862.14.4.el7.x86_64 - up 2 minutes (http://m.mozilla.org/Uptime) (In reply to Ludovic Hirlimann [:Usul] from comment #11) > It also seems , but I can't prove it - that things work better once iptables > are loaded. so stopping the services and running puppet fixes the issues.
Attachment #9010891 - Flags: review?(cshields) → review+
Comment on attachment 9010891 [details] [diff] [review] Fix puppet checked in 15db7cd8025833346616697fe20cd2e21fcc0a2e
Attachment #9010891 - Flags: checked-in+
fixing puppet fixed the issue. It didn't come back in a bad state since the puppet change.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: