Closed
Bug 1125159
Opened 10 years ago
Closed 10 years ago
zlb[38] complaining about nf_conntrack
Categories
(Infrastructure & Operations Graveyard :: WebOps: Other, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: Usul, Assigned: gozer)
References
Details
(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/337] )
Fri 06:22:15 PST [1410] zlb8.ops.phx1.mozilla.com:nf_conntrack_count is WARNING: WARN: nf_conntrack table size 948496/1048576, 90% (http://m.mozilla.org/nf_conntrack_count)
<Usul> gozer, it could be a fallout of bug 1119899
<gozer> Oui, trop de malades ici depuis un bout....
<gozer> Usul: yeah, could very well be
<Usul> anything we can do before we run out of conntracks
<gozer> Usul: yeah, working on it
<linda> :)
<gozer> Usul: the issue is that we are counting wrong, so we are not really running out atm
<Usul> gozer <3
<gozer> Usul: nothing much I can do here, unfortunately
<gozer> not sure the hotfix is related, shouldn’t be touching aus4
<Usul> :(
<gozer> I suspect way more AUS4, as this puppy went live relatively recently
<gozer> Usul: nothing I can do, but doesn’t mean there is a problem ATM, like I said, that check is somewhat misleading
<gozer> it counts used slots in the conntrack tables vs. maximum
<gozer> but lots of these entries are effectively empty placeholders, waiting to be replaced, an optimization of sorts
<gozer> Usul: 609484 such spare entries in there
<Usul> ok
<Usul> so I'm going to ack the alert
<gozer> Usul: yes, for now, nothing more can be done, unfortunately
<Usul> kk
<gozer> Usul: but I suggest bringing someone from releng in on this, bhearsum, actually, since he’s the AUS4 dude
<Usul> of course he's not around now :(
<gozer> Usul: like I said, it’s not a serious problem ATM, remember, there are an extra 60,000 spares entries in that table
<gozer> Usul: I’ll follow up on that later today too, cc me on the zlb bug please
Comment 1•10 years ago
|
||
Thanks for the cc. I don't really understand this bug or what (if anything) you'd like from me. Please let me know if we should talk about this or I need to do anything!
Comment 2•10 years ago
|
||
<nagios-phx1:#sysadmins> Sun 05:26:06 PST [1035]
zlb8.ops.phx1.mozilla.com:nf_conntrack_count is WARNING: WARN: nf_conntrack
table size 945525/1048576, 90% (http://m.mozilla.org/nf_conntrack_count)
This alerted again, AUS4 is indeed taking a lot of connections at the moment.
I've increased nf_conntrack_max and decreased nf_conntrack_tcp_timeout_established to stop it alerting until a better solution can be worked out.
# echo 2097152 > /proc/sys/net/netfilter/nf_conntrack_max
# sysctl net.netfilter.nf_conntrack_tcp_timeout_established=43200
Comment 4•10 years ago
|
||
zlb3 alerted today:
nagios-scl3 Mon 05:52:01 PST [5164] zlb3.ops.scl3.mozilla.com:nf_conntrack_count is WARNING: WARN: nf_conntrack table size 983694/1048576, 93%
Comment 5•10 years ago
|
||
Do we need to bump it up to the levels that aus3 is at, maybe? It's serving pretty much all of the traffic that aus3 used to.
Comment 6•10 years ago
|
||
As far as I can see AUS4 is being served from the same ZLBs as AUS3. These are machine-level settings that haven't changed for either until now so I'm unclear what you mean by bumping it up to other levels?
Comment 7•10 years ago
|
||
(In reply to Peter Radcliffe [:pir] from comment #6)
> As far as I can see AUS4 is being served from the same ZLBs as AUS3. These
> are machine-level settings that haven't changed for either until now so I'm
> unclear what you mean by bumping it up to other levels?
Based on your previous comment I assumed that nf_conntrack_max was something that was set per domain (eg, aus3 and aus4 could have different settings for it). Looks like I'm wrong, sorry for the fly by!
Comment 8•10 years ago
|
||
I have also did the same workaround pir did for zlb8 to zlb3
:
# echo 2097152 > /proc/sys/net/netfilter/nf_conntrack_max
# sysctl net.netfilter.nf_conntrack_tcp_timeout_established=43200
Comment 9•10 years ago
|
||
Might be related to redirects aus3.mo --> aus4.mo ? Also, aus2.m.o --> aus3.m.o but that's lower traffic and been there for a long time.
Comment 10•10 years ago
|
||
This is also being reported in bug 1069798.
Put back the workaround from above on zlb8. Situation doesn't seem to have changed.
Comment 11•10 years ago
|
||
and workaround back on zlb3
Updated•10 years ago
|
Summary: zlb8 complaining about nf_conntrack → zlb[38] complaining about nf_conntrack
Comment 12•10 years ago
|
||
<digi:#systems> if we dont have any stateful rules we should disable conntrak
<digi:#systems> /etc/modprobe.d/blacklist.conf
<digi:#systems> iptables -L will tell you
Comment 13•10 years ago
|
||
We've altered how AUS3/4 behave since this last occurred, which has kept the conntrack issue from being a problem since those fixes.
:pir, can we downgrade the conntrack check to IRC only? It's very helpful *if* there's an incident, but otherwise it's not the sort of thing that should page-and-escalate on its own.
Flags: needinfo?(pradcliffe+bugzilla)
Comment 14•10 years ago
|
||
Should be done.
pir@wedge> svn diff
Index: puppet/trunk/modules/nagios/manifests/mozilla/services.pp
===================================================================
--- puppet/trunk/modules/nagios/manifests/mozilla/services.pp (revision 103408)
+++ puppet/trunk/modules/nagios/manifests/mozilla/services.pp (working copy)
@@ -5068,6 +5068,7 @@
service_description => "nf_conntrack_count",
check_command => 'check_iptables',
normal_check_interval => 30,
+ contact_groups => 'sysalertsonly',
hostgroups => $::fqdn ? {
'nagios1.private.scl3.mozilla.com' => [
'external-zeus'
ir@wedge> svn ci -m "make nf_conntrack_count an IRC only alert, bug 1125159"
Sending puppet/trunk/modules/nagios/manifests/mozilla/services.pp
Transmitting file data .
Committed revision 103409.
Flags: needinfo?(pradcliffe+bugzilla)
Comment 16•10 years ago
|
||
Thanks for all the time and effort on this folks!
Updated•6 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•