Closed Bug 1012281 Opened 11 years ago Closed 11 years ago

puppet foreman plugin OOM'ing httpd in scl3, corp.phx1

Categories

(Infrastructure & Operations :: Infrastructure: Puppet, task)

Other
Other
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nagiosapi, Assigned: Atoll)

References

()

Details

(Whiteboard: [id=nagios1.private.scl3.mozilla.com:358911])

Automated alert report from nagios1.private.scl3.mozilla.com: Hostname: puppetmaster2.private.scl3.mozilla.com Service: Puppetmaster backend httpd State: CRITICAL Output: CRITICAL - Socket timeout after 10 seconds Runbook: http://m.allizom.org/Puppetmaster+backend+httpd
Assignee: nobody → infra
Component: Server Operations: MOC → Infrastructure: Puppet
Product: mozilla.org → Infrastructure & Operations
QA Contact: jdow
fixed itself.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
:jakem added a firewall rule for the puppet servers that permits them to contact the third Zeus; it's unclear why this weekend's upgrade work exposed the missing firewall rule, but we also discovered that the foreman reporting plugin does network timeouts *very badly*, OOM'ing httpd gradually. Reopening until we reenable foreman tomorrow.
Assignee: infra → rsoderberg
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Summary: Puppetmaster backend httpd on puppetmaster2.private.scl3.mozilla.com is CRITICAL: CRITICAL - Socket timeout after 10 seconds → puppet foreman plugin OOM'ing httpd in scl3, corp.phx1
:jakem resolved the sticky routing issues on the Zeus, so in theory we should be good to go.
Is this still an issue?
Nope.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.