Infrastructure & Operations
WebOps: Other
6 years ago
4 years ago


(Reporter: cshields, Assigned: jakem)





6 years ago
While investigating mdn pages that were not appearing, I noticed pm-dekiwiki03 was missing health checks in zeus. It has 4800 hits in the log today, compared to 145k in each 01 and 02.  Zeus appears to have put this node in place and pulled it repeatedly.

I've kicked dekiwiki and apache on this but given the holiday I've just disabled it in the pool for now until it can be looked at further next week.

Comment 1

6 years ago
I've un-done this. All 3 nodes are back in the pool.

Worth noting that 03 was set to get 3x as much traffic as 01 and 02... it's a beefier box. I've reset this back to 1:1:1.

The failures we've been seeing are primarily along the lines of "MDN cannot talk to Bugzilla". I suspect these types of queries were piling up on 03, and resulting in lots of failures. The problematic template has been temporarily altered to not actually call Bugzilla, until the underlying problem can be rectified.

I don't know who or when it was set up, but zeus health checks are turned on for this pool now.

I'm going to close this out. I'm 99.9% sure it's heavily related to the other bugs we have on this (Bug 712237 being the main one).
Last Resolved: 6 years ago
Resolution: --- → FIXED
Component: Server Operations: Web Operations → WebOps: Other
Product: → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.