Closed Bug 1203528 Opened 9 years ago Closed 9 years ago

Investigate MDN down-time incident 2015-09-09

Categories

(developer.mozilla.org Graveyard :: General, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: groovecoder, Assigned: groovecoder)

References

()

Details

(Keywords: in-triage)

:cyliang - you mentioned getting the apache core dumps. Can you pastebin them and link here, or maybe attach them here?
Flags: needinfo?(cliang)
Group: mozilla-employee-confidential
Flags: needinfo?(cliang)
The core dumps are on developeradm.private.scl3, in /tmp/cores.  It turns out that they are too big to attach to the bug, even after being compressed.
Group: mozilla-employee-confidential
Assignee: nobody → lcrouch
Severity: normal → major
Keywords: in-triage
Dang, I missed them. :(

[lcrouch@developeradm.private.scl3 ~]$ ls -l /tmp/cores/
total 0

In any case, this incident and a couple others were immediately preceded by a spike in $compare transactions. We previously made some changes to mitigate the risks of the $compare transaction overwhelming us [1][2], but Monday I made a more major change to prevent the $compare transaction from executing its CPU-intensive operations at all during an HTTP request. [3] It may just shift the load from the web processes to the celery processes, but at least it will prevent these run-away $compare operations from overwhelming our web node CPUs.

So, I'm going to call this incident investigated and resolved, knowing that we still haven't quite cleaned up everything.

[1] https://github.com/mozilla/kuma/pull/3386
[2] https://github.com/mozilla/kuma/pull/3447
[3] https://github.com/mozilla/kuma/pull/3497
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.