Something is hosed on mozillians-dev.allizom.org. Poking around... generic1 /var/log/messages Oct 4 06:10:11 node202 abrt: saved core dump of pid 23317 to /tmp/core.23317 (13541376 bytes) Oct 4 06:10:11 node202 abrt: saved core dump of pid 23338 to /tmp/core.23338 (13541376 bytes) Oct 4 06:10:12 node202 abrt: saved core dump of pid 23318 to /tmp/core.23318 (13541376 bytes) Oct 4 06:10:12 node202 abrt: saved core dump of pid 23315 to /tmp/core.23315 (13541376 bytes) Probably unrelated... generic2 No recent apache error logs, so I don't think Apache is up
Apache cycled on -dev. -dev should be back up. Need to investigate why this occurred.
FWIW this affected basket-dev.allizom.org as well (Bug 684363 comment 35).
Do we have basic monitoring on this?
(In reply to Stephen Donner [:stephend] from comment #4) > Do we have basic monitoring on this? Currently there is no http health checks for these dev hosts. I believe this is caused by the graceful restarts on the host during the update process which was requested for mozillians. After several days of restarts apache segfaults and the parent process gets stuck in a strange state. This issue is not being seen on the stage environment, where graceful restarts are not occurring. I have updated the script on -dev to reload mozillians wsgi daemon using the method here http://code.google.com/p/modwsgi/wiki/ReloadingSourceCode#Reloading_In_Daemon_Mode
Verified FIXED: [15:35:51.035] GET https://mozillians-dev.allizom.org/en-US/ [HTTP/1.1 200 OK 53ms]