Closed Bug 1049885 Opened 11 years ago Closed 11 years ago

Air Mozilla Cheif logs Internal Server Error

Categories

(Infrastructure & Operations :: IT-Managed Tools, task)

task
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: peterbe, Assigned: cturra)

Details

(Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/697] )

I'm unable to make a stage push because I'm getting an Internal Server Error. See http://genericadm.private.phx1.mozilla.com/chief/air.stage/history
Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/697]
looks like redis had crashed on this chief node :( i have kicked the redis process and everything should be back. will mark this bug as r/fixed, but i am going to look into adding this scenario to my redis process checking script/cron.
Assignee: server-ops-webops → cturra
Status: NEW → RESOLVED
Closed: 11 years ago
OS: Mac OS X → All
Hardware: x86 → All
Resolution: --- → FIXED
Yay! I was just able to upgrade stage. Running prod upgrade now.
just a quick follow up, i've added the following to the redis check check script to explicitly be sure redis is actually alive (we're generally more concerned about it misbehaving and didn't have a `are you actually alive` check). $ svn diff Index: modules/webapp/templates/admin/data-bin/redis-memory-check.sh.erb =================================================================== --- modules/webapp/templates/admin/data-bin/redis-memory-check.sh.erb (revision 91617) +++ modules/webapp/templates/admin/data-bin/redis-memory-check.sh.erb (working copy) @@ -14,6 +14,19 @@ $REDIS_SERVICE_NAME start > /dev/null } +function start { + $REDIS_SERVICE_NAME start > /dev/null +} + + +# first, check if redis is running. +PORT_CHECK=`nc -z localhost 6388 > /dev/null; echo $?` +if [ $PORT_CHECK -gt 0 ]; then + start + # since we've now started redis, we can exit the script + exit 0 +fi +
Status: RESOLVED → VERIFIED
looks like the code i committed to this redis check was stuck trying to start the redis process while another was dead but not killed in the background. i have added some logic that should address this. in the mean time, chief is back. $ curl -I http://genericadm.private.phx1.mozilla.com/chief/air.dev/history HTTP/1.1 200 OK Date: Thu, 07 Aug 2014 18:29:01 GMT Server: gunicorn/0.14.5 Content-Type: text/html; charset=utf-8 Content-Length: 1621 Connection: close
You need to log in before you can comment on or make changes to this bug.