It looks like the update_pushlog manage command stopped running sometime last Thursday. I think we can officially label this scheduled command as 'pesky'... https://datazilla.mozilla.org/refdata/pushlog/list/?days_ago=5&branches=Mozilla-Inbound https://datazilla.mozilla.org/refdata/pushlog/list?days_ago=1&branches=Mozilla-Inbound This seems to occur every few weeks or so https://bugzilla.mozilla.org/show_bug.cgi?id=823690. I'm unable to reproduce this in my development environment so I cannot tell if it's a bug or environment issue but the command seems to hang on something regularly. Are there any error messages in the cron log to give a clue why this command seems to be hanging? If there is an old command still running in the process list, could someone collect some output from 'strace' on it to determine what it might be hanging on in production? Once the process list has been examined go ahead and kill any old update_pushlog jobs still running, clear out any update_pushlog.lock files present in the run directory, and try running the following command from the command line: python $DATAZILLA_HOME/manage.py update_pushlog --repo_host=hg.mozilla.org --hours=24 to confirm that it's not generating any errors.
looks like the cron hung on jan10th. i have killed that process and removed the lock file. the next schedule cron kicked off update_pushlog and is currently running... [email@example.com ~]# ps aux | grep [u]pdate_pushlog root 5483 0.0 1.6 352424 32512 ? Ss Jan10 0:00 /usr/bin/python /data/datazilla/src/datazilla.mozilla.org/datazilla/manage.py update_pushlog --repo_host=hg.mozilla.org --hours=24 [firstname.lastname@example.org ~]# kill 5483 [email@example.com ~]# rm /root/update_pushlog.lock rm: remove regular empty file `/root/update_pushlog.lock'? y
Assignee: server-ops-webops → cturra
Status: NEW → ASSIGNED
Whiteboard: [triaged 20130113][push interrupt]
Thanks for doing that, looks like that fixed it again but without more information I doubt I will be able to determine the cause of the problem. I will try a few more experiments in dev to see if I can reproduce... Please be sure to run strace on the running process before killing the process in the future. [firstname.lastname@example.org ~]#strace -p processid_of_command so in this case it would have been, [email@example.com ~]#strace -p 5483 The output of strace might provide some useful information regarding what the program is hanging on in production. Until next time...
:jeads - noted! you're absolutely right, an strace would have been useful. i will keep that in mind if this happens again.
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.