Closed
Bug 830440
Opened 12 years ago
Closed 12 years ago
Datazilla update_pushlog Cron Job Not Running
Categories
(Infrastructure & Operations Graveyard :: WebOps: Other, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jeads, Assigned: cturra)
Details
(Whiteboard: [triaged 20130113][push interrupt])
It looks like the update_pushlog manage command stopped running sometime last Thursday. I think we can officially label this scheduled command as 'pesky'...
https://datazilla.mozilla.org/refdata/pushlog/list/?days_ago=5&branches=Mozilla-Inbound
https://datazilla.mozilla.org/refdata/pushlog/list?days_ago=1&branches=Mozilla-Inbound
This seems to occur every few weeks or so https://bugzilla.mozilla.org/show_bug.cgi?id=823690. I'm unable to reproduce this in my development environment so I cannot tell if it's a bug or environment issue but the command seems to hang on something regularly.
Are there any error messages in the cron log to give a clue why this command seems to be hanging?
If there is an old command still running in the process list, could someone collect some output from 'strace' on it to determine what it might be hanging on in production?
Once the process list has been examined go ahead and kill any old update_pushlog jobs still running, clear out any update_pushlog.lock files present in the run directory, and try running the following command from the command line:
python $DATAZILLA_HOME/manage.py update_pushlog --repo_host=hg.mozilla.org --hours=24
to confirm that it's not generating any errors.
Assignee | ||
Comment 1•12 years ago
|
||
looks like the cron hung on jan10th. i have killed that process and removed the lock file. the next schedule cron kicked off update_pushlog and is currently running...
[root@datazillaadm.private.scl3 ~]# ps aux | grep [u]pdate_pushlog
root 5483 0.0 1.6 352424 32512 ? Ss Jan10 0:00 /usr/bin/python /data/datazilla/src/datazilla.mozilla.org/datazilla/manage.py update_pushlog --repo_host=hg.mozilla.org --hours=24
[root@datazillaadm.private.scl3 ~]# kill 5483
[root@datazillaadm.private.scl3 ~]# rm /root/update_pushlog.lock
rm: remove regular empty file `/root/update_pushlog.lock'? y
Assignee: server-ops-webops → cturra
Status: NEW → ASSIGNED
Whiteboard: [triaged 20130113][push interrupt]
Reporter | ||
Comment 2•12 years ago
|
||
Thanks for doing that, looks like that fixed it again but without more information I doubt I will be able to determine the cause of the problem. I will try a few more experiments in dev to see if I can reproduce...
Please be sure to run strace on the running process before killing the process in the future.
[root@datazillaadm.private.scl3 ~]#strace -p processid_of_command
so in this case it would have been,
[root@datazillaadm.private.scl3 ~]#strace -p 5483
The output of strace might provide some useful information regarding what the program is hanging on in production. Until next time...
Assignee | ||
Comment 3•12 years ago
|
||
:jeads - noted! you're absolutely right, an strace would have been useful. i will keep that in mind if this happens again.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Updated•6 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•