Closed
Bug 1124783
Opened 11 years ago
Closed 9 years ago
Careers-prod syncjobvite not running?
Categories
(Infrastructure & Operations Graveyard :: WebOps: Engagement, task)
Infrastructure & Operations Graveyard
WebOps: Engagement
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: osmose, Assigned: cliang)
References
Details
(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/2073] )
Attachments
(1 file)
2.73 KB,
text/x-log
|
Details |
In bug 1119413, comment 5 we got a report that, after we updated the frequency of the syncjobvite cron job in bug 1119536, the jobs weren't actually being synced. I manually ran the sync command and it updated properly.
Can we confirm that the cronjob is running, and if so, is it possible to see the output from the last day or so? Thanks!
Assignee | ||
Comment 1•11 years ago
|
||
Based on the cron log, it looks like the job is running regularly. [1] Right now, it looks like errors are supposed to go to a common email address: for now, I've re-directed it to myself just to make sure there aren't any mailbox issues.
Unfortunately, the flock command that is being used to prevent multiples copies of the script from running also seems to swallow STDOUT. [2] If you think that it's safe / won't cause more harm than good, I could remove the flock and have output dumped to a file.
[1] Jan 22 12:45:01 genericadm.private.phx1.mozilla.com CROND[13296]: (root) CMD (/usr/bin/flock -w 10 /var/lock/careers-prod /data/genericrhel6/src/careers.mozilla.org/lumbergh/manage.py syncjobvite > /dev/null 2>&1)
Jan 22 12:48:01 genericadm.private.phx1.mozilla.com CROND[25414]: (root) CMD (/usr/bin/flock -w 10 /var/lock/careers-prod /data/genericrhel6/src/careers.mozilla.org/lumbergh/manage.py syncjobvite > /dev/null 2>&1)
Jan 22 12:51:01 genericadm.private.phx1.mozilla.com CROND[27367]: (root) CMD (/usr/bin/flock -w 10 /var/lock/careers-prod /data/genericrhel6/src/careers.mozilla.org/lumbergh/manage.py syncjobvite > /dev/null 2>&1)
[2] Running '/usr/bin/flock -w 10 /var/lock/careers-prod /data/genericrhel6/src/careers.mozilla.org/lumbergh/manage.py syncjobvite' -> no output. Running '/data/genericrhel6/src/careers.mozilla.org/lumbergh/manage.py syncjobvite' --> output
Assignee: server-ops-webops → cliang
Reporter | ||
Comment 3•11 years ago
|
||
(In reply to C. Liang [:cyliang] from comment #1)
> Based on the cron log, it looks like the job is running regularly. [1]
> Right now, it looks like errors are supposed to go to a common email
> address: for now, I've re-directed it to myself just to make sure there
> aren't any mailbox issues.
>
> Unfortunately, the flock command that is being used to prevent multiples
> copies of the script from running also seems to swallow STDOUT. [2] If you
> think that it's safe / won't cause more harm than good, I could remove the
> flock and have output dumped to a file.
>
>
> [1] Jan 22 12:45:01 genericadm.private.phx1.mozilla.com CROND[13296]: (root)
> CMD (/usr/bin/flock -w 10 /var/lock/careers-prod
> /data/genericrhel6/src/careers.mozilla.org/lumbergh/manage.py syncjobvite >
> /dev/null 2>&1)
> Jan 22 12:48:01 genericadm.private.phx1.mozilla.com CROND[25414]: (root) CMD
> (/usr/bin/flock -w 10 /var/lock/careers-prod
> /data/genericrhel6/src/careers.mozilla.org/lumbergh/manage.py syncjobvite >
> /dev/null 2>&1)
> Jan 22 12:51:01 genericadm.private.phx1.mozilla.com CROND[27367]: (root) CMD
> (/usr/bin/flock -w 10 /var/lock/careers-prod
> /data/genericrhel6/src/careers.mozilla.org/lumbergh/manage.py syncjobvite >
> /dev/null 2>&1)
>
> [2] Running '/usr/bin/flock -w 10 /var/lock/careers-prod
> /data/genericrhel6/src/careers.mozilla.org/lumbergh/manage.py syncjobvite'
> -> no output. Running
> '/data/genericrhel6/src/careers.mozilla.org/lumbergh/manage.py syncjobvite'
> --> output
I think dumping it to a file for a day and then having the admins make some small changes to see if we can note when the changes show up would be fine. Since we've confirmed the job is running, though, I'd first like to get the admins to test if their changes still aren't being pulled in by the automatic job. I'll update once I hear back from them.
Reporter | ||
Comment 4•11 years ago
|
||
Looks like updates aren't happening. Since the site isn't being updated we know there's updates in the queue waiting to be synced, so we don't need to have it sit for a day, just the output of like two or three runs would be fine. Thanks!
Updated•11 years ago
|
Flags: needinfo?(cliang)
Assignee | ||
Comment 5•11 years ago
|
||
The logs look fairly innocent. (Example posted below. [1] They looked the same for six runs.) However, in doing this tweak, I found that there was a set of existing careers cron process that seem to have seized up since January 19th:
root 16386 16379 0 Jan19 ? 00:00:00 /bin/sh -c /usr/bin/flock -w 10 /var/lock/careers-prod /data/genericrhel6/src/careers.mozilla.org/lumbergh/manage.py syncjobvite > /dev/null 2>&1
root 16396 16386 0 Jan19 ? 00:00:00 /usr/bin/flock -w 10 /var/lock/careers-prod /data/genericrhel6/src/careers.mozilla.org/lumbergh/manage.py syncjobvite
These may have been the source of the problem. >_< I manually cleared these out.
[1] /data/genericrhel6/src/careers.mozilla.org/lumbergh/vendor-local/src/django/django/db/backends/mysql/base.py:114: Warning: Incorrect string value: '\xE2\x80\xA8<br...' for column 'description' at row 1
return self.cursor.execute(query, args)
Synced: 25
Added: 0
Removed: 0
Removed departments: 0
Flags: needinfo?(cliang)
Reporter | ||
Comment 6•11 years ago
|
||
(In reply to C. Liang [:cyliang] from comment #5)
> The logs look fairly innocent. (Example posted below. [1] They looked the
> same for six runs.) However, in doing this tweak, I found that there was a
> set of existing careers cron process that seem to have seized up since
> January 19th:
>
> These may have been the source of the problem. >_< I manually cleared
> these out.
That seems to have been the problem! I've confirmed with HR that their updates are being synced correctly to the site again. Thanks! :D
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 7•10 years ago
|
||
Hi team - It looks like this sync has stopped running again. :(
One of the reqs approved two days ago has not posted on the careers page. I also made a small change to one of the current reqs showing on the page, waited 15 minutes, and it has not updated.
The small change that should be showing up is that "Mozilla Corporation" in the final paragraph should be in bold here: http://careers.mozilla.org/en-US/position/oJ2w0fw1
Help!
Thank you,
Andrea
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•10 years ago
|
Flags: needinfo?(cliang)
Assignee | ||
Comment 8•10 years ago
|
||
azingerman: The cron job that was having issues is the one that updates the development environment for careers (careers-dev.allizom.org).
I don't think that the code behind careers.mozilla.org site is automatically updated via a job. The only job that I see is a manage.py syncjobvite job which looks like it ran relatively recently (times in PDT):
Apr 14 13:54:01 genericadm.private.phx1.mozilla.com CROND[10618]: (root) CMD (/usr/bin/flock -w 10 /var/lock/careers-prod /data/genericrhel6/src/careers.mozilla.org/lumbergh/manage.py syncjobvite > /dev/null 2>&1)
I have run that synchjobinvite job manually (without the redirect) and did not see any errors:
[cliang@genericadm.private.phx1 cron.d]$ sudo /data/genericrhel6-dev/src/careers-dev.allizom.org/lumbergh/manage.py syncjobvite
Synced: 42
Added: 0
Removed: 0
Removed departments: 0
Flags: needinfo?(cliang)
Comment 9•10 years ago
|
||
C. Liang, the change Andrea is referring to is not a code change, but change to copy entered into jobvite.
It's possible the site is updating and something else is going on.
Andrea can you confirm this change is in jobvite and if so add the details here. I am curious what exact html you are using. So something like "<b>Mozilla</b>"
Comment 10•10 years ago
|
||
Thanks for the explanation, Ben! Yes, the change I made is in Jobvite, and unfortunately it's a WYSIWYG text editor - https://mozilla.box.com/texteditor
It doesn't allow for HTML... or else I'd use it (it would get rid of a lot of formatting issues!). :)
Comment 11•10 years ago
|
||
Hi guys,
One of the recruiters opened a ticket with Jobvite on a related issue, and this is what they had to say:
It looks like Mozilla updated the locations for Content Partnerships Manager role. When these changes are saved in Jobvite, the system assigns a new requisition ID.
Since Mozilla is using our API feature, the API needs to be "called" so that the job listings (and their corresponding ID's) can be refreshed. It looks like the jobs haven't been refrshed yet, because the "Apply" link is referencing the pre-edited version of this job (it's also why the job's location still says "All" instead of filtering to San Francisco, Mountain View, and New York).
I recommend asking your web engineer to refresh the job reqs.
I don't know if this helps the not-updating situation, but I wanted to let you know!
Thank you,
Andrea
Comment 12•10 years ago
|
||
OK found the issue.
C. Liang the script you ran was on dev and DID update this listing with the change.
http://careers-dev.allizom.org/en-US/position/oJ2w0fw1
Note mozilla corporation in the last paragraph contained in the white area is bold.
This change has not occurred on production which makes me think this script is not running there.
Flags: needinfo?(cliang)
Assignee | ||
Comment 13•10 years ago
|
||
I'm wondering if:
1) the job doesn't work correctly from cron if there is a large number of files to sync -or-
2) something has changed such that the job is not running correctly when invoked via cron but does work when invoked with a terminal attached. =\
I can see many entries in the cron log that show that cron is executing the script.[1] If I temporarily stop the cron job and run the script manually, it syncs a large number of jobs[2] and I see that the changes mentioned in comment #12 are now present in production.
[1] Apr 17 09:24:01 genericadm.private.phx1.mozilla.com CROND[13834]: (root) CMD (/usr/bin/flock -w 10 /var/lock/careers-prod /data/genericrhel6/src/careers.mozilla.org/lumbergh/manage.py syncjobvite > /dev/null 2>&1)
[2] $ sudo /data/genericrhel6/src/careers.mozilla.org/lumbergh/manage.py syncjobvite
Synced: 41
Added: 2
Removed: 4
Removed departments: 0
Flags: needinfo?(cliang)
Comment 14•10 years ago
|
||
Great, looks like the missing req is now showing on the careers page too. Is there anything I can do to help make sure the sync keeps running? :)
Thank you!
Andrea
Comment 15•10 years ago
|
||
Cyliang can you advise on where this problem may be.. either with the code or with infra?
I am unclear if I should have one of my developers look into this or if it is something you are investigating.
Thanks.
Flags: needinfo?(cliang)
Comment 16•10 years ago
|
||
Hi guys, until we can find the underlying cause of this, can we request to have this manually updated at least once a day? It would be great to get a fresh update today. :) Thank you!
Comment 17•10 years ago
|
||
Andrea: Agree we need to get this sorted sooner rather than later.
Cyliang can you respond to comment 15, I am not sure if this is something on our side, like a possible error condition that is not accounted for or if it is something on the machine.
Assignee | ||
Comment 18•10 years ago
|
||
Ben / Andrea: The behavior I’m seeing indicates that cron is running the job as it should be. Since flock seems to swallow all the output from the manage.py invocation, I’ve reworked things so that the file lock is being handled in a script so that I can capture output to a log file in /tmp.
Since I’m only capturing the output of the latest run, would it be possible to coordinate with someone to do some testing at a particular time & day? I’d like someone to make a set of changes to that need to be synced to jobvite and I want to see what the output of the cron job is.
Flags: needinfo?(cliang)
Comment 19•10 years ago
|
||
Cyliang: Andrea will be your person to test. Folks on my team do not have access to jobvite to make changes.
Comment 20•10 years ago
|
||
Hi cyliang - please let me know when would be a good time for you and I will make as many changes as you want. :) Thank you!
Comment 21•10 years ago
|
||
Checking back in.. were you able to co-ordinate and find the problem?
Flags: needinfo?(cliang)
Assignee | ||
Comment 22•10 years ago
|
||
I've been out on PTO, which put us into a bit of a holding pattern.
Andrea: How about Monday, May 4th, at 11 AM PDT? If it works for you, we can meet up on Vidyo, in my room, to coordinate (you telling me when a change has gone in so I know when to watch the log file).
Flags: needinfo?(cliang) → needinfo?(azingerman)
Comment 23•10 years ago
|
||
Sure thing. I will be in your Vidyo room at 11am. Thank you!
Flags: needinfo?(azingerman)
Assignee | ||
Comment 24•10 years ago
|
||
It looks like the automated job is syncing to jobvite, so we shouldn't need the manual update.
I've filed a separate bug (1155751) to see if there is a way to proactively discover jobs that are in the "stuck" state and, hopefully, automatically attempt to un-stick them.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 10 years ago
Resolution: --- → FIXED
Comment 25•10 years ago
|
||
Hi guys,
This may be an issue again. Michael from MoFo created a Req today (hours ago) that has not been updated on the Careers site. It's called "Gigabit Fund Manager". Can you check to see if it's stuck on the backend pretty please? :)
Thanks!
Andrea
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 26•10 years ago
|
||
Note there were some huge infrastructure changes this week. I am unsure if this is related but it is entirely possible. C. can you double check, the data does look to be in the jobvite feed.
Flags: needinfo?(cliang)
Assignee | ||
Comment 27•10 years ago
|
||
Is it possible that the job is trying to insert UTF8 data into a column that is set for latin1 character encoding?
The syncjobvite job seems to be encountering issues interacting with database. (See attached stacktrace.) mpressman, one of the DB folk, pointed out to me that, of the three tables that are in the careers database, two of the tables have description columns in utf8 while the other has one that is in latin1.
I've looked at django_jobvite but it's not 100% clear to me which table the command is trying to interact with. I don't know if this is a case where we're now finally getting UTF8 data for this field or if something that was supposed to "normalize" this data has failed to do so.
Flags: needinfo?(cliang)
Comment 28•10 years ago
|
||
Giorgos can you take a look at C's comment and let us know what you think?
Flags: needinfo?(giorgos)
Comment 29•10 years ago
|
||
Hey Ben, just checking in to let you know that the Careers Page and Jobvite are still not syncing. This is a critical for us (we are losing high volumes of potential candidates since these roles are not being posted, and requisitions will take longer to fill because they're just sitting with no candidates).
Let us know if there is anything we can help provide to keep this moving- we appreciate all of you and your team's help resolving this issue.
Rachel Berenbaum
Comment 30•10 years ago
|
||
Understood - unfortunately I need to wait for giorgos to take a look. Note he is in greece so hopefully this can be resolved by tomorrow morning PT
Comment 31•10 years ago
|
||
Looks like the missing jobs are appearing. I am double checking there is no additional work in bug 1221147. Once that is closed out I will resolve this one.
Flags: needinfo?(giorgos)
Comment 32•10 years ago
|
||
Looks like it is working again!
Comment 33•10 years ago
|
||
Everything is sorted.
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
Comment 34•10 years ago
|
||
Hi there,
Is this not working again? I posted a job about half an hour ago and it doesn't appear to be on the Careers site...
https://hire.jobvite.com/CompanyJobs/Careers.aspx?nl=1&k=Job&j=oIbb2fwQ
Michael.
Comment 35•10 years ago
|
||
I checked and the job mentioned above is in the feed but not on the site.
C can you take a look.
Side Note in the near future careers will be moved to AWS and if all goes as planned we should have some pro-active monitoring that will alert us to these types of problems.
Status: RESOLVED → REOPENED
Flags: needinfo?(cliang)
Resolution: FIXED → ---
Comment 36•10 years ago
|
||
Note that we made this change yesterday (bug 1229388) which should have affected only careers-dev
Comment 37•10 years ago
|
||
This looks to have updated.. will resolve but leave the needinfo for an explanation.
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 38•10 years ago
|
||
Looks like the lock file wasn't cleaned up properly (dated Nov 30 09:42). I've manually cleared that file and the log file looks "normal":
$ cat /tmp/careers-syncjobvite.log
Synced: 58
Added: 0
Removed: 0
Removed departments: 0
Assignee | ||
Updated•10 years ago
|
Flags: needinfo?(cliang)
Comment 39•10 years ago
|
||
Hi there,
We're encountering this problem again. Re
Comment 40•10 years ago
|
||
*** Hi there,
We're encountering this problem again. Requisitions that have been opened the past few days are not appearing on the Careers Page. Could we look to see what's happening?
Thanks!
Rachel
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 41•10 years ago
|
||
Can you give me a job that has been entered but is not appearing to double check. I will need info C. Liang to diagnose.
Please note we are incredibly close to moving this site to AWS (a few days). When we do, we will have monitoring and alerting on the jobvite update. That means my team will be alerted when this fails (it should fail less often) and hopefully we will fix before you notice.
Flags: needinfo?(cliang)
Comment 42•10 years ago
|
||
Wonderful - that will be great. The requisition number is #3402, Front End Developer Intern - London.
Comment 43•10 years ago
|
||
Ok on the bright side it is appearing on the test instance we have running on AWS so we know it is in the feed.
https://careers-prod.us-west.moz.works/listings/
I think this issue is likely due to what C has fixed for us before in comment #38.
Comment 44•10 years ago
|
||
C - ping, can you take a look at this today?
Assignee | ||
Comment 45•10 years ago
|
||
Removed errant lock file (dated Jan 10 11:51).
It looks like the job ran successfully:
Synced: 68
Added: 2
Removed: 1
Removed departments: 0
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Flags: needinfo?(cliang)
Resolution: --- → FIXED
Comment 46•10 years ago
|
||
Thanks everyone!
Comment 47•9 years ago
|
||
Hi team - this sync seems to have stalled again since yesterday. The MoFo job "Project Manager, State of the Web" should be renamed to "Lead Writer and Project Manager, State of the Web" when it's working again. Thank you!
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 48•9 years ago
|
||
Andrea, thanks for flagging. If this ever does happen again please open a new bug.
So this is partially my fault. We received a ping stating this failed during a period where we were getting a ton of alerts, I muted the alert but neglected to inform Giorgos who due to his timezone likely did not see it.
Giorgos - dead mans snitch still has this as paused and it has not checked in today. Can you double check what is going on.
Flags: needinfo?(giorgos)
Comment 49•9 years ago
|
||
Hi Ben - Apologies, I will open new bugs from now on.
Comment 50•9 years ago
|
||
I'm seeing the new position now.
The update job has been restarted and a fix for the latest issue is under review https://github.com/mozilla/lumbergh/pull/166
Thanks!
Status: REOPENED → RESOLVED
Closed: 10 years ago → 9 years ago
Flags: needinfo?(giorgos)
Resolution: --- → FIXED
Updated•9 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•