Open
Bug 1336410
Opened 7 years ago
Updated 5 years ago
Fix warning/error "Could not lock PID file" in jobqueue code
Categories
(bugzilla.mozilla.org :: Email Notifications, defect)
Tracking
()
NEW
People
(Reporter: dylan, Unassigned)
References
Details
Feb 1 21:29:41 jobqueue2.bugs.scl3.mozilla.com jobqueue.pl[24518]: Sucessfully daemonized Feb 1 21:29:41 jobqueue2.bugs.scl3.mozilla.com jobqueue.pl[24518]: Could not lock PID file /var/run/bugzilla-queue8.worker.pid: Resource temporarily unavailable at local/lib/perl5/Daemon/Generic.pm line 181. Which happens because we re-execute jobqueue.pl... which calls Bugzilla::JobQueue::Runner->new() Bugzilla::JobQueue::Runner ineherits from Daemon::Generic https://metacpan.org/source/MUIR/Daemon-Generic-0.84/lib/Daemon/Generic.pm#L40 So error is coming from the lock() call on https://metacpan.org/source/MUIR/Daemon-Generic-0.84/lib/Daemon/Generic.pm#L181 Now this is interesting. Why would lock fail? Because the file is already locked -- and it is already locked because we re-execute jobqueue.pl from https://github.com/mozilla-bteam/bmo/blob/master/Bugzilla/JobQueue.pm#L114 This was done to reduce memory fragmentation/leaking in bug 832893. I am surprised that this works, reading the code for Daemon::Generic. Each time jobqueue.pl is run, Daemon::Generic->new is called. Provided that gd_pidfile is provided, an attempt at locking will happen. However it must be that initially the pidfile isn't set, and the first two attempts at locking seem to succeed. Only the third, on line 181 fails. It's also not clear if this is happening in the parent or the child. It's also not clear that error handling is correct in this instance. When the connection in the subprocess worker dies, is that error communicated to the queue runner? Is this related to entries in ts_error causing jobs to not get processed? Finally, this method of getting the memory cleared results in *more* memory being used. If we used fork/exec(), each child process would share a lot of memory with its parent -- but in this case we have N job queue runners each with 1 subprocess worker, so we have a higher constant overheard (at the cost of a lower usage over time). I think with the memory leak fixes I've been working on we can switch back to the traditional model.
Reporter | ||
Updated•7 years ago
|
Summary: Consider reverting bug 832893 → Fix warning/error "Could not lock PID file" in jobqueue code
Reporter | ||
Updated•5 years ago
|
Assignee: dylan → nobody
You need to log in
before you can comment on or make changes to this bug.
Description
•