The default bug view has changed. See this FAQ.

change jobqueue.pl to spawn worker processes to deliver bugmail to avoid memory leaks

RESOLVED FIXED in Bugzilla 4.4

Status

()

Bugzilla
Email Notifications
--
critical
RESOLVED FIXED
4 years ago
2 months ago

People

(Reporter: glob, Assigned: glob)

Tracking

(Blocks: 1 bug)

4.0.9
Bugzilla 4.4
Dependency tree / graph
Bug Flags:
approval +
approval4.4 +

Details

Attachments

(1 attachment, 2 obsolete attachments)

(Assignee)

Description

4 years ago
recently we've seen several occurrences of jobqueue.pl consuming large amounts of memory, pushing the webheads into swap.

for example, most recently had jobqueues consuming 1.4g each.


given the push frequency, we're not sure if this is a new issue, or it's a recent change.
As we start caching more and more objects I can imaging it having more stuff cached in memory the longer it runs.  I know we do a bunch of cleanup stuff on each request when handling web pages, but jobqueue isn't running inside a web request, so it's possible that cleanup is never getting done...  and this is almost surely an upstream issue.
Assignee: nobody → email-notifications
Component: General → Email Notifications
Product: bugzilla.mozilla.org → Bugzilla
QA Contact: default-qa
Version: Production → 4.0.9
(Assignee)

Comment 2

4 years ago
the cleanup happens via bug 822547.

i want to investigate this on bmo first, due to the number of extensions.  i will shift upstream if investigations point to it being an upstream issue.
Assignee: email-notifications → glob
Component: Email Notifications → General
Product: Bugzilla → bugzilla.mozilla.org
QA Contact: default-qa
Version: 4.0.9 → Production
(Assignee)

Comment 3

4 years ago
aside from higher baseline memory usage, i don't see bmo's jobqueue.pl consuming memory at a higher rate than trunk's (in fact it was less).

it's hard to tell if it's a memory leak, or just fragmentation due to all the small allocations perl does.  the cached stuff is cleared for each run, and i haven't been able to create a circular reference in the cache.

i think it may be easier to change how jobqueue.pl works..

while (1) {
  if (there are pending jobs in the queue) {
    system "jobqueue.pl -f process_pending_jobs"
  }
  sleep 5
}
Component: General → Bugzilla-General
Product: bugzilla.mozilla.org → Bugzilla
QA Contact: default-qa
Version: Production → 4.0.9
(Assignee)

Comment 4

4 years ago
Created attachment 704786 [details] [diff] [review]
patch v1
Attachment #704786 - Flags: review?(LpSolit)
(Assignee)

Updated

4 years ago
Summary: jobqueue.pl looks like it's leaking memory → change jobqueue.pl to spawn worker processes to deliver bugmail to avoid memory leaks

Comment 5

4 years ago
Comment on attachment 704786 [details] [diff] [review]
patch v1

I'm really not the right person to review this patch. I cannot test jobqueue nor do I use mod_perl.


>=== modified file 'Bugzilla/JobQueue.pm'

> use parent qw(TheSchwartz);
>+use fields qw(_pidfile);

From what http://perldoc.perl.org/base.html says:

"Unless you are using the fields pragma, consider this module discouraged in favor of the lighter-weight parent."

To me, this means that if we use the fields pragma, you cannot use parent.


That's the single comment I can make. :)
Attachment #704786 - Flags: review?(LpSolit) → review?(justdave)
(Assignee)

Comment 6

4 years ago
(In reply to Frédéric Buclin from comment #5)
> I cannot test jobqueue nor do I use mod_perl.

fwiw this isn't related to mod_perl.

i test jobqueue by setting use_mailer_queue to 1, triggering bugmail, and then running |./jobqueue.pl -d -f start|

to test this change, i wrote a simple script which calls MessageToMTA() hundreds of times.

> > use parent qw(TheSchwartz);
> >+use fields qw(_pidfile);
> 
> From what http://perldoc.perl.org/base.html says:
> 
> "Unless you are using the fields pragma, consider this module discouraged in
> favor of the lighter-weight parent."
> 
> To me, this means that if we use the fields pragma, you cannot use parent.

nice catch (although it appears to work fine with "parent").
i've changed "use parent" to "use base" locally, will hold off on uploading a new patch until the current one is reviewed.
Comment on attachment 704786 [details] [diff] [review]
patch v1

Review of attachment 704786 [details] [diff] [review]:
-----------------------------------------------------------------

Nice. Looks fine and works for my testing. Except for what LpSolit mentioned, r=dkl
Attachment #704786 - Flags: review+
(Assignee)

Updated

4 years ago
Flags: approval?
(Assignee)

Comment 8

4 years ago
during some more testing, i realized this patch has problems with stopping.

if a sub-process worker is running, and you ask the daemon to stop, it waits until the worker has finished then stops.  the stop message should cascade from the parent to the worker processes.
Flags: approval?
(Assignee)

Updated

4 years ago
Attachment #704786 - Flags: review?(justdave)
(Assignee)

Comment 9

4 years ago
Created attachment 710780 [details] [diff] [review]
patch v2

this revision allows the parent jobqueue.pl to catch an INT signal and ask the worker jobqueue.pl to stop.

i also changed how the 'once' command works, in order to make it match the 'onepass'.  with this patch, 'once' and 'onepass' creates a pidfile and runs in the background by default.  anyone calling 'once' will have to include '-f' to avoid deamonisation.
Attachment #704786 - Attachment is obsolete: true
Attachment #710780 - Flags: review?(dkl)
Comment on attachment 710780 [details] [diff] [review]
patch v2

Review of attachment 710780 [details] [diff] [review]:
-----------------------------------------------------------------

Only issue I found is that when using /etc/init.d/bugzilla-queue to start/top the daemon on RHEL/CentOS systems, it automatically stores the pidfile as /var/run/bugzilla-queue.pid instead of <cgi_path>/data/bugzilla-queue.pid. Which is fine as long as the service runs as 'root'. If you do not run as 'root' and instead run as 'dkl' or other, the script dies when trying to create /var/run/bugzilla-queue.worker.pid as permission denied. Even though /var/run/bugzilla-queue.pid is owned by dkl.root as configured by the SystemV init scripts. Running jobqueue.pl -d -f start locally from the cgi_path dir works fine as my data dir is owned by dkl.dkl. Maybe not an issue we need to worry about but one none the less since there will not be just a single jobqueue process being started by /etc/init.d/bugzilla-queue when your patch is applied.

r=dkl
Attachment #710780 - Flags: review?(dkl) → review+
(Assignee)

Comment 11

4 years ago
(In reply to David Lawrence [:dkl] from comment #10)
> Only issue I found is that when using /etc/init.d/bugzilla-queue to
> start/top the daemon on RHEL/CentOS systems, it automatically stores the
> pidfile as /var/run/bugzilla-queue.pid instead of
> <cgi_path>/data/bugzilla-queue.pid. Which is fine as long as the service
> runs as 'root'. If you do not run as 'root' and instead run as 'dkl' or
> other, the script dies when trying to create
> /var/run/bugzilla-queue.worker.pid as permission denied. Even though
> /var/run/bugzilla-queue.pid is owned by dkl.root as configured by the
> SystemV init scripts.

this shouldn't be a problem because init.d scripts should always be started by root, not by a normal user:

  print "$initscript installed.",
        " To start the daemon, do \"$dest_file start\" as root.\n";

it's odd that our init.d script creates and chowns the pidfile, instead of leaving that managed by the daemon::generic.
Flags: approval?
(In reply to Byron Jones ‹:glob› from comment #11)
> (In reply to David Lawrence [:dkl] from comment #10)
> > Only issue I found is that when using /etc/init.d/bugzilla-queue to
> > start/top the daemon on RHEL/CentOS systems, it automatically stores the
> > pidfile as /var/run/bugzilla-queue.pid instead of
> > <cgi_path>/data/bugzilla-queue.pid. Which is fine as long as the service
> > runs as 'root'. If you do not run as 'root' and instead run as 'dkl' or
> > other, the script dies when trying to create
> > /var/run/bugzilla-queue.worker.pid as permission denied. Even though
> > /var/run/bugzilla-queue.pid is owned by dkl.root as configured by the
> > SystemV init scripts.
> 
> this shouldn't be a problem because init.d scripts should always be started
> by root, not by a normal user:
> 
>   print "$initscript installed.",
>         " To start the daemon, do \"$dest_file start\" as root.\n";

Yes you need to start/stop the service as root. But as the initiscript is written, the PIDFILE is created as root before the daemon is even started. Then the PIDFILE is passed to the daemon as a parameter for it to use. The daemon itself is ran as a custom user specified by the /etc/sysconfig/bugzilla-queue config file. In my case I had the daemon user set to 'dkl' which cannot create an additional worker PIDFILE in the same dir as the parent jobqueue PIDFILE. Setting the USER=root in /etc/sysconfig/bugzilla-queue of course fixes all this but if we always have to be 'root' the configuration option should be removed or documented.

dkl
(Assignee)

Comment 13

4 years ago
in that case perhaps it would be better to always create the worker's pid file in the data directory.
Comment on attachment 710780 [details] [diff] [review]
patch v2

Changing to r- so that the script can be updated to store the worker pid in the data/ dir instead of /var/run when executed from an initscript.

dkl
Attachment #710780 - Flags: review+ → review-
(Assignee)

Comment 15

4 years ago
Created attachment 711355 [details] [diff] [review]
patch v3

always store the worker's pidfile in bugzilla's data/ directory.
Attachment #710780 - Attachment is obsolete: true
Attachment #711355 - Flags: review?(dkl)
Comment on attachment 711355 [details] [diff] [review]
patch v3

r=dkl
Attachment #711355 - Flags: review?(dkl) → review+
We should add a comment to /etc/sysconfig/bugzilla-queue (generated by Bugzilla::JobQueue::Runner::gd_can_install) and also to the relnotes stating that the value for USER in the bugzilla-queue config file will need to have write permissions to the data/ directory of the Bugzilla installation.

dkl
Keywords: relnote

Updated

4 years ago
Status: NEW → ASSIGNED
Flags: approval?
Flags: approval4.4+
Flags: approval+
Target Milestone: --- → Bugzilla 4.4

Updated

4 years ago
Component: Bugzilla-General → Email Notifications
(Assignee)

Comment 18

4 years ago
Committing to: bzr+ssh://bjones%40mozilla.com@bzr.mozilla.org/bugzilla/trunk/
modified jobqueue.pl
modified Bugzilla/JobQueue.pm
modified Bugzilla/JobQueue/Runner.pm
modified t/011pod.t
Committed revision 8573.

Committing to: bzr+ssh://bjones%40mozilla.com@bzr.mozilla.org/bugzilla/4.4/
modified jobqueue.pl
modified Bugzilla/JobQueue.pm
modified Bugzilla/JobQueue/Runner.pm
Committed revision 8514.
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED

Comment 19

4 years ago
(In reply to David Lawrence [:dkl] from comment #17)
> We should add a comment to /etc/sysconfig/bugzilla-queue (generated by
> Bugzilla::JobQueue::Runner::gd_can_install) and also to the relnotes stating
> that the value for USER in the bugzilla-queue config file will need to have
> write permissions to the data/ directory of the Bugzilla installation.

No need, it already does. So there is no need to relnote anything as there is nothing new here.
Keywords: relnote
OS: Mac OS X → All
Hardware: x86 → All

Comment 20

4 years ago
Which version of Daemon::Generic did you have when you encountered this problem? If your problem is the same as https://rt.cpan.org/Public/Bug/Display.html?id=36413, then we should require Daemon::Generic 0.71.

Comment 21

4 years ago
Err... I pasted the wrong link. I meant this one:

https://rt.cpan.org/Public/Bug/Display.html?id=58287
(Assignee)

Comment 22

4 years ago
(In reply to Frédéric Buclin from comment #21)
> https://rt.cpan.org/Public/Bug/Display.html?id=58287

sorry, but i don't think that issue (hanging on logout/STDIN not detached correctly) is related to jobqueue's memory consumption.

Updated

3 years ago
Blocks: 1006288

Updated

2 months ago
Blocks: 1336410
You need to log in before you can comment on or make changes to this bug.