Closed Bug 1296032 Opened 8 years ago Closed 5 years ago

Investigate the use of Apache2::SizeLimit->set_check_interval($interval) to increase performance

Categories

(bugzilla.mozilla.org :: General, task)

Production
task
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: dylan, Assigned: dylan)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

According to the docs:

> Since checking the process size can take a few system calls on some platforms 
> (e.g. linux), you may not want to check the process size for every request.

Originating this in BMO, but this would migrate to upstream if it pans out. Note that the memory leak fixes need to get to upstream too.
Taking less time to calculate memory usage means the web-worker can get back to serving requests faster. Depending on how long this takes, it is possible it will increase the performance of BMO.
Blocks: bmo-slow
100 would be a good safe starting value based on the size limit logs.
Assignee: nobody → dylan
Attached patch 1296032_2.patchSplinter Review
This is a pretty safe number -- most processes survive for at least this long.
Attachment #8856712 - Flags: review?(glob)
Comment on attachment 8856712 [details] [diff] [review]
1296032_2.patch

Review of attachment 8856712 [details] [diff] [review]:
-----------------------------------------------------------------

::: mod_perl.pl
@@ +66,4 @@
>      $limit = 400_000;
>  }
>  Apache2::SizeLimit->set_max_unshared_size($limit);
> +Apache2::SizeLimit->set_check_interval(100);

when i had an eye on this, which was before your excellent memory leak reduction work, it wasn't unusual for a request to prematurely blow its memory budget after handling a small number requests - generally after processing a search request that returned a large bug list.

do you have any data on the shortest lived processes?
what are the time cost savings here?
what will the impact be on the servers if they exceed the size limit long before their 100th request?
It is hard to get a feel for how much "time" this costs -- as the time is taken after
a request is complete. It is the time between a request completing and the worker being ready for another request.


Out of 8741 instances of the size limiter running, 1000 will be after fewer than 100 requests.
Out of that same number, only 98 happen after 10 requests.
The lowest value is 2, so:

It is safe to set this to 2,
and it may be safe to set it to 10,
and if we offload all static requests to web6, we could use a very high value indeed.

 egrep -o REQUESTS='[0-9]+' 2017-04-*-sizelimit.log|awk -F'=' '{print $2}'|st
N       min     max     sum	        mean    stddev
8741    2       1520    2.1777e+06	249.136	191.308
Comment on attachment 8856712 [details] [diff] [review]
1296032_2.patch

Review of attachment 8856712 [details] [diff] [review]:
-----------------------------------------------------------------

> Out of 8741 instances of the size limiter running, 1000 will be [killed] fewer than 100 requests.

that's 11.5%.  this makes me feel that setting check_interval to 100 is too high.

> Out of that same number, only 98 happen after 10 requests.

i'd be happier with 10, with a note to the MOC to ensure they diligently report any memory issues on the webheads so that value can be lowered if required.
Attachment #8856712 - Flags: review?(glob) → review-
Type: defect → task
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: