Closed
Bug 650801
Opened 13 years ago
Closed 13 years ago
Cannot allocate memory errors on buildbot masters
Categories
(Release Engineering :: General, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: catlee, Assigned: catlee)
References
Details
Attachments
(3 files)
2.73 KB,
patch
|
dustin
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
1.08 KB,
patch
|
dustin
:
review+
|
Details | Diff | Splinter Review |
254.40 KB,
image/png
|
Details |
Our masters have been hitting this exception on occasion: 2011-04-18 07:20:08-0700 [Broker,7059,10.12.50.71] Running ['/builds/buildbot/tests_master5/bin/python', '/builds/buildbot/tests_master5/lib/python2.6/site-packages/buildbotcustom/bin/log_uploader.py', 'stage.mozilla.org', '-u', 'ffxbld', '-i', '/home/cltbld/.ssh/ffxbld_dsa', '-b', 'mozilla-aurora', '-p', 'macosx64', '././mozilla-aurora_snowleopard-debug_test-mochitests-4', '4'] 2011-04-18 07:20:08-0700 [Broker,7059,10.12.50.71] Error running ['/builds/buildbot/tests_master5/bin/python', '/builds/buildbot/tests_master5/lib/python2.6/site-packages/buildbotcustom/bin/log_uploader.py', 'stage.mozilla.org', '-u', 'ffxbld', '-i', '/home/cltbld/.ssh/ffxbld_dsa', '-b', 'mozilla-aurora', '-p', 'macosx64', '././mozilla-aurora_snowleopard-debug_test-mochitests-4', '4'] 2011-04-18 07:20:08-0700 [Broker,7059,10.12.50.71] Log output: 2011-04-18 07:20:08-0700 [Broker,7059,10.12.50.71] Unhandled Error Traceback (most recent call last): File "/tools/python-2.6.5/lib/python2.6/threading.py", line 484, in run self.__target(*self.__args, **self.__kwargs) File "/builds/buildbot/tests_master5/lib/python2.6/site-packages/twisted/python/threadpool.py", line 207, in _worker result = context.call(ctx, function, *args, **kwargs) File "/builds/buildbot/tests_master5/lib/python2.6/site-packages/twisted/python/context.py", line 59, in callWithContext return self.currentContext().callWithContext(ctx, func, *args, **kw) File "/builds/buildbot/tests_master5/lib/python2.6/site-packages/twisted/python/context.py", line 37, in callWithContext return func(*args,**kw) --- <exception caught here> --- File "/builds/buildbot/tests_master5/lib/python2.6/site-packages/buildbotcustom/status/log_handlers.py", line 95, in handleLogs subprocess.check_call(cmd, stdout=output, stderr=subprocess.STDOUT) File "/tools/python-2.6.5/lib/python2.6/subprocess.py", line 493, in check_call retcode = call(*popenargs, **kwargs) File "/tools/python-2.6.5/lib/python2.6/subprocess.py", line 480, in call return Popen(*popenargs, **kwargs).wait() File "/tools/python-2.6.5/lib/python2.6/subprocess.py", line 633, in __init__ errread, errwrite) File "/tools/python-2.6.5/lib/python2.6/subprocess.py", line 1049, in _execute_child self.pid = os.fork() exceptions.OSError: [Errno 12] Cannot allocate memory
Assignee | ||
Comment 1•13 years ago
|
||
I wonder if this could be somehow related to having too many threads running? ps -eLf | grep tests_master6 | wc -l 172 Each SubprocessLogHandler has its own thread pool. Maybe they should all use reactor.callInThread instead of maintaining their own pool.
Updated•13 years ago
|
Assignee: nobody → catlee
Assignee | ||
Updated•13 years ago
|
Priority: -- → P2
Updated•13 years ago
|
Summary: Cannot allocate memory errors → Cannot allocate memory errors on buildbot masters
Assignee | ||
Comment 2•13 years ago
|
||
Use the global thread pool instead of creating one thread pool per log handler.
Attachment #526970 -
Flags: review?(dustin)
Comment 3•13 years ago
|
||
Comment on attachment 526970 [details] [diff] [review] stop using individual thread pools Looks good, although I don't see why you changed SubprocessLogHandler?
Attachment #526970 -
Flags: review?(dustin) → review+
Assignee | ||
Comment 4•13 years ago
|
||
(In reply to comment #3) > Comment on attachment 526970 [details] [diff] [review] > --> https://bugzilla.mozilla.org/attachment.cgi?id=526970 > stop using individual thread pools > > Looks good, although I don't see why you changed SubprocessLogHandler? Needed to get rid of the now unused 'size' parameter.
Assignee | ||
Comment 5•13 years ago
|
||
Comment on attachment 526970 [details] [diff] [review] stop using individual thread pools http://hg.mozilla.org/build/buildbotcustom/rev/5fd234770d41
Attachment #526970 -
Flags: checked-in+
Assignee | ||
Updated•13 years ago
|
Flags: needs-reconfig?
Assignee | ||
Updated•13 years ago
|
Flags: needs-reconfig? → needs-reconfig+
Assignee | ||
Comment 6•13 years ago
|
||
So it looks like this is actually caused by....not enough memory!
I don't have exact details, but a quick test of this script reproduces the same failure on my machine (with all swap disabled):
<<<
import os
big = "s"
while True:
print len(big)
if os.fork() == 0:
break
os.wait()
big = big * 2
>>>
when len(big) == 1073741824, I have over 500 MB of ram free on my machine yet the fork fails. I'm guessing that the kernel is making sure there would be enough room in memory for a process with the same resident set size as the calling process. Using vfork() here may help, but python doesn't expose that system call.
Adding more swap space does count as adding more memory, and so could be a good short term solution to this problem.
Assignee | ||
Comment 7•13 years ago
|
||
With the json web status disabled, an idle master maxes out at 396,376 MB according to 'ps -o rss -p <pid>' With the json web status enabled, we max out at 11,673,00 MB.
Attachment #529205 -
Flags: review?(dustin)
Comment 8•13 years ago
|
||
Comment on attachment 529205 [details] [diff] [review] Die json! > 11,673,00 MB. I think you mean 1,167,300 MB - not 11GB :) Still, a sizeable savings!
Attachment #529205 -
Flags: review?(dustin) → review+
Comment 9•13 years ago
|
||
For posterity here is a screenshot of a week long memory usage by the masters [1] that currently have this patch deployed since Saturday. The memory usage dropped in all of them by different amounts and some of the masters have gone back up but not as far as they were originally. I am curious to see what this will look like after this week's business days come into play.
Assignee | ||
Updated•13 years ago
|
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•