Closed Bug 576601 Opened 14 years ago Closed 13 years ago

Add more memory to PM01/03

Categories

(Release Engineering :: General, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: jhford, Unassigned)

Details

(Whiteboard: [buildmasters])

Attachments

(2 files)

This morning (July 2, 2010) PM01 ran out of memory [1].   This master has 3GB of ESX memory and 509mb of swap.  As a short term solution I have created a 2GB swap file (/builds/swapfile) on pm01 and pm03 and added it to /etc/fstab.  Longer term, we should run our production masters on 64bit os and have more than 4gb of physical ram.

before the swap file, pm01 had:
             total       used       free     shared    buffers     cached
Mem:          3043       2567        476          0         53         71
-/+ buffers/cache:       2442        600
Swap:          509        509

I created the swap file with:

dd if=/dev/zero of=/builds/swapfile bs=$((1024 * 1024)) count=$((2 * 1024))
mkswap /builds/swapfile
echo '/builds/swapfile swap swap defaults 0 0' >> /etc/fstab




[1] - The following exceptions (total 1) were detected on production-master01.build.mozilla.org pm01:

Exception in /builds/buildbot/builder_master1/twistd.log.2:
2010-07-02 08:15:12-0700 [Broker,14677,10.250.49.159] Unhandled Error
        Traceback (most recent call last):
          File "/tools/buildbot/lib/python2.6/site-packages/buildbot-0.8.0-py2.6.egg/buildbot/status/builder.py", line 1352, in buildFinished
            w.callback(self)
          File "/tools/buildbot/lib/python2.6/site-packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py", line 238, in callback
            self._startRunCallbacks(result)
          File "/tools/buildbot/lib/python2.6/site-packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py", line 307, in _startRunCallbacks
            self._runCallbacks()
          File "/tools/buildbot/lib/python2.6/site-packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py", line 323, in _runCallbacks
            self.result = callback(self.result, *args, **kw)
        --- <exception caught here> ---
          File "/tools/buildbot/lib/python2.6/site-packages/buildbot-0.8.0-py2.6.egg/buildbot/status/builder.py", line 2003, in _buildFinished
            w.buildFinished(name, s, results)
          File "/tools/buildbot/lib/python2.6/site-packages/buildbot-0.8.0-py2.6.egg/buildbot/status/mail.py", line 356, in buildFinished
            return self.buildMessage(name, build, results)
          File "/tools/buildbot/lib/python2.6/site-packages/buildbot-0.8.0-py2.6.egg/buildbot/status/tinderbox.py", line 238, in buildMessage
            logEncoding = "base64"
        exceptions.MemoryError:
OS: Mac OS X → All
Do you know if it was the builder or scheduler master process that had grown very large on pm01 ? 

And doesn't this just work around the fact that buildbot is growing without bound or has some sort of leak ? Is an upstream ticket filed ?
The small jump in Week 24 Day 3 is probably m-1.9.1 and m-1.9.2 moving the buildbot 0.8.0. 

The steady increase that stared on Week 24 Day 4 is probably m-c moving to buildbot 0.8.0.
A big jump here on Week 24 Day 4, which I think is Thu June 17th. m-c moved to buildbot 0.8.0 that day, going by the pushes to buildbot-configs.

It looks like the builder master was restarted at 12:53 PDT today but memory and swap usage is still high. Current memory usage:
 type     scheduler      builder
 resident   193M          1.3G      
 swap       1.3G           43M

So restarting the scheduler would be the big win here. Need a downtime to not miss pushes/sendchanges ?
pushing up the daisies.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: