This morning (July 2, 2010) PM01 ran out of memory . This master has 3GB of ESX memory and 509mb of swap. As a short term solution I have created a 2GB swap file (/builds/swapfile) on pm01 and pm03 and added it to /etc/fstab. Longer term, we should run our production masters on 64bit os and have more than 4gb of physical ram. before the swap file, pm01 had: total used free shared buffers cached Mem: 3043 2567 476 0 53 71 -/+ buffers/cache: 2442 600 Swap: 509 509 I created the swap file with: dd if=/dev/zero of=/builds/swapfile bs=$((1024 * 1024)) count=$((2 * 1024)) mkswap /builds/swapfile echo '/builds/swapfile swap swap defaults 0 0' >> /etc/fstab  - The following exceptions (total 1) were detected on production-master01.build.mozilla.org pm01: Exception in /builds/buildbot/builder_master1/twistd.log.2: 2010-07-02 08:15:12-0700 [Broker,14677,10.250.49.159] Unhandled Error Traceback (most recent call last): File "/tools/buildbot/lib/python2.6/site-packages/buildbot-0.8.0-py2.6.egg/buildbot/status/builder.py", line 1352, in buildFinished w.callback(self) File "/tools/buildbot/lib/python2.6/site-packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py", line 238, in callback self._startRunCallbacks(result) File "/tools/buildbot/lib/python2.6/site-packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py", line 307, in _startRunCallbacks self._runCallbacks() File "/tools/buildbot/lib/python2.6/site-packages/Twisted-9.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py", line 323, in _runCallbacks self.result = callback(self.result, *args, **kw) --- <exception caught here> --- File "/tools/buildbot/lib/python2.6/site-packages/buildbot-0.8.0-py2.6.egg/buildbot/status/builder.py", line 2003, in _buildFinished w.buildFinished(name, s, results) File "/tools/buildbot/lib/python2.6/site-packages/buildbot-0.8.0-py2.6.egg/buildbot/status/mail.py", line 356, in buildFinished return self.buildMessage(name, build, results) File "/tools/buildbot/lib/python2.6/site-packages/buildbot-0.8.0-py2.6.egg/buildbot/status/tinderbox.py", line 238, in buildMessage logEncoding = "base64" exceptions.MemoryError:
Do you know if it was the builder or scheduler master process that had grown very large on pm01 ? And doesn't this just work around the fact that buildbot is growing without bound or has some sort of leak ? Is an upstream ticket filed ?
Created attachment 455825 [details] memory usage on pm03 for the last month The small jump in Week 24 Day 3 is probably m-1.9.1 and m-1.9.2 moving the buildbot 0.8.0. The steady increase that stared on Week 24 Day 4 is probably m-c moving to buildbot 0.8.0.
Created attachment 455827 [details] memory usage on pm01 for the last month A big jump here on Week 24 Day 4, which I think is Thu June 17th. m-c moved to buildbot 0.8.0 that day, going by the pushes to buildbot-configs. It looks like the builder master was restarted at 12:53 PDT today but memory and swap usage is still high. Current memory usage: type scheduler builder resident 193M 1.3G swap 1.3G 43M So restarting the scheduler would be the big win here. Need a downtime to not miss pushes/sendchanges ?
pushing up the daisies.