Closed Bug 982440 Opened 12 years ago Closed 12 years ago

releng jenkins is down

Categories

(Release Engineering :: General, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Unassigned)

Details

Lots of messages in /var/log/kern.1 on March 7 about oomkill, and no jenkins process running. Doesn't want to start up again: $ sudo start jenkins sudo: unable to resolve host ip-10-134-48-37 start: Job failed to start /var/log/upstart/jekins.log has /proc/self/fd/9: 4: /proc/self/fd/9: /bin/maintain-plugins.sh: not found Possibly http://stackoverflow.com/questions/10133964/jenkins-failed-to-start-in-linux
I've edited /etc/init/jenkins.conf to comment out the call to maintain_plugins.sh, and jenkins is back up. Leaving open for the experts to cross check.
Promptly oomed again, killing off cc1 processes (compiling for buildbot instances in test-masters.sh ?). First instance was Mar 7 07:17:02 UTC, after: Mar 7, 2014 7:11:31 AM hudson.triggers.SCMTrigger$Runner run INFO: SCM changes detected in tools_tests. Triggering #572 Mar 7, 2014 7:11:33 AM hudson.triggers.SCMTrigger$Runner run INFO: SCM changes detected in buildbot-configs_tests. Triggering #1731 Mar 7, 2014 7:16:40 AM hudson.triggers.SCMTrigger$Runner run INFO: SCM changes detected in Balrog. Triggering #34 Mar 7, 2014 7:16:40 AM hudson.triggers.SCMTrigger$Runner run INFO: SCM changes detected in Balrog Snippet Comparison. Triggering #52 Trying turning the "# of executors" down to 1.
And oomed again in buildbot-configs_test, it's definitely related to test-masters.sh. Maybe fallout from http://hg.mozilla.org/build/tools/rev/272a20b083d0 adding another master in parallel ? I'm grasping .... halp!
I added some swap and changed the instance type to m3.medium.
Been fine since comment #4. Reopen if I'm telling porkie pies.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.