Closed Bug 982440 Opened 10 years ago Closed 10 years ago

releng jenkins is down

Categories

(Release Engineering :: General, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Unassigned)

Details

Lots of messages in /var/log/kern.1 on March 7 about oomkill, and no jenkins process running.

Doesn't want to start up again:
 $ sudo start jenkins
 sudo: unable to resolve host ip-10-134-48-37
 start: Job failed to start

/var/log/upstart/jekins.log has
 /proc/self/fd/9: 4: /proc/self/fd/9: /bin/maintain-plugins.sh: not found

Possibly
 http://stackoverflow.com/questions/10133964/jenkins-failed-to-start-in-linux
I've edited /etc/init/jenkins.conf to comment out the call to maintain_plugins.sh, and jenkins is back up. Leaving open for the experts to cross check.
Promptly oomed again, killing off cc1 processes (compiling for buildbot instances in test-masters.sh ?).

First instance was Mar 7 07:17:02 UTC, after:

Mar 7, 2014 7:11:31 AM hudson.triggers.SCMTrigger$Runner run
INFO: SCM changes detected in tools_tests. Triggering  #572
Mar 7, 2014 7:11:33 AM hudson.triggers.SCMTrigger$Runner run
INFO: SCM changes detected in buildbot-configs_tests. Triggering  #1731
Mar 7, 2014 7:16:40 AM hudson.triggers.SCMTrigger$Runner run
INFO: SCM changes detected in Balrog. Triggering  #34
Mar 7, 2014 7:16:40 AM hudson.triggers.SCMTrigger$Runner run
INFO: SCM changes detected in Balrog Snippet Comparison. Triggering  #52

Trying turning the "# of executors" down to 1.
And oomed again in buildbot-configs_test, it's definitely related to test-masters.sh. Maybe fallout from 
 http://hg.mozilla.org/build/tools/rev/272a20b083d0
adding another master in parallel ? I'm grasping .... halp!
I added some swap and changed the instance type to m3.medium.
Been fine since comment #4. Reopen if I'm telling porkie pies.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.