Closed Bug 795893 (b-linux64-hp-0028) Opened 12 years ago Closed 10 years ago

b-linux64-hp-0028 problem tracking

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task, P3)

x86_64
Linux

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: rail, Unassigned)

References

Details

(Whiteboard: [buildduty][buildslaves][capacity])

Per slavealloc it's "firmware patched, out of rotation for bug 779487". The host itself and its PDU (bld-centos6-hp-009-mgmt.build.mozilla.org) isn't responding.
Depends on: 807381
Loaned to dgherman in bug 708381 for crazy testing.
(In reply to Ben Hearsum [:bhearsum] from comment #1)
> Loaned to dgherman in bug 708381 for crazy testing.

Oops, meant bug 807381.
Now it's back, taking android builds and failing them with

INFO: copying /home/cltbld/.android to /builds/mock_mozilla/mozilla-centos6-i386/root/builds/.android
ERROR: [Errno 2] No such file or directory: '/home/cltbld/.android'
Traceback (most recent call last):
  File "/usr/sbin/mock_mozilla", line 862, in <module>
    main(retParams)
  File "/usr/sbin/mock_mozilla", line 823, in main
    shutil.copy(src, dest)
  File "/usr/lib64/python2.6/shutil.py", line 84, in copy
    copyfile(src, dst)
  File "/usr/lib64/python2.6/shutil.py", line 50, in copyfile
    with open(src, 'rb') as fsrc:
IOError: [Errno 2] No such file or directory: '/home/cltbld/.android'
disabled in slavealloc
It seems to have reenabled itself, and is happily burning builds that it can't upload.
Callek says he redisabled it.
arr, do we have a reference image/machine for this class of machines ?
New machines don't have reference images.  They are completely managed via puppet once they do a basic kickstart.
So, I confirmed it is puppetizing correctly, I also noticed that new-puppet has no ref to .android right now, while the old centos5 puppet does: http://mxr.mozilla.org/build/source/puppet-manifests/os/centos.pp#73

I suspect this was intentional (for the same way new puppet doesn't contain ssh keys) but I can't find any docs that say to add these files, rail any insight (being someone involved with setting up ec2 slaves which are based on this same puppet image)
Flags: needinfo?(rail)
So, I confirmed it is puppetizing correctly, I also noticed that new-puppet has no ref to .android right now, while the old centos5 puppet does: http://mxr.mozilla.org/build/source/puppet-manifests/os/centos.pp#73

I suspect this was intentional (for the same way new puppet doesn't contain ssh keys) but I can't find any docs that say to add these files, rail any insight (being someone involved with setting up ec2 slaves which are based on this same puppet image)
We don't manage the slave secrets yet, that's bug 792836. You need to copy ssh keys and android stuff (.android and .mozpass.cfg) from another slave.
Flags: needinfo?(rail)
Enabled in slavealloc after I verified that it did, indeed not have ~cltbld/.android/ and ~cltbld/.mozpass.cfg
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Buildbot is not running on this host, hasn't been since this morning...

2013-02-04 07:50:29-0800 [Broker,client]   argv: ['python', 'tools/buildfarm/maintenance/count_and_reboot.py', '-
f', '../reboot_count.txt', '-n', '1', '-z']
2013-02-04 07:50:29-0800 [Broker,client]  environment: {'LANG': 'en_US.UTF-8', 'CCACHE_HASHDIR': '', 'TERM': 'lin
ux', 'SHELL': '/bin/bash', 'SHLVL': '1', 'HOSTNAME': 'bld-centos6-hp-009.build.scl1.mozilla.com', 'G_BROKEN_FILEN
AMES': '1', 'HISTSIZE': '1000', 'HISTCONTROL': 'ignoredups', 'PWD': '/builds/slave/m-cen-lnx-l10n-ntly', 'LOGNAME
': 'cltbld', 'USER': 'cltbld', 'MAIL': '/var/spool/mail/cltbld', 'PATH': '/usr/local/bin:/usr/lib64/ccache:/usr/l
ocal/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/cltbld/bin', 'LESSOPEN': '|/usr/bin/lesspipe.sh %s',
 'HOME': '/home/cltbld', '_': '/tools/buildbot/bin/python'}
2013-02-04 07:50:29-0800 [Broker,client]   using PTY: False
2013-02-04 07:50:37-0800 [-] Received SIGTERM, shutting down.
2013-02-04 07:50:37-0800 [-] stopCommand: halting current command <buildslave.commands.shell.SlaveShellCommand in
stance at 0x29a53b0>
2013-02-04 07:50:37-0800 [-] command interrupted, attempting to kill
2013-02-04 07:50:37-0800 [-] trying to kill process group 53952
2013-02-04 07:50:37-0800 [-]  signal 9 sent successfully
2013-02-04 07:50:37-0800 [Broker,client] lost remote
...
2013-02-04 07:50:37-0800 [Broker,client] lost remote step
2013-02-04 07:50:37-0800 [Broker,client] Lost connection to buildbot-master13.build.scl1.mozilla.com:9001
2013-02-04 07:50:37-0800 [Broker,client] Stopping factory <buildslave.bot.BotFactory instance at 0x2a24c68>
2013-02-04 07:50:37-0800 [-] Main loop terminated.
2013-02-04 07:50:37-0800 [-] Server Shut Down.
[cltbld@bld-centos6-hp-009 ~]$ date
Mon Feb  4 10:39:12 PST 2013
[cltbld@bld-centos6-hp-009 ~]$ uptime
 10:39:15 up  2:47,  1 user,  load average: 0.00, 0.00, 0.00

Which means it didn't come up right somehow. (went down due to normal reboot)

[cltbld@bld-centos6-hp-009 ~]$ facter fqdn
bld-centos6-hp-009.build.scl1.mozilla.com

Puppet Dashboard says its last run was successful (2013-02-04 07:34 PST) ---

A fresh manual reboot fixed it
Product: mozilla.org → Release Engineering
Depends on: 939408
No space left on device. Disabled in slavealloc.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 1001518
This got re-enabled by someone on April 25.
Status: REOPENED → RESOLVED
Closed: 12 years ago10 years ago
Resolution: --- → FIXED
Alias: bld-centos6-hp-009 → b-linux64-hp-0028
Summary: bld-centos6-hp-009 problem tracking → b-linux64-hp-0028 problem tracking
https://tbpl.mozilla.org/php/getParsedLog.php?id=47948225&tree=Mozilla-Aurora

Error: unable to free 20.00 GB of space. Free space only 18.66 GB

Disabled.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Cleaned up for chemspills, re-enabled.
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
Please do not re-enable this slave. We are retiring linux hardware build slaves in bug 1106922.
Blocks: 1106922
Resolution: FIXED → WONTFIX
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.