Closed Bug 922783 (bld-lion-r5-050) Opened 11 years ago Closed 7 years ago

bld-lion-r5-050 problem tracking

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task, P3)

x86_64
macOS

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Unassigned)

References

Details

(Whiteboard: [buildduty][buildslaves][capacity])

ping check failing, trying a pdu reboot
Back in production.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Re-image requested.
Depends on: 958648
Back in production.
Status: REOPENED → RESOLVED
Closed: 11 years ago10 years ago
Resolution: --- → FIXED
https://tbpl.mozilla.org/php/getParsedLog.php?id=33932642&tree=Mozilla-Inbound (and, by the time anyone reboots it, hundreds of others)

rm: tools: Input/output error

Might wants some diagnostics on that disk.

Disabled in slavealloc, despite the fact that won't do anything.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 974068
enabled after diags in Bug 974068
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
Perma-retrying; disabled in slavealloc

https://tbpl.mozilla.org/php/getParsedLog.php?id=38008207&tree=Mozilla-Inbound

rm: tools/trychooser/index.html: Invalid argument
rm: tools/trychooser/jquery-ui.css: Invalid argument
rm: tools/trychooser/jquery.min.js: Invalid argument
rm: tools/trychooser/trychooser.css: Invalid argument
rm: tools/trychooser/trychooser.js: Invalid argument
rm: tools/trychooser/tryload.js: Invalid argument
rm: tools/trychooser: Directory not empty
rm: tools: Directory not empty
program finished with exit code 1
elapsedTime=2086.710640
========= Finished clobber build tools failed (results: 5, elapsed: 34 mins, 46 secs) (at 2014-04-17 06:04:57.571328) =========
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 997977
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
QA Contact: armenzg → bugspam.Callek
Resolution: --- → FIXED
Attempting SSH reboot...Failed.
Attempting PDU reboot...Failed.
Filed IT bug for reboot (bug 1067698)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
Being the only slave that hits bug 1130905 sounds more than fishy to me.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The fact that we continue to allow mozharness tests to just fill up slave disks with trash until they can no longer run, and then reimage the slave to allow them to run again, seems more than fishy to me.

Nevertheless, temporarily reenabled this slave to get it a job which runs tests, to see whether the mozharness fix from bug 1130905 will cause it to properly output the old message which will tell us that the test has completely crapped it up and it has to be reimaged.
So Jordan and I debugged that a bit and found the following problem:

[root@bld-lion-r5-050.build.releng.scl3.mozilla.com ~]# ls -la /Volumes/Stub
total 0
d--x--x--x  3 cltbld  admin  102  8 Feb 14:40 .
drwxrwxrwt@ 4 root    admin  136 11 Feb 13:36 ..
drwx------  4 root    admin  136  8 Feb 14:40 .Spotlight-V100

This folder has been mounted by root so it's clear that our build process which is not running as root, cannot mount anything.

Also as Jordan wrote on IRC it was not a current mount:

<jlund|buildduty> I don't think this is a mount
<jlund|buildduty> hdiutil: eject failed - No such file or directory
<jlund|buildduty> tries rm -rf
<jlund|buildduty> [root@bld-lion-r5-050.build.releng.scl3.mozilla.com Volumes]# rm -rf Stub
<jlund|buildduty> success

So the mozinstall problem should be done now. If it manifests again please let us know immediately. In such a case it would be good to know how this mount was initially mounted and from whom.
re-enabling. cc'n myself to check back before closing
Flags: needinfo?(jlund)
last 7 jobs are green
Status: REOPENED → RESOLVED
Closed: 10 years ago9 years ago
Flags: needinfo?(jlund)
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Depends on: 1263641
Resolution: FIXED → ---
Back to the pool after re-image.
Status: REOPENED → RESOLVED
Closed: 9 years ago8 years ago
Resolution: --- → FIXED
Back online after been re-image.
Depends on: 1333145
This machine appears to be having memory issues:
build-script-build(56444,0x102004000) malloc: *** error for object 0x7f9c2c01c608: incorrect checksum for freed object - object was probably modified after being freed.
from
https://treeherder.mozilla.org/logviewer.html#?job_id=114117969&repo=comm-central&lineNumber=2594
Status: RESOLVED → UNCONFIRMED
Ever confirmed: false
Resolution: FIXED → ---
Re-imaged and back online
Status: UNCONFIRMED → RESOLVED
Closed: 8 years ago7 years ago
Resolution: --- → FIXED
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.