Closed
Bug 795795
(bld-lion-r5-052)
Opened 12 years ago
Closed 7 years ago
bld-lion-r5-052 problem tracking
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: nthomas, Unassigned)
References
Details
(Whiteboard: [problemtracking])
Nagios says PING OK but not responsive to ssh or other nagios checks on buildbot or disk. Power cycled on the PDU.
Reporter | ||
Comment 1•12 years ago
|
||
Back online.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Comment 2•12 years ago
|
||
Blew up this morning - the two examples I have from its last fourteen failed runs are https://tbpl.mozilla.org/php/getParsedLog.php?id=16627221&tree=Fx-Team and https://tbpl.mozilla.org/php/getParsedLog.php?id=16641088&tree=Mozilla-Inbound where both creating and removing files gets a blank stare and an "invalid argument."
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 3•12 years ago
|
||
Time for a re-image. Pulled it out of the pool.
Comment 4•12 years ago
|
||
Back in the production pool. Fingers crossed.
Updated•12 years ago
|
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Comment 5•12 years ago
|
||
Machine is dead again and I can't reboot it via the pdu.
Comment 6•12 years ago
|
||
This host is in scl3, we can get it back up next trip to scl3. Please open a separate bug for scl3 reboots.
Thanks.
Comment 7•12 years ago
|
||
Reboot was a success, but the patient still should be dead:
https://tbpl.mozilla.org/php/getParsedLog.php?id=16999342&tree=Mozilla-Inbound
tar: FirefoxNightlyDebug.app/Contents/MacOS/webapprt/components: Write error
Comment 8•12 years ago
|
||
Conveniently, according to buildapi and nagios, it did die after burning two jobs.
Comment 9•12 years ago
|
||
Then someone brought it back, and it burned https://tbpl.mozilla.org/php/getParsedLog.php?id=17114143&tree=Mozilla-Inbound
Comment 10•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=17120542&tree=Mozilla-Inbound
IOError: [Errno 5] Input/output error still mostly smells like disk, though exercising bad memory while writing or reading will smell that way when it's really RAM.
Comment 11•12 years ago
|
||
disabled in slavealloc
Reporter | ||
Comment 12•12 years ago
|
||
I've rebooted this slave in the hopes it'll stop flapping on nagios' buildbot check.
Updated•12 years ago
|
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 13•12 years ago
|
||
This slave hasn't done any jobs since 2012-11-16, per comment #11. Since then it's been reimaged in bug 827845. I'm not confident that will have fixed up the issues, because bug 807692 was an even earlier reimage. If we do want to try it again then puppet needs fixing.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 14•12 years ago
|
||
Reimaged after diagnostics found corrupted files.
Puppeted up, added keys, reenabled, rebooted.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Comment 15•12 years ago
|
||
Not sure why, but this slave didn't make it back to the production pool.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 16•12 years ago
|
||
Apr 3 07:06:20 bld-lion-r5-052 org.mozilla.build.buildslave[467]: Error sending notice to nagios (ignored)
Apr 3 07:06:20 bld-lion-r5-052 org.mozilla.build.buildslave[467]: dyld: Library not loaded: @executable_path/../.Python
Apr 3 07:06:20 bld-lion-r5-052 org.mozilla.build.buildslave[467]: Referenced from: /tools/buildbot-0.8.4-pre-moz2/bin/python
Apr 3 07:06:20 bld-lion-r5-052 org.mozilla.build.buildslave[467]: Reason: image not found
Apr 3 07:06:20 bld-lion-r5-052 com.apple.launchd.peruser.501[179] (org.mozilla.build.buildslave[467]): Exited with code: 251
Apr 3 07:06:20 bld-lion-r5-052 ReportCrash[469]: Saved crash report for python[468] version ??? (???) to /Users/cltbld/Library/Logs/DiagnosticReports/python_2013-04-03-070620_bld-lion-r5-052.crash
Comment 17•12 years ago
|
||
http://debugfix.com/2011/11/dyld-library-loaded-executable_path-python/ says:
"You’re trying to use a virtual environment created on a different computer, or you’ve upgraded / reformated your machine and you’re using the virtenv from your backup of the old machine."
It also says the solution is to recreate the virtualenv. This doesn't sound right....makes me think that the machine got the wrong image or something. The reimaging did happen over 2 months ago, let's try again....
Comment 18•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=21434464&tree=Mozilla-Inbound (disconnect while cloning) - is this in production while being worked on?
Comment 19•12 years ago
|
||
(In reply to Phil Ringnalda (:philor) from comment #18)
> https://tbpl.mozilla.org/php/getParsedLog.php?id=21434464&tree=Mozilla-
> Inbound (disconnect while cloning) - is this in production while being
> worked on?
Coop, you were poking at this.
Flags: needinfo?(coop)
Comment 20•12 years ago
|
||
(In reply to Phil Ringnalda (:philor) from comment #18)
> https://tbpl.mozilla.org/php/getParsedLog.php?id=21434464&tree=Mozilla-
> Inbound (disconnect while cloning) - is this in production while being
> worked on?
Not on purpose. Looks like it was re-enabled in slavealloc before I had fixed the auto-login issue, but not by me. Was it enabled the whole time?
Flags: needinfo?(coop)
Updated•12 years ago
|
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Comment 21•12 years ago
|
||
(In reply to Chris Cooper [:coop] from comment #20)
> (In reply to Phil Ringnalda (:philor) from comment #18)
> > https://tbpl.mozilla.org/php/getParsedLog.php?id=21434464&tree=Mozilla-
> > Inbound (disconnect while cloning) - is this in production while being
> > worked on?
>
> Not on purpose. Looks like it was re-enabled in slavealloc before I had
> fixed the auto-login issue, but not by me. Was it enabled the whole time?
Looks like I either forgot to disable it, or re-enabled it before discovering the python issue. Sorry!
Comment 22•12 years ago
|
||
Two consecutive checktest runs have failed on this slave.
https://tbpl.mozilla.org/php/getParsedLog.php?id=21558666&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=21555506&tree=Mozilla-Inbound
Comment 23•12 years ago
|
||
disabled in slavealloc
Reporter | ||
Updated•12 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 24•12 years ago
|
||
Going to try this one out in staging for a bit:
http://dev-master01.build.scl1.mozilla.com:8044/buildslaves/bld-lion-r5-052
Comment 25•12 years ago
|
||
lipo: can't write to output file: obj-firefox/i386/dist/universal/test-package-stage/xpcshell/tests/xpcom/tests/unit/TestTimers (Bad address)
/builds/slave/m-cen-osx64-000000000000000000/build/build/macosx/universal/unify: lipo create fat failed for:
obj-firefox/i386/dist/test-package-stage/xpcshell/tests/xpcom/tests/unit/TestTimers,
obj-firefox/x86_64/dist/test-package-stage/xpcshell/tests/xpcom/tests/unit/TestTimers,
obj-firefox/i386/dist/universal/test-package-stage/xpcshell/tests/xpcom/tests/unit/TestTimers
Can't exec "rm": Input/output error at /builds/slave/m-cen-osx64-000000000000000000/build/build/macosx/universal/unify line 265.
That's...not good.
Updated•12 years ago
|
Whiteboard: [buildduty][buildslaves][capacity] → [buildduty][buildslaves][capacity][needs diagnostics]
Comment 26•12 years ago
|
||
Hard drive has been replaced in bug 885396. Back in production.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
Comment 27•11 years ago
|
||
Attempting SSH reboot...Failed.
Attempting PDU reboot...Failed.
Filed IT bug for reboot (bug 1055444)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•11 years ago
|
Status: REOPENED → RESOLVED
Closed: 12 years ago → 11 years ago
Resolution: --- → FIXED
Comment 28•10 years ago
|
||
mozinstall failures
https://treeherder.mozilla.org/logviewer.html#?job_id=12739116&repo=mozilla-inbound
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 29•10 years ago
|
||
Attempting SSH reboot...Failed.
Attempting PDU reboot...Failed.
Filed IT bug for reboot (bug 1194615)
Updated•10 years ago
|
Status: REOPENED → RESOLVED
Closed: 11 years ago → 10 years ago
Resolution: --- → FIXED
Updated•8 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•8 years ago
|
Whiteboard: [buildduty][buildslaves][capacity][needs diagnostics] → [problemtracking]
Updated•7 years ago
|
Priority: P3 → --
Comment 30•7 years ago
|
||
Back online after re-image.
Status: REOPENED → RESOLVED
Closed: 10 years ago → 7 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•