Closed Bug 795795 (bld-lion-r5-052) Opened 12 years ago Closed 7 years ago

bld-lion-r5-052 problem tracking

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Unassigned)

References

Details

(Whiteboard: [problemtracking])

Nagios says PING OK but not responsive to ssh or other nagios checks on buildbot or disk. Power cycled on the PDU.
Back online.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Blew up this morning - the two examples I have from its last fourteen failed runs are https://tbpl.mozilla.org/php/getParsedLog.php?id=16627221&tree=Fx-Team and https://tbpl.mozilla.org/php/getParsedLog.php?id=16641088&tree=Mozilla-Inbound where both creating and removing files gets a blank stare and an "invalid argument."
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Time for a re-image. Pulled it out of the pool.
Depends on: 807692
Back in the production pool. Fingers crossed.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Machine is dead again and I can't reboot it via the pdu.
Status: RESOLVED → REOPENED
Depends on: 809024
Resolution: FIXED → ---
This host is in scl3, we can get it back up next trip to scl3. Please open a separate bug for scl3 reboots. Thanks.
Depends on: 810524
No longer depends on: 809024
Reboot was a success, but the patient still should be dead: https://tbpl.mozilla.org/php/getParsedLog.php?id=16999342&tree=Mozilla-Inbound tar: FirefoxNightlyDebug.app/Contents/MacOS/webapprt/components: Write error
Conveniently, according to buildapi and nagios, it did die after burning two jobs.
https://tbpl.mozilla.org/php/getParsedLog.php?id=17120542&tree=Mozilla-Inbound IOError: [Errno 5] Input/output error still mostly smells like disk, though exercising bad memory while writing or reading will smell that way when it's really RAM.
disabled in slavealloc
I've rebooted this slave in the hopes it'll stop flapping on nagios' buildbot check.
Depends on: 827845
Depends on: 835756
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
This slave hasn't done any jobs since 2012-11-16, per comment #11. Since then it's been reimaged in bug 827845. I'm not confident that will have fixed up the issues, because bug 807692 was an even earlier reimage. If we do want to try it again then puppet needs fixing.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reimaged after diagnostics found corrupted files. Puppeted up, added keys, reenabled, rebooted.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Not sure why, but this slave didn't make it back to the production pool.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Apr 3 07:06:20 bld-lion-r5-052 org.mozilla.build.buildslave[467]: Error sending notice to nagios (ignored) Apr 3 07:06:20 bld-lion-r5-052 org.mozilla.build.buildslave[467]: dyld: Library not loaded: @executable_path/../.Python Apr 3 07:06:20 bld-lion-r5-052 org.mozilla.build.buildslave[467]: Referenced from: /tools/buildbot-0.8.4-pre-moz2/bin/python Apr 3 07:06:20 bld-lion-r5-052 org.mozilla.build.buildslave[467]: Reason: image not found Apr 3 07:06:20 bld-lion-r5-052 com.apple.launchd.peruser.501[179] (org.mozilla.build.buildslave[467]): Exited with code: 251 Apr 3 07:06:20 bld-lion-r5-052 ReportCrash[469]: Saved crash report for python[468] version ??? (???) to /Users/cltbld/Library/Logs/DiagnosticReports/python_2013-04-03-070620_bld-lion-r5-052.crash
http://debugfix.com/2011/11/dyld-library-loaded-executable_path-python/ says: "You’re trying to use a virtual environment created on a different computer, or you’ve upgraded / reformated your machine and you’re using the virtenv from your backup of the old machine." It also says the solution is to recreate the virtualenv. This doesn't sound right....makes me think that the machine got the wrong image or something. The reimaging did happen over 2 months ago, let's try again....
Depends on: 857585
Depends on: 858040
https://tbpl.mozilla.org/php/getParsedLog.php?id=21434464&tree=Mozilla-Inbound (disconnect while cloning) - is this in production while being worked on?
(In reply to Phil Ringnalda (:philor) from comment #18) > https://tbpl.mozilla.org/php/getParsedLog.php?id=21434464&tree=Mozilla- > Inbound (disconnect while cloning) - is this in production while being > worked on? Coop, you were poking at this.
Flags: needinfo?(coop)
(In reply to Phil Ringnalda (:philor) from comment #18) > https://tbpl.mozilla.org/php/getParsedLog.php?id=21434464&tree=Mozilla- > Inbound (disconnect while cloning) - is this in production while being > worked on? Not on purpose. Looks like it was re-enabled in slavealloc before I had fixed the auto-login issue, but not by me. Was it enabled the whole time?
Flags: needinfo?(coop)
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
(In reply to Chris Cooper [:coop] from comment #20) > (In reply to Phil Ringnalda (:philor) from comment #18) > > https://tbpl.mozilla.org/php/getParsedLog.php?id=21434464&tree=Mozilla- > > Inbound (disconnect while cloning) - is this in production while being > > worked on? > > Not on purpose. Looks like it was re-enabled in slavealloc before I had > fixed the auto-login issue, but not by me. Was it enabled the whole time? Looks like I either forgot to disable it, or re-enabled it before discovering the python issue. Sorry!
disabled in slavealloc
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 882500
lipo: can't write to output file: obj-firefox/i386/dist/universal/test-package-stage/xpcshell/tests/xpcom/tests/unit/TestTimers (Bad address) /builds/slave/m-cen-osx64-000000000000000000/build/build/macosx/universal/unify: lipo create fat failed for: obj-firefox/i386/dist/test-package-stage/xpcshell/tests/xpcom/tests/unit/TestTimers, obj-firefox/x86_64/dist/test-package-stage/xpcshell/tests/xpcom/tests/unit/TestTimers, obj-firefox/i386/dist/universal/test-package-stage/xpcshell/tests/xpcom/tests/unit/TestTimers Can't exec "rm": Input/output error at /builds/slave/m-cen-osx64-000000000000000000/build/build/macosx/universal/unify line 265. That's...not good.
Whiteboard: [buildduty][buildslaves][capacity] → [buildduty][buildslaves][capacity][needs diagnostics]
Depends on: 885396
Hard drive has been replaced in bug 885396. Back in production.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Attempting SSH reboot...Failed. Attempting PDU reboot...Failed. Filed IT bug for reboot (bug 1055444)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → RESOLVED
Closed: 12 years ago11 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attempting SSH reboot...Failed. Attempting PDU reboot...Failed. Filed IT bug for reboot (bug 1194615)
Status: REOPENED → RESOLVED
Closed: 11 years ago10 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [buildduty][buildslaves][capacity][needs diagnostics] → [problemtracking]
Priority: P3 → --
Back online after re-image.
Status: REOPENED → RESOLVED
Closed: 10 years ago7 years ago
Resolution: --- → FIXED
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.