Closed Bug 759167 Opened 12 years ago Closed 12 years ago

Please run diagnostics on bld-lion-r5-045

Categories

(Infrastructure & Operations :: DCOps, task)

x86_64
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: coop, Assigned: vinh)

References

Details

(Whiteboard: scl3)

This slave has failed many integrity checks lately while working with archive files (specifically bz2). Note that this is happening again *after* it was re-imaged just last week.

Can we please run some diagnostics on it, and possibly send it for repair?
Assignee: server-ops-releng → server-ops
Component: Server Operations: RelEng → Server Operations: DCOps
QA Contact: arich → dmoore
chris,

do you need to back up any data on this host? it appears there is an issue with the hard drive and we would like to replace the drive and re-image the host. please let me know when we can proceed.


thanks,
van
(In reply to Van Le from comment #1) 
> do you need to back up any data on this host? it appears there is an issue
> with the hard drive and we would like to replace the drive and re-image the
> host. please let me know when we can proceed.

No, I don't. The only thing I was worried about was the ssh keys on the disk, and I've already removed those.
Chris,
Can I reboot the bld-lion-r5-045 to run the diagnostics right now?
Assignee: server-ops → vhua
Status: NEW → ASSIGNED
(In reply to Vinh Hua [:vinh] from comment #3)
> Can I reboot the bld-lion-r5-045 to run the diagnostics right now?

The machine is disabled in our build environment, so diagnose away!
Whiteboard: scl3
Chris,
I've ran the comprehensive diagnostics testing twice on this Mac Mini. Hardware shows up as "passed".
Chris,
Please see comment #5.

Also, I've noticed from the error log that you had included in bug 758605:

bzip2: Data integrity error when decompressing.
	Input file = Contents/MacOS/XUL.bz2, output file = Contents/MacOS/XUL

It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.

bzip2: Deleting output file Contents/MacOS/XUL, if it exists.
Couldn't decompress "Contents/MacOS/XUL" at /builds/slave/m-cen-osx64-l10n-ntly/build/mozilla-central/tools/update-packaging/unwrap_full_update.pl line 64.
program finished with exit code 2


Could it be that the file itself is corrupted?
The nature of l10n repacks is that many slaves download the same file and repack the language strings in it. If the archive file itself was corrupt, I would be expecting all repacks to fail.
Chris,
Sorry for the delay on this. Is this still problematic?  If so I'll just send it to Apple and see what they diagnose. Will also need to power down "bld-lion r5-046.build.releng.scl3" in order to disassemble the chassis.
(In reply to Vinh Hua [:vinh] from comment #8)
> Sorry for the delay on this. Is this still problematic?  If so I'll just
> send it to Apple and see what they diagnose. Will also need to power down
> "bld-lion r5-046.build.releng.scl3" in order to disassemble the chassis.

The slave has been out of service since this bug was filed pending diagnostic resolution.

I've now disabled bld-lion-r5-046 as well so that bld-lion-r5-045 can be unracked.
Mac Mini has been sent to Apple for repair.  Here's the bug ticket for tracking:

https://bugzilla.mozilla.org/show_bug.cgi?id=768122
Ben, it just came back from repair yesterday. It should be in SCL3 by Monday.
Just got the Mac Mini back today from desktop team.  Will rack it back on the chassis in the next 2 hour.  

FYI - The OS was probably erased by Apple during repair.
When you power it on, please hold down the N key so it re-installs the OS.
Both "bld-lion-r5-045" and "bld-lion-r5-046" have been racked and powered on.  Looked like the OS is still intact.  But per Amy's instructions I will netboot it to re-install the OS on "bld-lion-r5-045"

Here is the note that I got back from desktop team:

"Returned from Computer Care and placed on Jake's desk. Unit passed all diags and performed to factory specs."
bld-lion-r5-045 has finished rebuilding itself.  Let me know if the original issue still occurs.
Hi all,
Just wanted to check if this bug can be closed?
Sent for repair, reimaged.  Thanks Vinh!
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.