444949 - bm-xserve07 has fts_read problem

Reporter

Description

•

16 years ago

I was fixing Bug 443397, in which, I updated tinderbox's code and tried to start again.
For several cycles, it seems that two instances of tinderbox were running at the same time.
I stopped all instances and decided to remove obj dir and source dir but there is one folder that cannot be deleted. There might be some type of corruption

Command runned: rm -rf /builds/tinderbox/Tb-Trunk/Darwin_8.8.4_Depend/build/universal/i386/dist/include/dom

It returns this:
rm(315) malloc: *** vm_allocate(size=1069056) failed (error code=3)
rm(315) malloc: *** error: can't allocate region
rm(315) malloc: *** set a breakpoint in szone_error to debug
rm: fts_read: Cannot allocate memory

chizu

Updated

•

16 years ago

Assignee: server-ops → thardcastle

Mark Banner (:standard8)

Comment 1

•

16 years ago

I wonder if there's a way we could force a disk check?

Can we get a rough idea of the work and time it may take to fix this? It was generating our nightlies and we were getting ready to cut 3.0a2.

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 2

•

16 years ago

Looks like Trevor rebooted this box shortly after he took the bug. The directory in comment #0 is still broken - a call to ls or rm uses 100% CPU and keeps on grabbing memory until it OOMs and the malloc error is reported.

Nick Thomas [:nthomas] (UTC+12)

Assignee

Updated

•

16 years ago

Assignee: thardcastle → nobody

Component: Server Operations → Release Engineering

Priority: -- → P2

QA Contact: justin → release

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 3

•

16 years ago

Trevor didn't mind me grabbing this ;-)

Assignee: nobody → nthomas

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 4

•

16 years ago

Disk Utility's Verify Disk says "Keys out of order. The volume Macintosh RAID needs to be repaired". The Repair Disk button is disabled though. It also says the RAID status is OK, so go figure.

Anyway, I can't do much with it while it's booted off the disk that needs repairing, so back to Server Ops for a colo-trip and boot of CD.

Assignee: nthomas → server-ops

Component: Release Engineering → Server Operations

QA Contact: release → justin

chizu

Updated

•

16 years ago

Flags: colo-trip+

chizu

Updated

•

16 years ago

Severity: critical → major

Justin Fitzhugh

Updated

•

16 years ago

Assignee: server-ops → phong

Philippe M. Chiasson (:gozer)

Comment 5

•

16 years ago

Sounds like the RAID array itself is fine, but the underlying filesystem is corrupted. Not something RAID will protect you against.

Have you tried booting in single-user mode (holding Apple-S down at boot time) ?
That will bring you to a running system with / mounted read-only, so you should be able to run a manual fsck, i.e.:

$> fsck -fy

Phong Tran [:phong]

Comment 6

•

16 years ago

is this server frozen right now?  it is not responding with the KMV attached.

Status: NEW → ASSIGNED

Armen [:armenzg]

Reporter

Comment 7

•

16 years ago

I have been able to ssh and to vnc to it

Philippe M. Chiasson (:gozer)

Comment 8

•

16 years ago

Increasing severity, this is keeping the tree closed. Any way I can help, if I had access to the box, I could possibly be driving this myself.

Severity: major → critical

Phong Tran [:phong]

Comment 9

•

16 years ago

** /dev/redisk1
** root file system
** checking HFS Plus volume
** checking extents overflow file
** checking Catalog file
   Keys out of order
(4, 22709)
** Rebuilding Catalog B-tree
** The volume Macintosh RAID could not be repaired

Philippe M. Chiasson (:gozer)

Comment 10

•

16 years ago

Sounds bad, one possibility is to try Disk Warrior (http://www.alsoft.com/DiskWarrior), it's got a bootable CD and can often repair more errors than OS X itself can. Worth a shot, IMO.

Phong Tran [:phong]

Comment 11

•

16 years ago

I've booted off the CD and trying to run the repair that way.

Phong Tran [:phong]

Comment 12

•

16 years ago

Still won't repair from install DVD.

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 13

•

16 years ago

We probably have to restore from a cloning image then, if the actual drives are OK. Please check that and then use the Intel/10.4 image.

It's a fairly quick to setup tinderbox again.

Phong Tran [:phong]

Comment 14

•

16 years ago

is there anything I need to save before I wipe it out and reimage this server?

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 15

•

16 years ago

Looking now ...

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 16

•

16 years ago

I've moved everything we care about to stage.m.o:/tmp/bm-xserve07/, so please go ahead with restoring the clone image.

Phong Tran [:phong]

Comment 17

•

16 years ago

I have images from bm-xserve02 and 03.  Which one of those would you like?

Phong Tran [:phong]

Comment 18

•

16 years ago

I also have images from bm-xserve10 and 16.

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 19

•

16 years ago

Could we have the one from bm-xserve10 thanks. (x-refbug 410271 comment #18)

Phong Tran [:phong]

Comment 20

•

16 years ago

re-imaged with 10.4.8.  I also verified that disk and see no errors.

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 21

•

16 years ago

Great. You could pass it back to RelEng for tinderbox setup, or we can file a dependent bug.

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 22

•

16 years ago

Back to RelEng to finish this off.

Assignee: phong → nobody

Status: ASSIGNED → NEW

Component: Server Operations → Release Engineering

Flags: colo-trip+

QA Contact: justin → release

Nick Thomas [:nthomas] (UTC+12)

Assignee

Updated

•

16 years ago

Assignee: nobody → nthomas

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 23

•

16 years ago

Rebuilt tinderbox dirs for Thunderbird and XULRunner, and restarted tinderbox. Will close this once both go green.

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 24

•

16 years ago

Builds went green, resolving FIXED. Thanks to IT for doing most of the work here.

Status: NEW → RESOLVED

Closed: 16 years ago

Resolution: --- → FIXED

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 25

•

16 years ago

I'd forgotten about the chown stuff [1] so the XULRunner build failed. Fixed that up and forced a clobber, which was green.

[1] http://wiki.mozilla.org/ReferencePlatforms/Mac#chown_scripts

Nobody; OK to take it and work on it

Updated

•

11 years ago

Product: mozilla.org → Release Engineering