Closed
Bug 444949
Opened 16 years ago
Closed 16 years ago
bm-xserve07 has fts_read problem
Categories
(Release Engineering :: General, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: armenzg, Assigned: nthomas)
References
()
Details
I was fixing Bug 443397, in which, I updated tinderbox's code and tried to start again. For several cycles, it seems that two instances of tinderbox were running at the same time. I stopped all instances and decided to remove obj dir and source dir but there is one folder that cannot be deleted. There might be some type of corruption Command runned: rm -rf /builds/tinderbox/Tb-Trunk/Darwin_8.8.4_Depend/build/universal/i386/dist/include/dom It returns this: rm(315) malloc: *** vm_allocate(size=1069056) failed (error code=3) rm(315) malloc: *** error: can't allocate region rm(315) malloc: *** set a breakpoint in szone_error to debug rm: fts_read: Cannot allocate memory
Comment 1•16 years ago
|
||
I wonder if there's a way we could force a disk check? Can we get a rough idea of the work and time it may take to fix this? It was generating our nightlies and we were getting ready to cut 3.0a2.
Assignee | ||
Comment 2•16 years ago
|
||
Looks like Trevor rebooted this box shortly after he took the bug. The directory in comment #0 is still broken - a call to ls or rm uses 100% CPU and keeps on grabbing memory until it OOMs and the malloc error is reported.
Assignee | ||
Updated•16 years ago
|
Assignee: thardcastle → nobody
Component: Server Operations → Release Engineering
Priority: -- → P2
QA Contact: justin → release
Assignee | ||
Comment 4•16 years ago
|
||
Disk Utility's Verify Disk says "Keys out of order. The volume Macintosh RAID needs to be repaired". The Repair Disk button is disabled though. It also says the RAID status is OK, so go figure. Anyway, I can't do much with it while it's booted off the disk that needs repairing, so back to Server Ops for a colo-trip and boot of CD.
Assignee: nthomas → server-ops
Component: Release Engineering → Server Operations
QA Contact: release → justin
Updated•16 years ago
|
Assignee: server-ops → phong
Comment 5•16 years ago
|
||
Sounds like the RAID array itself is fine, but the underlying filesystem is corrupted. Not something RAID will protect you against. Have you tried booting in single-user mode (holding Apple-S down at boot time) ? That will bring you to a running system with / mounted read-only, so you should be able to run a manual fsck, i.e.: $> fsck -fy
Comment 6•16 years ago
|
||
is this server frozen right now? it is not responding with the KMV attached.
Status: NEW → ASSIGNED
Reporter | ||
Comment 7•16 years ago
|
||
I have been able to ssh and to vnc to it
Comment 8•16 years ago
|
||
Increasing severity, this is keeping the tree closed. Any way I can help, if I had access to the box, I could possibly be driving this myself.
Severity: major → critical
Comment 9•16 years ago
|
||
** /dev/redisk1 ** root file system ** checking HFS Plus volume ** checking extents overflow file ** checking Catalog file Keys out of order (4, 22709) ** Rebuilding Catalog B-tree ** The volume Macintosh RAID could not be repaired
Comment 10•16 years ago
|
||
Sounds bad, one possibility is to try Disk Warrior (http://www.alsoft.com/DiskWarrior), it's got a bootable CD and can often repair more errors than OS X itself can. Worth a shot, IMO.
Comment 11•16 years ago
|
||
I've booted off the CD and trying to run the repair that way.
Comment 12•16 years ago
|
||
Still won't repair from install DVD.
Assignee | ||
Comment 13•16 years ago
|
||
We probably have to restore from a cloning image then, if the actual drives are OK. Please check that and then use the Intel/10.4 image. It's a fairly quick to setup tinderbox again.
Comment 14•16 years ago
|
||
is there anything I need to save before I wipe it out and reimage this server?
Assignee | ||
Comment 15•16 years ago
|
||
Looking now ...
Assignee | ||
Comment 16•16 years ago
|
||
I've moved everything we care about to stage.m.o:/tmp/bm-xserve07/, so please go ahead with restoring the clone image.
Comment 17•16 years ago
|
||
I have images from bm-xserve02 and 03. Which one of those would you like?
Comment 18•16 years ago
|
||
I also have images from bm-xserve10 and 16.
Assignee | ||
Comment 19•16 years ago
|
||
Could we have the one from bm-xserve10 thanks. (x-refbug 410271 comment #18)
Comment 20•16 years ago
|
||
re-imaged with 10.4.8. I also verified that disk and see no errors.
Assignee | ||
Comment 21•16 years ago
|
||
Great. You could pass it back to RelEng for tinderbox setup, or we can file a dependent bug.
Assignee | ||
Comment 22•16 years ago
|
||
Back to RelEng to finish this off.
Assignee: phong → nobody
Status: ASSIGNED → NEW
Component: Server Operations → Release Engineering
Flags: colo-trip+
QA Contact: justin → release
Assignee | ||
Updated•16 years ago
|
Assignee: nobody → nthomas
Assignee | ||
Comment 23•16 years ago
|
||
Rebuilt tinderbox dirs for Thunderbird and XULRunner, and restarted tinderbox. Will close this once both go green.
Assignee | ||
Comment 24•16 years ago
|
||
Builds went green, resolving FIXED. Thanks to IT for doing most of the work here.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 25•16 years ago
|
||
I'd forgotten about the chown stuff [1] so the XULRunner build failed. Fixed that up and forced a clobber, which was green. [1] http://wiki.mozilla.org/ReferencePlatforms/Mac#chown_scripts
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•