Closed Bug 546490 Opened 14 years ago Closed 14 years ago

some new linux ix machines hung at grub

Categories

(Release Engineering :: General, defect)

x86
macOS
defect
Not set
major

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 546424

People

(Reporter: bhearsum, Unassigned)

References

Details

Some of the Linux ix machines from bug 545134 are hung at a 'GRUB' screen. Power cycling doesn't help. Can someone look into this, please?

mv-moz2-linux-ix-slave{01,08} are both in this state, and possibly others.
Dropping sev, no one's at the office right now to look.

What changed?  They were working when I handed them off and survived lots of reboots in the process.
Severity: critical → major
One possibility is that we changed the SATA mode from IDE to AHCI.  The latter has about 50x better drive performance.  But they survived plenty of reboots after that too.
Ben, What steps did you undertake before the reboot? Where did you get the kernel from?
I reverted slave01 back to IDE but same results.  At this point it'll have to sit until the morning when someone can boot off a rescue image.  

Sounds like GRUB is missing its stage2 image or /boot is gone/broken.  Worst case these will all need to be re-imaged and we don't have the master image onsite that I'm aware of.
(In reply to comment #3)
> Ben, What steps did you undertake before the reboot? Where did you get the
> kernel from?


Also, bhearsum, are there any recent changes to puppet manifests that might have horked these machines all automatically?
(In reply to comment #3)
> Ben, What steps did you undertake before the reboot? Where did you get the
> kernel from?

I don't know where the kernel came from; Catlee did that work.

> Also, bhearsum, are there any recent changes to puppet manifests that might
> have horked these machines all automatically?

If there's a Puppet change that broke things I'd be very surprised to find any of them working, but I'll certainly have a look.
mrz notes

catlee
	slave01, changed SATA type
	rebooted okay, builds faster
	applied BIOS change across all Linux slaves
	
	Noticed not seeing all 4GB RAM,
		install PAE kernel
		slave01 - installed PAE
			yum install kernel-pae-i686
			rebooted via console
			manually selected PAE kernel
			no NIC driver
			slave01 was booting by default into PAE kernel with network
			
			
			slave8 & 25
				SATA mode changed
				PAE kernel installed, not default
Assigning to mrz since it is paging.
Assignee: server-ops → mrz
I believe catlee has this under control, punting over.
Assignee: mrz → nobody
Component: Server Operations → Release Engineering
QA Contact: mrz → release
running grub-install /dev/sda seems to have fixed both slave01, and slave08. 
Still not sure what the initial cause was.

There's a rescue .iso image on the desktop of admin.b.m.o.  If you boot the
slave off of that, and then select 'grubdisk' from the first menu, then
"AUTOMAGIC BOOT", and then select the 2.6.18 kernel to boot from, you can then
run 'grub-install /dev/sda' as root.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → DUPLICATE
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.