bugzilla.mozilla.org has resumed normal operation. Attachments prior to 2014 will be unavailable for a few days. This is tracked in Bug 1475801.
Please report any other irregularities here.

HP slaves hung trying to run hp-asrd

RESOLVED FIXED

Status

Infrastructure & Operations
RelOps
RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: rail, Assigned: dustin)

Tracking

Details

Attachments

(1 attachment)

(Reporter)

Description

6 years ago
/var/log/boot.log snippet:

  Using Proliant Standard
 	IPMI based System Health Monitor
  Starting Proliant Standard
 	IPMI based System Health Monitor (hpasmlited): /etc/rc3.d/S91hp-health: line 658:  1525 Segmentation fault      (core dumped) $PNAME $PARGS < /dev/null >> $LOGFILE 2>&1
                                                           [FAILED]

I have to restart hp-health service to allow hp-asrd proceed properly...

Nagios complains about not started buildbot.
(Reporter)

Comment 1

6 years ago
bld-centos6-hp-028 is one of those
This seems like a chronic issue that somebody needs to deal with, not a buildduty one.
Component: Release Engineering → Release Engineering: Platform Support
QA Contact: coop
Whiteboard: [buildduty]
(In reply to Rail Aliiev [:rail] from comment #1)
> bld-centos6-hp-028 is one of those

and bld-centos6-hp-016 too
We could try updating the HP tools.  If that doesn't help, we can have the SREs take a look and open a case with HP if this doesn't seem familiar to them.
All next actions live within IT - moving to relops per comment #4 for hp tool update.
Assignee: nobody → server-ops-releng
Component: Release Engineering: Platform Support → Server Operations: RelEng
QA Contact: coop → arich
http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&taskId=110&prodSeriesId=3923526&prodTypeId=18964&objectID=c03562076

This is specifically for Redhat, but I'm assuming the same will apply for CentOS:

"The root cause is a memory conflict between mcelog and hp-health.

When mcelog loads first, it reserves the 9e00 block of memory as read-only.
When hp-health loads, it attempted to change the 9e00 address to read/write, and this causes a segmentation fault.

The workaround allows for hp-health to finish reading the mmap before mcelog loads, and this resolves the problem.

Solution
HP-Health does not need read/write access of the memory, and this bug has been resolved in the 9.1.2 hp-health package (hp-health-9.1.2.4-3.rhel6.x86_64.rpm)"
Hm, though further investigation on bld-centos6-hp-028 shows that mcelog isn't even installed.  Is there a machine that's showing this issue now that I can look at to debug?
Assignee: server-ops-releng → arich
It turns out that we installed the hp-health tools in bug 733648 and aren't actually using them for anything.  HP's support of CentOS doesn't include RPMs with this particular issue fixed and I am loath to try and shoehorn the RHEL rpms (which may or may not have the fix) onto these systems and possibly break something else.  Since we definitely want to phase out the HPs anyway, I think the best solution here is to just remove the hp-health packages all together.

Handing this over to Dustin for the puppet work.
Assignee: arich → dustin
Attachment #721289 - Flags: review? → review?(arich)
Attachment #721289 - Flags: review?(arich) → review+
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.