Closed
Bug 848681
Opened 12 years ago
Closed 12 years ago
[socorro-prod] python script hanging on abrt.socket
Categories
(Infrastructure & Operations Graveyard :: WebOps: Other, task)
Infrastructure & Operations Graveyard
WebOps: Other
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rhelmer, Assigned: cturra)
References
Details
One of the Socorro cron jobs (that's been running for years ;)) has developed an issue within the last month where the scripts it calls are hanging with (from strace):
connect(4, {sa_family=AF_FILE, path="/var/run/abrt/abrt.socket"}, 27
I dug up https://partner-bugzilla.redhat.com/show_bug.cgi?id=614752 but maybe someone more qualified than me can take a look pls?
Thanks!
Reporter | ||
Updated•12 years ago
|
Reporter | ||
Comment 1•12 years ago
|
||
This is still happening, any ideas?
Reporter | ||
Updated•12 years ago
|
Severity: normal → critical
Assignee | ||
Comment 2•12 years ago
|
||
grabbing the bug and dropping severity so it stops paging on-call.
Assignee: server-ops-webops → cturra
Severity: critical → major
Comment 3•12 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #1)
> This is still happening, any ideas?
Can you supply a hostname where you have experienced this? Some quick research points to a socket closing down, or not being available. I'm curious if this could be related to nic driver issues we have on some servers.
Reporter | ||
Comment 4•12 years ago
|
||
(In reply to Rick Bryce [:rbryce] from comment #3)
> (In reply to Robert Helmer [:rhelmer] from comment #1)
> > This is still happening, any ideas?
>
> Can you supply a hostname where you have experienced this? Some quick
> research points to a socket closing down, or not being available. I'm
> curious if this could be related to nic driver issues we have on some
> servers.
The host in question for bug 836671 is sp-admin01.phx1.mozilla.com
Comment 5•12 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #4)
> (In reply to Rick Bryce [:rbryce] from comment #3)
> > (In reply to Robert Helmer [:rhelmer] from comment #1)
> > > This is still happening, any ideas?
> >
> > Can you supply a hostname where you have experienced this? Some quick
> > research points to a socket closing down, or not being available. I'm
> > curious if this could be related to nic driver issues we have on some
> > servers.
>
> The host in question for bug 836671 is sp-admin01.phx1.mozilla.com
sp-admin01 has the dreaded bnx2x driver. I can't confirm, but I might bet real money the older firmware on the nic is to blame. There is an update we can perform, but it would require downtime and reboots. Again, I am not certain this is the issue, but some research into the error message points to networking.
Assignee | ||
Comment 6•12 years ago
|
||
:rbryce that's along the lines of what i was thinking also.
*note: abrt is a daemon that watches for application crashes and collects/reports on these.
*funny note: this is a crash report (abrt) issues for our crash reporting service (socorro) o_O
Assignee | ||
Comment 7•12 years ago
|
||
:rbryce - could you schedule this node for a nic driver review/update?
Flags: needinfo?(rbryce)
Comment 8•12 years ago
|
||
(In reply to Chris Turra [:cturra] from comment #7)
> :rbryce - could you schedule this node for a nic driver review/update?
Im going to do the firmware upgrade tomorrow @ 10:00am PDT. I will be sending out a notice to socorro-dev@m.c announcing the time and expected impact shortly.
Flags: needinfo?(rbryce)
Comment 9•12 years ago
|
||
(In reply to Rick Bryce [:rbryce] from comment #8)
> (In reply to Chris Turra [:cturra] from comment #7)
> > :rbryce - could you schedule this node for a nic driver review/update?
>
> Im going to do the firmware upgrade tomorrow @ 10:00am PDT. I will be
> sending out a notice to socorro-dev@m.c announcing the time and expected
> impact shortly.
We postponed this until the morning of April 9th
Comment 10•12 years ago
|
||
SP-Admin01 has been fully upgrades. Firmware updates, and OS upgrades were successful.
Linux sp-admin01.phx1.mozilla.com 2.6.32-358.2.1.el6.x86_64 #1 SMP Wed Feb 20 12:17:37 EST 2013 x86_64 x86_64 x86_64 GNU/Linux
Assignee | ||
Comment 11•12 years ago
|
||
:rhelmer - i am going to mark this bug as r/fixed per the work :rbryce did. please re-open the bug if the abrt.socket error returns.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Updated•6 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•