Closed
Bug 848681
Opened 11 years ago
Closed 11 years ago
[socorro-prod] python script hanging on abrt.socket
Categories
(Infrastructure & Operations Graveyard :: WebOps: Other, task)
Infrastructure & Operations Graveyard
WebOps: Other
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rhelmer, Assigned: cturra)
References
Details
One of the Socorro cron jobs (that's been running for years ;)) has developed an issue within the last month where the scripts it calls are hanging with (from strace): connect(4, {sa_family=AF_FILE, path="/var/run/abrt/abrt.socket"}, 27 I dug up https://partner-bugzilla.redhat.com/show_bug.cgi?id=614752 but maybe someone more qualified than me can take a look pls? Thanks!
Reporter | ||
Updated•11 years ago
|
Reporter | ||
Comment 1•11 years ago
|
||
This is still happening, any ideas?
Reporter | ||
Updated•11 years ago
|
Severity: normal → critical
Assignee | ||
Comment 2•11 years ago
|
||
grabbing the bug and dropping severity so it stops paging on-call.
Assignee: server-ops-webops → cturra
Severity: critical → major
Comment 3•11 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #1) > This is still happening, any ideas? Can you supply a hostname where you have experienced this? Some quick research points to a socket closing down, or not being available. I'm curious if this could be related to nic driver issues we have on some servers.
Reporter | ||
Comment 4•11 years ago
|
||
(In reply to Rick Bryce [:rbryce] from comment #3) > (In reply to Robert Helmer [:rhelmer] from comment #1) > > This is still happening, any ideas? > > Can you supply a hostname where you have experienced this? Some quick > research points to a socket closing down, or not being available. I'm > curious if this could be related to nic driver issues we have on some > servers. The host in question for bug 836671 is sp-admin01.phx1.mozilla.com
Comment 5•11 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #4) > (In reply to Rick Bryce [:rbryce] from comment #3) > > (In reply to Robert Helmer [:rhelmer] from comment #1) > > > This is still happening, any ideas? > > > > Can you supply a hostname where you have experienced this? Some quick > > research points to a socket closing down, or not being available. I'm > > curious if this could be related to nic driver issues we have on some > > servers. > > The host in question for bug 836671 is sp-admin01.phx1.mozilla.com sp-admin01 has the dreaded bnx2x driver. I can't confirm, but I might bet real money the older firmware on the nic is to blame. There is an update we can perform, but it would require downtime and reboots. Again, I am not certain this is the issue, but some research into the error message points to networking.
Assignee | ||
Comment 6•11 years ago
|
||
:rbryce that's along the lines of what i was thinking also. *note: abrt is a daemon that watches for application crashes and collects/reports on these. *funny note: this is a crash report (abrt) issues for our crash reporting service (socorro) o_O
Assignee | ||
Comment 7•11 years ago
|
||
:rbryce - could you schedule this node for a nic driver review/update?
Flags: needinfo?(rbryce)
Comment 8•11 years ago
|
||
(In reply to Chris Turra [:cturra] from comment #7) > :rbryce - could you schedule this node for a nic driver review/update? Im going to do the firmware upgrade tomorrow @ 10:00am PDT. I will be sending out a notice to socorro-dev@m.c announcing the time and expected impact shortly.
Flags: needinfo?(rbryce)
Comment 9•11 years ago
|
||
(In reply to Rick Bryce [:rbryce] from comment #8) > (In reply to Chris Turra [:cturra] from comment #7) > > :rbryce - could you schedule this node for a nic driver review/update? > > Im going to do the firmware upgrade tomorrow @ 10:00am PDT. I will be > sending out a notice to socorro-dev@m.c announcing the time and expected > impact shortly. We postponed this until the morning of April 9th
Comment 10•11 years ago
|
||
SP-Admin01 has been fully upgrades. Firmware updates, and OS upgrades were successful. Linux sp-admin01.phx1.mozilla.com 2.6.32-358.2.1.el6.x86_64 #1 SMP Wed Feb 20 12:17:37 EST 2013 x86_64 x86_64 x86_64 GNU/Linux
Assignee | ||
Comment 11•11 years ago
|
||
:rhelmer - i am going to mark this bug as r/fixed per the work :rbryce did. please re-open the bug if the abrt.socket error returns.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•