Closed Bug 804119 Opened 13 years ago Closed 13 years ago

high ftp load, sentry pulling CDNs, etc

Categories

(mozilla.org Graveyard :: Server Operations, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: pir, Assigned: pir)

References

Details

ftp1-4.dmz.scl3.mozilla.com all spiked in load (that's what alerted) and when checked iowait had spiked hard for no obvious reason. Ganglia showed outbound network traffic somewhat increased (but not inbound). Small amount of sockets in TIME_WAIT all of a sudden which supports networkio issues. 03:40:01 AM CPU %user %nice %system %iowait %steal %idle 03:50:01 AM all 1.52 0.00 2.12 1.79 0.00 94.58 04:00:01 AM all 0.25 0.00 1.53 1.64 0.00 96.59 04:10:01 AM all 0.33 0.00 1.72 3.76 0.00 94.18 04:20:01 AM all 1.37 0.00 1.95 2.09 0.00 94.59 04:30:01 AM all 0.24 0.00 1.64 3.36 0.00 94.76 04:40:01 AM all 0.33 0.00 1.81 3.45 0.00 94.41 04:50:01 AM all 0.25 0.00 1.65 22.71 0.00 75.39 05:00:01 AM all 1.40 0.00 2.20 43.92 0.00 52.48 05:10:01 AM all 1.46 0.00 2.19 49.29 0.00 47.06 05:20:01 AM all 0.51 0.00 2.07 56.31 0.00 41.11 05:30:01 AM all 0.31 0.00 1.78 20.10 0.00 77.80 None of the problems mentioned in bug 752399 are currently there. Load recovered, leaving this open for further debugging.
16:07 <@gcox:#systems> fox2mike: I killed off a long-running dedupe, which made things clean up pretty well from where I see. ftp2 load average was 1400, now 2.
sentry pulled the CDNs so the ftp servers were getting way more traffic than they should have. This DoSed the ftp servers, the zeus nodes (with various other things like hg being collateral damage) and scl3 in general. sentry has been tweaked to Not Do That, the traffic has reduced and things have settled back to near normal.
Summary: ftp servers spiked in load from iowait → high ftp load, sentry pulling CDNs, etc
Group: infra
We're seeing some more 500 ise's on ftp; is this still happening, or is that something else?
Blocks: 804740
gcox is looking into the storage backend at the moment.
Depends on: 804413
This bug is still open because although the major symptoms are not present all the time the issue is not solved. Until bug 804413 is resolved, meaning the load on the filer is reduced to a level where it can support the ftp load when it goes above normal background levels, and further testing has been done to establish the problem has been fixed the bug remains open for further notes and debugging. If you have problems that this bug is causing/exacerbating then please make those bugs dependent on this bug.
Peter, now that we know the root cause of the netapp issues, do we still need this open?
Assignee: server-ops → pradcliffe
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.