Closed
Bug 1092242
Opened 10 years ago
Closed 8 years ago
Unexplained spikes in latency resulting in timeouts and tree closures
Categories
(Infrastructure & Operations Graveyard :: NetOps, task)
Infrastructure & Operations Graveyard
NetOps
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: coop, Assigned: jbarnell)
References
Details
We've had a number of tree closures this week caused by many build/test machines timing out while trying to download required artifacts. Bug 1091707 is one such manifestation. Bug 1039849 might be another.
The smokeping graphs show a number of these events. Here's one for use1:
http://netops2.private.scl3.mozilla.com/smokeping/sm.cgi?displaymode=n;start=2014-10-30%2007:30;end=2014-10-31%2010:00;target=Datacenters.RELENG-SCL3.nagios1-releng-use1~admin1b.private.releng.scl3.mozilla.com
...and one for scl3:
http://netops2.private.scl3.mozilla.com/smokeping/sm.cgi?displaymode=n;start=2014-10-30%2007:30;end=2014-10-31%2010:00;target=Datacenters.RELENG-SCL3.fw1-scl3~admin1b.private.releng.scl3.mozilla.com
If I look at the releng-scl3 smokeping graphs in aggregate, it's not limited to any one system, i.e. it seems pervasive:
http://netops2.private.scl3.mozilla.com/smokeping/sm.cgi?target=Datacenters.RELENG-SCL3
Can netops provide any insight on what might be causing these events based on the timings? Is there any monitoring we can put in place to figure out why they're happening? Could we be saturating our network at those times and, in essence, DoS-ing ourselves?
Assignee | ||
Comment 1•10 years ago
|
||
I'm talking a look ...
Assignee | ||
Updated•10 years ago
|
Assignee: network-operations → jbarnell
Assignee | ||
Comment 2•8 years ago
|
||
We've upgrade firewalls which has provided some level of relief on this.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Updated•2 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•